Transformation by interpreter specialisation

33
Science of Computer Programming 52 (2004) 307–339 Science of Computer Programming www.elsevier.com/locate/scico Transformation by interpreter specialisation Neil D. Jones DIKU, University of Copenhagen, Denmark Received 27 April 2003; received in revised form 30 October 2003; accepted 23 January 2004 Available online 4 June 2004 Abstract A program may be transformed by specialising an interpreter for the language in which it is written. Efficiency of the transformed program is determined by the efficiency of the interpreter’s dynamic operations; the efficiency of its static operations is irrelevant, since all will be “specialised away”. This approach is automatic (as automatic as the specialiser is); general, in that it applies to all of the interpreter’s input programs; and flexible, in that a wide range of program transformations are achievable since the transformed program inherits the style of the interpreter. The chief practical challenge is understanding cause and effect: how should one write the interpreter so that specialisation produces efficient transformed programs? The core of this paper is a series of examples, both positive and negative, showing how the way the interpreter is written can influence the removal of interpretational overhead, and thus the efficiency and size of the target programs obtained by specialisation. In principle this method can speed up source programs by an arbitrary amount; the limit depends only on how efficiently the interpreter can be written. An example shows that a “memoising” interpreter can yield asymptotic speedup, transforming an exponential-time program into one running in polynomial time. A final analysis is made of the speed of the output programs in relation to the speed of the interpreter from which they are derived. It is argued that this relative speedup is linear, with a constant coefficient depending on the program being transformed. © 2004 Elsevier B.V. All rights reserved. 1. Introduction Program specialisation (also known as partial evaluation or partial deduction) is a source-to-source optimising program transformation. Suppose the inputs to program p have E-mail address: [email protected] (N.D. Jones). 0167-6423/$ - see front matter © 2004 Elsevier B.V. All rights reserved. doi:10.1016/j.scico.2004.03.010

Transcript of Transformation by interpreter specialisation

Science of Computer Programming 52 (2004) 307–339

Science ofComputerProgramming

www.elsevier.com/locate/scico

Transformation by interpreter specialisation

Neil D. Jones

DIKU, University of Copenhagen, Denmark

Received 27 April 2003; received in revised form 30 October 2003; accepted 23 January 2004

Available online 4 June 2004

Abstract

A program may be transformed by specialising an interpreter for the language in which it iswritten. Efficiency of the transformed program is determined by the efficiency of the interpreter’sdynamic operations; the efficiency of its static operations is irrelevant, since all will be “specialisedaway”. This approach isautomatic (as automatic as the specialiser is);general, in that it applies toall of the interpreter’s input programs; andflexible, in that a wide range of program transformationsare achievable since the transformed program inherits the style of the interpreter.

The chief practical challenge is understanding cause and effect: how should one write theinterpreter so that specialisation produces efficient transformed programs? The core of this paperis a series of examples, both positive and negative, showing how the way the interpreter is writtencan influence the removal of interpretational overhead, and thus the efficiency and size of the targetprograms obtained by specialisation.

In principle this method can speed up source programs by an arbitrary amount; the limit dependsonly on how efficiently the interpreter can be written. An example shows that a “memoising”interpreter can yield asymptotic speedup, transforming an exponential-time program into one runningin polynomial time. A final analysis is made of the speed of the output programs in relation to thespeed of the interpreter from which they are derived. It is argued that this relative speedup is linear,with a constant coefficient depending on the program being transformed.© 2004 Elsevier B.V. All rights reserved.

1. Introduction

Program specialisation (also known aspartial evaluation or partial deduction) is asource-to-source optimising program transformation. Suppose the inputs to programp have

E-mail address: [email protected] (N.D. Jones).

0167-6423/$ - see front matter © 2004 Elsevier B.V. All rights reserved.doi:10.1016/j.scico.2004.03.010

308 N.D. Jones / Science of Computer Programming 52 (2004) 307–339

been divided into two classes:static anddynamic. A specialiser is given programp togetherwith valuess for its static inputs. It yields as output aspecialised program ps.

Programps, when run on any dynamic inputs, will yield the same output thatp wouldhave yielded, if givenboth its static and dynamic inputs. Specialised programps may besignificantly faster thanp if many ofp’s computations that depend only on the inputs havebeen pre-computed.

Ideally a partial evaluator is a black box: a program transformer able to extractnontrivial static computations whenever possible; that never fails to terminate; and alwaysyields specialised programs of reasonable size and maximal efficiency, so all possiblecomputations depending only on static data have been done.Practically speaking, programspecialisers often fall short of this goal; they sometimes loop, sometimes pessimise, andcan explode code size.

A program specialiser is analogous to a spirited horse: while impressive results can beobtained when it is used well, the user must know what he/she is doing. This paper’s thesisis thatthis knowledge can be communicated to new usersof program specialisers.

1.1. The phenomena

In my opinion there exist a number ofreal phenomena in specialisation, i.e., frequentlyoccurring specialisation problems that are nontrivial to solve; there exist styles ofprogramming that can be expected to giveany program specialiser difficulties; and onecanlearn and communicate programming style that leads to good specialisation, i.e., highspeedups without explosion of code size.

This paper communicates this knowledge via a checklist of advice, illustrated by aseries ofconcrete examples that concentrate on a broad and on the whole quite successfulapplication area: achievingprogram transformation by specialising an interpreter withrespect to a source program. Examples are both positive and negative, showing how theway the interpreter is written affects the efficiency and size of the transformed programs.After each, we summarise what was learned,and analyse the problems that arose.

1.2. Overview of the article

Section 2defines basic concepts, shows by equational reasoning that programs may betransformed by interpreter specialisation,and identifies two measures (Type 1 and Type 2)of the speedups obtained by specialisation. The nature of the transformations that canbe realised by the approach is discussed, followed by descriptions of some successfulapplications.

Two small examples beginSection 3, whose main content is a list of the assumptionswe make about program specialisation.Section 4 contains a (very) simple exampleinterpreter, together with an imperative source program, and the result of transformingit into functional form by specialisation.

Section 5is the heart of the paper: how to obtain good results from interpreter specialisa-tion, in the form of a list of points of advice. It begins with problems that must be overcometo use the methodology. There are, unavoidably, references to concepts developed in latersections. Finally, the question of where relevant interpreters originate is discussed.

N.D. Jones / Science of Computer Programming 52 (2004) 307–339 309

Section 6 focuses on the problem of ensuring that transformation will terminate.Section 7contains several interpreter examples, after whichSection 8analyses theirspecialisation.Section 9addresses the problem of code explosion. One cause is analysed:static parameters that varyindependently. Two examples are given of the problem, andwork-arounds are described.

The penultimateSection 10investigates the question of how much speedup may be ob-tained in principle by program specialisation. Two speedup types are identified. It is shownthat Type 1 speedup can be arbitrarily large, as it amounts to a change of algorithm, and anexample displaying exponential speedup is given. Final, it is shown that Type 2 speedup islinearly bounded, with a constant coefficient depending on the program being transformed.

1.3. Making implicit knowledge explicit

Some of the following examples will make the experienced reader grit his or her teeth,saying “I would never dream of programming my interpreter in such an absurd way!” Thisis indeed true—but inexperienced users often commit just these errors of thought or ofprogramming style (I speak from experience with students). A major point of this articleis to make manifest some of theimplicit knowledge that we haveacquired in our field([9,17,18,20] and the PEPM conference series), so newcomers can avoid encountering andhaving to figure out how to solve the same problems “the hard way”.

2. Concepts and notation from program specialisation

2.1. Main concepts

We use familiar notation (e.g., see [20]). Data setD is a set containing all first-ordervalues, includingall program texts. Inputs may be combined in pairs, so “(d, e)” is in D ifbothd ande are inD. For concreteness the reader may think ofD as being Lisp/Schemelists, and programs as being written in Scheme. However the points below are not at alllanguage dependent.

The partial function[[p]] : D → D ∪ ⊥ is thesemantic function of programp: if pterminates, then[[p]](d) is in D; and ifp fails to terminate ond, then[[p]](d) = ⊥. We willassume the underlying implementation language is defined by an operational semantics,not specified here. Computations are given bysequences (for imperative languages) or,more generally, bycomputation trees.

Program running times

We assume that program run time is proportional to the size of the computation treededucing[[p]](d) = e. (This will be relevant in the discussion of linear speedup.) Notation:the time taken to compute[[p]](d) is written astimep(d) (a value inN ∪ ∞, equal to∞ iffp fails to terminate ond).

Other languages. If unspecified, we assume that a standard implementation languageL isintended. If another languageS is used, we add a superscriptS to the semantic and runningtime functions, e.g.,[[p]]S(d) andtimeSp(d).

310 N.D. Jones / Science of Computer Programming 52 (2004) 307–339

2.1.1. Program specialisationGiven programp and a “static” input values ∈ D, ps is aresult of specialising p to s if

for every “dynamic” input valued ∈ D,

[[ps]](d) .= [[p]](s, d) (1)

We call ps a specialised program for p with respect tos.1

A partial evaluator (or program specialiser) is a programspec such that for everyprogramp and “static” inputs ∈ D, [[spec]](p, s) is a resultps of specialising p to s.Correctness is specified by the equation2

[[p]](s, d) .= [[[[spec]](p, s)]](d) (2)

A trivial spec is easy to build by “freezing” the static inputs (Kleene’ss-m-n theorem ofthe1930s did specialisation in this way). The practical goal is to makeps = [[spec]](p, s)as efficient as possible, by performing many ofp’s computations—ideally, all of those thatdepend ons alone.

A number of practical program specialisers exist. Systems involved in recent articlesincludeTempo, Ecce, Logen andPGG [8,23,24,37], and significant earlier work has beenreported concerning the systemsC-mix, Similix andMixtus [15,22,32].

2.1.2. Interpretation and its overheadPrograminterp is aninterpreter for language S if for all S-programssource and data

d ∈ D

[[interp]](source, d) .= [[source]]S(d) (3)

Interpretational overhead. Interpreters are reputed to beslow. This is certainly true ifinterp slavishly imitates theoperations ofsource; timeinterp(source, d) may be tentimes larger thantimeS

source(d).On the other hand,interpmay be written cleverly using, for example, thememoisation

device to save time when programsourcewould solve the same problem repeatedly. Suchan interpreter may in fact runfaster than the program being interpreted.

2.1.3. Compilation and specialisationA compiler transformsS-programsource into L-programtarget suchthat

[[target]](d) .= [[source]]S(d) (4)

for any datad ∈ D. Thefirst Futamura projection [12] is the equation

targetdef= [[spec]](interp, source) (5)

1 Symbol.= stands for equality of partial values:a

.= b means either both sides are equal elements ofD, orboth sides are undefined. This definition is purelyextensional, ignoring “intensional” features such as programefficiency or size. A more intensional view is often taken in logic programming [24,26].

2 The right side of (2) can be undefined in two ways: either[[spec]](p, s) = ⊥ if the specialiser loops; or[[spec]](p, s) = q and[[q]](d) = ⊥ if the specialised program loops.

N.D. Jones / Science of Computer Programming 52 (2004) 307–339 311

It states that one maycompile by specialising an interpreter, since

[[source]]S(d) .= [[interp]](source, d) by (3).= [[[[spec]](interp, source)]](d) by (2)

= [[target]](d) by (5)

2.1.4. Type 1 speedupThe purpose of program specialisation is to make programs run faster.Type 1 speedup

compares the speed of the two sides of Eq. (1).3 The Type 1 speedup realised byspecialising programp to static inputs is the ratio

speedups(d) = timep(s, d)

timeps(d). (6)

The Type 1 speedup is 1.7, for instance, if the specialised program runs 1.7 times as fast asthe original program. (If less than 1, it might be better called a slowdown.) For each staticinputs, the Type 1 speedup is a function ofd.

2.1.5. Type 2 speedupThe Type 1 speedup for a given interpreterinterp would be

speedupsource(d) = timeinterp(source, d)

timetarget(d).

If source languageS has a natural time measure it may be more natural to compare thespeed of the target program to the speed of the source program it was obtained from,giving theType 2 speedup:

speedup(d) = timeSsource(d)

timetarget(d). (7)

We will see in Section 10that for normal specialisers Type 1 speedup is at most a linearfunction ofd, with a coefficient determined by the value ofs. On theother hand Type 2speedup can be arbitrarily large, as shown by an example whose speedup is exponential.

2.2. Discussion: what transformations can be realised by specialisation?

Supposetarget = [[spec]](interp, source). For all specialisers I know:

(1) Programtarget inherits thealgorithm of programsource.(2) Programtarget inherits theprogramming style of interpreterinterp.

The reason for Point 1 is that the interpreterinterp faithfully executes the operationsspecified by programsource, so its specialised versiontarget will executethe sameoperations in the same order.

3 Type 1 speedup ignores the timetimespec(p, s) required to specialisep to s. This is reasonable if[[p]](s, d)is often computed with the sames value but with varyingd values—natural ifp is an interpreter ands is a sourceprogram.

312 N.D. Jones / Science of Computer Programming 52 (2004) 307–339

The reason for Point 2 is that programtarget consists of specialised code taken fromthe interpreterinterp: theoperations it could not execute using only the values of its staticinputsource. Some consequences are discussed in [28,35]. Further examples of Point 2:we can expect that

• if interp is written in a functional language, thentarget will also be functional,regardless of howsource was written;

• if interp is written in tail-recursive style, thentarget will also be tail recursive;

• if interp is written in CPS (continuation-passing style), thentarget will also be inCPS;

• if interp uses memoisation to implement its function calls, then the correspondingspecialised calls intarget will also use memoisation.

A general-purpose program transformer can thus be built by programming a self-interpreterin a style appropriate to the desired transformation. Further, if some parts of the programshould be handled differently to others (e.g., if some but not all calls should be memoised),this can be accomplished by addingprogram annotations to source, andprogramminginterp to treat parts ofsource in accordance with their annotations.

The main challenges to overcome for using the approach in practice are:

• to write interp so it realises, while executingsource, the optimisations the userwishes to obtain (style change, memoisation, instrumentation etc); and

• to write interp so thatas many operations as possible are given static binding times,so thattarget will be as efficient as possible.

2.2.1. The goal of this articleOur goal is pragmatically to understandcause andeffect in the process of realising a

program transformation by interpreter specialisation.

2.3. Applications of interpreter specialisation

2.3.1. Program transformationIf S = L theninterp is a self-interpreter, and the first Futamura projection effects a

transformation fromL-programs toL-programs. This approach has been widely used inthe partial evaluation community, e.g., [13,16,20,21,28,34,35,38]. In this case, the Type 2speedup ratio is the natural measure of the benefit of specialisation.

2.3.2. Language extensionSelf-interpreters have long been popular inthe Lisp, Scheme and Prolog communities

because thesource language can be extended, e.g., to allow new language constructions,simply by extending the interpreter. However, this flexibility comes with a cost: programsmust be executed interpretively, instead of being compiled into more efficient targetcode.

Specialisation allows the best of both worlds. Partially evaluating an extended self-interpreter with respect to an extended source programp′ has the effect of translatingp′

N.D. Jones / Science of Computer Programming 52 (2004) 307–339 313

into an equivalent programp in the base language, which can then be compiled by anexisting compiler. The net effect is thus to remove all interpretational overhead.

2.3.3. Debugging

A self-interpreter can be augmented withinstrumentation code to keep a logof valueand control flow, to check for exceptional behaviour etc. to aid debugging. Althoughflexible, this approach is not ideal. Interpreters tend to be slow; and an interpreter’sbookkeeping operations may change timing factors and so distort communicationbehaviour of a concurrent program.

Specialising an instrumented self-interpreter to a source program has the effect ofinserting instrumentation code into the body of the source program [14]. One advantageover interpretation is faster execution. Another advantage, particularly useful whendebugging multithreaded programs, is that the control flow in an instrumented programis closer to that of the original source program than that of an interpreter.

2.3.4. Aspects and reflection

Kiczales, Asai and others have used thisapproach to implement quite nontrivialextensions of the base language’s semantics. Kiczales applied specialisation to aninterpreter for anaspect-oriented language [27]. The effect of specialisation wasautomatically to achieve the effect of “weaving”, a rather tricky process in most aspect-oriented languages. Further, Asai and others have used this approach to compile areflectivelanguage by specialising a customised meta-circular interpreter [4].

2.3.5. Domain-specific languages

If interpreterinterp implements a special-purpose languageS, by the firstFutamuraprojection a specialiser can compileS-programs into the specialiser’s output language.In this caseS-programs may have no “natural” running times, so Type 1 speedupmeasures the benefit gained by specialisation. Articles on domain-specific languagesinclude [29,30,36,39].

3. Assumptions about the specialisation algorithm

The example series to be given can lead to remarks of the form “yes, but I can solve thatproblem using feature XXX of my program specialiser”. A consequence is that the adviceto be given seems, at least to some extent, related to whichtype of specialisation algorithmis used. Accounting online for such remarks, with responses (“even so,. . . ”) could easilylead to a protracted discussion whose line would be hard to follow. Worse, it would makeit hard at the end of the paper to say what had really been accomplished.

To avoid fruitless digressions, I now carefully detail the rather conservative assumptionsabout the specialisation algorithms used in this paper. Once readers have followed myreasoning under these assumptions, they are encouraged to devise better solutions. Webegin with two examples, and then discuss specialisation techniques.

314 N.D. Jones / Science of Computer Programming 52 (2004) 307–339

3.1. What partial evaluation can accomplish

The canonical first partial evaluation example is the “power” program, computingxn

and specialised to a known staticn value. The program, and its specialisation ton = 5, are

power(n,x) = if n=0 then 1 else x * power(n-1,x)power5(x) = x * x * x * x * x * 1

From a program transformation viewpoint, specialisation has occurred by applying a singlefunction definition, someunfold operations,constant propagation. (The equalityx ∗ 1 = xcould also be applied to remove “*1”.)

A less trivial example program computing Ackermann’s function illustrates sometechnicalpoints. The programp to be specialised is

ack(m,n) = if m=0 then n+1 elseif n=0 then ack(m-1,1)

else ack(m-1,ack(m,n-1))

Specialisation to staticm = 2 and dynamicn yields this programp2:

ack2(n) = if n=0 then ack1(1) else ack1(ack2(n-1))ack1(n) = if n=0 then ack0(1) else ack0(ack1(n-1))ack0(n) = n+1

Programp2 performs about half as many base operations (+, -, =, . . . ) as programp.Specialisation occurred by creating newfunction definitions ack2, ack0, ack1 andfoldsto produce calls to the newly defined functions.

3.2. How these examples can be specialised

In the power example the binding-time analysis (explained below) classifiesn as staticandx as dynamic, andannotates all the program’s operations as follows:

power(ns, xd) = ifs n =s 0 thens 1d elses x *d powers(n -s 1, x)

The statically annotated operations involvingnwill be computed at specialisation time. Therecursive call topower is also marked withs for static, now meaning “to be unfolded”. Thespecialised program contains only*, x and1.

The Ackermann example in binding-time annotated form is

ack(ms, nd) = ifs m =s 0 thens n +d 1 elses

ifd n =d 0 thend ackd(m -s 1, 1d)elsed ackd(m -s 1, ackd(m, n -d 1))

The if-test onm is marked static, to be decided at specialisation time; but specialisedcode will be generated for the test onn. The recursive calls are all marked as “not to beunfolded”. This causes the calls toack2, ack1 andack0 to be generated for the static inputm = 2.

N.D. Jones / Science of Computer Programming 52 (2004) 307–339 315

3.3. Assumptions about the specialisation algorithm

(1) The programs to be specialised are assumed expressed in anuntyped first-orderfunctional language (side-effect free), using an informal syntax. However, thelanguages they interpret may be functional, imperative, higher order, logic etc.

(2) Offline specialisation is assumed. Before the value of static inputs is known, anautomaticbinding-time analysis is performed. This phase annotates all of programp’s function parameters, operations and calls as• static. Compute/perform/unfold at specialisation time; or• dynamic. Generate code to be executed at run time.

(3) We assume the binding-time analysis ismonovariant. Each function parameter isalways treated as static, or always as dynamic. (A polyvariant binding-time analysis,or an online specialiser, could yield better results on the Ackermann program byunfolding the callsack1(1) andack0(1) to yield respectively 3 and 2.)

(4) Call unfolding. A call will be unfolded at specialisation time if it will not cause arecursive call in the specialised program.4

(5) Program point specialisation is assumed. Each control point in the specialised programcorresponds to a pair(, static-store), where is a control point in the original programandstatic-store is a tuple of values of the program’s static variables at point.

The original Ackermann program has only one control point: entryto the functionack. The specialised program’s three control pointsack2, ack1, ack0 correspond to(ack, m = 2), (ack, m = 1), (ack, m = 0).

(6) No global optimisations are done, e.g. the classicaldead code elimination,or elimination of common subexpressions, especially ones containing functioncalls.5

4. A simple interpreter example

The trivial imperative languageNorma is described inFig. 1. During Norma programexecution, input and the current values of registersx, y are represented asunary or base1 numbers sox = 3 is represented by the length-3 list(1 1 1). Labels are represented bythe same device: the final(GOTO 1 1) transfers to instruction 2 (the testZERO-X?...).

4.1. How interpreter Norma-int of Fig. 2 works

Given the program and the initial value ofx, function execute calls run. Inrun(pgtail, prog, x, y), parametersx andy are the current values of the two reg-isters. Thecall from run thus setsx to the outside input andy to 0, represented by the

4 A common strategy in more detail: unfold a source call that is nonrecursive or one that causes astatic parameter to decrease; but avoid unfolding if it would cause duplication of specialised code. Mostspecialisers allow a “pragma” to ensure that specialisation terminates: any call may be marked by hand as “mustnot unfold”.

5 Logic programming analogues: removal of code thatis certain to fail, or removing duplicated goals.

316 N.D. Jones / Science of Computer Programming 52 (2004) 307–339

Fig. 1. Norma program syntax and an example source program.

empty list. Parameterprog always equals theNorma program being interpreted. Thecurrent “control point” is represented by a suffix ofprog calledpgtail. Itsfirstcomponentis the next instruction to be executed. Thus the initial call torun haspgtail = prog,indicating that execution begins with the first instruction inprog.

4.2. Binding-time analysis

Theexecute parameters are the interpreter program’s inputs: a static inputprog anda dynamic inputx. Functionrun is thus initially called with staticpgtail andprog, anddynamicx. Fornow we assume thaty too isdynamic. All parts of therun body involvingx or y are annotated as dynamic, e.g.,cdr x or cons(′1,y). The remainder is annotatedas static, including the central “dispatch on syntax”. Further, both arguments tojump areannotated as static.

4.3. How specialisation of Norma-int to source proceeds

The interpreter’s statically annotated parts will be evaluated, and specialised programcode will be generated for its dynamically annotated parts. In particular, any part involvingx or y will be considered as dynamic.

The resulttarget of specialising Norma-int to thesource from Fig. 1 is shown inFig. 3. It also computes 2∗ x + 2, but is a functional program. We see thattarget is

N.D. Jones / Science of Computer Programming 52 (2004) 307–339 317

Fig. 2. The Norma interpreterNorma-int.

about ten times faster thanNorma-int, if time is measured by thenumber of calls to basefunctions and toexecute, run andjump.6

The example concretely illustrates two points made earlier:

(1) Programtarget inherits thealgorithm of programsource. (Here, to compute 2x +2by repeated addition.)

(2) Programtarget inherits the programming style of interpreter interp. (Here,imperative program source was translated to tail-recursive functional programtarget.)

6 This would be less ifGOTO were not implemented in such a simplistic way.

318 N.D. Jones / Science of Computer Programming 52 (2004) 307–339

Fig. 3.target = [[spec]](Norma-int, source) for computing 2x + 2.

5. Advice on writing an interpreter for specialisation

5.1. Problems to overcome

Two key problems must be overcome in order to get the full benefit of this approach inpractice.

(1) NONHALTING. The specialiser may fail to terminate; and(2) QUALITY . Programtarget may be too slow or too large.

5.2. Points of advice

The following points are listed here for convenient reference. Some parts, however, referto concepts developed later.

(1) Avoid theNONHALTING problem, i.e., nonterminating specialisation, by ensuring thatevery static parameter is ofbounded static variation (explained inSection 6.3). If not,your specialiser is certain to fail to terminate when transforming some programs.

(2) If practical, write your interpreter by transliterating a big-step operational semantics(Section 7.3) or a denotational semantics (Section 7.4). This usually gives at leastsemicompositionality (Section 6.4). If you write your interpreter compositionally orsemicompositionally then every syntactic argument (i.e., pieces of the interpreter’sprogram input) will be of bounded static variation.

(3) If your interpreter is not based on a big-step operational or denotational semantics,there is no guarantee that all parameters depending only on static data will be ofbounded static variation. You will thus have to do your own analyses and annotationsto ensure that they are. Bewareof static data values that cangrow unboundedly underdynamic control. An example: the control stackC of Section 8.1.2.

(4) To avoid problems ofQUALITY , e.g., code explosion, ensure that at any program point:• There are no (or as few as possible)independently varying static parameters

(Section 9.1).• There are nodead static variables (Section 9.3).

(5) If your interpreter is based on a small-step operational semantics, then apply thetechniques sketched inSection 7.3.

(6) Do not expectsuperlinear Type 1 speedup (as a function of dynamic input size), unlessthe specialiser used is rather sophisticated (Section10.2.1).

(7) If you wish to achieve superlinear speedup by using an unsophisticated specialiser, thenwrite your interpreter to incorporate statically recognisable optimisation techniquessuchas those seen inSection10.2.2(memoisation, static detection of semanticallydead code etc).

N.D. Jones / Science of Computer Programming 52 (2004) 307–339 319

5.3. Where do interpreters originate?

This question is inevitable if one approaches program transformation by specialising aninterpreter. There are several natural answers:

(1) A definitional interpreter [31] is a program derived from an operationally orienteddefinition of the source language’s semantics [33,40]. Some variants: big-stepoperational semantics;small-step operational semantics; ordenotational semantics.

(2) An interpreter may originate in a run-time execution model, e.g., Java bytecode.(3) An interpreter may originate in an ad hoc execution model, e.g., one used for

instruction. Our first example, the Norma interpreter, is ad hoc.

6. Bounded static variation: the key to specialiser termination

6.1. The termination problem

To understand the problem of specialiser termination, consider interpreterNorma-intwith static program inputprog and dynamic inputx.

Clearly the Norma-int parameterspgtail, dest and prog can all be safelycomputed at specialisation time. Further, the recursive callsrun(rest,...)or jump(cdrprog,...) decrease their static first parameters, so these calls may be unfolded withoutrisk of nontermination. The effect is that all of the instruction dispatch code, and the entireauxiliary functionjump can be “specialised away”, i.e., done at specialisation time. Thisleaves less to do intarget, and so gives a significant speedup. On the other hand, alloperations involving x depend on data not available at specialisation time. These must beperformed by the specialised program.

Parametery is, however, problematic. It is initialised to 0 (the list′()) by thecall fromexecute, andotherwise only tested, incremented or decremented. Thusin principle ally computations could be done at specialisation time. However, in practicethis wouldbe disastrous. The reason: for the example source program computing 2∗ x + 2, theinterpreter’sy variable will take on the infinitely many possible values(), (1), (1 1),. . . .

A solution is to hand annotate Fig. 2, changing its firstline to tell the specialiser thatthey value should be considered as dynamic:

execute(prog, x)= run(prog, prog, x, generalise(′()))

More generally, specialisationcan fail to terminate because of:

(1) an attempt to build aninfinitely large command or expression; or(2) an attempt to build a specialised program containinginfinitely many program points

(labels, defined functions, or procedures;) or(3) aninfinite static loop without generation of specialised code.

The basic cause is the same: the specialiser’sunfolding strategy is the main cause of oneor the other misbehaviour. For theNorma-int example with the suggested unfoldingstrategy, problem 2 arises ify is considered as a static parameter. Such nonterminationmisbehaviour makes a specialiser unsuitable for use by nonexperts, andquite unsuitable for

320 N.D. Jones / Science of Computer Programming 52 (2004) 307–339

automatic use. Further, failure to terminate gives the user very little to go on for discoveringthe problem’s cause, in order to repair it.

Two attitudes have been seen to nonterminating specialisers, and each view can be (andhas been) both attacked and defended. In functional or imperative specialisers, infinitestatic loops are often seen as the user’s fault, so there is no need for the specialiser toaccount for them. The other two problems are usually handled by hand annotations.

Logic programming most often requires that the specialiseralways terminate, even ifthis involves high specialisation times (e.g., online tests for homeomorphic embedding).A reason is that the Prolog’s “negation as finite failure” semantics implies that changingtermination properties can change semantics, and even answer sets.

Recent work on functional program specialisation shows how to carry out a binding-time analysis that is conservative enough to guarantee that specialisation always terminates,and still gives good results when specialising interpreters [19].

6.2. Pachinko, execution and specialisation

Normal execution is given complete input data and deterministically performs a singleexecution. On the other hand, a specialiser is givenincomplete (static) input data; and itmust account forall possible computations that could arise for any dynamic input values.

An analogy. The popular Japanese “Pachinko” entertainment involves steel balls thatfall through a lattice of pins. Deterministic execution corresponds to dropping only oneball (the program input), which thus traverses only one trajectory. On the other hand,specialisation to static inputs corresponds to dropping an infinite set of balls all atonce—those that shares as given static input, but haveany possible dynamic input value.Reasoning about specialisation, e.g., specialised program finiteness or efficiency, thusrequires reasoning about sets of computations that are usually infinite.

6.2.1. Reachable statesLet functionf defined in programp have a number of parameters denoted by “arity(f)”

and let the parameter values range over the setV . Thei th parameter off will be written asf(i).

A state is a pair (f, α) whereα ∈ V arity(f). Notation: αi is the i th component ofparameter tupleα ∈ V arity(f). Thusαi is the value off(i) in state(f, α). State (f, α) canreach state(g, β), written as(f, α) → (g, β), if computation off on parameter valuesαrequires the value ofg on parameter valuesβ.

Example. (f,5) → (f,4) and(f,5) → (f,3) in a program:

f(n) = if n < 2 then n else f(n-1) + f(n-2)

6.3. Definition of bounded static variation

Let “;” be thetuple concatenation operator, supposef1 is the initial (entry) function ofprogramp, and lets, d ∈ N , wheres + d = arity(f1). We assume (for simplicity) thatthe firsts parameters off1 are static, and the remainingd parameters are dynamic. Theset ofstatically reachable states SR(σ ) consists of all states(f, α) reachable in 0, 1, or

N.D. Jones / Science of Computer Programming 52 (2004) 307–339 321

moresteps from some initial call withσ ∈ V s as the static program input, and an arbitrarydynamic inputδ:

SR(σ ) = (f, α) | ∃δ ∈ V d : (f1, σ ; δ) →∗ (f, α).The i th parameterf(i) of f is of bounded static variation, abbreviated to BSV, iff for allσ ∈ V s , the following set is finite:

StatVar(f(i), σ )def= αi | (f, α) ∈ SR(σ ).

This captures the intuitive notion thatf’s i th parameter only varies finitely during allcomputations whose inputs share a fixed static program inputσ , but may have arbitrarydynamic input values. In the interpreterNorma-int, for example, parametersprog,pgtail anddest are of BSV, while neitherx nory are of BSV.

Theorem. If every parameter classified as “static” is of bounded static variation, thenspecialisation can be guaranteed to terminate.

Most BSV parameters are directly computable from the static program inputs, butexceptions do occur. An example expression of BSV, even ifx is dynamic:

if x mod 2 = 0 then ′EVEN else ′ODD.

6.4. Compositionality and semicompositionality of interpreters

Henceforth we call the interpreted source programsource. Classify each functionparameter of an interpreter assyntactic if its values range only over phrases (e.g., names,expressions or commands) from the source language, and otherwise asnonsyntactic.

The interpreter is defined to becompositional if for any function definition, thesyntactic parameters of every recursive subcall areproper substructures of the callingfunction’s syntactic parameters. Clearly in a computation by acompositional interpreterthe syntactic parameters in a function call are necessarily substructures of the interpretedprogramsource, or interpreter constants—and so obviously of BSV.

The compositionality requirement ensures that the interpreter can be specialised withrespect to its static inputsource by simple unfolding—a processguaranteed to terminatesince the “substructure” relation is well founded.

A weaker requirement, also guaranteeing BSV, is for the interpreter to besemicompositional, meaning that called parameters must besubstructures of the originalsource program source, but need not decrease in every call. Semicompositionality canevenallow syntactic parameters to grow; the only requirement is that their value set islimited to range over the set of all substructures ofsource (perhaps including the wholeof source itself).

Some common instances of semicompositionality: the meaning of awhile constructcan be defined in terms of itself, and procedure calls may be elaborated by applyingan execution function to the body of the called procedure. Both would violate strictcompositionality.

Semicompositionality does not guarantee termination at run time; but it does guaranteefiniteness of the set of specialised program points—sotermination of specialisation can be

322 N.D. Jones / Science of Computer Programming 52 (2004) 307–339

ensured by a natural unfolding strategy: unfold, unless this function has been seen beforewith just these static parameter values.

7. Several example interpreters

7.1. Interpreters derived from execution models

Many interpreters simply embody some model of run-time execution, e.g., Pascal’s“stack of activation records” or Algol 60’s “thunk” mechanism. The following is a simpleexample.

7.2. An instructional interpreter

The interpreter ofFig. 4has been used for instruction at DIKU, and derives from lecturenotes used at Edinburgh. Computation is by a linear sequence

(S1,M1,C1) → (S2,M2,C2) → · · ·of states of form(S,M,C). S is a computation stack, C is a control stack containingbits of commands and expressions that remain to be evaluated/executed andmemory M :x0, x1, . . . → Value contains the values of variablesxi . (C is in essence a materialisationin data form ofthe program’scontinuation from the current execution point.)

For readability we describe this interpreter (call itint-instruct) via a set of transitionrules inFig. 4. Note that theC component is neither compositional nor semicompositional.The underlines, e.g.,n, indicate numeric run-time values, andM[xi → n] is a memoryM ′identical toM, except thatM ′(xi) = n.

7.3. Interpreters derived from operational semantics

Operational semantics are easier to program than denotational semantics, sinceemphasis is on judgements formed from first-order objects (although finite functions areallowed). They come in two flavours:big step in which ajudgement may, for example, beof the form

environment Exp ⇒ Value

andsmall step in which ajudgement is of form

environment Exp ⇒ Exp′

that describes expression evaluation by reducing oneexpression repeatedly to another, untila final answer is obtained. In both cases a language definition is a set ofinference rules. Ajudgement is proven to hold by constructing a proof tree with the judgement as root, andwhere each node of the tree is an instance of an inference rule.

Transliterating these two types of operational semantics into program form givesinterpreters of rather different characteristics. Big-step semantics are conceptually nearerdenotational semantics than small-step semantics, since source syntax and programmeanings are clearly separated. Small-step semantics work by symbolic source-to-sourcecode reduction, and so are nearer equational or rewrite theories, for example traditionaltreatments of theλ-calculus.

N.D. Jones / Science of Computer Programming 52 (2004) 307–339 323

Fig. 4. An instructional interpreterint-instruct.

7.3.1. An interpreter derived from a big-step operational semanticsThe interpreterint-big-stepof Fig. 5implements a “big-step” semantics. A big-step

judgementenvironment Exp ⇒ Value is implemented as a functionEval constructingValue from argumentsenvironment andExp. Theenvironment is represented by two values:a list ns of variable names,and a parallel listvs of variable values. The source programhas only one input, but its internal functions may have any number of parameters, as in theinterpretation of(call f es).

7.3.2. Interpreters derived from small-step operational semanticsAt first sight, a small-step operational semantics seems quite unsuitable for

specialisation. A familiar example is the usual definition of reduction in the lambdacalculus. This consists of a few notions of reduction, e.g.,α, β, η, andcontext rules statingthat these can be applied when occurring inside other syntactic constructs. Problems withdirect implementation include:

• Nondeterminism. Oneλ-expression may have many reducible redexes;• Nondirectionality of computation. Rewriting may occur either from left to right or vice

versa (conversion rather than reduction); and

324 N.D. Jones / Science of Computer Programming 52 (2004) 307–339

Fig. 5. The big-step semanticsint-big-step in program form.

• Syntax rewriting. A subject program’s syntax is continually changed in ways that arestatically unpredictable during interpreter specialisation.

N.D. Jones / Science of Computer Programming 52 (2004) 307–339 325

These are also problems for implementations, often resolved as follows:

• Nondeterminism: reduction may be restricted to occur only from the outermostsyntactic operator, using call-by-value,call-by-name, lazy evaluation or some otherfixed strategy;

• Nondirectionality: rewriting occurs only from left to right, until a normal form isobtained (one that cannot be rewritten further).

More generally, nondeterminism is often resolved by defining explicitevaluation contextsC[ ] from [11]. Each is an “expression with a hole in it”, the hole indicated by[ ].Determinism may be achieved by defining evaluation contexts so theunique decompositionproperty holds: any expressionE has at most onedecomposition of formE = C[E ′] whereE ′ is a redex andC[ ] is an evaluation context. Given this, computation proceeds througha series of rewritingsof such configurations.

The problem ofsyntax rewriting causes particular problems for specialisation. Thereason is that rewriting often leads to an infinity of different specialisation-time values.For example, thiscan occur in theλ-calculus ifβ-reduction is implemented by substitutionin an expression with recursion, either implicitly by theY combinator or explicitly by afix construct. The problem may be alleviated by usingclosures [31] instead of rewriting,essentially a “lazy” form ofβ-reduction.

One approach: some small-step semantics have a property that can be thought of asdual semicompositionality: every decompositionE = C[E ′] that occurs in a givencomputation is such thatC[ ] consists ofthe original program with a hole in it, andE ′ is anormal form value, e.g., a number or a closure. Such a dual semicompositional semanticscan usually be specialised well. However, the condition is violated if (for example) thesemantics implements function or procedurecalls by substituting the body of the calledfunction or procedure in place of the call itself. A similar example is seen inSection 8.1.2below.

7.4. Interpreters derived from denotational semantics

A denotational semantics [33,40] consists of a collection of mathematicaldomainequations specifying the partially ordered value sets to which the meanings (denotations)of program pieces and auxiliaries such as environments will belong; and a collection ofsemantic functions mapping program pieces to their meanings in a compositional way(as defined inSection 6.4). In practice a denotational semantics resembles a program ina modern functional language such as ML, Haskell, or Scheme. As a consequence, manydenotational language definitions can simply be transliterated into functional programs.

One problem with a denotational semantics is that it necessarily uses a “universaldomain”, capable of expressingall possible run-time values of all programs that can begiven meaning. This, combined with the requirement of compositionality, often leadsto an (over)abundance of higher-order functions. These two together imply that domaindefinitions very often must be bothrecursive andhigher order, creating both mathematicaland implementational complications.

326 N.D. Jones / Science of Computer Programming 52 (2004) 307–339

8. Analysis of the example interpreters

Some programs are well suited to specialisation, yielding substantial speedups, andothers are not. Given a specialiserspec, abinding-time improvement is the transformationof programp into an equivalentprogramq suchthat[[p]] = [[q]], but specialisation ofq toa static values will do more static computation than specialisation ofp to the sames. Weshow several examples of binding-time improvements of interpreters used for compilationby specialisation.

8.1. Analysis of the instructional interpreter

The interpreterint-instruct of Fig. 4 works quite well for computation and proof,and has been used for many exercises and proofs of program properties in courses such as“Introduction to Semantics”. On the other hand, it does not specialise at all well!

8.1.1. Lack of binding-time separation

Suppose one is given the source programp, i.e., the initial control stack contents areC = p · (), but that the memoryM is unknown, i.e., dynamic. Clearly the value stackSmust also be dynamic, since it receives values fromM as in the second transition rule.But this implies thatanything put into S and then retrieved from it again must also bedynamic. In particular this includes: the indexi of a variable to be assigned; thethenand else branches of anif statement; and the test of awhile statement. Consequentlyall of these become dynamic, and so appear in the specialiser output. In essence, nospecialisation at all occurs, and source code appears in target programs generated byspecialisation.

8.1.1.1. A binding-time improvement. The problem is easy to fix: just push the above-mentioned syntactic entities onto thecontrol stack C insteadof onto S: the indexi of avariableto be assigned; thethen andelse branches of anif statement; and the test of awhile statement. This leads to the interpreter ofFig. 6, again as a set of transition rules.

8.1.1.2. Bounded static variation of the control stack. In this version stackC is built uponly from pieces of the original program. Further, for any given source program,C cantake on only finitely many different possible values. (Even thoughC can grow when awhile command is processed, it is easy to see that it cannot grow unboundedly.) ThusCmay safely be annotated as “static”, so source code fragments will not appear in specialisedprograms.

This version specialises much better. The effect is to yield a target program consistingof tests for realising transitions from one(S,M) pair to another—in essence with onetransition (more or less) for each point in the original program.

8.1.2. Extension to include procedures

Interpretation of a program with parameterless procedure calls is straightforward. LetPdefs be a list of mutually recursive procedure definitions with entryP : p if procedurePhas commandp as its body.

N.D. Jones / Science of Computer Programming 52 (2004) 307–339 327

Fig. 6. An instructional interpreter withmodifications .

A procedure call is handled by looking up the procedure’s name inPdefs, and replacingthecall by the body of the called procedure.

8.1.2.1. Rule for procedure calls

〈S,M, (call P) · C〉 ⇒ 〈S,M, p · C〉 if Pdefs containsP : p.

Even though this looks quite innocent, it causes control stackC no longer to be of boundedstatic variation. To see why, consider a recursive definition of the factorial function:

Procedure Fac:

if N = 0 then Result := 1

else N := N-1; call Fac; N := N+1; Result := N * ResultThe control stack’s depth will beproportional to the initial value ofN, which is dynamic!The result: an unpleasant choice between infinite specialisation on the one hand (for anyrecursive procedure); and, on the other hand, classifyingC as dynamic, which will resultin the existence of source codeat run time, and very little speedup.

328 N.D. Jones / Science of Computer Programming 52 (2004) 307–339

A solution using “the trick”.7

Add a fourth componentR, a “return stack” (initially empty), so a state becomes〈S,M,C, R〉. The rules given in Fig. 6 are used as is, except that an unchangedRcomponent is added to each rule. Procedure calls and returns are treated as follows (inplace of the rule above):

8.1.2.2. Rule for procedure calls

〈S,M, (call P) · C, R〉 ⇒ 〈S,M, p · return, [C] · R〉 if Pdefs containsP : p.

8.1.2.3. Rule for procedure returns

〈S,M, return, [C] · R〉 ⇒ 〈S,M,C, R〉.The control stackC is now of bounded static variation, as it contains only commands in theoriginal program (from the main program, or from a procedure body followed by the tokenreturn). The return stackR is not of BSV since its depth is not statically bounded; but italways has the form[C1] · · · · [Ck] where theCi are commands in the original program orprocedure bodies. This can be exploited as follows.

Program the extendedFig. 6 so the interpreter code for thereturn transition pops thetop [C] from R, and comparesC against each of the commandsC1, . . . ,Cn found in theprogram immediately aftera procedure call. Once someCi = C is found (as it must be),the interpreter continues to apply the transitions ofFig. 6usingCi as the new control stack.

The point of programming the interpreter this way is thatCi comes from the programbeing interpretedand so is a static value. Consequently the interpreter’s control stack willbe static, and will not appear in any specialised program.

The specialised code forreturn will compare the topR element with the sourceprogram commands that follow procedure calls, branching on equality to the specialisedcode for the selectedCi . The effect is that source text is used at run time only to implementthe return.

The exact form of the source textCi from R as used in these comparisons is clearlyirrelevant, since its only function is to determine where, in the target program, control isto be transferred—and so eachR element can be replaced by atoken, one for each sourceprogram command following a procedure call. Such tokens are in effect return addresses.Moreover, the effect of the comparisons and branches could in principle be realised inconstant time, by an indirect jump.8

Interestingly, use of “the trick” led to something close to a procedure execution devicethat is widely used in pragmatic compiler construction.

8.2. Analysis of the “big-step” interpreter

Interpreterint-big-step from Fig. 5specialises well. Even though not compositional(as defined inSection 7.4), it is semicompositional since the parameterse, ns andpgm of

7 This is a programming device (described in [10,20]) for making static otherwise dynamic values, providedthey are known to be of bounded static variation.

8 Although logical, I know of no specialiser that uses this device.

N.D. Jones / Science of Computer Programming 52 (2004) 307–339 329

Fig. 7. The addition of LET.

Fig. 8. Dynamic name binding.

Eval are always substructures of the original source program, and so are static. In this caserecursion causes no problems: there is no explicit stack, and nothing can grow unboundedlyas in the extended instructional interpreter, or as in many small-step semantics.

A nonsemicompositional extensionExtend interpreterint-big-step of Fig. 5 to accept a “LET” construction. This

modification is easy to do, by adding an extra CASE as inFig. 7. With this change,parameterns can take on values that arenot substructures of pgm. Parameterns can grow(and without limit) since more deeply nested levels oflet expressions will give rise tolonger listsns. Nonetheless, parameterns is of bounded static variation.

Why? Interpretation of a function call will resetns to getparams(f, pgm), butthis always yields a value that is part ofpgm. Further,ns only grows when e shrinks,so for any fixed source program,ns can only increase by the number of nestedletexpressions inpgm. Thus parameterns cannot grow unboundedly as a function of thesize of int-big-step’s static inputpgm, and so is of bounded static variation.

ConclusionThe extended interpreterint-big-stepwill specialise well.

8.2.1. Dynamic bindingSuppose the “function call” construction in interpreterint-big-step of Fig. 5 were

implemented differently, as inFig. 8. Thiscausesns to beof unbounded static variation.Intuitively, the reason is that the length ofns is no longer tied to any syntactic property ofpgm.

Interestingly, the dynamic name binding in the call has an indirect effect onspecialisation ofEval: in order to avoid nontermination, the interpreter parameterns mustbe classified as dynamic. The consequence is (as in Lisp) that target programs obtained by

330 N.D. Jones / Science of Computer Programming 52 (2004) 307–339

Fig. 9. The binary search algorithm.

specialisingint-big-stepwill contain parameternames as well as their values, and callsto functionlookparamswill involve arun-time loop every time a source language variableis accessed, substantially increasing target program running times and space usage.

9. On code explosion

TheQUALITY problem can be significant when specialising an interpreter whose staticparameters are all of bounded static variation. Before showing an interpreter that suffersfrom code explosion, we look ata simple noninterpretive program to illustrate the root ofthe problem.

9.1. Synchronicity in static parameter variation

9.1.1. A good example: binary searchThe problem is efficiently to search a table whose size is statically known. Specialisation

transforms generic table lookup code into more efficient code.Programp of Fig. 9performs binary search for goal parameterx in an ordered (increas-

ing) global tableT0, . . . , Tn wheren = 2m − 1. Parameteri points to a position in thetable, anddelta is the size of the portion of the table in whichx is currently being sought.

If the table contains eight elements thenn = 8 andSearch(8, x) callsFind(T, 0,4, x). We begin by assuming thatdelta is classified as static and thati is dynamic.Specialising with respect to static initialdelta = 23−1 = 4 and dynamici gives programp8 of Fig. 10.

In generalpn runs intime O(log(n)), with a better constant coefficient than the originalgeneral program. Moreover, it has sizeO(log(n))—acceptable for all but extremely largetables.

9.1.2. A bad example: binary searchTo illustrate the code explosion problem, consider the binary search program above,

again with knownn = n. One may in principle also classify parameteri as static, since itsstatic variation is the range 0,1, . . . , n − 1. In the resulting program, however, the test onx affects the value of statici. A perhapsunexpected result iscode size explosion.

N.D. Jones / Science of Computer Programming 52 (2004) 307–339 331

Fig. 10. A small program obtained by specialising binary search.

Fig. 11. A large program obtained by specialising binary search.

Specialisation with respect to static initialdelta = 4 andi = 0 now gives the programin Fig. 11. The specialised program again runs in timeO(log(n)), and with a slightly betterconstant coefficient than above. On the other hand, it has sizeO(n)—exponentially largerthan the previous version! This is too large tobe of practical use, except for small tables.

The problem cause is that parametersi anddelta are independently varying staticparameters. A specialiser must account forall possible combinations of their values—inthis case,n − 1 combinations. This leads to specialised programs of sizeO(n) ratherthanO(log(n)).

9.2. An unfortunate programming style

The interpreter sketched inFig. 12 implements a tiny imperative language.Unfortunately it does not specialise at all well due to code explosion similar to that ofthe binary search program. First, note thatExpression, Names andProgram are all ofBSV and socan all be classified as “static”.

332 N.D. Jones / Science of Computer Programming 52 (2004) 307–339

Fig. 12. An interpreter for a simple imperative language.

This interpreter uses a quite natural idea: if a variable has not already been bound inthe store, then it is added, with initial value zero. This creates no problems when runningthe interpreter. However, this seemingly innocent trick can cause catastrophically largetarget programs to be generated when specialising the interpreter to a source program.The reason is thatspec must take account ofall execution possibilities (the “Pachinkosyndrome”).

For an example, consider the source program ofFig. 13. After test1 is processed atspecialisation time,spec has to allow for either branch to have been taken, so the resultingvalue ofns could be either[] or [X1]. After the second testX2 may be either presentor absent, giving four possibilities:ns = [], ns = [X1], ns = [X2], or ns = [X1,X2].There are thus 2n possible values forns when -- some code -- is encountered, sospecialisation will generate a target program whose size is (quite unnecessarily)Ω(2n).

The problem is easily fixed, by revising the interpreter first to scan the entire sourceprogram, collecting an initial store in which every variable is bound to zero. The result will

N.D. Jones / Science of Computer Programming 52 (2004) 307–339 333

Fig. 13. A source program in the simple imperative language.

yield a target program whose size is proportional to that of the source program. A furtheroptimisation now becomes possible: the type ofExec can be simplified to the following,with correspondingly smaller target code:

Exec : Command x Names x Values x Program -> Values

9.3. Dead static parameters

This is yet another manifestation of the same problem: a combinatorial explosion in theset of static values.

A parameter issemantically dead at a program point if changes to its value cannot affectthe output of the current program. Syntactic approximations to this property are widelyused in optimising compilers, especially to minimise register usage.

Dead variable detection can be even more important in program specialisation than incompilation. The size of specialised programps critically depends on the number of itsspecialised program points: each has form(s1,...,sk) where is one ofp’s program points,and(s1, . . . , sk) is a tuple of values of the static parameters ofp whose scope includespoint.

Thus ifp has a dead static parameterX, the size ofps may be multiplied by the numberof static values thatX can assume. This will increase the number of specialised programpoints in(s1,...,sk ), even though the computations byps will not be changed in any way.

Lesson. A specialiser can dramatically reduce the size of the specialised program bydetecting the subject program’s dead static parameters before specialisation time, and thenmaking them dynamic. My first experience with this problem, for a simple imperativelanguage (see [20], Section 4.9.2), was that not accounting for dead static variables led toa specialised program 500 times larger than necessary(!).

10. On speedup

Since a major motivation for program specialisation is speedup, we now examine morecarefully the question of how much speedup can be obtained.

10.1. Speedup by change of evaluation algorithm

An interpreter has the freedom to execute its source program in any way it wishes,as long as the same input–output function is computed. This can be exploited to giveasymptotically superlinear Type 2 speedups.

334 N.D. Jones / Science of Computer Programming 52 (2004) 307–339

10.1.1. Selective memoisationA common example from the program transformation literature [1,5,25] is the use of

memo tables: a function value, once computed, is cached so a later function call with thesame arguments can be replaced by a table search. While memoisation can give dramaticspeedups, it can also use large amounts of memory, and is only of benefit when the samefunction is in fact frequently called with identical arguments.

Let interp be a self-interpreter for languageL. A natural source language extensionis to annotate function calls and return values in a program with remarks saying where tosave values in a memo table, and where to search a memo table. For example, consider anannotated Fibonacci program:

f(n)= save (if n <= 1 then n else f(n-1)+ srch f(n-2))

An interpreter could, on encounteringsrch f(exp), evaluateexp to valuev and thensearch thef-memo table. If a pair(v,w) is found, then returnw without callingf, elseevaluate its body as usual. The effect of annotationf(n) = save result is to ensure thatonceresult has been evaluated, itsvalue is saved in thef-memo table. With such anevaluation strategy, the Fibonacci programruns in linear time with a table size linear inn (assuming constant-time memo table access). The effect of specialisation is to compilethe program into target code that accesses memo tables or stores into them only wherespecified by the user.

It would be desirable to have a program analysis to determine which functions may have“overlap”, i.e., be called repeatedly with the same arguments in a computation. This couldbe used to decide which annotations to place where.

The ‘tupling” transformation is one way to achieve the effect of memoisationpurely functionally, without having a memo table. Progress towards automating thistransformation is seen in [6]. Liu and Stoller have a semi-automated methodology toexploit memoisation opportunities [25], and Acar et al. use a modal type system to analysedata flow for memoisation purposes [1].

10.1.2. Limits to Type 2 speedupsA memoising interpreter is impressive becauseits effect can be to interpret programs

faster than they actually run. On the other hand, there are some mathematical limits:if a problem has lower-bound exponential complexity (for example), then no programcan solve it essentially faster, even if memoisation or other sophisticated evaluationmechanisms are used.

10.2. Target speed in relation to interpreter speed

We begin with a resume (and improvement) of the argument in [3,20] that Type 1speedup is at most a linear multiplicative factor for the usual specialisation algorithms.A subtle point, though, is that the size of the linear coefficient can depend on the values ofthe static inputs.

An example. A naive string matcher taking time O(|pattern| · |subject|) can be specialisedto the pattern to yield a matcher with run timeO(|subject|) with coefficient independent

N.D. Jones / Science of Computer Programming 52 (2004) 307–339 335

of the pattern [2,7]. In this case, larger static data leads to larger speedups as a function ofthe size of the dynamic input; butfor each value of the static input, the speedup is linear.

The argument below also applies todeforestation andsupercompilation.

10.2.1. Relating specialised and original computationsConsider a given programp, and static and dynamic inputss, d. Let ps be the result

of specialising p to s. A first observation (on all functional or imperative programspecialisers I have seen): there is an order-preservinginjective mapping ψ

• from theoperations done in the computation of[[ps]](d)• to theoperations done in the original computation of[[p]](s, d).The reason is that specialisers normally just sort computations byp into static

operations: those done during specialisation, anddynamic operations: those done by thespecialised program, for which code is generated. Most specialisers do not rearrange theorder in which computation is performed, or invent new operations not inp, so specialisedcomputations can be embedded 1–1 into original ones.

10.2.1.1. A second observation, in the opposite direction. Some operations done in thecomputation of[[p]](s, d) can be forced to be specialised because they use parts ofd,which is unavailable at specialisation time. We use the termforced dynamic for any suchoperation.

Since a forced dynamic operation cannot be done byspec, specialised code must beexecuted to realise its effect. It holds for most program specialisers that there exists asurjective order-preserving mapping φ from

• the forced dynamic operations in the computation of[[p]](s, d), onto• the corresponding operations in the computation of[[ps]](d).Fig. 14shows these correspondences between the original and specialised computations.

10.2.1.2. Why specialisers give at most linear speedup. Let time-forcep(s, d) be thenumber of forced dynamic operations done while computing[[p]](s, d). Sinceall of thesemust be performed by the specialised program, we get a lower bound on its running time:

time-forcep(s, d) ≤ timeps(d). (8)

All other p operations can be performed without knowingd: they depend only ons. Themaximum number of such operations performed by the specialiser9 is thus somefunctionf (s) of the static program input. Thus at mostf (s) static operations occur between anytwo forced dynamic operations, yielding

timep(s, d) ≤ (1 + f (s)) · time-forcep(s, d). (9)

Combining these, we see thattimep(s, d) ≤ (1 + f (s)) · timeps(d), so

speedups(d) = timep(s, d)

timeps(d)≤ 1 + f (s). (10)

9 Assuming, of course, that specialisation terminates.

336 N.D. Jones / Science of Computer Programming 52 (2004) 307–339

Fig. 14. Correspondences between original and specialised computations.

This observation explains the fact that, in general, program specialisers yield at most linearspeedup.

10.2.2. Ways to break the linear speedup barrierThe argument just given rests on the existence of an order-preserving “onto” mapping of

forced operations in original computations to specialised ones. This is not always the case,and specialisers working differently can obtain superlinear speedup. Three cases follow:

10.2.2.1. Dead code elimination. If spec can examine its specialised program anddiscover “semantically dead” code whose execution does not influence the program’s finalresults, such code can be removed. If it contains dynamic operations, then there will bedynamic[[p]](s, d) operations not corresponding to any[[ps]](d) operations atall. In thissituation, between the twop operations corresponding to twops operations there can occuranumber ofp operationsdepending on dynamic input that were removed by the dead codeelimination technique.

10.2.2.2. Specialisation-time detection of dead branches. A logic programming analogyis specialisation-time detection of branches in the Prolog’sSLD computation tree that areguaranteed to fail (and may depend on dynamic data). This is much more significant than infunctional programming, where time-consuming sequences of useless operations are mostlikely a sign of bad programming—whereas the ability to prune useless branches of searchtrees is part of the essence of logic programming.

Removing repeated subcomputations is another way to achieve superlinear speedup, lesstrivial than just eliding useless computations [1,5,25].

11. Conclusions

Given an interpreterinterp, any programsource may be transformed by computing

target = [[spec]](interp, source)

N.D. Jones / Science of Computer Programming 52 (2004) 307–339 337

This approach is appealing because of its top-down nature: rather than considering onetransformation of oneprogram at a time, a global transformation effect is obtained by thedevice of writing an appropriate interpreter.

The main challenges to overcome to use the approach in practice are:

• to write interp so it realises, while executingsource, the optimisations the userwishes to obtain (style change, memoisation, instrumentation etc.); and

• to write interp so as manyoperations as possible are given static binding times, sotarget will be as efficient as possible.

Once an interpreterinterp has been written to allow good binding-time separation, thebinding-time analysis and the specialisationphases in modern specialisers are completelyautomatic. Further, the“generating extension”interp-gen (explained in detail in [20]),allowstarget to be directly constructed fromsource, without having to run a slow andgeneral-purpose specialiser.

Acknowledgements

Thanks are due to the referees for constructive comments; to many in the Topps groupat DIKU, in particular Arne Glenstrup, A.-F. Le Meur and students in the 2003 courseon Program Transformation; and to participants in the 1996 Partial Evaluation Dagstuhlmeeting and the 1998 Partial Evaluation Summer School, in particular Olivier Danvy,Torben Mogensen, Robert Gl¨uck and Peter Thiemann.

This research was partially supported by the DanishDART andPLI projects.

References

[1] U.A.A. Acar, G.E. Blelloch, R. Harper, Selective memoization, in: Proceedings of the 30th ACMSIGPLAN-SIGACT Symposium on Principles of Programming Languages, ACM Press, 2003, pp. 14–25.

[2] M.S. Ager, O. Danvy, H.K. Rohde, Fast partial evaluation of pattern matching in strings, in: Proceedingsof the 2003 ACM SIGPLAN Workshop on Partial Evaluation and Semantics-Based Program Manipulation,ACM Press, 2003, pp. 3–9.

[3] L.O. Andersen, C.K. Gomard, Speedup analysis in partial evaluation (preliminary results), in: PartialEvaluation and Semantics-Based Program Manipulation,New Haven, Connecticut (Sigplan Notices, vol. 26,no. 9, September 1991), 1992, pp. 1–7.

[4] K. Asai, S. Matsuoka, A. Yonezawa, Duplication andpartial evaluation—for a better understanding ofreflective languages—, Lisp and Symbolic Computation 9 (1996) 203–241.

[5] R.S. Bird, Improving programs by the introduction of recursion, Communications of the ACM 20 (11)(1977) 856–863.

[6] W.-N. Chin, S.-C. Khoo, P. Thiemann, Synchronization analyses for multiplerecursion parameters(extended abstract), in: Selected Papers from the International Seminar on Partial Evaluation, Springer-Verlag, 1996, pp. 33–53.

[7] C. Consel, O. Danvy, Partial evaluation of pattern matching in strings, Information Processing Letters 30(2) (1989) 79–86.

[8] C. Consel, J.L. Lawall, A.-F. Le Meur, A tour of Tempo: a program specializer for the C language, Scienceof Computer Programming (this issue). doi:10.1016/j.scico.2004.03.011.

[9] O. Danvy, R. Gluck, P. Thiemann, Partial Evaluation. Dagstuhl castle, Germany, Lecture Notes in ComputerScience, vol. 1110, Springer-Verlag, 1996.

[10] O. Danvy, K. Malmkjær, J. Palsberg, Eta-expansion does the trick, ACM Transactions on ProgrammingLanguages and Systems 18 (6) (1996) 730–751.

338 N.D. Jones / Science of Computer Programming 52 (2004) 307–339

[11] M. Felleisen, R. Hieb, The revised report on the syntactic theories of sequential control and state, TheoreticalComputer Science 103 (2) (1992) 235–271.

[12] Y. Futamura, Partial evaluation of computation process—an approach to a compiler–compiler, Higher-Orderand Symbolic Computation 12 (4) (1999) 381–391; Systems, Computers, Controls 2 (5) (1971) 45–50(reprinted).

[13] J. Gallagher, Transforming logic programs by specialising interpreters, in: Proceedings of the 7th EuropeanConference on Artificial Intelligence, ECAI-86, Brighton, 1986, pp. 109–122.

[14] J. Gallagher, M. Codish, E. Shapiro, Specialisation of Prolog and FCP programs using abstractinterpretation, New Generation Computing 6 (2–3) (1988) 159–186.

[15] A.J. Glenstrup, H. Makholm, J.P. Secher, C-MIX: Specialization of C programs, in: Partial Evaluation—Practice and Theory, DIKU 1998 International Summer School, Springer-Verlag, 1999, pp. 108–154.

[16] R. Gluck, J. Jørgensen, Generating transformers for deforestation and supercompilation, in: B. Le Charlier(Ed.), Static Analysis, Lecture Notes in Computer Science, vol. 864, Springer-Verlag, 1994, pp. 432–448.

[17] J. Hatcliff, T. Mogensen, P. Thiemann, Partial Evaluation—Practice and Theory, DIKU 1998 InternationalSummer School, Lecture Notes in ComputerScience, vol. 1706, Springer-Verlag, 1999.

[18] N.D. Jones, What not to do when writing an interpreter for specialisation, in: Selected Papers from theInternational Seminar on Partial Evaluation, Springer-Verlag, 1996, pp. 216–237.

[19] N.D. Jones, A.J. Glenstrup, Program generation, termination, and binding-time analysis, in: The ACMSIGPLAN/SIGSOFT Conference on Generative Programming and Component Engineering, Springer-Verlag, 2002, pp. 1–31.

[20] N.D. Jones, C.K. Gomard, P. Sestoft, Partial Evaluation and Automatic Program Generation, Prentice-Hall,Inc., 1993.

[21] J. Jørgensen, Compiler generation by partial evaluation, Master’s Thesis, DIKU, University of Copenhagen,Denmark, 1991.

[22] J. Jørgensen, SIMILIX: A self-applicable partial evaluator for Scheme, in: Partial Evaluation—Practice andTheory, DIKU 1998 International Summer School, Springer-Verlag, 1999, pp. 83–107.

[23] M. Leuschel, Advanced logic program specialisation, in: Partial Evaluation—Practice and Theory, DIKU1998 International Summer School, Springer-Verlag, 1999, pp. 271–292.

[24] M. Leuschel, Logic program specialisation, in: Partial Evaluation—Practice and Theory, DIKU, 1998,International Summer School, Springer-Verlag, 1999, pp. 155–188.

[25] Y.A. Liu, S.D. Stoller, Dynamic programming via static incrementalization, Higher Order and SymbolicComputation 16 (1–2) (2003) 37–62.

[26] J.W. Lloyd, J.C. Shepherdson, Partial evaluation in logic programming, Journal of Logic Programming 11(3–4) (1991) 217–242.

[27] H. Masuhara, G. Kiczales, C. Dutchyn, A compilation and optimization model for aspect-oriented programs,in: Proceedings of 12th International Conferenceon Compiler Construction, CC2003, Lecture Notes inComputer Science, vol. 2622, 2003, pp. 46–60.

[28] T. Mogensen, Inherited limits, in: J. Hatcliff, T. Mogensen, P. Thiemann (Eds.), Partial Evaluation: Practiceand Theory. Proceedings of the 1998 DIKU International Summerschool, Lecture Notes in ComputerScience, vol. 1706, Springer-Verlag, 1998, p. 202.

[29] G. Muller, J.L. Lawall, S. Thibault, R.E.V. Jensen, A domain-specific language approach to programmablenetworks, Transactions on Systems,Man and Cybernetics 33 (3) (2003).

[30] C. Pu, A. Black, C. Cowan, J. Walpole, C. Consel, Microlanguages for operating system specialization,in: 1st ACM-SIGPLAN Workshop on Domain-Specific Languages, Computer Science Technical Report,University of Illinois at Urbana-Champaign, 1997.

[31] J.C. Reynolds, Definitional interpreters for higher-order programming languages, Higher-Order andSymbolic Computation 11 (4) (1998) 363–397.

[32] D. Sahlin, Mixtus: an automatic partial evaluatorfor full Prolog, New Generation Computing 12 (1) (1993)7–51.

[33] D.A. Schmidt, Denotational Semantics: A Methodology for Language Development, McGraw-HillProfessional, 1987.

[34] M. Sperber, R. Gl¨uck, P. Thiemann, Bootstrapping higher-order program transformers from interpreters,in: K.M. George, J.H. Carroll, D. Oppenheim, J. Hightower (Eds.), Proceedings of the 1996 ACMSymposium on Applied Computing, ACM Press, 1996, pp. 408–413.

N.D. Jones / Science of Computer Programming 52 (2004) 307–339 339

[35] S. Thibault, C. Consel, J.L. Lawall, R. Marlet, G. Muller, Static and dynamic program compilation byinterpreter specialization, hosc 13 (3) (2000) 161–178.

[36] S. Thibault, R. Marlet, C. Consel, Domain-specific languages: from design to implementation—applicationto video device driver generation, IEEE Transactions on Software Engineering (TSE) 25 (3) (2003) 363–377.

[37] P. Thiemann, Aspects of the PGG system: Specialization for standard scheme, in: Partial Evaluation—Practice and Theory, DIKU 1998, International Summer School, Springer-Verlag, 1999, pp. 412–432.

[38] V.F. Turchin, The concept of a supercompiler, ACM Transactions on Programming Languages and Systems(TOPLAS) 8 (3) (1986) 292–325.

[39] A. van Deursen, P. Klint, J. Visser, Domain-specific languages: an annotated bibliography, ACM SIGPLANNotices 35 (6) (2000) 26–36.

[40] G. Winskel, The Formal Semantics of Programming Languages, The MIT Press, 1993.