Implementing SASL using Categorical Multi-combinators

29
SOFTWARE-PRACTICE AND EXPERIENCE, VOL. 20(0), 000-000 ( 1990) Implementing SASL using Categorical Multi-Combinators RAFAEL D.LINS Dept. de Inform´ atica, Universidade Federal de Pernambuco, 50.739, Recife, Brazil AND SIMON J.THOMPSON Computing Laboratory, The University of Kent, CT2 7NF, Canterbury, England SUMMARY Categorical multi-combinators form a rewriting system developed with the aim of providing efficient implementations of lazy functional languages. The core of the system of categorical multi-combinators consists of only two rewriting laws with a very low pattern-matching complexity. This system allows the equivalent of several -reductions to be performed at once, and avoids the generation of trivially reducible sub-expressions. In this paper we present a method of introducing algebraic data-types and local recursion to categorical multi-combinators which is both efficient and in harmony with the original system. We also show how to compile a subset of SASL into categorical combinators. Some implementation issues are also addressed here. The performance of implementations of categorical multi-combinator SASL machines is analysed here and compared with other implementations of functional languages. KEY WORDS Functional languages Categorical multi-combinators Abstract machines INTRODUCTION Functional programs consist of definitions of functions and other objects. The execution of a program in a functional language consists of the evaluation of an expression, and we can see this as proceeding by successive rewriting of an expression until it takes printable (or normal) form. (A normal form is one that cannot be rewritten further.) For example, if we say fac n = n * fac(n-1) , n>0 (1) =1 , otherwise then fac 2 is rewritten thus 0038-0644/90/000000-00$05.00 Received 28 September 1987 c 1990 by John Wiley & Sons, Ltd. Revised 18 May 1988 and 24 April 1990 1

Transcript of Implementing SASL using Categorical Multi-combinators

SOFTWARE-PRACTICE AND EXPERIENCE, VOL. 20(0), 000-000 ( 1990)

Implementing SASL usingCategorical Multi-Combinators

RAFAEL D.LINS

Dept. de Informatica, Universidade Federal de Pernambuco, 50.739, Recife, Brazil

AND

SIMON J.THOMPSON

Computing Laboratory, The University of Kent, CT2 7NF, Canterbury, England

SUMMARY

Categorical multi-combinators form a rewriting system developed with the aim of providing efficientimplementations of lazy functional languages. The core of the system of categorical multi-combinatorsconsists of only two rewriting laws with a very low pattern-matching complexity. This system allows theequivalent of several�-reductions to be performed at once, and avoids the generation of trivially reduciblesub-expressions. In this paper we present a method of introducing algebraic data-types and local recursionto categorical multi-combinators which is both efficient and in harmony with the original system. We alsoshow how to compile a subset of SASL into categorical combinators. Some implementation issues are alsoaddressed here. The performance of implementations of categorical multi-combinator SASL machines isanalysed here and compared with other implementations of functional languages.

KEY WORDS Functional languages Categorical multi-combinators Abstract machines

INTRODUCTION

Functional programs consist of definitions of functions andother objects. The execution of aprogram in a functional language consists of the evaluationof an expression, and we can seethis as proceeding by successive rewriting of an expressionuntil it takes printable (or normal)form. (A normal form is one that cannot be rewritten further.) For example, if we say

fac n = n * fac(n-1) , n>0 (1)= 1 , otherwise

then

fac 2

is rewritten thus

0038-0644/90/000000-00$05.00 Received 28 September 1987c 1990 by John Wiley & Sons, Ltd. Revised 18 May 1988 and 24 April 1990

1

2 R.D.LINS AND S.J.THOMPSON

fac 2 ) 2 � fac (2� 1) (2)

) 2 � fac 1

) 2 � (1� fac (1� 1))

) 2 � (1� fac 0)

) 2 � (1� 1)

) 2 � 1

) 2

The definition (1) is used in the first line of (2) where we have to substitute theactualvalue 2 for the variablen - this process of parameter passing forms the major overheadinrewriting implementations of functional languages.

The method of compilation into combinators, first explored by Turner in Reference1,provides a way of avoiding the problem by removing the variables from a program,transforming it into an applicative combination of constant functions orcombinators. Turnerused a set of combinators based on combinatory logic of Curry. This is a formal theoryof function definition and application.2Another theory of functions is provided by categorytheory,3and we can see the notation used herein as providing an alternative set of combinators.

Categorical combinators form a formal system similar to combinatory Logic. The originalsystem was developed by Curien4 inspired by the equivalence of the theories of typed�-calculus and cartesian closed categories as shown by Lambek3 and Scott.5An approach to theexecution of categorical combinators which uses a stack machine is described in Reference 6.

Aiming to implement lazy functional languages in an efficient way using categoricalcombinators we developed a new system of categorical combinators, called simplifiedcategorical combinators.7 This system is based on the original system by Curien, but hastwo advantages of giving a linear relationship between the size of a lambda-expression andits categorical combinator equivalent (the relationship in the original system is quadratic inthe worst case as shown in Reference 8),and of providing a very simple and small set ofrewriting rules to execute the code. Simulations9 showed us that an implementation based onsimplified categorical combinators was at least an order of magnitude more efficient than theoriginal system of categorical combinators and presented time and space complexity of thesame order of magnitude as Turner’s combinators.1

Two optimizations were introduced in References 9 and 7 to simplified categoricalcombinators forming a new rewriting system which was calledlinear categorical combinators.These modifications reduce the number of rewriting laws and increase the efficiency of thesystem by reducing the number of rewriting steps involved intaking an expression to normalform. These optimizations keep the complexity of the pattern-matching algorithm unchanged.

Categorical multi-combinators are a generalization of theconcepts of linear categoricalcombinators. Each rewriting step of the multi-combinator code is equivalent to severalrewritings of linear categorical combinators. The core of the system of categorical multi-

IMPLEMENTING SASL USING CATEGORICAL MULTI-COMBINATORS3

combinators consists only of two rewriting laws with a very low pattern-matching complexityand avoids the generation of trivially reducible sub-expressions.

A work which has similar aims to the system of categorical multi-combinators is Hughes’system of supercombinators.10 Both systems have the power to perform the equivalent ofseveral�-reductions in a single step and in both of them an expressionneeds to have allits arguments present before evaluation. There are, however, differences between these twosystems. categorical multi-combinators work with a fixed set of combinators which allowsus the possibility of a hardware implementation. Supercombinators are generated duringthe compilation process and do not give us this flexibility. The compilation algorithm forcategorical multi-combinators is extremely simple and generates expressions with size linearto the source code. Supercombinators use a more complex compilation algorithm due to thenecessity of detecting maximal free expressions.10 (m.f.e’s) in the code. The supercombinatortranslation of a program of sizeN has sizeO(N logN) (in the worst case). On the otherhand, supercombinators are fully lazy.10 This means that any sub-expression will be reducedat most once. Categorical multi-combinators are not fully lazy. If in our code we have ashared ocurrence of a partial application of a function thissub-expression cannot be evaluatedbefore being copied and therefore it may be evaluated more than once.

In this paper we present a method of introducing algebraic data-types and local recursionto categorical multi-combinators, and we also address someimplementation issues. Newtypes of cells are introduced into the categorical multi-combinator graph reduction machineas presented in Reference 11 to make the graph of expressionsmore compact as well as todecrease the complexity of pattern-matching. The dynamicsof this rewriting system areanalysed and further simplifications introduced.

SASL12 (St.Andrews Static Language) is a fully lazy weakly typed functional language.It has a very simple syntax. In this paper we will present a translation of a subset of SASLinto categorical multi-combinators. The performance of implementations of two interpretedcategorical multi-combinatorSASL machines is analysed and compared with Simon Croft’s13

implementation of KRC14 which uses an extended set of Turner’s combinators.Compiled versions of functional languages run much faster at von Neumann machines than

interpreted ones.15 Although we address problems related to the interpretationof categoricalmulti-combinators we also compare the performance of thesemachine with GMC16, acompiled categorical multi-combinator machine.

CATEGORICAL COMBINATORS AS A REWRITING SYSTEM

As we mentioned in the introduction, the essence of combinator compilation is to removevariables. One means for doing this was introduced by de Bruijn.17 Categorical combinatorsuse DeBruijn’s representation for variables. In DeBruijn notation for the�-calculus a variableis replaced by the number of�s in the parse tree lying between it and the� to which it isbound. The compilation algorithm for categorical combinators from DeBruijn’s�-calculusas introduced by Curien is:

4 R.D.LINS AND S.J.THOMPSON

[[�:a]] ! ([[a]])

[[ab]] ! App � h[[a]]; [[b]]i

[[0]] ! Snd

[[n]] ! Snd � Fst

n

if n � 1 ;where Fst

1

= Fst andFst

n+1

= Fst � Fst

n

[[c]] ! (Fst)c ;where c is a constant

Curien14 chooses particular orientations of the axioms and deduces equations of a cartesianclosed category;3; different selections of them will generate several different rewriting systemsfor reducing the code generated by the compilation algorithm above. The system which hecallsCCL

uses the following laws, and simulates�-calculus�-reduction by a sequence ofelementary reduction steps of rewritings on the categorical code.

(r.1) (x � y) � z ) x � (y � z)

(r.2) Id � x) x

(r.3) x � Id ) x

(r.4) Fst � hx; yi ) x

(r.5) Snd� hx; yi ) y

(r.6) hx; yi � z ) hx � z; y � zi

(r.7) App � h(x); yi ) x � hId ; yi

(r.8) (x) � y ) (x � hy � Fst;Sndi)

A rewriting system is locally confluent18 if and only if for all termsM, N, andP, if P reducesin one step toM andP also reduces in one step toN there is then some termQ such that bothM andN reduce toQ in a finite sequence of steps. The system above is not locally confluentas shown by Curien.4

In Reference 7 we have added to this rewriting system the following law, which cansimulate a sequence of rewritings performed using the laws above:

App � h(x) � y; zi ) x � hy; zi

In the same paper we have also introduced in a new compilationalgorithm for categoricalcombinators from DeBruijn�-calculus. This algorithm is based on the original one withsome simplifications concerning the treatment of variables, constants and applications. Thefollowing algorithm exhibits a linear relationship between the size of source and compiled

IMPLEMENTING SASL USING CATEGORICAL MULTI-COMBINATORS5

codes, while the original one, as was shown in Reference 8, has a relationship which isquadratic in the worst case.

[[�:a]] ! ([[a]])

[[ab]] ! < [[a]]; [[b]] >

[[n]] ! n ; where n is a variable

[[c]] ! c ; where c is a constant

The rewriting strategy we have adopted is leftmost-outermost, i.e. we search the syntaxtree for the outermost pattern matching the left-hand side of any of the rewriting rules. Ifseveral exist at the same level, we choose the leftmost. Whenwe find it, this pattern willbe rewritten and rewriting will resume from the outermost level of the new expression. Wehave also analysed8 the dynamics of the rewriting system of categorical combinators so asto avoid moving to the outermost level of an expression aftereach rewriting. The minimalset of rewriting rules to execute by a leftmost-outermost strategy the code generated by thecompilation algorithm above is presented in Reference 7, asfollows

(R.1) n � hx; yi ) (n� 1) � x; if n > 0

(R.2) 0 � hx; yi ) y

(R.3) <x; y > � z ) <x � z; y � z >

(R.4) < (x); y >) x � hId ; yi

(R.5) < (x) � y; z >) x � hy; zi

(R.6) c � x) c ; where c is a constant

The rewriting system which comprises the set of rules above is called simplified categoricalcombinators.7 In that paper we have proved that leftmost-outermost reduction of simplifiedcategorical combinators ‘mimics’ leftmost-outermost reduction of�-terms, which is knownto be a normalising strategy, i.e. a strategy which takes terms to normal form whenever theyexist. Simulations9 show that simplified categorical combinators present a space and timecomplexity of at least one order of magnitude better than Curien’sCCL

, and are similar inperformance to Turner’s combinators.

The direct compilation of�-expressions into simplified categorical combinators can beperformed by the following algorithm,

(t .1) [�x

i

:a] = (R

x

i

0 a)

(t .2) [a b] = < [a]; [b]>

(t .3) [c] = c, where c is a constant

R

x

i

:::x

j

n

i

:::n

j

is the replacementfunction. It will recursively perform the replacement of allocurrences ofx

i

in a. The replacement function is defined as:

6 R.D.LINS AND S.J.THOMPSON

(t .4) R

x

i

;:::;x

j

n

i

;:::;n

j

�x

k

:a = (R

x

i

;:::x

j

;x

k

n

i

+1;:::;nj

+1;0a)

(t .5) R

x

i

:::x

j

n

i

:::n

j

a b = <(R

x

i

:::x

j

n

i

:::n

j

a); (R

x

i

:::x

j

n

i

:::n

j

b)>

(t .6) R

x

i

:::x

j

n

i

:::n

j

b =

(

b ; if b is a constant

n

k

; if b = x

k

If whenever applying rule t.6 above a variableb can be associated with more than onex

k

thenone must choose the minimum correspondingn

k

. In doing so we preserve locality of bindingbecause a greatern

k

means that a more internal binder is connected to the variablexk

.Two optimizations have been introduced to simplified categorical combinators. In

Reference 9 we show that rule (R.4) above is a simplified case of rule (R.5). At a cost of asimple compilation artifice, theId-complementation of all the highest-level -terms, we canremove rule (R.4) from the rewriting system above. To incorporateId-complementation tothe compilation algorithm above we need only to modify rule (t.1) to

(t’ .1) [�x

i

:a] = (R

x

i

0 a) � Id

The analysis of the behaviour of the categorical code allowsus to unify rules (R.1) and (R.2)in a schema representing an infinite set of rules (which is presented as rule (L.1) below).These modifications not only reduce the number of rewriting laws but reduce the number ofrewriting steps involved in taking an expression to normal form, keeping the complexity ofthe pattern matching algorithm unchanged. The optimizations above formed a new rewritingsystem, called linear categorical combinators, which usesthe following rewriting laws:

(L.1) n � hh� � � hx

m+1; xmi; � � �i; x1i; x0i ) x

n

(L.2) <x; y > � z ) <x � z; y � z >

(L.3) <(x) � y; z >) x � hy; zi

(L.4) c � x) c ; where c is a constant

The system above served as a basis for the development of categorical multi-combinators.

CATEGORICAL MULTI-COMBINATORS

Categorical multi-combinators are a generalization of linear categorical combinators. Thecode for a�-expression compiled into categorical multi-combinatorsis more compact thanits linear categorical combinators equivalent. Categorical multi-combinators reductions areof a coarser degree of computation than linear categorical combinators. Each rewritingstep of the multi-combinator code is equivalent to several rewritings of linear categoricalcombinators. The core of the system of categorical multi-combinators consists only of tworewriting laws with a very low pattern-matching complexityand avoids the generation oftrivially reducible sub-expressions. In categorical multi-combinators, function application isdenoted by juxtaposition. We take juxtaposition to be left-associative.

The compilation algorithm11for translating�-expressions intocategorical multi-combinatorsis:

IMPLEMENTING SASL USING CATEGORICAL MULTI-COMBINATORS7

(T .1) [�x

i

: : :�x

j

| {z }

n

:a] = L

n�1(R

x

i

;:::;x

j

n�1;:::;0a)

(T .2) [a : : :b] = [a] : : : [b]

(T .3) [c] = c, where c is a constant

(T .4) R

x

i

:::x

j

n

i

:::n

j

�x

k

: : :�x

l

| {z }

m

:a = L

m�1(R

x

i

;:::;x

j

;x

k

;:::;x

l

n

i

+m;:::;n

j

+m;m�1;:::;0a)

(T .5) R

x

i

:::x

j

n

i

:::n

j

(a : : :b) = (R

x

i

:::x

j

n

i

:::n

j

a) : : :(R

x

i

:::x

j

n

i

:::n

j

b)

(T .6) R

x

i

:::x

j

n

i

:::n

j

b =

(

b ; if b is a constant

n

k

; if b = x

k

Again, if whenever applying rule T.6 above a variableb can be associated with more than onex

k

then one must choose the minimum correspondentn

k

. There follows an example of thetranslation of a�-expression into categorical multi-combinators using thealgorithm above:

[�x�y(y�z(zz))]

T:1= (L

1(R

x;y

1;0y �z(zz)))

T:5= (L

1((R

x;y

1;0y) (Rx;y

1;0�z(zz))))

T:6= (L

1(0 (R

x;y

1;0�z(zz))))

T:4= (L

1(0L0

(R

x;y;z

2;1;0(zz))))

T:5= (L

1(0L0

((R

x;y;z

2;1;0z) (Rx;y;z

2;1;0z))))

T:6= (L

1(0L0

(0 (R

x;y;z

2;1;0z))))

T:6= (L

1(0L0

(0 0)))

The core of the categorical multi-combinator machine enriched with arithmetic operations isexpressed by the following rewriting laws,

(M.1) � (x0x1x2 : : :xn) (P y

m

: : : y1y0)) (x

0

0x0

1 : : : x0

n

);

where

8

>

<

>

:

x

0

i

= x

i

if x

i

is a constant or of the form L

a

(b)

= y

k

if x

i

is a variable k

= � x

i

(P y

m

: : : y1y0); otherwise

(M.2) L

n

(y) x0x1 � � �xnxn+1 � � �xz

8

>

<

>

:

) y x

n+1 : : :xz if y is a constant

) x

n�y

x

n+1 : : : xz if y is a variable

) ( � y (P x0 � � �xn))xn+1 � � �xz ; otherwise

(+) +xy ) x+ y

(=) Cond x m n

(

) m; if x = True

) n

8 R.D.LINS AND S.J.THOMPSON

Consider an example of the execution of a ‘program’ using categorical multi-combinators.The�-expression

(�x�y:y(�z:z x) x) 2 +

translates into categorical multi-combinators using the compilation algorithm above giving

L

1(0(L0

(0)1)1) 2 +

and using the laws above this expression can be rewritten

M:2) � (0(L0

(0)1)1) (P 2 +)

M:1) + ( � (L

0(0)1) (P 2 +)) 2

M:1) + (L

0(0) 2) 2

M:2) + 2 2

+

) 4

As one can observe in the sequence of reductions above the categorical multi-combinatorcode suffers a ‘metamorphosis’ under rewriting. The application of rule (M.2) changes thestructure of the code and generates an ‘environment’, in which to each variable there isassociated a (local) value. This ‘evaluation environment’is distributed through the body ofthe multi-abstraction (Ln) and then variables fetch their value by successive application ofrule (M.1).

INTRODUCING NEW DATA TYPES

A simple way of introducing a new data type in a programming language is to declare a termalgebra over already-defined types. In a term algebra there are no associated laws, such as alaw associating a list with its reverse list, or a tree with its mirror image. Two elements ofsuch a data type are equal if and only if they are constructed in exactly the same way. Thisway of introducing new data types using term algebras is employed in Miranda1 as describedin Reference 19; according to Turner it has a long and respectable history. Because of itssyntactic elegance and simplicity we adopt the Miranda syntax for introducing algebraic datatypes throughout this section.

The primitive or basic types in Miranda are:num, char, and bool. The typenumcomprises both integers and floating point numbers. As elements of the typeboolwe have thetruth valuesTrueandFalse. The typechar comprises the characters of the ASCII characterset. In-built in Miranda there are also more complex data type constructors such as ‘list’ and‘tuple’.

A new algebraic type in Miranda is introduced by an equation using the symbol ‘::=’. Forinstance the type ‘tree’ can be introduced by the declaration of the type constructors as

tree ::= Leaf num j Node num tree tree

1Miranda is a Trademark of Research Software Ltd.

IMPLEMENTING SASL USING CATEGORICAL MULTI-COMBINATORS9

The tree-constructorsLeaf and Node in the example above have typesnum ! tree and((num ! tree)! tree)! tree respectively. (The! operator denotes curryfication andis left-associative). In Miranda the definition mechanism uses pattern matching on theconstructors, so that there is no need to introduce names forthe recognizers (asisnodeorisleaf) or selectors in the declaration of functions over algebraic data types. For instance, wecan define a function which gives the ‘weight’ of a tree as

ssum (Leaf n) = n

ssum (Node n t1 t2 ) = (ssum t1 ) + (ssum t2 ) + n

The type system of Miranda subsumes Pascal enumerated types, records, variant records,and some uses of pointers which are involved in the construction of recursive dynamic datastructures. We will assume we have as primitive types the same primitive types as Miranda,i.e. num, boolandchar.

INTRODUCING A DATA-BLOCK COMBINATOR

Consider the example we introduced in the last section,

tree ::= Leaf num j Node num tree tree

During compilation this information will be used for type-checking. The domain sum(+)

constructor, expressed above as ‘j’, during type-checking plays the role of the logical operator‘or’, and states that an object of typetree is formed either by the constructorLeafor by theconstructorNode. The cartesian product operator(�) works as a primitive constructor fornew types. For instance, the constructorsLeafandNodepresented above can be read

fLeaf g � num and fNodeg � num� tree � tree

We can say that the cartesian product will build ‘blocks’ of data with name (or tag)Leaf andNode.

We can see the laws in the core of the categorical multi-combinatormachine as forming anatomic structure to which extensions should be made in a modular fashion, without interferingwith the basic behaviour of the system. We will explore the intrinsic characteristics of thissystem in order to introduce algebraic data types in a compact and efficient way. We willintroduce a data block constructor ‘D’, which stands for a nested sequence of pairs forming aunique data structure. Each instance of a constructor will form a data block. A constructorname will be interpreted as a constant. For instance, if we have

tt = (Node 1 (Leaf 5 ) (Node 2 (Leaf 6 ) (Leaf 7 )))

compilation into categorical multi-combinators will give,

tt = (D Node 1 (D Leaf 5 ) (D Node 2 (D Leaf 6 ) (D Leaf 7 )))

10 R.D.LINS AND S.J.THOMPSON

The data-block combinator and the multi-pair combinator are two instances of the cartesianproduct, which play completely different roles in the system of categorical combinators.The multi-pair combinator forms ‘evaluation environments’, while the data-block combinatorforms ‘data structures’. The syntactic distinction between them is convenient, becausewe extend the original system in a modular fashion without interfering with its normalisingproperties. The relationship between the data-block combinator ‘D’ and function composition(‘ � ’) is given by the rule

( � (D x0x1 : : :xn) z)) D ( � x0 z) ( � x1 z) : : :( � xn z)

The rule above distributes an environment throughout a datastructure. A trivially reduciblesub-expression is a sub-expression which can be reduced whenever it appears in an expressionwithout interfering with the normalising properties of a rewriting system. In general, thegeneration of trivially reducible sub-expressions represents waste of time and space. To avoidthe generation of trivially reducible sub-expressions this rewriting law can be expressed in asimilar fashion to rule (M.1) above, i.e.

(D.1) � (D x0x1 : : : xn) (P y

m

: : : y1y0)) (D x

0

0x0

1 : : :x0

n

);

where

8

>

<

>

:

x

0

i

= x

i

; if x

i

is a constant or of the form L

a

(b)

= y

k

; if x

i

is a variable k

= � x

i

(P y

m

: : : y1y0); otherwise

Instead of using nested sequences of theHd and Tl combinators as selectors we willintroduce a multi-selector combinator,Sn, which will fetch arguments inside a data-blockconstructor via application. The multi-selector combinator is defined as

(S) Sn

(D x0x1 : : : xi)) x

n

We can observe that the laws introduced above do not interfere with the original structureof the rewriting system of categorical multi-combinators.These new rules are not going toaffect the original set of laws of categorical multi-combinators because we do not add anycritical pair to the system.18

Functions over data-types

Functions overalgebraic data types in categorical multi-combinators will fetch as argumentthe whole data structure, which will be broken apart by the application of the selectors. Let usreconsider the example we examined above, in which we introduced a new data typetreeas

tree ::= Leaf num j Node num tree tree

and a function over treesssumas,

ssum (Leaf n) = nssum (Node n t1 t2) = ssum t1 + ssum t2 + n

IMPLEMENTING SASL USING CATEGORICAL MULTI-COMBINATORS11

of type

ssum :: tree! num

The constructorsLeafandNodein the example above will work, formatters of compilation,as parametric instances of a ‘tag’ variable. The implicit selections in the function above willbe translated into applications of a selectorS

n, wheren is the depth of the selected item inthe data-block. The case switch implicit in the pattern match is translated into a conditionaltest on the tag variable. The example above is equivalent to the following�-expression

ssum � �a:(Cond Leaf (S

0a))(S

1a)(+(+(ssum(S

2a))(ssum(S

3a)))(S

1a))

and this translates into categorical multi-combinators as

ssum � L

0((Cond Leaf (S

00))(S10)(+(+(ssum(S

20))(ssum(S

30)))(S10)))

The categorical multi-combinator expression

ssum tt

will recursively evaluate the weight of the treett presented above. The program we havepresented above yields a result of basic type. Consider the case of a function returning aresult of algebraic type; for instance, a function to reflecta tree is given by

re ect (Leaf n) = (Leaf n)

re ect (Node n t1 t2 ) = Node n (re ect t2 ) (re ect t1 )

The evaluation is done by first decomposing algebraic type expressions and then reassemblingthe data item again. The function above will be translated into categorical combinators as

re ect = L

0(isleaf (D Leaf (S

1 0))) (D Node (S

1 0) (re ect (S3 0)) (re ect (S2 0)))

where

isleaf = Cond Leaf (S

0 0)

In order to present the result of the evaluation of the categorical programs involving algebraicdata types in a readable way we need to unparse the results to ‘hide’ the data-block constructor.

Implementing lists

Using the notation we have introduced for the implementation of algebraic data types,lists can be implemented as a primitive type in a very naturalway. TheHead andTailfunctions over lists will be translated respectively into theS0 andS1 combinators. As anoptimization we shall not declare the constructor name ‘list’, and a list will be representedonly by nesting the data-block combinator in a right-associative way. The empty list([ ])

12 R.D.LINS AND S.J.THOMPSON

can be regarded as a constant in the categorical language. The relationship between this newcombinator([ ]) and the selectorSn is

S

n

[ ]) ‘error ”

in which we can see that decomposing an empty list will generate an error message and abortthe evaluation process. Consider an example of an evaluation involving a simple list.

x = [2; 4; 6;8]

will be translated into categorical multi-combinators, and stored as

[[x]] = D 2 (D 4 (D 6 (D 8 [ ])))

The programHead (Tail(Tail(x)))will be translated and executed as follows,

= S

0(S

1(S

1(D 2 (D 4 (D 6 (D 8 [ ]))))))

= S

0(S

1(D 4 (D 6 (D 8 [ ]))))

= S

0(D 6 (D 8 [ ]))

= 6

We use similar techniques for functions of algebraic type over lists. For example, consideringthe concatenation of two lists[1; 2; 3] and[4; 5], we have

concat [1; 2; 3] [4; 5] = [1; 2; 3;4;5]

(this function is equivalent to the infix operator++ in Miranda and SASL). The functionabove is defined by

concat [ ] b = b

concat a b = D (Head a) (concat (Tail a) b)

which translates into categorical combinators as

concat � L

1((Cond [ ] 1) 0 (D (S

0(1)) (concat (S1

(1)) 0)))

Lazy lists

One of the advantages of lazy evaluation is that it provides the power for the representationof infinite data objects such as infinite lists. In Miranda thelist

[1; 3::]

when evaluated will generate the infinite list of odd numbers

[1; 3; 5; 7; 9; 11;13;15;17;19;21; : : :

IMPLEMENTING SASL USING CATEGORICAL MULTI-COMBINATORS13

In the general case a list[a; b::] can be introduced into categorical multi-combinators as arecursive function

ilist (D a b) = D a (D b (ilist (D ((2� b)� a) ((3� b)� (2� a)))))

which is translated into categorical combinators as

ilist � L

0(D (S

0 0) (D (S

1 0) (ilist nlist)))

nlist = D (+ (S

1 0) (b� a)) (+ (S

1 0) (� 2 (b� a)))

Implementing tuples

One special kind of algebraic data type in Miranda is calledtuple. In a strong typedlanguage all the elements of a list have the same type. In a tuple one can have elements ofmixed types. An instance of a tuple is

(‘Russell” ; 1910; ‘Principia Mathematica” ;True)

The implementationof tuples in categorical combinators can be done using the same apparatuswe used for implementing lists. In reality there is no difference between tuples and listsat execution time. Special attention must be taken to distinguish between the printingmechanisms for tuples and lists, however. We use the same data-block constructorD tointroduce tuples into categorical multi-combinators. Thetuple presented above is representedin categorical multi-combinators as

D ‘Russell” 1910 ‘Principia Mathematica” True

Functions over tuples work in a similar fashion to functionson lists.The strategy for the introduction of algebraic data types into categorical multi-combinators

presented in this section is extremely simple, compatible with the original system, and neitheralter the complexity of pattern-matching nor its normalising properties.

A SUBSET OF SASL

In the next section of this paper we show how to translate a subset of SASL into categoricalmulti-combinators. In this section we address the two features of SASL we do not treat inthis paper, which are ZF-expressions and pattern-matching. We show how these aspects ofthe language can be translated into our limited language now.

ZF-expressions

ZF-expressions were not in the original version of SASL but have been borrowed from thelater language KRC14 and incorporated in the version of this language described in Reference12. The general form of a ZF-expression in SASL is

fexp ; quali�ersg

14 R.D.LINS AND S.J.THOMPSON

where there can be any positive number of qualifiers. Each qualifier is either a generator ora filter. A generator is a list of values which will be attributed to a specific variable in theexpressionexp. A filter is an arbitrary boolean expression used to restrictfurther the range ofa generator.

ZF-expressions are a convenient and elegant syntactic ‘sugaring’ of a class of expressions,but they do not bring any gain in terms of computational power. For instance, the list ofall the squares of the integer numbers between 1 and 10 can be expressed in SASL by thefollowing ZF-expression:

{ x*x; x <- (1..10)}

In our subset of SASL the same expression can be expressed as

list [1..10]where

list x = (x = [ ]) -> [ ]; a*a : list ywherea = Hd xy = Tl x

ZF-expressions can be seen as an optional feature which can be easily added to the languageby the implementor. No extensions need to be made to the core of the categorical multi-combinator machine to cope with them.

Pattern-matching

Pattern-matching is another interesting simplification tothe syntax of expressions whichwill be neglected in our implementation of SASL. SASL was oneof the first programminglanguages to incorporate pattern-matching. Using pattern-matching one can place guards onthe left hand side of a function definition. For instance, thefactorial function (using SASLsyntax) can be defined as,

fac n = (n=0) -> 1; n * fac(n-1)

Using pattern-matching the function definition above can beexpressed more neatly as

fac 0 = 1fac n = n * fac(n-1)

In the general case pattern-matching can become quite complex. SASL allows pattern-matching on lists and list elements. As we noted above, Miranda goes much further andallows pattern-matching on algebraic data-types in general. In these cases pattern-matchingnot only gives guards as presented in the example above but also selectors which willdecompose members of algebraic data-types. The efficient compilation of pattern-matchingin general is discussed in Reference 20.

IMPLEMENTING SASL USING CATEGORICAL MULTI-COMBINATORS15

The implementation of pattern-matching in our subset of SASL may necessitate theextension of the categorical multi-combinator machine. Turner has introduced two newcombinators for implementing pattern-matching:TRY andMATCH . The inclusion of thesenew combinators in categorical multi-combinators can be made without interfering with thenormalising properties of the original system.

A grammar for SASL

The SASL12 language manual presents the grammar for SASL. Below, we present in herea grammar for our subset of SASL which was obtained from the original one by suppressingthe parts relating to ZF-expressions and pattern-matching.

hsystemi ::= hfuncdefsi hcommandsi

hcommandsi ::= hcommandsi hcommandi

hcommandi ::= hevaluationi j OFF

hfuncdefsi ::= DEF hdefsi

hevaluationi ::= hexpsi?

hexpsi ::= hexpsiWHERE hdefsi j hcondexpri

hcondexpri ::= hopexpri� > hcondexpri; hcondexpri j hlistexpri

hlistexpri ::= hopexpri; : : : ; hopexpri j hopexpri; j hopexpri

hopexpri ::= hpre�xihopexpri j hopexprihin�xihopexpri j hcombi

hcombi ::= hcombihsimplei j hsimplei

hsimplei ::= hnamei j hconstanti j (hexpri)

hdefsi ::= hclausei; hdefsijhclausei

hclausei ::= hnamelisti = hexpri

hnamelisti ::= hnamelisti hnamei j hnamei

hconstanti ::= hnumerali j hcharconsti j hboolconsti j [ ] j hstringi

hnumerali ::= hrealihscale � factor i j � hrealihscale � factori

hreali ::= hdigiti � j hdigiti � :hdigiti �

hscale � factor i ::= ehdigiti � j e� hdigit�i

hboolconsti ::= TRUE j FALSE

hcharconsti ::= %hany char i j SP j NL j NP j TAB

hstringi ::= ‘hany message not containing quotesi”

hnamei ::= hany chari � hdigiti�

A list of operators in order of increasing binding power is asfollows:

: + + �� in�x (right associative)

:: in�x (non � associative)

16 R.D.LINS AND S.J.THOMPSON

::: post�x

& in�x

� pre�x

>> > >= = �= <= < << in�x

+ � in�x

+ � pre�x

� = DIV REM in�x

�� in�x(right associative)

# pre�x

: in�x

The first three clauses of the grammar reflect the dynamics of the system. A call to thesystem entails the compilation of a set of definitions held ina file, followed by the executionof a series of expressions which are read from the standard input. A valid script (i.e. list ofdefinitions) in our version of SASL is, for instance,

DEFfac n = (n=0) -> 1; n * fac (n-1)twice f x = f (f x)

This script is held in a file, saydefine. To invoke the SASL system using the script aboveone uses the command:

sasl define

The system compiles the definitions within the file into graphs of categorical multi-combinators. Once this has been completed and no syntax errors have been encountered inthe definitions held in the file, the user is prompted

SASL:

From this point onwards the system expects a series of expressions. Each expression must befollowed by a ‘?’. For example,

SASL: twice fac 3 ?

The expression above is then executed providing the result of the evaluated expressionfollowed by a new prompt:

720SASL:

The system is terminated when anOFF token is received. The remaining clauses of thegrammar describe the legal syntax of a SASL script.

IMPLEMENTING SASL USING CATEGORICAL MULTI-COMBINATORS17

COMPILING SASL INTO MULTI-COMBINATORS

In this section we present the compilation algorithm for oursubset of SASL without localdefinitions, which will be introduced in the next section.

The core rules

The compilation of simple functions involving arithmetic operations and conditionalexpressions is performed by the following rules. The symbol� labels the expression on itsright hand side with the identifier on its left hand side, andin

op

andpreop

stand respectivelyfor the infix and prefix arithmetic and boolean operators presented in the last section.

(t.1) [[a = b]]) a � [[b]]

(t.2) [[f x

i

: : : x

j

| {z }

n

= y]] ) f � L

n�1(R

x

i

;:::;x

j

n�1;:::;0y)

(t.3) [[a : : :b]]) [[a]] : : : [[b]]

(t.4) [[a�> b; c]]) Cond [[a]] [[b]] [[c]]

(t.5) [[a hin

op

i b]]) hin

op

i [[a]] [[b]]

(t.6) [[hpre

op

i a b]]) hpre

op

i [[a]] [[b]]

(t.7) [[� a]])� [[a]]

(t.8) [[c]]) c, where c is a constant or an identifier

(t.9) R

x

i

:::x

j

n

i

:::n

j

a : : :b) (R

x

i

:::x

j

n

i

:::n

j

a) : : :(R

x

i

:::x

j

n

i

:::n

j

b)

(t.10) Rx

i

:::x

j

n

i

:::n

j

(a�> b; c)) Cond (R

x

i

:::x

j

n

i

:::n

j

a) (R

x

i

:::x

j

n

i

:::n

j

b) (R

x

i

:::x

j

n

i

:::n

j

c)

(t.11) Rx

i

:::x

j

n

i

:::n

j

(ahin

op

ib)) hin

op

i(R

x

i

:::x

j

n

i

:::n

j

a)(R

x

i

:::x

j

n

i

:::n

j

b)

(t.12) Rx

i

:::x

j

n

i

:::n

j

(hpre

op

ia b)) hpre

op

i (R

x

i

:::x

j

n

i

:::n

j

a) (R

x

i

:::x

j

n

i

:::n

j

b)

(t.13) Rx

i

:::x

j

n

i

:::n

j

(� a))� (R

x

i

:::x

j

n

i

:::n

j

a)

(t.14) Rx

i

:::x

j

n

i

:::n

j

b)

(

b ; if b is a constant or an identi�er

n

k

; if b = x

k

If we compare the set of rules above with the set presented forthe compilation of�-expressions. we can see that rule (T.4) is not present here. The reason for this is discussedin Reference 11 and has to do with the formation of internal redexes in the compilationprocess. For sake of simplicity and efficiency we have imposed the condition that there are noexplicit abstraction operations in our system. The remaining translation rules above extendthe original translation rules to work with the grammar for the subset of SASL presented inthe last section.

18 R.D.LINS AND S.J.THOMPSON

Consider an example of compilation of a function. The factorial function can be definedin SASL by

fac n = (n=0) -> 1; n * fac (n-1)

This expression translates into categorical multi-combinators using the set of rules above,where bolface type denotes constants.

[[fac n = (n = 0)�i1; n � fac(n� 1)]]

t:2) fac � (L

0(R

n

0((n = 0)�i1; n � fac(n� 1))))

t:10) fac � (L

0(Cond (R

n

0(n = 0)) (R

n

01) (Rn

0(n � fac(n� 1)))))

t:5) fac � (L

0(Cond (= (R

n

0n) (Rn

00)) (Rn

01) (Rn

0(n � fac(n� 1)))))

t:14) fac � (L

0(Cond (= 0 (R

n

00) (Rn

01)) (Rn

0(n � fac(n� 1)))))

t:14) fac � (L

0(Cond (= 0 0) (Rn

01) (Rn

0(n � fac(n� 1)))))

t:14) fac � (L

0(Cond (= 0 0) 1 (R

n

0(n � fac(n� 1)))))

t:5) fac � (L

0(Cond (= 0 0) 1 (� (R

n

0n) (Rn

0(fac(n� 1))))))

t:14) fac � (L

0(Cond (= 0 0) 1 (� 0 (R

n

0(fac(n� 1))))))

t:9) fac � (L

0(Cond (= 0 0) 1 (� 0 ((R

n

0 fac) (Rn

0(n� 1))))))

t:14) fac � (L

0(Cond (= 0 0) 1 (� 0 (fac (R

n

0(n� 1))))))

t:11) fac � (L

0(Cond (= 0 0) 1 (� 0 (fac (� (R

n

0n) (Rn

01))))))

t:14) fac � (L

0(Cond (= 0 0) 1 (� 0 (fac (� 0 (R

n

01))))))

t:14) fac � (L

0(Cond (= 0 0) 1 (� 0 (fac (� 0 1)))))

Compiling lists

The compilation of lists and operators over lists is performed by the application of thefollowing rules

(l.1) [[[ ]]]) [ ]

(l.2) [[Hd]]) S

0

(l.3) [[Tl]]) S

1

(l.4) [[a : b]]) D [[a]] [[b]]

(l.5) R

x

i

:::x

j

n

i

:::n

j

[ ] ) [ ]

(l.6) R

x

i

:::x

j

n

i

:::n

j

Hd) S

0

(l.7) R

x

i

:::x

j

n

i

:::n

j

Tl) S

1

IMPLEMENTING SASL USING CATEGORICAL MULTI-COMBINATORS19

(l.8) R

x

i

:::x

j

n

i

:::n

j

(a : b)) D (R

x

i

:::x

j

n

i

:::n

j

a)(R

x

i

:::x

j

n

i

:::n

j

b)

The compilation of list operators is done by replacing theseoperators by the compiled codeof their definitions, as follows:

(l.9) [[a + + b]]) [[concat a b]] ;where

concat a b = (a = [ ])� > b; (Hd a : (concat (Tl a) b))

(l.10) [[a � � b]]) [[del a b]] ;where

del a b = (a = [ ])� > b; ((b= [ ])� > a; (((Hd a) = (Hd b))

� > (del(Tl a) (Tl b));(del (Tl a) ((Hd b) : (del (Hd a) (Tl b))))))

(l.11) [[[a::b]]]) [[blist a b]] ;where

blist a b = (a > b)� > [ ]; (a : blist (a+ 1) b)

(l.12) [[[a:::]]]) [[ilist a]] ;where

ilist a = (a : ilist (a+ 1))

(l.13) Rx

i

:::x

j

n

i

:::n

j

(a + + b)) [[concat a b]]

(l.14) Rx

i

:::x

j

n

i

:::n

j

(a � � b)) [[del a b]]

(l.15) Rx

i

:::x

j

n

i

:::n

j

[a::b]) [[blist a b]]

(l.16) Rx

i

:::x

j

n

i

:::n

j

[a:::]) [[ilist a]]

The set of compilation rules presented above was obtained bythe direct application ofthe original compilation algorithm for�-expressions to the syntax adopted for our subset ofSASL.

Now we present an example of compilation of a function involving lists. We can define afunction which reflects a list in our subset of SASL as

ref n = (n=[ ]) -> [ ]; (ref (Tl n)) : (Hd n)

This function translates into categorical multi-combinators as

[[ref n = (n = [ ])�i[ ]; (ref (Tl n)) : (Hd n)]]

t:2) ref � L

0(R

n

0((n = [ ])�i[ ]; (ref (Tl n)) : (Hd n)))

t:10) ref � L

0(Cond (R

n

0(n = [ ])) (R

n

0 [ ]) (Rn

0(ref (Tl n)) : (Hd n)))

t:11) ref � L

0(Cond (= (R

n

0n) (Rn

0 [ ])) (Rn

0 [ ]) (Rn

0(ref (Tl n)) : (Hd n)))

t:14) ref � L

0(Cond (= 0 (R

n

0 [ ])) (Rn

0 [ ]) (Rn

0(ref (Tl n)) : (Hd n)))

20 R.D.LINS AND S.J.THOMPSON

l:5) ref � L

0(Cond (= 0 [ ]) (R

n

0 [ ]) (Rn

0(ref (Tl n)) : (Hd n)))

l:5) ref � L

0(Cond (= 0 [ ]) [ ] (R

n

0(ref (Tl n)) : (Hd n)))

l:8) ref � L

0(Cond (= 0 [ ]) [ ] (D (R

n

0 (ref (Tl n))) (Rn

0(Hd n))))

t:9) ref � L

0(Cond (= 0 [ ]) [ ] (D ((R

n

0ref ) (Rn

0((Tl n))) (Rn

0 (Hd n))))

t:14) ref � L

0(Cond (= 0 [ ]) [ ] (D (ref (R

n

0(Tl n))) (Rn

0 (Hd n))))

t:9) ref � L

0(Cond (= 0 [ ]) [ ] (D (ref ((R

n

0 Tl) (Rn

0 n))) (R

n

0(Hd n))))

l:7) ref � L

0(Cond (= 0 [ ]) [ ] (D (ref (S

1(R

n

0n))) (Rn

0(Hd n))))

t:14) ref � L

0(Cond (= 0 [ ]) [ ] (D (ref (S

10)) (Rn

0(Hd n))))

t:9) ref � L

0(Cond (= 0 [ ]) [ ] (D (ref (S

1 0)) ((Rn

0Hd) (R

n

0n))))

l:6) ref � L

0(Cond (= 0 [ ]) [ ] (D (ref (S

1 0) (S0(R

n

0n)))))

t:14) ref � L

0(Cond (= 0 [ ]) [ ] (D (ref (S

1 0)) (S0 0)))

INTRODUCING LOCAL DEFINITIONS

As we mentioned in the previous section, a feature of SASL we do treat is that of localdefinitions. We have already seen these in use, introduced bythe keywordwhere. The generalform of a definition will be:

f x0 : : : xn = e0 where

a

1v

11 : : :v

1p

= d

1

...a

m

v

m

1 : : : v

m

q

= d

m

The definitions of the functionsa1; � � � ; a

m within thewhereblock will be mutually recursive,in general. Their scope is restricted to the expressione0 (and to the right-hand sides oftheir definitions, as they are recursive). Local definitionsare used both to define auxilliaryfunctions and to hold values of intermediate computations which may be referenced a numberof times in the expressione0.

The definitions ofa1; � � � ; a

m may themselves contain local definitions: as is customary,a local definition will obscure a more global one.

Our basic strategy is to constitute a block whose componentsform the compiled versionsof the local definitions. This block will be passed as an argument to the compiled code fore0 and references to functionsa1

; � � � ; a

m will be compiled into selectors for members of thisblock.

The definition of the code for the block will itself be recursive (we exhibit an examplein the appendix). This self-referencing code will be transformed into a cyclic graph. Thismethod was first used by Turner1 to compile uses of the fixed-point combinatorY .

IMPLEMENTING SASL USING CATEGORICAL MULTI-COMBINATORS21

Our compilation algorithm is presented as follows,

(R.1)2

6

6

6

6

4

2

6

6

6

6

4

e0xn : : :x0 = e1 where

a

1v

11 : : : v

1p

= d

1

...a

m

v

m

1 : : :v

m

q

= d

m

3

7

7

7

7

5

3

7

7

7

7

5

) e0 � L

n

(R

x

n

:::x0n:::0 (I

T

a

1:::a

m

0:::m�1 e1))

whereT a1:::am� D (L

n+p

(R

x

n

:::x0;v11:::v

1p

n+p:::0 (I

T

a

1:::a

m

0:::m�1 d

1))n � � �0) � � �

(L

n+q

(R

x

n

:::x0;vm

1 :::v

m

q

n+q:::0 (I

T

a

1:::a

m

0:::m�1 d

m

))n � � �0)

(R.2)

R

y

i

:::y

j

n:::0 (I

T

a1:::am

0:::m�1

0

B

B

B

B

@

e2 where

b

1w

11 : : :w

1p

= g

1

...b

f

w

f

1 : : :wf

q

= g

f

1

C

C

C

C

A

) ) R

y

i

:::y

j

n:::0 (I

T

a

1:::a

m

;b

1:::b

f

0:::m+f�1 e1)

whereT a

1:::a

m

;b

1:::b

f

� T

a

1:::a

m

++(D (L

n+p

(R

y

i

:::y

j

;w

11:::w

1p

n+f:::0 (I

T

a

1:::a

m

;b

1:::b

f

0:::m+f�1 g

1))n � � �0) � � �

(L

n+q

(R

y

i

:::y

j

;w

m

1 :::w

m

q

n+f�1:::0 (I

T

a

1:::a

m

;b

1:::b

f

0:::m+f�1 g

f

))n � � �0))

T

a1:::am is the ‘context’ of the where block formed by definitionsa1 toam. ++ is the operator

which performs the concatenation of data-blocks.I

T

a

1:::a

m

0:::m+f�1 is the identifier operator . It

works in a similar way to the replacement operatorR

y

i

:::y

j

n:::0 . The identifier operator is definedas follows,

(I.1) I

T

x

i

:::x

j

n

i

:::n

j

a : : :b = (I

T

x

i

:::x

j

n

i

:::n

j

a) : : :(I

T

x

i

:::x

j

n

i

:::n

j

b)

(I.2) I

T

x

i

:::x

j

n

i

:::n

j

b =

(

b ; if b is a constant or a variable

S

n

k

(T

x

i

:::x

j

) ; if b = x

k

If whenever applying rule (I.2) above an identifierb can be associated with more than onex

k then we must choose the maximum correspondingn

i

. In doing so we preserve localityof definitions because a greatern

i

means a more recent definition in our context. Rule R.1above compiles the outermostwhereblock in a function while rule R.2 compiles the innerones. The simple difference between them resides in the needto merge the outer context withthe internal ones. In the algorithm above one can also observe the existence of a series ofvariables applied to each function defined in a context. We pass explicitly the value of globalvariables by locally referencing them and increasing the arity of the -terms of each localfunction definition. Local functions will take global variables as ‘extra’ arguments. Duringrewriting, these variables will fetch the value of global variables.

22 R.D.LINS AND S.J.THOMPSON

In the case of awhereblock not being embedded in a function but in a simple expressionrule R.1, needs to be modified slightly by omitting the replacement operator relative to theoutermost variables.

Two optimizations can be incorporated in a compiler that uses the compilation algorithmabove. The first one is to remove the selectors from the code atcompile time by applying therewriting lawS. The second optimization, slightly more complex, consistsof detecting whenglobal variables are not referenced in internal blocks. In this case the instantiation of globalvariables can be removed, generating more compact code.

An example of compilation can be found in the appendix.

GRAPH REDUCTION

The version of SASL presented here has been implemented at the University of Kent atCanterbury as two individual undergraduate projects by Mr.J-A.Camilleri and Mr.C.S.Lewis.The projects consisted of two parts, which can be described as: Part I - parsing of expressions,compilation and graph generation, and Part II - reduction machine and garbage collector.The implementation environment was UNIX2. Yacc21 was used to generate a parser of SASLscripts. The reduction machine was written in C.22 We present here some of the experiencegained from this implementation.

Due to the very experimental basis of the work, the very shorttime available,and also to thefact that this was the largest C program ever written by the implementors, efficiency was not amajor issue of the first stage of this project. Freedom was given to the implementors to decideabout implementation techniques. The set of categorical multi-combinators implemented wasslightly different from the one presented in the first section of this paper.Id-complementationwas needed in the system implemented, generating less compact graphs of compiled code.

Two major points, well known to C programmers and implementers of functionallanguages, were corroborated by this implementation. The first is the very high cost of routine(function) calls in C, which should be avoided whenever possible. In the simplest cases thiscan be done by introducing macro-definitions. The second major issue concerns the size ofthe graph. The project confirmed the enormous overhead of graph traversal by followingpointers.

The original machine written by Camilleri and Lewis was largely modified by the firstauthor keeping their project decisions, however. Unnecessary tests were removed, macrodefinitions replaced function calls whenever possible, andsome routines such as the pattern-matcher and the cell allocation routine were completely rewritten. These modificationsincreased the performance of the machine of an order of magnitude. In this paper wheneverwe talk about Camilleri and Lewis’ implementation we refer to the modified machine and notthe original one.

In the following subsections we present some simple modifications to our machine basedon the experience gained from the project.

2UNIX is a Trademark of AT&T Bell Labs.

IMPLEMENTING SASL USING CATEGORICAL MULTI-COMBINATORS23

Cells

A cell is the basic constituent of the dynamic storage systemused to represent the graph.A cell is divided into fields in which information (pointer, type, data, etc) is stored. In orderto obtain an efficient implementation of categorical multi-combinators the use of variablelength cells is desirable. Each cell is formed by a contiguous block of memory with a tag.The information allocated in each memory space can be eithera pointer, a variable, or aconstant. In Reference 11 we have suggested that the structure of the different cells usedin graph-reduction of categorical multi-combinators. In their implementation Camilleri andLewis opted to have a fully boxed cell representation. This meant that the the informationfields would be filled only with pointers. Variables and constants would be represented ascells themselves. This has proved to be an unwise decision inperformance terms, becausethe size of graphs is increased unnecessarily.

In the next subsection we extend the types of cell used in our machine to cope withthe new combinators introduced in this paper and also to makethe rewriting process moreefficient.

Operator cells

In this paper we have introduced two new combinators which were needed to define andmanipulate algebraic data-types. The data-block ‘D’ combinator can be represented in asimilar way to the structure of the multi-application or multi-pair combinators above. Themulti-selector combinator will be represented in a similarway to the multi-abstraction cell Nabove.

As we have already mentioned, the more compact the graph of expressions the moreefficient tends to be the implementation of a machine to reduce these expressions tonormal form, assuming the complexity of the rewriting laws and pattern-matching to remainunchanged. For this reason we now introduce explicit operator cells. Now the expression+x0x1 will be stored in one Plus cell which has only two informationfields. The relationshipbetween compositionsand each operator mustbe stated explicitly. For instance, the expression� (+x0x1)(P y

m

: : : y0) rewrites as

� (+x0x1) (P y

m

: : :y0) ) (+ x

0

0x0

1);

where

8

>

<

>

:

x

0

i

= x

i

; if x

i

is a constant or of the form L

a

(b)

= y

k

; if x

i

is a variable k

= � x

i

(P y

m

: : : y1y0); otherwise

These laws need to be introduced explicitly because before the introduction of explicitoperator cells this sort of rewriting was covered by rule M.1(remember that application isdenoted by juxtaposition).

Camilleri and Lewis’implementation makes use of operator cells.

24 R.D.LINS AND S.J.THOMPSON

Unifying compositions

As we have remarked in the last section we had to introduce a rewriting law to statethe relationship between compositions and each operator. In this section we show how toovercome this inconvenience.

If we denote application explicitly in rule M.1 above

� (A x0x1x2 : : : xn) (P y

m

: : : y1y0)) (A x

0

0x0

1 : : : x0

n

)

we can see the similarity between all the rewriting laws in the extended system of categoricalmulti-combinators which use compositions and can therefore unify them thus:

(C.1) � (x

c

x0x1 : : : xn) (P y

m

: : : y1y0)) (x

c

x

0

0x0

1 : : :x0

n

);

where

8

>

<

>

:

x

0

i

= x

i

; if x

i

is a constant or of the form L

a

(b)

= y

k

; if x

i

is a variable k

= � x

i

(P y

m

: : : y1y0); otherwise

The fact that this unification of rewriting laws can take place is not an accident. It restoresthe ‘real’ meaning of compositions in the categorical language as equivalent to substitutionsin the�-Calculus as explained by Lambek,3 or more rigorously speaking substitutions in the�-Calculus with Lazy Substitutions.7

This unification of compositions was not used in Camilleri and Lewis’ implementation.

Environment distribution

The modifications presented in this paper show us that the categorical multi-combinatormachine extended with arithmetic operators and algebraic data-types can be expressed as,

(M.2) L

n

(y) x0x1 � � �xnxn+1 � � �xz

8

>

<

>

:

) y x

n+1 : : :xz ; if y is a constant

) x

n�y

x

n+1 : : : xz ; if y is a variable

) ( � y (P x0 � � �xn))xn+1 � � �xz ; otherwise

(C.1) � (x

c

x0x1x2 : : :xn) (P y

m

: : :y1y0) ) (x

c

x

0

n

: : :x

0

1x0

0);

where

8

>

<

>

:

x

0

i

= x

i

; if x

i

is a constant or of the form L

a

(b)

= y

k

; if x

i

is a variable k

= � x

i

(P y

m

: : : y1y0); otherwise

(S) Sn

(D x

i

: : : x1x0)) x

n

(+) +xy ) x+ y

Now let us examine how a pattern matching the left-hand side of rule (C.1) can begenerated. Compositions are not generated by compilation of expressions. They areintroduced by rewriting a pattern of type of the left-hand side of rule (M.2) if y is not aconstant or a variable. In this casey is a complex expression either formed by application

IMPLEMENTING SASL USING CATEGORICAL MULTI-COMBINATORS25

of simpler expressions or an algebraic data-type structure. The environment formed by theapplication of rule (M.2) will be distributed using rule (C.1). Since the pattern rewritten usingrule (M.2) was the leftmost outermost pattern in an expression and the new pattern generatedmatches only with the left hand side of rule (C.1) we know thatan application of rule (C.1)will automatically follow the application of rule (M.2) ify is not a constant or a variable.This implies that there is no need to pattern-match in this case, making the rewriting processmore efficient.

By observing rule (C.1) we can also see that compositions canalso be generated wheneverwe distribute an environment throughout a complex expression. One can introduce ‘eagerenvironment distribution’, i.e. wheneverx

i

is a complex expression under application or adata-block we recursively call (C.1). As we have already shown in Reference 11 this does notcause any theoretical problems (of termination etc.) for the system because the only risk wetake is the one of passing parameters to an expression which would be discarded during theexecution process. Eager environment distribution can be desirable in a machine with a slowmemory management system once we have completely dismissedthe need for the generationof composition and multi-pair cells.

The categorical multi-combinator implementation of SASL at Kent does not contain anyof the optimizations above.

PERFORMANCE CONSIDERATIONS

In this section we compare the performance of two categorical combinator interpreters andone compiled implementation with several other language implementations.

Camilleri & Lewis

The categorical multi-combinator version of SASL implemented at Kent was comparedfor performance with the standard version of SASL,which uses a Turner combinator graphreduction machine. Using several standard benchmark programs as discussed below wecompared the performance of these two implementations. Thecategorical multi-combinatorversion of SASL was about 2 to 3 times slower than the Turner combinator one. The reasonsfor this poor performance can be attributed to a series of implementation decisions, togetherwith the short time available, and the inexperience of the implementors with C programmingand functional languages.

Amongst the unwise implementation decisions taken we can point out the use of fully-boxed cells, which generated unnecessarily large graphs, and the absense of sharing ofcomputations. Sharing was incorporated during compilation, but if a shared sub-expressionwas reduced a copy of this expression would be made without up-dating the original pointers.This means that a shared sub-expression could be evaluated more than once. Function callswere widely used in the C code instead of macro definitions. None of the optimizationsabove, except the use of operator cells, was incorporated inthis implementation.

26 R.D.LINS AND S.J.THOMPSON

Musicante & Lins

The experience gained from the implementation by Camilleriand Lewis served as a basisfor a new implementation for an interpreted machine based oncategorical multi-combinators.This implementation is fully described in Reference 23.Four simple programs which makeextensive use of the most important features of lazy functional languages, such as recursion,higher-order functions, and lazy evaluation have been usedto compare this machine withsome other implementations. They are:

Fibonacci: the Fibonacci number of 20.

Sieve: Erathosthenes’ sieve to find all prime numbers up to 300.

InsOrd: sorting by insertion a list of 100 numbers generated at random.

SimLog: a program which transforms a list of 100 random numbers into alist of 100 randomboolean values.

We compare the performance of the interpreted machine (CMC)with a categorical multi-combinator compiled machine (GMC16), and also with Simon Croft’s implementation ofTurner’s KRC13, and ML24, a strict functional language. Our version of ML corresponds tothe Edinburgh implementation of Standard ML by FAM version 3.3. This implementationof KRC makes use of Turner’s combinators and displays a performance at least as goodas Turner’s implementation of SASL. We also provide performance figures for two of thechosen benchmark programs in C. These programs were implemented in a functional style.A different implementation of these algorithms in C may bring a better performance.

InsOrd andSimLog make use of lazy evaluation. For this reason we do not producetheir performance figures in ML and C.

Table I. Time performance in secondsProgram implementation Fibonacci Sieve InsOrd SimLogKRC 65.84 30.40 30.2l 4.11CMC 24.88 14.20 15.60 2.73GM-C 2.19 2.89 2.85 1.05ML 8.58 6.68C 0.83 1.28

For the sake of simplicity and portability our implementations are written in C running underVMS. All data presented here were obtained using a Vax 750. For each test program atleast five time measures were taken. We present the worst ones. The time figures abovecorrespond to user c.p.u. time in seconds. The table above shows us that the performanceof an interpreted categorical multi-combinator machine isas good as an interpreted machinerunning Turner’s combinators. A compiled categorical multi-combinator machine can runseveral times faster than an interpreted one reaching a performance comparable with efficientimplementations of imperative languages.

IMPLEMENTING SASL USING CATEGORICAL MULTI-COMBINATORS27

ACKNOWLEDGEMENTS

The authors would like to thank John-Albert Camilleri, CarlLewis, and Martin Musicantefor their implementation of SASL. Gratitude is also due to Prof.David Turner and John Cupittfor several discussions and comments.

This work had the financial support of the British Council andC.N.Pq. (Brazil) grantsNo 40.9110/88.4. and 46.0782/89.4.

APPENDIX

Example of Compilation of a simple program using local recursion. The program

exp n = suc n + dsuc nwheresuc a = a+1dsuc b = suc(suc b)

compiles into categorical multi-combinators as,2

6

4

2

6

4

exp n = suc n + dsuc n where

suc a = a+ 1dsuc b = suc(suc b)

3

7

5

3

7

5

R:1) exp � L

0(R

n

0(IT

suc;dsuc

0;1 (suc n + dsuc n))

I:1) exp � L

0(R

n

0((IT

suc;dsuc

0;1 suc) (I

T

suc;dsuc

0;1 n) (I

T

suc;dsuc

0;1 +) (I

T

suc;dsuc

0;1 dsuc) (I

T

suc;dsuc

0;1 n)))

I:2) exp � L

0(R

n

0((S0T

suc;dsuc

)n+ (S

1T

suc;dsuc

)n))

t:9) exp � L

0((R

n

0(S0T

suc;dsuc

)) (R

n

0n) (Rn

0+) (R

n

0(S1T

suc;dsuc

)) (R

n

0n))

t:14) exp � L

0((S

0T

suc;dsuc

)0+ (S

1T

suc;dsuc

)0)

where,

T

suc;dsuc

� D (L

1(R

n;a

1;0 (IT

suc;dsuc

0;1 (a+ 1))))0) (L1(R

n;b

1;0(IT

suc;dsuc

0;1 (suc(suc b))))0)

I:1) T

suc;dsuc

� D (L

1(R

n;a

1;0 ((IT

suc;dsuc

0;1 a) (I

T

suc;dsuc

0;1 +) (I

T

suc;dsuc

0;1 1)))0)

(L

1(R

n;b

1;0(IT

suc;dsuc

0;1 (suc(suc b))))0)

I:2) T

suc;dsuc

� D (L

1(R

n;a

1;0 (a+ 1))0)(L1(R

n;b

1;0(IT

suc;dsuc

0;1 (suc(suc b))))0)

t:9) T

suc;dsuc

� D (L

1((R

n;a

1;0a) (Rn;a

1;0+) (R

n;a

1;0 1))0)(L1(R

n;b

1;0(IT

suc;dsuc

0;1 (suc(suc b))))0)

t:14) T

suc;dsuc

� D (L

1(0+ 1)0)(L1

(R

n;b

1;0(IT

suc;dsuc

0;1 (suc(suc b))))0)

28 R.D.LINS AND S.J.THOMPSON

I:1) T

suc;dsuc

� D (L

1(0+ 1)0)(L1

(R

n;b

1;0((IT

suc;dsuc

0;1 suc)(I

T

suc;dsuc

0;1 (suc b))))0)

I:2) T

suc;dsuc

� D (L

1(0+ 1)0)(L1

(R

n;b

1;0((S0T

suc;dsuc

) (I

T

suc;dsuc

0;1 (suc b))))0)

I:1) T

suc;dsuc

� D (L

1(0+ 1)0)(L1

(R

n;b

1;0((S0T

suc;dsuc

)((I

T

suc;dsuc

0;1 suc) (I

T

suc;dsuc

0;1 b))))0)

I:2) T

suc;dsuc

� D (L

1(0+ 1)0)(L1

(R

n;b

1;0((S0T

suc;dsuc

)((S

0T

suc;dsuc

)b)))0)

t:9) T

suc;dsuc

� D (L

1(0+ 1)0)(L1

(R

n;b

1;0((S0T

suc;dsuc

))(R

n;b

1;0(S0T

suc;dsuc

)b))0)

t:14) T

suc;dsuc

� D (L

1(0+ 1)0)(L1

((S

0T

suc;dsuc

)(R

n;b

1;0((S0T

suc;dsuc

)b)))0)

t:9) T

suc;dsuc

� D (L

1(0+ 1)0)(L1

((S

0T

suc;dsuc

)((R

n;b

1;0(S0T

suc;dsuc

))(R

n;b

1;0b)))0)

t:14) T

suc;dsuc

� D (L

1(0+ 1)0)(L1

((S

0T

suc;dsuc

) ((S

0T

suc;dsuc

) 0))0)

REFERENCES

1. D.A. Turner, ‘A new implementation technique for applicative languages’,Software — Practice andExperience,9, 31 50 (1979).

2. H.P.Barendregt,The Lambda Calculus Its Syntax and Semantics, North Holland, 1984.3. J.Lambek, ‘From lambda-calculus to cartesian closed categories’, in J.P.Seldin and J.R.Hindley (eds),To

H.B.Curry: Essays on Combinatory Logic, Lambda-Calculus and Formalism, Academic Press, 1980.4. P-L.Curien,Categorical Combinators, Sequential Algorithms and Functional Programming, Research

Notes in Theoretical Computer Science, Pitman Publishing Ltd., 1986.5. D.Scott, ‘Relating theories of the lambda-calculus’, inJ.P.Seldin and J.R.Hindley (eds),To H.B.Curry:

Essays on Combinatory Logic, Lambda-Calculus and Formalism, Academic Press, 1980.6. G.Cousineau P-L.Curien and M.Mauny, ‘The categorical abstract machine’, in J-P.Jouannaud (ed.),

Functional Programming Languages and Computer Architecture, SLNCS 201, 1985.7. R.D.Lins, ‘A new formula for the execution of categoricalcombinators’,Proceedings of 8th. International

Conference on Automated Deduction, Springer Verlag, July 1986, LNCS 230, pp 89–98.8. R.D.Lins, ‘On the efficiency of categorical combinators in applicative languages’, Ph.D. Thesis, The

University of Kent at Canterbury, October 1986.9. R.D.Lins, ‘On the efficiency of categorial combinators asa rewriting system’,Software — Practice and

Experience, 17, 547–559 (1987).10. R.J.M.Hughes, ‘The design and implementation of programming languages’, Ph.D. Thesis, Oxford Univ.

Comp.Lab., July 1983.

11. R.D.Lins, ‘Categorical multi-combinators’, in GillesKahn (ed.),Functional Programming Languages andComputer Architecture, Springer-Verlag, September 1987, LNCS 274, pp. 60–79.

12. D.A.Turner, ‘SASL Language Manual’, UKC Computing Lab. Report, The University of Kent atCanterbury, 1983. Revised Version: Nov/83.

13. Simon Croft, ‘Functional language implementation’, Master’s Thesis, The University of Kent, ComputingLaboratory, 1984.

14. D.A.Turner,FunctionalProgrammingand its Applications, chapterRecursion Equations as a ProgrammingLanguage. Cambridge University Press, 1982.

15. T.Johnsson, ‘Compiling Lazy Functional Languages’, Ph.D. Thesis, Chalmers Tekniska Hogskola,Goteborg, Sweden, January 1987.

IMPLEMENTING SASL USING CATEGORICAL MULTI-COMBINATORS29

16. M.A.Musicante and R.D.Lins, ‘Gmc a graph categorical multi-combinators machine’,Proc. of 8th.Congress of the Brazilian Computing Society, July 1989.

17. N.G.DeBruijn, ‘Lambda calculus notation with namelessdummies, a tool for automatic formula manipu-lation’, Indag.Math, 34, 381–392 (1972).

18. G.Huet and D.Oppen, ‘Equations and rewrite rules - a survey’, Formal Language Theory, pages 349–405,1980.

19. D.A. Turner, ‘Miranda: a non-strict functional language with polymorphic types’, in J.P.Jouannaud (ed.),Functional Programming Languages and Computer Architecture. Springer-Verlag, 1985.

20. S.Peyton Jones,The Implementation of Functional Languages, Prentice Hall, 198721. S.C.Johnson, ‘Yacc - yet another compiler compiler’, Technical Report 32, Bell labs., 1975. Also in UNIX

Programmer’s Manual, Volume 2B.22. B.W.Kernighan and D.M.Ritchie,The C Programming Language, Prentice-Hall, Englewood Cliffs, N.J,

1978.23. M.A.Musicante and R.D.Lins, ‘Implementing a categorical multi-combinators machine’,Proceedings of

XIV LatinoAmerican Conference on Informatics, Buenos Aires, Argentina, September 1988.24. R.Milner, ‘Standard ML proposal’,The ML/LCF/Hope Newsletter, 1, (3), January 1984.