A practical approach to type-sensitive parsing

16
Pergamon oo,~-ossl(~)oeo4-2 Comput. Lang. Vol. 20, No. 2, pp. 101-116, 1994 Copyright © 1994 ElsevierScienceLtd Printed in Great Britain. All rights reserved 0096-0551/94 $7.00 + 0.00 A PRACTICAL APPROACH TO TYPE-SENSITIVE PARSING KEN SAILOR and CARL MCCROSKY'~ Department of Computational Science, University of Saskatchewan, Saskatoon, Canada S7N 0W0 (Received 4 November 1993; revision received 25 April 1994) Abstract--Type-sensitive parsing of expressions is context-sensitive parsing based on type. Previous research reported a general class of algorithms for type-sensitive parsing. Unfortunately, these algorithms are impractical for languages that infer type. This paper describes a related algorithm which is much more efficient--its incremental cost is linear (with a small constant) in the length of the expression, even when types must be deduced. Our method can be applied to any statically typed language solving a variety of problems associated with conventional parsing techniques including problems with operator precedence and the interaction between infix operators and higher order functions. context-sensitive parsing type-sensitive parsing type inference programming language syntax 1. INTRODUCTION Human interpretation and generation of language are guided by context. Although "meaning" in natural languages is not well-defined, meaning is the context that guides our parsing of natural languages. If one were to say "time flies like a racehorse" in a conversation about how quickly the term has ended, the phrase is most likely interpreted as an appropriate metaphor of how quickly time seems to pass. In this case, "time" is a noun and "flies" is a verb. It is also possible that the speaker might be engaged in a discussion of how to determine the speed insects travel. In this case, the sentence is a suggestion of how this speed may be determined, and the word "time" is the verb while "flies" is the noun. A not inconceivable possibility is that the phrase concerns the preferences of a particular kind of fly. These flies prefer racehorses. In this case, "time" is an adjective while "flies" is a noun. Each of the interpretations is valid. The only way to determine the meaning of the sentence is to consider it in context, or guess a likely interpretation in the absence of context. Although natural language processing remains an unsolved computational problem, humans perform natural language processing with ease. It is not generally considered a challenge to converse even though the rules of conversation are not explicit nor is it understood how we generate or interpret utterances. Considering that programming languages are a type of formal language with simple syntax and well-defined semantics, it is curious that many students have difficulty writing syntactically correct programs. Given the best programming languages and the best programmers, programs are notoriously hard to understand. Programs are annotated with natural language comments to facilitate understanding even though the language of those comments is not syntactically simple nor is its semantics well-defined. In light of these observations, we question the current belief that a simpler syntax is a better syntax. It is certainly true that there are bad complicated syntaxes, but a complicated syntax is not in itself a detriment to the usability of a programming language. When we describe a syntax as simple, we mean that it has a context-free grammar, and a short one at that. A programming language syntax is generally considered good if it has a simple syntax and bad otherwise. Further, it is usually a routine task to construct a context-free grammar that links program text and program meaning with a parse tree. In this way, context-free grammars join the syntax and semantics of a language. Although there are benefits to the context-free approach, the fact remains that humans have a talent for parsing far in excess of simple grammars. We believe there is value in exploring context-sensitive parsing techniques for programming languages. Although this may result in more computationally complicated methods, it is possible ?Author for correspondence. 101

Transcript of A practical approach to type-sensitive parsing

Pergamon oo,~-ossl(~)oeo4-2

Comput. Lang. Vol. 20, No. 2, pp. 101-116, 1994 Copyright © 1994 Elsevier Science Ltd

Printed in Great Britain. All rights reserved 0096-0551/94 $7.00 + 0.00

A P R A C T I C A L A P P R O A C H T O T Y P E - S E N S I T I V E P A R S I N G

KEN SAILOR and CARL MCCROSKY'~

Department of Computational Science, University of Saskatchewan, Saskatoon, Canada S7N 0W0

(Received 4 November 1993; revision received 25 April 1994)

Abstract--Type-sensitive parsing of expressions is context-sensitive parsing based on type. Previous research reported a general class of algorithms for type-sensitive parsing. Unfortunately, these algorithms are impractical for languages that infer type. This paper describes a related algorithm which is much more efficient--its incremental cost is linear (with a small constant) in the length of the expression, even when types must be deduced. Our method can be applied to any statically typed language solving a variety of problems associated with conventional parsing techniques including problems with operator precedence and the interaction between infix operators and higher order functions.

context-sensitive parsing type-sensitive parsing type inference programming language syntax

1. I N T R O D U C T I O N

Human interpretation and generation of language are guided by context. Although "meaning" in natural languages is not well-defined, meaning is the context that guides our parsing of natural languages. If one were to say "time flies like a racehorse" in a conversation about how quickly the term has ended, the phrase is most likely interpreted as an appropriate metaphor of how quickly time seems to pass. In this case, "time" is a noun and "flies" is a verb. It is also possible that the speaker might be engaged in a discussion of how to determine the speed insects travel. In this case, the sentence is a suggestion of how this speed may be determined, and the word "time" is the verb while "flies" is the noun. A not inconceivable possibility is that the phrase concerns the preferences of a particular kind of fly. These flies prefer racehorses. In this case, "time" is an adjective while "flies" is a noun. Each of the interpretations is valid. The only way to determine the meaning of the sentence is to consider it in context, or guess a likely interpretation in the absence of context.

Although natural language processing remains an unsolved computational problem, humans perform natural language processing with ease. It is not generally considered a challenge to converse even though the rules of conversation are not explicit nor is it understood how we generate or interpret utterances. Considering that programming languages are a type of formal language with simple syntax and well-defined semantics, it is curious that many students have difficulty writing syntactically correct programs. Given the best programming languages and the best programmers, programs are notoriously hard to understand. Programs are annotated with natural language comments to facilitate understanding even though the language of those comments is not syntactically simple nor is its semantics well-defined. In light of these observations, we question the current belief that a simpler syntax is a better syntax. It is certainly true that there are bad complicated syntaxes, but a complicated syntax is not in itself a detriment to the usability of a programming language.

When we describe a syntax as simple, we mean that it has a context-free grammar, and a short one at that. A programming language syntax is generally considered good if it has a simple syntax and bad otherwise. Further, it is usually a routine task to construct a context-free grammar that links program text and program meaning with a parse tree. In this way, context-free grammars join the syntax and semantics of a language. Although there are benefits to the context-free approach, the fact remains that humans have a talent for parsing far in excess of simple grammars.

We believe there is value in exploring context-sensitive parsing techniques for programming languages. Although this may result in more computationally complicated methods, it is possible

?Author for correspondence.

101

102 KEN SAILOR and CARL McCROSKY

that these methods will allow syntaxes that are more easily generated and understood by humans. If there is a correspondence between a context-sensitive method and the way we naturally generate language, then it might be that such a method would make it easier to read and write programs. Anyone who has ever taught introductory programming knows that the rules of context-free parsing are a source of frustration for many beginning programmers.

Consider the Pascal expression

b or x < y and c,

where b and c are boolean variables and x and y are integer variables. The choice of precedence in Pascal makes the boolean operators bind tighter than the relational operator, forcing a type error when or is applied to b and x. How many times must instructors explain that although the phrase makes sense (to us) and has only one valid interpretation, the parser ignores meaning when determining meaning!

What kind of context-sensitive parsing is possible and what benefits would such methods have? In a previous paper, [1], we show that context-sensitive parsing of expressions is possible in programming languages where types are supplied by the programmer. The benefits of this method are a simpler and type-safe form of precedence (precedence will never force a type error) and a reduced need for disambiguating tokens like parentheses. For this technique, expression type is used as an approximation of meaning to guide the parsing of expressions. The cost of the method is O(n ~) where n is the number of elements in an expression, when types are supplied by the programmer, and exponential in n when types must be inferred. In this paper, we report a new approach that yields a practical context-sensitive algorithm within type inference. This algorithm infers types while parsing expressions at no increase in the cost of type inference. In a language including Hindley-Milner type inference, this algorithm makes context-sensitive parsing available with no additional cost. In a language with restricted polymorphism and programmer-supplied types, the new method reduces the complexity of context-sensitive expression parsing to O (n). Similar to the method reported in Ref. [1], our new method provides type-safe and simpler precedence and a reduced need for disambiguating syntactic tokens.

The new algorithm, like the old, uses type as an approximation of expression meaning. Given an expression consisting of a list of elements, various groupings of elements are tested to determine if they form validly typed function applications. When a grouping satisfies the typing restriction, then one element of the grouping is assumed to be a function and the other elements arguments to that function. This grouping becomes a single element in the original expression, and new groupings are tested until there is one element remaining in the expression being processed. When one element remains, the expression has been completely parsed. The new algorithm is based on a pushdown automaton. It never considers more than the first two elements of an expression and the top of the stack to determine the next grouping. This restricted lookahead (and behind) is the source of the efficiency gain over the previous algorithm.

Section 2 gives motivation for and an overview of type-sensitive parsing. Section 3 provides preliminary definitions. Section 4 specifies the algorithm. Section 5 gives example parses of the algorithm. Section 6 discusses the time complexity of the algorithm and relates practical experience with our implementation. Possible extensions are discussed in Section 7. Section 8 presents our conclusions.

2. LET TYPING BE THE GUIDE

Our interest in context-sensitive parsing grew out of the inability of context-free grammars to parse expressions obvious to us. We use four simple examples as an introduction to parsing using type context.

2.1. Interpreting "sin cos x "

Prefix function application is typically a tightly binding form of function application in programming languages. Languages like Pascal make prefix function application explicit through

A practical approach to type-sensitive parsing I03

the use of parentheses and commas: sin (cos (x ) ) . Languages like Haskell [2] and Mirandat [3] do not require arguments to be gathered in parenthesized lists, but treat function application as left-associative. In these languages sin cos x is a typing error since left-associative function application would apply sin to cos and apply the result to x: an error because sin is a function of a number not another function. The correct form of the sample expression for these languages would be sin (cos x ) . Regardless of the presence or absence of parentheses, the example expression has only one reasonable interpretation, that cos is applied to x before sin is applied to the result. This meaning can be determined from the type of the expression elements involved.

2.2. In terpre t ing " t rue or 4 < 5 and f a l s e "

In this second example, the weakness of standard precedence is exposed. Precedence determines a binding hierarchy of operator symbols used to determine the parse of infix operator applications. If the relative binding of < is less than that of or (as it is in Pascal) then the expression is an error because these precedences force the application of or to true and 4. This problem is somewhat mollified by more clever choices of precedence to ease the construction of standard phrases, but this cleverness results in complicated precedence hierarchies that many programmers choose not to learn. A common response to the precedence hierarchy of the C programming language is to ignore precedence and parenthesize expressions completely (just to be on the safe side), resulting in the unreadable expressions precedence is meant to eliminate. Allowing the context of type to guide parsing, we can introduce two rules: that infix operators may only be applied to correctly typed arguments; and that precedence is only considered among homogeneously typed function applications. These two rules collapse complicated hierarchies into simple ones. For example, it is only necessary to know the relative precedences of the boolean operators instead of understand- ing how the booleans compare to all other operators. The simple Pascal scheme of assigning and a multiplicative precedence and or an additive precedence is enough to guide parsing without stumbling over simple examples like the one above. In the example expression, the relational operator must be applied to the two numbers first. Once this application has been formed, the relative precedence of the boolean operators can be considered to complete the parse.

2.3. In terpre t ing "3 -/- 4" and "reduce -/- 0"

In higher-order programming languages, operators can be arguments to functions as well as functions that are applied infix to arguments. If + is addition, the single sensible meaning of the first expression above is the application of + to 3 and 4. If reduce ( fo ldr in Miranda or Haskell) is the standard higher order function that applies a binary function between every element in a list and an identity value, then the single sensible meaning of the second expression is the application of reduce to + and 0 (the standard definition of the function that sums a list of numbers). It is impossible for a context-free parser to distinguish the two expressions (other than for a fixed number of functions and operators hard-coded in a grammar). Functional languages that allow higher-order function application use additional symbols to signal the two cases for their context-free parsers. In Miranda or Haskell, the second phrase must be written reduce ( + ) O, using parentheses as magic characters to override the usual meaning of + to the parser. Similar magic characters are used to indicate that a function not requiring the magic parentheses is to be interpreted as infix. Miranda requires a prepended "$" character while Haskell uses surrounding backquotes. If the context of type is introduced to parsing, then it is possible to distinguish the two cases without magic characters. The first example must be infix operator application while the second must be prefix function application.

2.4. Pars ing guided by type

Type-sensitive parsing, as introduced in Ref. [1], uses type context for a subtler interpretation of syntax. In this scheme, a traditional context-free parser gathers an expression into a list of tokens, reduce + 0 would be parsed to a list further annotated by token type (from a symbol table), such as [reduce ~ ~ ~ ~ ~ ~ ~ L~,, • ~ ~, + In,~ ~,,~ ~n,,O In,]. Then a set of conditional, prioritized rewrite rules is applied to the list, creating application combinations until the list is reduced to one dement.

tMiranda is the trademark of Research Software Ltd.

104 KEN SAILOR and CARL MCCROSKY

In this example, the highest priority rule with a satisfied conditional would be partial pref ix application, resolving the list into the single application (reduce-t-0). The worst-case quadratic efficiency cost for this scheme results from the fact that the entire expression may be scanned n times to determine the next rewrite rule to apply. As expressions tend to be short, this cost is practical and allows for a sophisticated interpretation of syntax. Rules are easily developed to handle the examples above as well as multiplication by juxtaposition and other notational practices. Unfortunately, efficiency costs for this method are unacceptably large when types are not given but must be inferred. Our interest in higher-order functional programming languages where type inference is a standard feature led to the development of a type-sensitive parsing method practical in this domain.

Like Ref. [1], we use a context-free parser to parse a program into a form where expressions are gathered lists of sub-expressions. A modified version of Milner's type inference algorithm is then applied to the program that returns a program annotated by type where function application is fully resolved if the program is determined to be well-typed. Although type inference has a worst-case exponential cost [4], it is widely recognized as a practical compiler analysis. Our extensions providing type-sensitive parsing do not increase the practical costs. Rather than scanning the entire string of expressions, this version of type-sensitive parsing moves left-to-right through a list of expressions with no backup. Once an application is formed, it is never broken. As there are a constant number of possibilities for each position considered, the degradation of standard type inference is at most a constant factor. This algorithm has been implemented in our experimental functional language, Falafel, which has a syntax closely related to the syntax of Miranda. After defining the algorithm, we include experimental results demonstrating the low cost of the method in comparison with strictly prefix, left-associative parsing.

3. P R E L I M I N A R Y DEFINITIONS

In this section, definitions are given for standard terms like precedence and associativity, as well as new terms useful in the context of type-sensitive parsing and our algorithm.

Given an expression that is a pair of infix function applications such as x . y + z or in general argt opl arg2 op2 arg3, the expression is parsed into prefix form (or order of application is determined) respecting the binding properties of the operators involved. For example, if opl has a higher binding priority than 0t92 , then the prefix form is op2 (opt argt arg2) arg 3 . If op2 has the higher binding priority, then the prefix form is opt argl (op2 arg2 arg3). This higher binding priority is the definition of precedence. If opt and op2 have the same precedence and if they are left-associative, the left-most operator binds tightest resolving to op2 (opt arg~ arg2) arg3. If the operators have the same precedence but are both right-associative, then the expression yields opt argl (op2 arg2 arg3). It is not allowed that operators have the same precedence but different associativities, since this results in ambiguity: either operator could be chosen as most tightly binding. Precedence is a useful strategy for eliminating unnecessary parentheses and improving the readability of expressions, although complicated precedence hierarchies tend not to be used (as noted above). We modify the standard definition of precedence so that the precedence of operators is only considered in a type safe context. A classification of functions is introduced to explain this modification to precedence.

Functions of two or more arguments can be divided into two categories: homogeneous binary and heterogeneous. Homogeneous binary functions take two arguments and return a result, all of the same type. I f f i s a homogeneous binary function, then f : t ---, t --, t ( r eadf i s a function from t to t to t for some type t ). A heterogeneous function's argument and result types are not required to be the same, or it may be a function of more than two arguments (in a curried language).

Given a string of infix homogeneous binary function applications, arg~ opl arg 2 op2 • • • opn_ l argn, if every operator is of type t ---, t --, t and each arg is of type t, then regardless of the order of application of the operators, the expression is well-typed. The standard concept of precedence can be used to reduce the need for parentheses in type safety. For example, 3 , 4 + 5 div 7 - 5 is well-typed regardless of the order the operators are applied. Conversely, given two homogeneous binary functions of incompatible types, no interpretation based solely on precedence can safely resolve order of application.

A practical approach to type-sensitive parsing 105

Precedence is not type-safe in resolving the order of application of strings of infix heterogeneous function applications. Given the expression arg~ het~ arg 2 het2 arg3, and let het~ : t~ ~ t 2 ~ t 3 and het2:t4 ~ t5---, t6, if t 3 unifies with t4 then het2 (het~ arg~ arg2) arg 3 is a possible interpretation. If t2 unifies with t6, then het~arg~(het2arg2arg3) is a possible interpretation. It is unusual for both interpretations to be valid. This would require the typings het~ : t~ ~ t2 ~ t2 and het2: t2 ~ t3 ~ t2 .

For example, given bn : Bool ~ Int ~ In t and nb: Int ~ Bool ~ lnt, then either interpretation of true bn 5 nb f a l s e is valid in terms of type. In this case precedence could choose between valid interpretations preserving type safety. This is rare and complicated, however, Commonly, there is a single valid interpretation of a string of infix, heterogeneous function applications, and precedence serves no purpose in these cases other than to introduce typing errors. For this reason, we restrict precedence to infix application of compatibly typed, homogeneous binary functions.

We distinguish between heterogeneous operators that gather arguments, and those that do not. The relational operators, like < , are typical of heterogeneous operators that gather their arguments. In expressions like 3 + 4 < 5 • 6, we choose to allow the numeric operators to be applied before the comparison is made. There are other heterogeneous functions for which the gathering behavior is less attractive. Where p ick is the array indexing function, we choose to interpret 3 p ick A + 5 as a p ick of A, not A + 5 (assuming A + 5 has meaning). Different preferences on this level have little effect on the algorithm that follows, although there is an aesthetic effect on the language accepted. In our work on syntax, the relational operators are the only operators we are aware of that are generally expected to gather arguments. A question of language design is whether gathering should be restricted to a specific set of operators or whether it should be under programmer control. As long as the algorithm can determine the behavior of a particular symbol, it makes no difference whether the set is open or closed.

We choose prefix application to bind more tightly than infix. An expression like sin 3 + 4 is interpreted as (sin 3 ) + 4 rather than sin (3 4- 4). This choice forces prefix function application to be non-gathering.

4. AN I N F O R M A L PRESENTATION

The goal of type-sensitive parsing is to provide a parsing method for languages more easily parsed by humans than the languages accepted by context-free parsers. This goal is not well defined, because how we (humans) parse or even whether there is just one way all people parse remains unknown. Indeed, it is ironic that mathematically more complicated languages--natural languages--appear easier for us to parse than context-free programming languages. If a parsing algorithm is a success, its workings should go unnoticed. That is, it should do what we expect; when it surprises us, it has failed. Writing software to such demanding and vague specifications is difficult indeed. Evidence that we have succeeded is given in the form of expressions our algorithm parses to our satisfaction.

Context-free grammars can be concise and, when one is versed in the formalism, easily understood. If context-sensitive techniques such as ours are to be used, there must be some method of explaining the technique concisely. In the section that follows, the complete method is given in all its detail. In this section, we give a less formal description to introduce the reader to the technique as well as to demonstrate that the method has a concise description. A description such as this could be used in presenting the technique within a particular language definition.

The algorithm works left-to-right through a list of expressions. Using a place marker, the algorithm identifies an expression in the list as the current expression. Given a list of expressions, we can informally denote the state of the algorithm by

w x y z

where underlining indicates the current expression is x. In any step, the algorithm considers at most two expressions: the current expression and the adjacent expression to its right or left. The expressions to the right of the current expression are expressions yet to be considered. The expressions to the left of the current expression have been passed over for one of two reasons: either the expression represents an application of an operator that gathers its arguments; or the expression

106 l~r4 SxmOR and CARL McCROSKY

could not be applied to the expression on its right or left. As the algorithm progresses, function applications are introduced, denoted by parenthesized groupings. For example,

(fx) (g y z) s t

denotes that the current expression is the application of g to y and z. The application ( fx ) has been formed and passed over. The expressions s and t are yet to be considered.

The algorithm is a list of conditional rules. If the condition of a rule is satisfied, then the expression list and placemarker are changed as specified by the rule. The rules are attempted in order of presentation. If the condition of the rule is not satisfied, then the next rule in the list is attempted. The ordering of the termination rules is arbitrary. No other rule would apply when either termination rule is satisfied. The ordering of the remaining rules reflects our preference that prefix application be more tightly binding than infix application. An example is given with each rule. The first two rules govern termination. (Rule names match the formal algorithm.)

1. (Termination) If there is only one expression in the expression list, then the algorithm has successfully determined function application.

(reduce + 0) =~ (reduce + 0)

2. (Forced Prefix) If the current expression marker has been moved beyond the last expression in the expression list, check if the expressions to the left form an application in prefix form. If not, a type error is reported. Figure 3 presents the full processing of the example and includes a motivating discussion.

map (< 3) nums_ ~ pre f ix (map ( < 3) nums)

The next five rules are the inner workings of the algorithm. Rule 3 is prefix function application. Rules 4 and 6 test when the expression on the left of the current expression applies to the current expression. Rule 5 is infix function application. Rule 7 applies when no other rule does.

3. (Prefix) If the current expression applies to the expression on its right, form the application (merge the two expressions into a single expression represented by the application of the first to the second).

double 3 =~ (double 3)

4. (Non-gathering) If the expression to the left is not a gathering expression and that expression applies to the current expression, form the application.

( + 3 ) 4 ~ (+3 4)

5. (Infix) If the current expression could be an argument of the expression to its right, form the application. If the expression on the right is a gathering operator, move the current marker ahead to the next expression. In the examples, + is non-gathering and < is gathering.

3 + 4 = ~ ( + 3 ) 4

3 < 4 =:, ( < 3 ) 4

6. (Gathering) If the expression to the left is a gathering expression that applies to the current expression, form the application.

( < 3 ) 4 =~ (<3 4)

7. (Block) If none of the above rules apply, move the place marker to the next expression.

map 3 <nums =~ map 3 < n u m s

Figure 1 demonstrates how these rules are used to interpret the example expressions from Section 2. The rule name and number producing the line are given on the right.

In processing sin cos x, the value of the Block Rule is shown. Because sin cos can not be interpreted as a valid prefix or infix application, the Block Rule inserts an implicit open parenthesis

A practical approach to type-sensitive parsing 107

COS X

sin cos x Block (7) sin ~cos x~ Prefix (3) (sin (cos x)) Non-gathering (4)

3 + 4 (+ 3) 4 Infix (5) (+ 3 4) Prefix (3)

re.zhr~ + 0 (reduce +) 0 Prefix (3) (reduce + 0) Prefix (3)

true or 3 < 4 and false (or true~ 3 < 4 and false (or true) 3 < 4 and false (or true) (< 3) 4 and false (or true) (< 3 4) and false (or u'ue (< 3 4)) and false (and (or true (< 3 4~) false (and (or true (< 3 4) false)

Infix (5) Block (7) L~x (5)

Gathering (6) Non-Gathering (4)

Infix (5) Pre~ (3)

Fig. 1. The informal algorithm applied to the examples.

before cos by moving the current expression marker to the right, cos applies to x, satisfying the condition of the Prefix Rule, and sin applies to the result by the Non-gather Rule.

The second and third examples show how infix operator application is distinguished from higher-order function application. In the first case, 3 does not apply to + , so infix application is detected. In the second case, reduce does apply to + , signalling prefix application.

The example of true or 3 < 4 and false shows the recognition of infix operators. In this case, the Infix Rule forms the application of or to true. The next operand for or is not found, so the current expression is bumped to the right. < applies to 3, and because < is a gathering operator, 4 becomes the current expression. Although < is gathering, the application to 4 is completed because 4 and and are incompatibly typed. Now the partial application of or to the left is completed with the second argument ( < 3 4 ) . Finally, the infix application of and is recognized. This case demonstrates the recognition of gathering and non-gathering infix operators. It also demonstrates that this naive version does not consider precedence. Precedence can be added by delaying the forming of function applications in the case of infix homogeneous binary functions. The Infix Rule is used to recognize such applications, but function applications are not formed until the longest list of homogeneous binary applications has been parsed. Using angle brackets to represent this delayed application form, this example expression is processed as shown in Fig. 2. At the end of

~a3<4wdblz

~.~3<4~df~ ~(5) (~m)~<4mdfalz Bl0dt~

(I~ ~)~<341mlf~ Ga~ering (6) (m~(<341)ulf~ N0a.gatkfing (4) (m~t<341~f~ I~(5)

Fig. 2. Precedence and operators.

CL 20/2--I)

108 KEs SAILOR and CARL MCCROSKY

m a n 3 < nurrts map 3 < n u m s map (< 3) n u r n s

map (< 3) n u m s _ _ (map (< 3) nums)

m o c k (6) Infix (4)

B l o c k (6) Forced Prefix (2)

Fig. 3. The forced prefix rule.

the processing in Fig. 2, the expression enclosed in angle brackets is of the form arg, bin, arg2 bin2 arg 3 and function application respecting precedence can be resolved by standard techniques [5] (assuming and has a higher precedence than or) to:

(or true (and(< 3 4) false))

Figure 3 provides an example of the Forced Prefix Rule, using the expression map 3 <nums, where nums is a list of numbers. Because < is a gathering operator, the current expression is moved right preventing the Non-gather Rule that would test whether map applies to the partial application (<3) . Since nums can not be the second operand to < the current expression falls off the right side of the expression list. The Forced Prefix Rule checks that the entire expression list forms a prefix application.

The example expressions have so far included only tokens of explicit type. Tokens of as yet unknown type---tokens that require type inference---introduce one complication. Consider the expression x + y where the types of x and y are unknown and + is the usual infix operator. The expression is ambiguous, having at least two validly typed interpretations: one obvious and the other less so. The first interpretation is that x and y are numbers, and the entire expression is an application of addition resulting in a number. In a second interpretation, x is a higher-order function taking + as its first argument and y as its second. The type of y and the result of the expression remain unknown in this interpretation. The naive rules given always choose the second interpretation because the Prefix Rule applies before the Infix Rule. This choice, however, is not satisfying, failing the criterion presented in the introduction to this section: it surprises us. We choose the first interpretation because the resulting types are simpler: the type of x is shorter and the types of y and the result of the entire expression are known.

To guide the algorithm to the desired interpretation, we add a condition to the Prefix Rule to prevent the application of the rule when the current expression is of unknown type and it has a function on its left or an infix function on its right. If this condition prevents the Prefix Rule from applying, then one of the three following rules will always apply, depending on whether there is a function to the left or right. This condition guides the algorithm to our desired interpretation in the absence of type information. For example, given the definition of a function f :

f x y = x + y

f is interpreted to be a function that adds its two arguments. In the case of the definition of the S combinator,

S f g h = fh(gh) ,

the Prefix Rule is not suppressed and the interpretation matches the standard definition of S since no more is known about h than f or g. Acknowledging that there may be cases where more interesting interpretations of functions like either f a n d S are desirable, these cases require explicit type declarations to guide the algorithm (as well as the human reader) to the alternate interpretation.

This brief introduction has two purposes: to argue that this approach to type-sensitive parsing is simple enough to provide a language feature that can be understood and used by programmers, and to provide an intuitive model to make the description of the exact method accessible. The next section presents the exact algorithm. Every example given to demonstrate the exact algorithm can

A practical approach to type-sensitive parsing 109

be understood in the context of the rules of this section together with the extension to handle precedence and the heuristic to suppress the Prefix Rule in connection with unknown types.

5. T Y P E - S E N S I T I V E P A R S I N G W I T H I N T Y P E I N F E R E N C E

Static typing is a compile time analysis which determines if a program is well-typed. Milner's type inference algorithm determines whether each function application is valid within the Hindley/Miiner polymorphic type discipline [6]. A well-typed program need not check data types at run time because correct typing is guaranteed by the analysis. Other optimizations may take advantage of the type annotations provided by the algorithm [7].

Given an environment supplying the types of predefined identifers, Milner's algorithm deduces the type of an expression and the type substitution necessary for the deduction. We modify the algorithm to return an output expression when the input expression is well-typed (as in Ref. [8]). In the input expression, function application is undetermined. Function application is fully resolved in the output expression. A context-free parser is used to translate expressions of concrete syntax into lists of expressions that our algorithm translates into fully-determined abstract syntax.

Milner's type inference algorithm can be described as a set of inference rules [9]. There are rules for abstraction, instantiation, and definition. In particular the rule for function application is:

A~-e' : (t---, t') AF-e:t

Ak(e' e): t'

This rule is read: "if under assumptions A it may be derived that e ' is of type (t --* t ') and if under the same assumptions it may be derived that e is of type t, then, under assumptions A, the application of e ' to e is of type t '." The workings of Milner's algorithm is the mechanical application of these rules of inference along with the necessary bookkeeping until an expression is fully typed.

Our algorithm for type-sensitive parsing is a hybrid of Milner's basic algorithm and freedom of expression. The application inference rule of the basic algorithm is used in a new way although the rule itself remains unchanged. In standard type inference, function application is determined before type inference, and the application rule ensures that the predetermined application is correct. In type-sensitive parsing, the type inference algorithm is given a list of expressions. It then tests possible applications until the order of applications is fully deduced, or no possible correctly typed application remains.

Let an expression be given by (e0, el . . . . e,) where the ei represent arbitrary expressions. In the example (reduce + 0), each expression is simply a token. Using parentheses (and various syntactic sugarings such as list notation), compound expressions may be formed, as in (map (5 + )(tail nums)). A context-free parser can gather expressions into nested lists of expressions. (reduce + O) parses to [reduce, + , 0]. (map (5 + ) (tail hums)) parses to [map, [5, + ], [tail, hums]]. In general, (e0, el . . . . e,) parses to [e0, el . . . . e,] where each ei may itself be a list of expressions.

As in Milner's algorithm, before an expression is checked, its subexpressions are processed. In FOE, this means that the elements of a list of expressions, [e0, el . . . . e,], are individually resolved before the enclosing list is processed. If each element is successfully typed and function application determined, then the list is annotated with types. [e0, el . . . . e,] becomes [e ~:t0, e'l: tl . . . . e ', : t,] where e ; : ti indicates that e; has inferred type ti and e; is the fully determined abstract syntax corresponding to e~. A substitution ~p, returned by the typing of the elements of the list, is also applied to the types of each of the elements in the list providing a unified type context for the types of all the elements of the original list. At this point, order of application can be inferred from the list of expressions.

The algorithm is given as a four state machine with a stack. The input to the machine is a typed list, a substitution and an empty stack. If the machine succeeds on an expression, the output is a fully determined tree of function applications, the expression's type, and the resulting substitution. If the machine fails, an error is reported. In the first state, the machine moves left-to-right through the input expression looking for the next function to apply. In the second and third states, the machine collects a string of homogeneous binary infix applications, and in the fourth state the machine has reached the end of the expression and empties the stack. We begin

110 K~N SAILOR and CARL MCCROSKY

the description of the machine by describing the first state, which is also the initial state of the machine.

The machine is given an English description, followed by functional pseudo-code to provide a precise definition. The pseudo-code is similar to simplified Miranda, using standard features of functional languages like pattern matching. Lists are enclosed in square brackets, with elements separated by commas. The pattern [] denotes the empty list, and [h/t] represents a list made by adding element h to the head of the list t. Tuples are denoted by parenthesized, comma separated elements. The syntax e : t is the pair of an expression and its deduced type. Where clauses are used for local definitions. A parenthesized grouping(e0 el) represents the application of e0 to el.

The stack used by the algorithm is a list of sum-of-products types defined as:

Stackelem = homo (List Expr) Type + nonGather Expr Type + gather Expr Type

where the constructors homo, nonGather and gather create stack elements, Expr is an abstract syntax expression and Type is the type of an abstract syntax expression. Each element on the stack is a partial function application. The stack contains expressions to the left of the current expression in the informal algorithm. The constructors label the expressions to distinguish between expressions that gather, those that don't gather, and those that are partial homogeneous function applications. The homo stack elements are strings of homogeneous infix function applications where function application will not be fixed until precedence can be considered.

Certain helper functions are used in the pseudo-code. In particular, tcap tests if Milner's standard inference rule applies for given types. The expression tcapq~ tl t2 esS returns the tuple (resp, tp', t', es', S') where resp is true iff tl is the type of a function that can be applied to t2 under the substitution tp. If resp is true, then ~o' is the substitution needed to form the application, t ' is the result type of the application, es' and S' are the input expression list es, and the stack S with all included types updated by the substitution q~'. If resp is false, then the other elements of the tuple are undefined. The predicate homobinary determines if a type is an infix homogeneous binary function type. heterobinary determines whether a type is an infix heterogeneous binary function type. gathers determines if an infix function gathers its arguments. Other helper functions will be described as they are used.

State 1 of the machine is encoded as a function, foe, with seven rules corresponding to the rules of the informal algorithm. The rules are tried in order, until the conditions of a rule are satisfied. One rule will always apply. If there is a type error, it will only be reported when the machine is in its fourth state. The rules are:

1. (Termination) Termination is indicated by an expression list of length 1 and an empty stack. There are no more applications to determine.

foe ~o[e: t] [] = (~0, e, t) (1.1)

2. (Forced Prefix) If the entire expression list has been pushed on the stack, move to State 4, foestk, to test if the elements of the stack form a valid prefix application. The function convert pops the elements of the stack, converting them into expression:type pairs.

.

foe q~[ ] S = foestk tp (convert S) (1.2)

(Prefix) If the first expression applies to the second, form the application and leave this application as the first in the list of applications. As discussed in Section 4, the Prefix Rule is restricted and does not apply when the type of the expression to be applied is unknown and it has a function to its right or left. The function to its left is on the top of the stack. The function to its right is the second element of the list. The predicate prefOK can examine the types of the first and second expressions as well as the top of the stack to see if the rule should be permitted to be applied. In particular, prefOK guards against the undesirable application of ar unknown to an infix operator, as previously discussed.

A practical approach to type-sensitive parsing I I 1

foe cp [e0: to, el : tll es] S

= foe ~p'[(e0e~):t'les'] S' when (prefOK eo:toe~:h S) and resp (1.3)

where (resp, ~p', t', es', S') = tcap ~p to tmes S

4. (Non-gathering) If the top element of the stack is not a gather element, and its type applies to the head of the expression list, then form the application. This rule has two forms: one for each of the stack elements nonGather and homo. The nonGather form pops the stack putting the application at the head of the expression list.

foe tp [eo: to les] [nonGather pe ptLS]

= foe tp' [(pe e0): t 'ies'] S' when resp (1.4.1)

where (resp, ~0 ', t', es', S') = tcap ¢p pt to es S

In the homo case, the next argument of a string of infix applications has been found. The machine moves to the third state looking for the next infix function.

foe ~0 [%: t01es] [homo bs btLS]

= foe3 q~' es' S'[e01bs] t' when resp (1.4.2)

where (resp, q~', t', es', S') = tcap ~0 bt to es S

5. (Infix) If the second element of the expression list is a function that applies to the first, form the application. There are three cases. If the function is homogeneous binary, the machine moves into State 2 looking for the next argument. If the function is heterogeneous and gathering, then the application is pushed on the stack. If the function is heterogeneous but not gathering, then the application is left as the head of the expression list.

foe q~ [e0:t0, e I :h les] S

= foe2 q~' es' S'[e~, %] t' when homobinary tj and resp (1.5.1)

= foe ~0' es'[gather (e~ e0) t'lS']

when gathers e~ and resp (1.5.2)

= foe ¢'[(el e0) [es'] S'

when heterobinary el and resp (1.5.3)

where (resp, q~', t', es', S') = tcap ¢ tt to es S

6. (Gathering) If the top element of the stack is a gathering function, test if it applies to the head of the expression list. Similar to Rule 4, this rule allows functions to gather compound arguments by its decreased priority.

foe q~ [e0: to les] [gather he htl S]

= foe q~'[(he e0):t'les'] S' when resp (1.6)

where (resp, q~', t', es', S') = tcap ~o ht to es S

7. (Blocked) No rule applies for the head of the expression list. Push the head onto the stack as a NonGather stack element.

foe tp [%: t0l es] S = foe ~0 es [nonGather e0 t01S] (1.7)

The second and third states, represented by the functions foe2 and foe3, are states in which strings of homogeneous binary infix functions and arguments are recognized. With foe2, an operator has been found and an argument is being looked for. With foe 3 an argument has been found and the next operator in the string is being looked for. Strings are collected to the maximum length before precedence is used to determine order of application. Given a string of operators and arguments, fixPrec is any function that returns an expression respecting the precedence and assoeiativity (left or right) of the operators involved (see Ref. [5], for example).

112 KEN SAILOR and CARL McCgosgr

The rules for foe2 are:

1. (Argument) If the infix applications, push the string as a homo stack element.

foe2 ~p [%: t01 es] S bs bt

= foe3 cp' es' S'[e01bs] t' when resp

= foe cp [e0: to[ es] [homo bs bt[ S] when not resp

where (resp, ~p', t', es', S') = tcap cp bt to es S

next element in the expression list is a valid argument for the string of then add it to the string. If the next element is not an argument, then

(2.1.1)

(2.1.2)

2. (Homo Partial) If the input list is empty, then the current string is a partial infix application--such as (x + y . ) - - the string collected is converted into an expression by fixPrec. Control is returned to the first state, foe, to handle the possibly non-empty stack.

foe2 tp [ ] S bs bt = foe tp [(fixPrec bs):bt] S (2.2)

The rules for foe3, representing State 3, follow, foe3 looks for the next operator.

1. (Operator) If the expression list is not empty, test if the next expression is a valid operator for the string gathered so far. If so, add it to the string and go back to foe2 for the next argument. If not, the end of the infix string has been reached; collapse it into an expression and return to State 1, foe.

foe3 q~ [e0: t01 es] S bs bt

= foe2 tp' es' S' [e0 1 bs] t ' when homobinary to and resp (3.1.1)

= foe ~p [(fixPrec bs) :bt, e0: to l es] S otherwise (3.1.2)

where (resp, tp', t', es', S') = tcap cp to bt es S

2. (Collapse) If the expression list is empty, collapse the string collected into a single application and return to State 1.

foe3 cp [ ] S bs bt = foe tp [(fixPrec bs): bt] S (3.2)

If the entire expression list has been pushed on the stack through the application of Rule 7 in State 1, then the machine moves to State 4. In this state, the converted, reversed stack is tested to determine if it represents a valid prefx application. This is determined through repeated application of a rule much like the Prefix Rule of State 1.

1. (Prefix 4) As long as the list has at least two elements, apply this rule to create the next application. Typing errors are reported when prefix application fails.

foestk ~p [e0: to, e~ : t I l es]

= foestk cp'[(e0 e~ ): t'l es'] when resp (4.1.1)

= report error when not resp (4.1.2)

where (resp, cp', t', es', [ ]) = tcap ~p to h es [ ]

2. (Termination 4) Similar to State 1, foe, foestk terminates when the expression is reduced to a length of 1.

foestk cp [e:t] = (~p, e, t) (4.2)

6. EXAMPLES

Freedom of expression allows for a type-sensitive interpretation of syntax, easily handling the examples of Section 2. We present several cases where type information is available to give an

A practical approach to type-sensitive parsing 113

"3 > 4 o r 5 > 6 "

foe [ ] [3:Int, >:ot--~ot--cBool, 4:Int, or:.Bool--*Bool--¢Booi, 5:lMt, >:a.--,,~--t.Bool, 6:Int] [ ]

foe [ ] [4:Int, orz.Bool--tBool--¢BooL 5:Int, >:ct---~--~Boei, 6:Int] ~ (> 3) Int--~Bool] (1.5.2)

foe [ ] [(> 3 4):Booi, Or:Bool--*Bool--cBool, 5:InL >:a-cax--~BooL 6:lnt] [ ] (1.6)

=* foe2 [ ] [5:Int, >:ct---~a--~Bool, 6:Int] [ ] [or, (> 3 4)] Bool-~Bool (1.5.1)

=* foe [ ] [5:InL >:ot---)ot---cBool, 6:Int] [homo [or, (> 3 4)] Bool.--cBool] (2.1.2)

:=t, foe [ ] [6:Int] [gath~ (> 5) lnt---)BooL homo [or, (> 3 4)] Bool--*Bool] (1.5.2)

=--* foe [ ] [(> 5 6):Bool] [homo [or, (> 3 4)] Bool---~Bool] (1.6)

~ f o e 3 [ ] [ ] [ ] [(> 56) ,o r , (>34 ) ] Bool (1A.2)

foe [ ] [(or (> 34)(> 56)):Booll [ ] (3,2)

~ ( [ l, (or (> 34) (> 56)) , Bool) (1.1)

Fig. 4. Resolving 3 > 4 or 5 > 6 to or ( > 3 4 ) ( > 5 6).

introduction to the workings of the algorithm and then discuss the role of the prefOK constraint in type inference.

Consider 3 > 4 or 5 > 6. A trace of this example is given in Fig. 4. The numbers in the right hand column refer to the clause of the algorithm producing the rewrite on that line.

Compare the algorithm's progress through (3 + 4) and (reduce + O) in Fig. 5. The second case requires a non-empty substitution, represented by an association list of type variables and types. In the first case, addition is applied infix. In the second case, reduce does apply to addition, and even if addition could be applied to reduce the prefix rule, Rule 3 of State 1, would take priority.

Figure 6 gives a variety of function definitions together with their inferred order of application and typing. The algorithm output is indented beneath each definition, fI and f2 demonstrate that FOE interprets expressions similarly to standard precedence techniques, f3,f4 and f5are definitions of the same function, isalpha, a predicate to determine if a character is alphabetic. In the first comparison of x to 'a' in f3, although the type of x is not constrained by the type of < , or can not be applied to the character 'a', constraining x to a character variable. In f4, parentheses are required around this comparison because the positions of 'a' and x are switched. < is a gathering operator and the type of x is unconstrained, so and can apply to x which will result in a mistyping if the parentheses are eliminated. Parentheses are not required for the second occurrence of the substring "x and x" because the left-most occurrence of x has determined the type before the substring is considered. Alternately, if the programmer supplies the typing, as in f5, then f5"s first parameter must be of type Character, enabling the correct interpretation without parentheses, f 6 gives an example of a compound infix function.

"3 + 4" foe [ ] [3:Int, +:Int--*Int---~Int, 4:Int] [ ] =* foe2 [ ] [4:Int] [+, 3] Int--t, Int (1.5.1) =~ foe3 [ ] [ ] [4,+, 3] Int (2.l .1)

foe [ ] [((+ 3) 4):Int] 0 .2 ) ( [ ], ((4- 3) 4), In0 (1.1)

'~+o" foe [ ] [reduce:(ct-~-~[3)--~ff--~List ct--~p,+:Int--~Int--*InL 0:Int] [ ]

foe [(a,lnt), (ll,lnt-~In0] [(reduce +):Int--~List Int---~InL 0:lnt] [ ] =* foe [(otJn0, (~,Int--~In0] [((reduce 4-) 0):List Int--~Int] [ ]

([(a,In0, (l~,Int--~In0], ((reduce +) 0), List Int--~In0

(1.3)

(1.3)

(1.1)

Fig. 5. Different contexts for addition.

114 KEN SAILOR and CARL McCROSKY

fl x y z = f f i x + z , y < y * y + x fl x y z==< (+x ( ,zy)) (+ ( , y y) x) fl : Int --~ Int --~ Int --~ Bool

f 2 x y z m - ~ x + y * z ^ 2 * x + 3 f2 x y z -~- + (+ x (* (*y (^ z 2)) x)) 3 f2 : Int --~ Int --~ Int -~ Int

f3 x -~ x >= 'a' and x <= 'z' or 'A' < = x and x < = 'Z'

f3 x == or (and (>= x 'a') (<= x 'z')) (and (<= 'A' x) (<= x 'Z'))

f3 : Character --> Bool

f4 x == ('a' <= x) and x <= 'z' or 'A' <= x and x <= 'Z'

f4 x == or (and (<= 'a' x) (<= x 'z')) (and (<= 'A' x) (<= x 'Z'))

f4 : Character --~ Bool

f5 : Character --~ Bool f 5 x ~-~ 'a' < = x and x < = 'z' or 'A' < = x and x < = 'Z'

f5 x == or (and (<= 'a' x) (<= x 'z')) (and (<= 'A' x) (<= x 'Z'))

f5 : Character -~ Bool

/,eachboth : (A --hB --#C) --h Array A---~ Array B --~ Anay C,/

f6 a b c ==- a (eachboth +) b (eachboth +)c

f6 a b c ~ cachboth + (eachboth + a b) c

f6 : Array Int --~ Array Int --~ Array Int --~ Array Int

/ , pick : Array A --# Array A ---~ A*/ f7 a ~ [2] pick a < [3] pick a

f7 a ~ < (pick [2] a) (pick [3] a) f7 : Array A -~ Bool

Fig. 6. Example freedom of expression results.

7. I M P L E M E N T A T I O N A N D P E R F O R M A N C E IN T H E F A L A F E L T R A N S L A T O R

The Falafel t ranslator is a full t ranslator for the Falafel language, used for more than research on freedom of expression. It is implemented in L M L [10] as a translator f rom Falafel to LML. This implementat ion uses the flat array primitives o f L M L as a basis for the multidimensional arrays o f Falafel. A parser generator is used to generate an LR(1) parser f rom a yacc-like grammar , decorated with action routines coded in LML. All other aspects o f the translator are hand-coded, comprising 6500 lines o f L M L code. The translator has been used in other research, including two theses, [11] and [12], and has proven to be both reliable and practical. Translat ion time is acceptable.

The time complexity o f the pushdown au toma ton version o f freedom o f expression is at best the complexity o f type inference because freedom o f expression is an extension to the type inference algorithm. Al though type inference is exponential (because the size o f the type o f an expression may be exponential in the size o f the expression), the cost o f type inference in practice is reasonable. FOE-extended type inference adds a cost to type inference which is worse by at most a constant factor. In each o f the first three states, for each expression in the expression list, either one expression is pushed onto the stack or two expressions are merged into one. In the fourth state there is no stack, so two expressions must be merged at each step. Each expression is considered at the head o f the list at most twice. Since standard type inference must consider each application anyway, FOE extended type inference considers each expression at most 8 times: at most seven times in State 1 and once again in State 4. Slowing down type checking by a constant factor is not desirable, but checking the application rule is not the sole work o f type inference. Only a por t ion o f the type inference algori thm suffers the cost o f FOE. In practice, the cost o f

A pract ical a p p r o a c h to type-sensi t ive pars ing 115

h e a r G C s ~ U t ime

a~g n:m ~ Rgi& i ~ ath.fal 4 .45 4 .46 16 16 20.71 20.17 mlib . fa l 1.01 1.04 l 1 12 5 .17 5 .08 mlib2.fa l 1.99 2 .02 16 16 10.69 10.75 db. fa l 1.83 1.86 14 15 10.58 9.91 totals 9 .28 9 .38 57 59 47 .15 45.91

l i g FOE 22.1 22 .0

6 .0 5 .9 11.8 11.8 11.5 10.9 51 .4 50 .6

Fig. 7. Statist ics on the t rans la t ion of sample programs.

FOE-extended type inference is negligible. In our implementation FOE appears to be slightly more heap intensive than straight prefix parsing with standard type inference. If the extra heap required happens to trigger garbage collection, then the cost is at most 5% higher. If an extra garbage collection is not triggered, as is most often the case, standard and FOE-extended type inference run in approximately the same time. Sometimes FOE is actually faster.

Versions of the Miranda standard environment, definitions of More's array theoretic functions [13], and sample database and splay tree programs have been used as test cases for FOE. Figure 7 presents translation statistics for a set of test programs for the experimental Falafel compiler [14] as run on a Sun Sparc 1 +. The columns labelled "pref" are statistics gathered for prefix versions of the programs using standard type inference. The columns labelled "FOE" are statistics for versions of the programs taking advantage of full FOE. The statistics are for full translation which includes parsing and translation into the current target language, LML [10], but are informative in showing how inexpensive FOE is in practice. The heap figures are megabytes of heap used; the GC column is the number of garbage collections required; CPU and user time are elapsed time in seconds. It can be seen that heap use for FOE is slightly greater while processing time is slightly less.

8. CONCLUSION

Freedom of expression delivers practical type-sensitive parsing. We have presented an algorithm that achieves this and in the worst-case costs a constant factor (in the length of the expression) more than standard type inference without any compromise to type inference. In practice, our implementation's running costs are indistinguishable from standard type inference. This is a significant practical advance over the work reported in [1], and we believe this to be a significant advance over current parsing techniques.

This paper presents a portion of the work of the first author's thesis work [Sailor, 1993 # 250]. In the thesis, freedom of expression is examined in a variety of settings. One of the more surprising results is that the extensions to the Hindley/Milner type system in the programming language Haskell do not affect freedom of expression. Exactly the techniques developed in this paper provide freedom of expression for Haskell. Similarly, freedom of expression parsing techniques have been developed for Pascal [15] and the object-oriented language Eiffel [16]. This lends further credibility to our claim that freedom of expression is a generally applicable technique. In addition, it provides yet another argument for the value of typed over untyped programming languages: typed programming languages provide the opportunity for sophisticated parsing.

Concern has been expressed that freedom of expression will give the malicious or inept programmer the ability to write obscure and misleading code. There is no way under any discipline, however, to prevent a programmer from writing bad code. It is our intention to provide a tool that corresponds more to the way humans think about programs. When we write programs, we maintain an intimate awareness of the meaning of the tokens used. To a certain extent, type-sensitive parsing achieves this mechanically. It is our hope that programmers can use this tool to write more readable code.

The implementation of this algorithm is available for distribution.

116 KEN SAILOR and CARL MCCROSKY

R E F E R E N C E S

I. McCrosky, C. and Sailor, K., A synthesis of type-checking and parsing. Comput. Lang. 18: 241-250; 1992. 2. Hudak, P., Jones, S. P. and Wadler, P. Report on the programming language Haskell. ACM SIGPLAN Notices 27(5):

p. Section R; 1992. 3. Turner, D. A. Miranda--a Non-strict Functional Language with Polymorphic Types. In Conference on Functional

Programming Languages and Computer Architecture. Nancy: Springer Verlag; 1985. 4. Mairson, H. G. Deciding ML typability is complete for deterministic exponential time. In 17th ACM Symposium on

the Principles of Programming Languages. San Francisco: ACM; 1990. 5. Tremblay, J.-P. and Sorenson, P. G. The Theory and Practice of Compiler Writing. New York: McGraw-Hill Book

Company; 1985. 6. Milner, R. A theory of type polymorphism in programming. J. Comput. Sys. Sci. 17: 348-375; 1978. 7. Peyton Jones, S. L. The Implementation of Functional Programming Languages. Prentice-Hall Series in Computer

Science (Edited by Hoare, C. A. R.) London: Prentice-Hall; 1987. 8. Wadler, P. and Blott, S. How to make ad hoc polymorphism less ad hoc. In 16th ACM Symposium on the Principles

of Programming Languages. Austin, Texas; 1989. 9. Damas, L. and Milner, R. Principal type-schemes for functional programs. In 9th ACM Symposium on the Principles

of Programming Languages. Albuquerque, NM; 1982. I0. Augustsson, L. and Johnsson, T. The Chalmers lazy-ML compiler. Comput. J. 32(2): 127-141; 1989. 11. Ghavamnia, M. Generating parallel distributed code for a functional array language, University of Saskatchewan; 1993. 12. Wang, L. Static inference of permissible destructive updates in functional languages, University of Saskatchewan; 1992. 13. More, T. Notes on the diagrams, logic, and operations of array theory. In Structures and Operations in Engineering

and Management Systems (Edited by Franksen, B. A.). Trondheim, Norway: Tapir; 1981. 14. McCrosky, C. The elimination of intermediate containers in the evaluation of first-class array expressions. In 1EEE

International Conference on Computer Languages. Miami Beach: IEEE Computer Society Press; 1988. 15. Jensen, K. and Wirth, N. Pascal User Manual and Report, ISO Pascal Standard, 3rd Edn. New York: Springer-Verlag;

1985. 16. Meyer, B. Eiffel: The Language (Version 3.0). New York: Prentice Hall; 1992.

About the Author--KEN SAILOR received a PhD (1993) and an MSc (1990) in Computational Science from the University of Saskatchewan. He received his BA from the University of California at Berkeley and did further undergraduate work in Computer Science at the University of Toronto. Ken is presently employed by Digital Systems Group Inc, Saskatoon, Saskatchewan, Canada where he works on control software for an advanced ATM switch. As part of his duties he also serves half time as an Adjunct Professor in Computational Science at the University of Saskatchewan. Dr Sailor's research interests include: (1) type-sensitive parsing algorithms; (2) the implementation of functional array languages; and (3) the use of partial evaluation to optimize telecommunications software stacks.

About the Author---CARL McCROSKY received a PhD (1985) in Electrical Engineering and an MSc in Computer Science from Queen's University, Kingston, Canada. He received a BA in political science from the Centre College of Kentucky in Danville. Carl is presently a Professor of Computational Science at the University of Saskatchewan (since 1985) and an Adjunct Professor at TR Labs in Saskatoon, Saskatchewan. He was a founding partner of Andyne Computing Ltd, Kingston, Canada. Professor McCrosky has several research interests: (l) high speed cell relay switches, networks, and congestion control algorithms; (2) synthesis and testing of digital systems; and (3) the definition, optimization, and parallel evaluation of first-class array languages.