Refinement Types for Program Analysis

Re�nement Types for Program AnalysisMario Coppo, Ferruccio Damiani and Paola GianniniUniversit�a di Torino, Dipartimento di Informatica, Corso Svizzera 18510149 Torino (Italy)Abstract. In this paper we introduce a system for the detection andelimination of dead code in typed functional programs. The main ap-plication of this method is the optimization of programs extracted fromproofs in logical frameworks but it could be used as well in the elimina-tion of dead code determined by program specialization. Our algorithm isbased on a type inference system suitable for reasoning about dead codeinformation. This system relays on re�nement types which allow to ex-ploit the type structure of the language for the investigation of programproperties. The detection of dead code is obtained via type inference,which can be performed in an e�cient and complete way, by reducingit to the solution of a system of inequalities between type variables. Akey feature of our method is that program analysis can be performedin a strictly incremental way. Even though the language considered inthe paper is a simply typed �-calculus we can generalize our approach topolymorphic languages like ML. Although focused on dead code elimina-tion our type inference method can also be applied to the investigationof other program properties like binding time and strictness. Some hintson these applications are given.IntroductionTypes have been recognized as useful in programming languages because theyprovide a semantical (non context free) analysis of programs. Such analysis isusually incorporated in the compiling process, and is used on one side to checkthe consistency of programs and on the other to improve the e�ciency of thecode produced.In addition to preventing run-time errors, type systems can be useful forcharacterizing run-time properties of programs. For instance intersection types,see [4] (and also [1]), in their full generality, provide a characterization of strongnormalization. As a consequence of this their type checking is undecidable, andtherefore cannot be used in the compiling process of functional languages.Type systems tailored to speci�c analysis, such as strictness, totality, bindingtime analysis etc. have been introduced, see [17, 12, 5, 13, 18]. In this perspectivetypes represent program properties and their inference systems are systems forreasoning formally about them. In this paper we keep a clear distinction betweenthe type structure of the language (types in the usual sense) and the re�ne-ment types which represent, inside the type structure of the language, particularproperties. This distinction is very useful in the design of inference algorithms.

A similar view is taken in [8], in which the authors attempt to unify the anal-ysis, identifying a basic type structure that can be extended to incorporate thevarious analyses.The attraction of the type based approach is the possibility of designing ef-�cient algorithms to do the analysis, as compared to the classical approach ofsemantical analysis using abstract interpretation which is quite ine�cient forhigher order functions. Type based analyzers are based on an implicit represen-tation of types, either via type inequalities, see [11], or via lazy (implicit) types,see [8]. In our approach we pursue the �rst approach, reducing the inferenceproblem to the solution of a system of inequalities between atomic types. Aninteresting feature of our method is that it is naturally compositional.Type analysis is also used in the area of program extraction from formalproof. The programs extracted from proofs are usually very ine�cient, as theycontain parts that are useless for the computation of the �nal result; they there-fore require some sort of simpli�cation. One of the more e�ective simpli�cationtechniques is \pruning", and has been developed by Berardi, see [2]. In this tech-nique useless terms are discovered by analyzing the type of terms. Such termsare called \dead code". The method was improved in [3] with the use of typeinclusion. With type inclusion an application is well typed if the argument hasa type included in the input type of the corresponding function. There are somebasic problems with the method of [3]. The optimization algorithm is rather dif-�cult to understand and this makes its proof of correctness even more di�cultto follow. The method presented in this paper seems much more self-evident.Moreover in Berardi's algorithm it is assumed that the input term is typed, andthe input type information is used for the analysis. So there does not seem to bea clear way to generalize the analysis to untyped terms (or better to do type in-ference and \pruning" at the same time). In [16] another formalism to representinformation about dead code, based on the idea of marking terms and types, ispresented. However no marking algorithm is given. This approach seems relatedto ours.In our paper we present an inference system for detecting \dead code", andan algorithm that simpli�es �-terms based on the system of [3]. Although oursystem is tailored for dead code analysis we will show in the last section of thepaper how its basic ideas can be applied to the design of inference algorithmsfor the investigation of other kinds of program properties.The inference system for dead code relies mostly on Berardi's ideas. Thelanguage we consider is a typed (�a la Church) �-calculus with constants fornatural numbers, pairs and recursor. The types are then built from the typeof natural numbers, nat, using arrow and cartesian product. Starting from atyped term we infer properties of the term. We call such properties re�nementtypes. For the dead code analysis we need two re�nements for the basic typenat. The �rst, � corresponds to the idea that the value may be used, and soit could only be replaced with a term with the same behaviour (�-equal). Thesecond, !, corresponds to the fact that the value is not used, and so it doesnot matter what the term is (it could be any constant of the same type). We

consider an order relation v on re�nement types whose interpretation is that atype is more informative than another, in particular �v!. These properties arepropagated to higher types, for instance if a function of type nat! nat has there�nement types � ! ! or ! ! ! then the whole term will not be used (andalso any free assumption on which it depends). When we apply a function to anargument we require that the argument have a re�nement type that allows itsuse in every occurrence of the parameter. So it must be more speci�c, v, thanthe re�nement type of the formal parameter. The soundness of this system and ofthe optimizing transformation that it induces is proved via a partial equivalencerelation semantics of the re�nement types, showing that the optimized programsare observationally equivalent to the original ones.Let us �rst consider a simple example. Let M = (�xnat:�3)P where �3 is thenumeral 3 and P is a term of type nat. Since x is never used in the body of thelambda we can assign the re�ned type ! ! nat to �x:�3 so we discover that Pis not used in the computation of M and could be replaced by any constant ofthe right type. The information about dead code can be propagated. Considerhaving a �-calculus with pairs h ; i and projections �1 and �2. LetM be the term(�x:�1hP;Qi)N for some term P , Q, and N . Since the projection �1 returns the�rst component of the pair the term Q is dead code. Moreover, assume that Pdoes not contain any occurrence of x, in this case also N is not relevant to thecomputation of the �nal result. So M in a call by name language behaves likethe term P , which is of course simpler.The main di�erence between our approach and that of [2] and [3] is in thealgorithm that �nds the optimized version of a given term. The algorithm pre-sented in these papers is a kind of \data ow" algorithm that analyzes a term byimplicitly building a directed graph which represents the input-output relationbetween the subterms of a given term. In our approach instead we assign toeach term a system of inequalities between re�nement type variables which canbe seen as representing the whole ow of input-output information of the term.This system has always a maximal solution corresponding to the most informa-tive typing of the term. Detecting the best optimization of a term correspondsto looking for a solution that maximizes the number of !'s with the restrictionthat the whole term must be useful (cannot be re�ned to !). Such a solutioncan be found in a time proportional to the number of atomic type occurrencesin the complete typing of a term. The proof of correctness and completeness ofthe algorithm is rather easy. An important feature of our algorithm is that it isnaturally compositional while that of [2] and [3] is not.Our analysis is for a strongly normalizing language, so we did not considertermination issues. To apply the method to languages including a �xed pointoperator we need a richer model including some sort of termination order on theterms.It would be easy to extend the algorithm to deal with Curry-style polymor-phism, see [9]. So for instance one could think of doing both type and re�nementtype inference at the same time. From the algorithmic point of view this exten-sion is quite easy. In this framework it is, however, more di�cult to understand

the system from a semantical point of view. We can also deal with Milner-stylepolymorphism, see [15]. That is the presence of the \let x = N in M" construct.Here x in M can be assigned di�erent types, all instances of the same typescheme. In this case we have to handle the re�nements of the di�erent types ofN in M . The optimizations possible for N would be such that they eliminatecode that is useless according to all the di�erent re�nements of N .The underlying type system is not part of the analysis. So the technique couldbe applied to more informative type systems such as the decidable restrictionsof intersection types, like the rank 2 restriction (see [14]) or the simple typesof [6] and [7]. Of course having already, at the level of the type system, moreinformation the same re�nement type system would be more informative. Wethink that, maintaining the orthogonality of the two issues: typing and re�ningthe type, we can get the best out of each one of them.The technique presented can be extended to deal with other properties usefulin program optimization, such as binding time analysis and strictness analysis.The �rst section introduces the language we are dealing with and the re�ne-ment type assignment system along with its semantics. In the second section weintroduce a code optimization based on re�nement type information, in partic-ular we show that a term and its optimized version are equivalent. In the thirdsection we introduce an algorithm for re�nement types inference. The algorithmis articulated in two phases. First given a term we de�ne a set of inequalitieswhose solutions induce all the derivations of re�nement types for the term. Thenwe give an algorithm that �nds the maximum solution of a system of inequali-ties. In the last section we outline how the technique of re�nement types can beused to study binding time and strictness analysis.1 A Type Assignment for Proving PropertiesThe aim of this section is to introduce a typed functional language (basically atyped �-calculus with cartesian product and arithmetic constants) and a typeassignment system for deriving ow properties of typed terms. The set of typesis de�ned assuming as unique basic type nat, the set of natural numbers.De�nition 1 (Types). The language of types (T ) is de�ned by the followinggrammar: R ::= nat j R! R j R� R.Types are ranged over by R, S, ... Typed terms are de�ned from a set of typedterm constantsK� = f 0nat; succ nat!nat; recnat!R!(nat!R!R)!R ; it nat!R!(R!R)!R ;case nat!R!(nat!R)!R; if nat!R!R!R j R 2 T g ;(ranged over by C S), and a set V� of typed term variables (ranged over byxR; yS ; : : :). The choice of the constants has been done in view of an applicationto the optimization of terms extracted from proofs. We write � `T M R to meanthat M is a typed term of type R whose free variables are among the variablesin the context � . We use this notation since it allows to attach a type to all

subterms of M . Note the di�erence with the more usual notation M : R inwhich this is not possible. The set of well typed (well decorated) terms is de�nedby the rules of the following de�nition.De�nition 2 (Typed terms). A typing statement is an expression � `T M Rwhere � is a context, i.e., a set of variables xR containing all the free variablesof M . The rules for term formation are the following:(Var) �=x [ fxRg `T xR (Con) � `T C R(! I) �=x [ fxRg `T M S�=x `T (�xR:M S)R!S (! E) � `T M R!S � `T N R� `T (M R!SN R)S(�I) � `T M1 R1 � `T M2 R2� `T hM1 R1 ;M2 R2 iR1�R2 (�El) � `T M R1�R2� `T (�lM R1�R2 )Rl l 2 f1; 2g :Note that with this notation we explicitly mention inM the types assigned to allits subterms. In the following we often omit to write types which are understood.As usual a substitution is a �nite function mapping term variables in terms,denoted [x1 := N1; : : : ; xn := Nn], which respect the types, i.e., each xRii issubstituted with a term NRii of the same type.Let �T be the set of all typed terms which are de�ned according to theprevious rules, i.e., �T = fM R j � `T M R for some basis �g. We provide �Twith a standard operational semantics de�ned by a notion of reduction. Let !�denote the usual � reduction relation and !� denote the following reductionrelation for pairs: �lhM1;M2i !� Ml; for l 2 f1; 2g.The reduction !C determined by the constants is de�ned by the clauses:rec 0M F !C M it 0M F !C Mrec (succ n)M F !C F n (rec nM F ) it (succ n)M F !C F (it nM F )case 0M F !C M if 0M N !C Mcase (succ n)M F !C F n if (succ n)M N !C N :Let !r denote the union of !�, !� and !C , let !�r denote its re exive andtransitive closure, and let =r be the equivalence relation induced by !�r. Notethat every term in �T is strongly normalizable.The closed term model M of �T is de�ned by interpreting each type R asthe set of the equivalence classes of the relation =r on the closed terms of typeR. Let I(R) denote the interpretation of type R in this model, and let [M ]denote the equivalence class of term M . An environment is a mapping e : V� !SR2T I(R) which respects types, i.e., such that, for each xR, e(xR) 2 I(R). Theinterpretation of a term M in an environment e is de�ned in a standard wayby: [[M ]]e = [M [x1 := N1; : : : ; xn := Nn]], where fx1; : : : ; xng = FV(M ) and[Nl] = e(xl) (1 � l � n).We want to be able to represent more informationabout terms of a given type.If we consider the basic type nat, for instance, we want to represent the notionof terms of type nat whose evaluation will possibly be useful to get the �nalresult of a computation from that of terms (of the same type) whose evaluation

will certainly be useless to that aim. For instance take the term of type nat:M = (�xnat:Qnat)P nat, where x does not occur in Q. The evaluation of P willbe useless to the evaluation of M . To this aim we de�ne two re�nement typesof nat: � and !, which represent, respectively, the notion of values which are(possibly) necessary or (certainly) useless for the determination of the �nal valueof a computation, i.e., we identify � with (possibly) live and ! with dead. So wewill assign, for instance, type � to Q and type ! to P .Re�nement types are de�ned from f�; !g following the type constructionrules. So, for instance, !! � is a re�nement type of nat! nat (we denote thiswith ! ! � :: nat ! nat ) which informally represent the set of all functions fwhich yield a useful output whenever applied to an argument which is not usefulfor the termination of this output, like �x:Q above. This means that ! ! �characterizes all constant function of type nat! nat. In general we write � :: Rto mean that � re�nes type R.De�nition 3 (Re�nement types). The language R of re�nement types (r-types for short) and the re�nement relation :: are de�ned by the following rules:(nat) ' 2 f�; !g' :: nat (!) � :: R � :: S�! � :: R! S (�) � :: R � :: S�� :: R� S :It it easy to see that each re�nement type � re�nes a unique type R, denoted byT(�), i.e., we have � :: T(�). Moreover, if R is a type and ' 2 f�; !g, let R'(R)denote the re�nement type obtained from R by replacing each occurrence of thebasic type nat by '. We have obviously R'(R) :: R.!-re�nement types (!-r-types for short) formalize the notion of not beingrelevant to the computation at higher types.De�nition 4 (!-r-types). The set O of !-r-types is inductively de�ned by:! 2 O and if �; � 2 O and � 2 R then � ! � 2 O and � � � 2 O.We now introduce a notion of inclusion between r-types, denoted v, that meansto be less informative.De�nition 5 (Inclusion relation). The inclusion relation v between re�ne-ment types is de�ned by the following rules:(�) �v� (!) � 2 O T(�) = T(�)�v�(!) �1v�2 �1v�2�2 ! �1v�1 ! �2 (�) �1v�2 �1v�2�1 � �1v�2 � �2 :Note that all !-r-types which re�nes the same type are considered equivalentwith respect to the v relation. Moreover we have immediately that for all �1; �2 2R, �1v�2 implies T(�1) = T(�2).Re�nement types are assigned to typed �-terms by a set of type inferencerules similar to these of the ML type inference system.If xR is a term variable of type R an assumption for xR is an expression of theform xR : � , or x : � for short, where � :: R. A basis is a set � of assumptions,and we write � :: � to mean that � contains an assumption xR : � only if xR

is a variable in � . We will prove judgements of the form � `R M � where M isa typed term in a context � , i.e., such that � `T M R, � :: � and � :: R. Notethat we omit to denote explicitly the types of terms. Note also that we do notfollow the more usual notation � `R M : � since we are interested in keepingtrack of the r-types assigned to the subterms of M in a derivation.The type assignment rules are similar to the rules for term formation withthe only exception that we take into account the inclusion relation. Notice thatthe types of all the constants C can be described by TC [R], where TC [ ] is atype context (with some, possibly none, holes) and R is a type. For instanceT0[ ] = nat and Tif [ ] = nat ! [ ] ! [ ] ! [ ]. With �C [ ] we denote the r-typecontext obtained from TC [ ] by replacing nat with �. For instance �0[ ] = � and�if [ ] = � ! [ ] ! [ ] ! [ ]. Let � :: R, then �C [�] :: TC [R] is the r-type obtainedfrom TC [R] by replacing nat with � and R with �.De�nition 6 (Re�nement type assignment system). The rules for re�nedtype assignment are the following:(Var) �1v�2�=x [ fx : �1g `R x �2 (Con) � :: R �C [�]v�� `R C �(! I) �=x [ fx : �g `R M ��=x `R (�x �:M �) �!� (! E) � `R M �1!� � `R N �2�1 ! �v�2 ! �� `R (M �1!�N �2)�(�I) � `R M1 �1 � `R M2 �2� `R hM1 �1 ;M2 �2 i �1��2 (�El) � `R M �1��2� `R (�lM �1��2) �l l 2 f1; 2g :If � `R M � then M � has written in it the r-types assigned to its subterms.We say that M � is an annotated term. It is worth mentioning that, in the rule(! E), the condition �1 ! �v�2 ! � is used instead of �2v�1. This is becauseif �1 ! � is an !-r-type M can take any argument. Note that, being `R aninference system, the same terms can have di�erent annotations.Notice that the previous system is equivalent to the one obtained by removingthe use of inclusion in rules (Var), (Con), (! E) and by adding an explicitinclusion rule. We chose the presentation of De�nition 6 to get a syntax directedsystem which is more suitable to the de�nition of the type inference algorithm.The functions T andR de�ned above can naturally be extended to annotatedterms. T(M �) in particular is simply the termM in which all r-type annotationshave been erased. Is is immediate to see that: � `R M � implies T(�) `TT(M �), and � `T M R implies, for ' 2 f�; !g, R'(� ) `R R'(M R). This lastdeduction being obtained without the use of the v relation.We now introduce a notion of semantics for our type assignment system.We interpret each basic re�nement type � or ! as a partial equivalence relation(p.e.r. for short) over the interpretation of type nat, i.e., the set of equivalenceclasses of closed terms of type nat with respect to =r . Let � denote the cartesianproduct of sets and [M ] denote the equivalence class of M in =r .

De�nition 7. 1. The interpretation [[�]] of an r-type is de�ned by:[[�]] = fh[n]; [n]i j [n] 2 I(nat)g [[!]] = I(nat)� I(nat) [[�� ]] = [[�]]� [[�]][[�! �]] = fh[M ]; [N ]i j 8h[P ]; [Q]i 2 [[�]]:h[MP ]; [NQ]i 2 [[�]]g :2. By �� we denote the equivalence relation [[�]] on I(T(�)). Two environmentse1, e2 are �-related if and only if, for all x � 2 �, e1(x) �� e2(x).3. Let � `R M � and � `R N �. We write M �� N to mean that for all e1,e2, if e1 and e2 are �-related, then [[M ]]e1 �� [[N ]]e2 .The v relation between re�nement types corresponds to inclusion of p.e.r. . Infact: �v� if and only if [[�]] � [[�]]. Moreover, if � :: R is an !-r-type, then[[�]] = I(R) � I(R), i.e., [[�]] is the p.e.r. which relates all pairs of elements ofI(R).We state now the main theorem for p.e.r. interpretation, which is standard(in various forms) in the literature. The proof of the following theorem is byinduction of terms.Theorem 1. Let � `R M �. Then M �� M .Let us now identify a subset of r-typings that assures a correct use of theoptimization mapping introduced in the next section.De�nition 8 (Faithful re�nement). � `R M � is a faithful re�nement typ-ing statement if � = R�(T(�)), and for all x : � 2 �, if � 62 O then � =R�(T(�)).Let (C[ ]R� )S denote a typed context of type S with a hole of type R in itwhich (possibly) binds variables in � . If � `T M R and � `T N R we say thatM and N are observationally equivalent (M =obs N ) if for all closed contexts(C[ ]R� )nat we have C[M ] =r C[N ].2 Dead Code EliminationIn this section we introduce an optimization mapping W that, given an anno-tated term M �, de�nes an optimized version of it. De�ne W(M�) to be theterm obtained by replacing all maximal subterms of M which are assigned an!-r-type � :: R by �, and de�ne W(�) = fx : � j x : � 2 � and � 62 Og, where� is a r-type assignment basis. We have immediately that if � `R M � thenW(�) `R W(M �) and W(�) � �.Example 1. Let � `T M R where � = fx natg, R = (nat! nat)! nat andM R = �fnat!nat:f((�znat:3)(f x)):It is easy to check that � `R M 0 �, where � = fx : !g, � = (� ! �) ! � andM 0 � = (�f�!� :(f�!�((�z!:3�)!!�(f!!!x!)!)�)�)�; is a faithful r-typing.Applying the W optimization mapping we get W(�) `R W(M 0 �), whereW(�) = fg and W(M 0 �) = (�f�!� :(f�!�((�z! :3�)!!�!)�)�)�, and, erasingthe r-type annotations, T(W(�)) `T T(W(M 0 �)), whereT(W(�)) = fg and T(W(M 0 �)) = �fnat!nat:f((�znat:3)nat). 2

The following result follows easily from the r-type semantics.Theorem 2. If � `R M � then T(M �) �� T(W(M �)).This means that if � `R M � then M and its optimized version are equivalent in�. The function T has been inserted just to point out that the r-type informationhas a purely static nature and is not relevant in the formation rules and in theoperational semantics of terms.This result is especially interesting when the typing of M is faithful since,using the above theorem, we can prove that that if � `R M � is a faithful typingstatement then T(M �) and T(W(M �)) are observationally equivalent.Theorem 3. Let � `R M � be a faithful typing. Then T(M �) =obs T(W(M �)).Remark 1. A strong optimization function S could be inductively de�ned onterms in such a way that, if � `R M 0 � is the faithful r-typing of Example 1,then T(S(�)) = fg and T(S(M 0 �)) = �fnat!nat:f 3.The mapping S could be de�ned following [2]. The optimizations performed byS can be much stronger than those performed by W. In certain cases S canalso replace the constants of the language transforming, for instance, a recursorrec �!�!(!!�!�)!� in an iterator it �!�!(�!�)!� : A deeper analysis of the r-type structure of the term can allow to detect where this stronger optimizationcan be done. We do not go into those details since this kind of analysis simplyuses the r-type structure determined by our algorithm. 23 An Algorithm for Re�nement Types InferenceIn this section we deal with the problem of de�ning an algorithm for programoptimization based on the system de�ned in the previous section. To this aimthe main problem is to use the inference rules to detect the maximal subtermsto which !-r-types can be assigned. The application of the optimization functionW is then trivial. The algorithm, given a typing of a term, returns a decorationof the term with re�nement patterns and a set of inequalities between re�nementvariables. The output of the algorithm characterizes all the possible r-typings ofthe term.We start by de�ning the notions of r-type pattern and r-type scheme.De�nition 9 (Re�nement type schemes).1. Let U be the set of atomic variables, ranged by �, � , , : : :The language P of re�nement type patterns (r-patterns for short) is de�nedfrom the rules of De�nition 3 by replacing rule (nat) by the following rule:(U) � 2 U� :: nat ;r-patterns are ranged over by �, �,: : :2. A constraint is a formula of one of the following shapes:

{ �1v�2, where �1; �2 2 f�g [ U{ G ) E , where G is a �nite not empty subset of f�g [ U and E is a �niteset of constraints.3. A re�nement type scheme is a pair h�; Ei where � is a r-pattern and E is a�nite set of constraints.R-types and r-typings can be obtained from patterns by instantiation. A con-straint is simply an inequality (between atomic variables or the constant �) or aguarded set of constraints. For instance the set of constraintsf �3v�1; f�1; �2g ) f�3v�4; �5v�g gcan be read as \�3v�1 and if �1 = � or �2 = �, then �3v�4 and �5v�". Togive a meaning to constraints we have to say which are the solutions of a set ofconstraints.De�nition 10 (Renaming and instantiations).1. A renaming is a one{to{one mapping r : U ! U .2. An instantiation is a mapping i : U ! f�; !g.Both renaming and instantiation can be extended to constants by de�ning i(') =' and r(') = ', for ' 2 f�; !g.De�nition 11. Let h�; Ei be a scheme. An instantiation i satis�es E if{ �1v� 2 2 E implies i(�1)vi(� 2), and{ G ) E 0 2 E implies that, if � 2 i(G), then i satis�es E 0.The set of all the instantiations that satisfy E is denoted by SAT(E). A schemeh�; Ei represents all the re�nement types i(�), for any i 2 SAT(E).De�nition 12. Let i1, i2 be instantiations. i1vi2 means that, for all � 2 U ,i1(�)vi2(�).Let E be a �nite set of constraints. The set SAT(E) is not empty and has amaximum element.Example 2. Consider the sets of constraints:E = f f�03g ) f�3v�03; �02v�2g;f�003g ) f�3v�003 ; �002v�2g;�1v�01;f�003g ) f�01v�002g;f�5g ) f�003v�4g;f�03g ) f�5v�02g g ; E1 = f f�1g ) f�1v�g;�2v�;�3v�;�03v� g :To �nd the maximum element i0 of SAT(E [ E1) observe that from the lastthree constraints of E1 we get i0(�2) = i0(�3) = i0(�03) = �. Then from the �rstconstraint of E we get i0(�02) = �, and �nally from the last constraint of E wehave i0(�5) = �. Let I = f�2; �02; �3; �03; �5g, then i0 de�ned by: i0(�) = � if� 2 I and i0(�) = ! otherwise, is the maximum instantiation in SAT(E [ E1).2

R-type inference of a term is reduced to the solution of a �nite set of constraints.A maximal instantiation then corresponds to the typing that shows the maximalamount of dead code. The algorithm for �nding the maximal instantiation i thatsatis�es a �nite set of constraints E is presented in natural semantics style usingjudgements E ; I where I is the set of atomic variables that represents i, i.e.,such that � 2 I if and only if i(�) = �. The idea is simply that of recognizing,following the inequalities, all the variables that are forced to represent �. Allother atomic variables are then replaced by ! in the maximal solution.De�nition 13 (\Natural semantics" rules for constraints solution).(AX) E ; ; if no other rule can be applied(ATOM) E[�=�] ; IE [ f�v�g ; I [ f�g (GUARD) E [ E 00 ; I � 2 GE [ fG ) E 00g ; IIt is easy to see that, given a �nite set of constraints E , we can �nd I such thatE ; I in a time linear in the number of constraints which occur in E .Proposition 1. Let E be a �nite set of constraints. Then E ; I if and only ifI represents the maximum of SAT(E).We can now proceed to de�ne the re�nement type inference algorithm. Thisalgorithm is presented in the natural semantics style using judgementsh�;M Ri =) h�;M 0 �; Ei where � `T M R, � is a basis that associates to eachterm variable in � a pattern, and M 0 � is a term annotated with patterns. Thenatural semantics rules follow the inference rules of De�nition 6 in a natural way.We will prove that h�; Ei is a scheme that represents exactly the re�ned typesassignable to M R. More precisely, for any � and M 00 � such that T(�) = �and T(M 00 �) = M R, we have that � `R M 00 � if and only if � = i(�) andM 00 � = i(M 0 �), for some i that satis�es E .To de�ne the algorithm we need some preliminary notations. Let R be atype. By fresh(R) we denote a pattern of the same shape of R such that to eachoccurrence of any atom in R is associated a fresh atomic variable. For example:fresh(nat! nat) = � ! � . For a basis � , fresh(� ) = fx : fresh(R) j xR 2 �g.The function vars maps an r-pattern � to its �nite set of atomic variables.The function tail, that maps r-patterns and r-types (not containing !) to �nitesubsets of f�g [ U , is inductively de�ned by: tail(� ) = f�g (for � 2 f�g [ U ),tail(� � �) = tail(�) [ tail(�), and tail(� ! �) = tail(�).Let �0, �00 be r-patterns or r-types (not containing !) of the same shape, at(�0v�00)denotes the set of constraints inductively de�ned by at(� 1v�2) = f� 1v�2g; if �1; �2 2 f�g [ U at(�1 � �1v�2 � �2) = at(�1v�2) [ at(�1v�2) at(�01 ! � � � ! �0n ! �0v�001 ! � � � ! �00n ! �00) =ftail(�00)) ( at(�0v�00) [S1�l�n at(�00l v�0l))g,where �0; �00 are not arrow r-patterns or arrow r-types.For all instances i, i(�)vi(�) if and only if i 2 SAT( at(�v�)).

De�nition 14 (\Natural semantics" rules for re�nement typings infer-ence).(VAR) � = fresh(� ) �1 = fresh(R) �2 = fresh(R)h�=x [ fxRg; xRi =) h�=x [ fx : �1g; x �2 ; at(�1v�2)i(CON) � = fresh(� ) � = fresh(TC [R]) � = fresh(R)h�; C TC [R]i =) h�;C �;Eiwhere E = at(�C [�]v�)(ABS) h�=x [ fxRg;M Si =) h�=x [ fx : �g;M 0 �;Eih�=x; (�xR:M S)R!Si =) h�=x; (�x �:M 0 �) �!�;Ei(APP) h�;M R!Si =) h�1;M 0 �1!� ;E1i h�;N Ri =) h�2;N 0 �2 ;E2i�1 = fx : r(�) j x : � 2 �2gh�; (M R!SN R)Si =) h�1; (M 0 �1!�N 0 �2) �; Eiwhere E = E1 [ r(E2) [ ftail(�)) at(�2v�1)gand r is a renaming of the atomic variables in �2(PAIR) h�;M1 R1 i =) h�1;M 01 �1 ;E1i h�;M2 R2 i =) h�2;M 02 �2 ;E2i�1 = fx : r(�) j x : � 2 �2gh�; hM1 R1 ;M2 R2 iR1�R2i =) h�1; hM 01 �1 ;M 02 �2i �1��2 ;E1 [ r(E2)iwhere r is a renaming of the atomic variables in �2(PROJl) h�;M R1�R2i =) h�;M 0 �1��2 ;Eih�; (�lM R1�R2)Rli =) h�; (�lM 0 �1��2) �l ;Ei l 2 f1; 2gThe rules are in this form to make them as readable as possible. They generateinequalities that could easily be avoided in a real implementation. Correctnessand completeness of the inference are expressed by the following theorem.Theorem 4. � `T M R and h�;M Ri =) h�;M 0 �; Ei imply1. for all instantiations i, if i satis�es E , then i(�) `R i(M 0 �)2. for all re�nement type assignment statements � `R M 00 � such that T(�) =� and T(M 00 �) = M R there exists i 2 SAT(E) such that i(�) = � andi(M 0 �) = M 00 �.We are interested in faithful re�nements, so we want to restrict the set of solu-tions of the constraints generated by the algorithm to those that correspond tofaithful re�nements. This can be done as shown by the following corollary.Corollary 1. Let � `T M R and h�;M Ri =) h�;M 0 �; Ei. Then i(�) `Ri(M 0 �) is a faithful re�nement type assignment if and only if the instantiationsi satis�es the set of constraints E [ faithful(�; �), wherefaithful(�; �) = Sx:�2�ftail(�)) f v�j 2 vars(�)gg [ f�v�j� 2 vars(�)g.The constraint tail(�) ) f v�j 2 vars(�)g means that either � is an !-r-typeor it contains only �'s.

Let � `T M R and h�;M Ri =) h�;M 0 �; Ei. If i0 is the maximum elementof SAT(E [ faithful(�; �)) then, by the results of Sect. 1, T(W(i0(M 0 �))) is thebest simpli�cation (in our sense) of M R with the same operational meaningw.r.t. =obs.Example 3. Let � `T M R be the typing of Example 1. Then h�;M Ri =)h�;M 0 �; Ei where � = fx : �1g, � = (�2 ! �3)! �03,M 0 � = (�f�2!�3 :(f�02!�03((�z�4 :3�5)�4!�5(f�002!�003 x�01)�003 )�5)�03)�; andE = at(�2 ! �3v�02 ! �03) [ at(�2 ! �3v�002 ! �003 )[f�1v�01; f�003g ) f�01v�002g; f�5g ) f�003v�4g; f�03g ) f�5v�02g gis the �rst set of constraints introduced in Example 2.The set faithful(�; �) is the set E1 in Example 2, so E [ faithful(�; �) ; I,where I is f�2; �02; �3; �03; �5g.Let i0 be de�ned by: i0(�) = � if � 2 I and i0(�) = ! otherwise.We have that i0(�) `R i0(M 0 �), where i0(�) = fx : !g, i0(�) = (� ! �) ! �and i0(M 0 �) = (�f�!� :(f�!�((�z! :3�)!!�(f!!!x!)!)�)�)i(�); is the faithful r-typing used in Example 1. 2When using the inference method to analyze a term M we will indeed pro-duce as output a set of constraints that characterizes all possible r-typings ofM . The subterms of M which are assigned !-r-types in a faithful typing areindeed certainly useless in any use of M , and can then be simpli�ed. More-over the constraints associated to them are irrelevant to further analysis of Mand can then be eliminated by the set of constraints for M . More precisely,if � `T M R, h�;M Ri =) h�;M 0 �; Ei and i0 is the maximum element ofSAT(E [ faithful(�; �)) then, the set of constraints E can be replaced by any setof constraints E 0 such thatfiji 2 SAT(E 0) and i0vig = fiji 2 SAT(E) and i0vig : (1)Example 4. Let E and i0 be the set of constraints and the instantiation of Ex-ample 3. Consider the following subset E 0 of E :E 0 = f f�03g ) f�3v�03; �02v�2g; f�03g ) f�5v�02g g :It is easy to see that any instantiation i0 that maps to ! more atomic variablesthan i0, i.e., such that i0vi0, satis�es all the constraints in E that are not inE 0. So, since only the instantiations i0 2 SAT(E) such that i0vi0 are relevant tofurther analysis, we can replace E by the simpli�ed set of constraint E 0 withoutloss of information. 2The following algorithm, given a �nite set of constraints E and an instantiationi0 2 SAT(E), returns a simpli�ed set of constraints E 0 such that (1) holds, andso can be used to eliminate from E all the constraints which are not relevant forfurther analysis.The algorithm is presented in natural semantics style using judgements E ; I �E 0, where I represents i0.

De�nition 15 (\Natural semantics" rules for simpli�cationof constraints).(AX) E; I � E if no other rule can be applied(ATOM1) E; I � E 0E [ f�v�g;I � E 0 (ATOM2) E;I � E 0 � 2 62 I [ f�gE [ f� 1v� 2g;I � E 0(GUARD1) E; I � E 0 G \ (I [ f�g) = ;E [ fG ) E0g; I � E 0 (GUARD2) E; I � E 0 E0;I � ;E [ fG ) E0g; I � E 0(GUARD3) E; I � E 0 E0;I � E 00 G \ (I [ f�g) 6= ; E 00 6= ;E [ fG ) E0g;I � E 0 [ fG ) E 00gProposition 2. Let E be a �nite set of constraints and let I represent i0 2SAT(E). Then E ; I � E 0 implies that (1) holds.Example 5. Consider the output of the re�nement type inference algorithm inExample 3, i.e., the triple h�;M 0 �; Ei: The dead code showed by the maximumfaithful typing can be immediately removed and, since E ; I � E 0, the set ofconstraints can be simpli�ed as shown in Example 4. The triple h;;M 00�; E 0i,where M 00� = (�f�2!�3 :(f�02!�03((�z�4 :3�5)�003 )�5)�03)�, andE 0 = ff�03g ) f�3v�03; �02v�2g; f�03g ) f�5v�02gg, is indeed all we need in theanalysis of programs that use M . 2We have now all the components for the de�nition of an e�cient incrementalanalysis of a term MR with free variables in � . Let h�;M Ri =) h�;M 0 �; Eiand E[faithful(�; �); I. We can de�ne an optimizationmappingW0 such thatW0(h�;M 0 �; Ei; I) = h�0;M 00 �; E 0i, where �0 = fx : � j x : � 2 � and tail(�) \I 6= ;g,M 00� is obtained fromM 0� by replacing the maximal subterms annotatedby a pattern � such that tail(�) \ I = ; by � , and E ; I � E 0. The tripleh�0;M 00 �; E 0i is all we need to perform further analysis of a program containingM . For instance M could be applied to a term or could be itself the argumentof function. In such contexts further optimizations of M could be possible. Theoptimizations performed by W0 are those which are possible in any context.4 Binding Time and Strictness AnalysisThe interpretation of � and ! can be understood in a di�erent way, as suggestedin [10]. The r-type, � (interpreted as the diagonal relation on I(nat)), character-izes values which are \known" and have a precise identity while ! characterizesvalues which are completely unknown (they can be any value in I(nat)). Theseproperties are naturally propagated to higher types, for instance if a term of typenat! nat has the re�nement types � ! ! or ! ! ! then it is totally unknown,since it can be equated to anything, whereas the re�nement � ! � or ! ! �are given to terms that identify uniquely a function. In � ! � two functionsare equated only if they are extensionally equivalent. In the case of ! ! � wealso know more, that is, the function is a constant function, since it identi�es

uniquely a value from an unknown input. Of course there are re�nements thatare not in either one of these classes, e.g., (! ! �)! ! ! � that identi�es all thefunctions that return a constant function when applied to a constant function.In general !-r-types correspond to the concept of being totally unknown, indeedall of them represent the relations that identify all the elements of a given type.With the previous interpretation detecting dead code in a termM can be seenas assuming thatM identi�es uniquely a value and trying to assign the maximumnumber of !'s to its subterms (of course keeping the re�nement consistent). Asubterm with !-r-type then can be totally unknown, and so it is useless to theevaluation of the �nal result.This interpretation led us to explore the use of the system for binding timeanalysis. For the binding time analysis the question is: given a description ofthe parameters of a function that will be known we must determine which partsof the program are dependent solely on the known parts. Known parameterscorrespond to terms identifying a unique value, and unknown parameters cor-respond naturally to terms having !-r-types. The �rst are called static and thesecond dynamic in the binding time literature. For this analysis we look for are�nement of the term that respect the static and dynamic information of theparameters and that maximizes the number of � in the term. A subterm of type� is a subterm whose value does not depend on the values of the variables withdynamic type. So it is a natural candidate for evaluation when all static valuesare known. Our analysis gives that same results as [10] and [8].Our framework can be also applied to strictness analysis. If we replace therule (Zero) of De�nition 6 by the rule \(Zero0) � `R 0! " we get a system thatcan be used to study strictness properties for the �-calculus with unary andconstant operators. The re�nement � now means to be the unde�ned value ofnatural numbers and the re�nement ! means to be any value. For a re�nementof a functional type, say nat! nat! nat, to specify that a function is strict inthe �rst argument, means that � ! ! ! � is a re�nement of the function. Thisis because if the �rst argument of the function is unde�ned and, regardless ofthe value of the second, the result is unde�ned that means that the �rst argu-ment must be evaluated somewhere in the body of the function. In this contextthe equivalence of !-r-types says that non-informative types are all equivalent.Of course strictness analysis makes sense in a language including a �xed pointoperator. It is not di�cult to add to our language a typed �xed point constant,with a suitable re�nement rule.The di�erence between strictness analysis and dead code (or binding time)analysis arises with the use of constants such as if nat!R!R!R which strictnessbehaviour cannot be described, as for dead code or binding time, by all there�nements � such that � ! �! �! �v� for any � :: R.For strictness analysis, \if " has all the re�nements � such that either ! !� ! � ! �v� or � ! �1 ! �2 ! �3v�. (The second case corresponding to acertainly unde�ned value of the test.) This adds a sort of and/or structure tothe system of inequalities. A version of the system tailored to strictness analysisis in preparation.

References1. H. P. Barendregt, M. Coppo, and M. Dezani-Ciancaglini. A �lter lambda modeland the completeness of type assignment. Journal of Symbolic Logic, 48:931{940,1983.2. S. Berardi. Pruning Simply Typed Lambda Terms. Journal of Symbolic Compu-tation, to appear.3. S. Berardi and L. Boerio. Using Subtyping in Program Optimization. In TypedLambda Calculus and Applications, 1995.4. M. Coppo and M. Dezani-Ciancaglini. An extension of basic functional theory forlambda-calculus. Notre Dame Journal of Formal Logic, 21(4):685{693, 1980.5. M. Coppo and A. Ferrari. Type inference, abstract interpretation and strictnessanalysis. In M. Dezani-Ciancaglini et al., editors, A collection of contributions inhonour of Corrado B�ohm, pages 113{145. Elsevier, 1993.6. M. Coppo and P. Giannini. Pricipal Types and Uni�cation for Simple IntersectionTypes Systems. Information and Computation, 122(1):70{96, 1995.7. F. Damiani and P. Giannini. A Decidable Intersection Type System based onRelevance. In Theoretical Aspects of Computer Software, LNCS 789. Springer,1994.8. C. Hankin and D. Le Metayer. A Type-Based Framework for Program Analysis.In Static Analisys, LNCS 864, pages 380{394. Springer, 1994.9. R. Hindley. The principal types schemes for an object in combinatory logic. Trans-actions of American Mathematical Society, 146:29{60, 1969.10. L.S. Hunt and D. Sands. Binding Time Analysis: A New PERspective. In Proceed-ings of the ACM Symposium on Partial Evaluation and Semantics-based ProgramManipulation, 1991.11. P. O'Keefe J. Palsberg. A Type System Equivalent to Flow Analysis. In Principlesof Programming Languages, 1995.12. T. P. Jensen. Strictness Analysis in Logical Form. In J. Hughes, editor, Proceedingsof the 5th ACM Conference on Functional Programming Languages and ComputerArchitecture, pages 98{105, 1991.13. H. R. Nielson K. L. Solberg and F. Nielson. Strictness and Totality Analysis. InStatic Analisys, LNCS 864, pages 408{422. Springer, 1994.14. D. Leivant. Polymorphic Type Inference. In Principles of Programming Languages.ACM, 1983.15. R. Milner. A Theory of Type Polymorphism in Programming. Journal of Computerand System Science, 17:348{375, 1978.16. F. Prost. Marking techniques for extraction. Technical report, Ecole NormaleSup�erieure de Lyon, Lyon, December 1995.17. T.M.Kuo and P.Mishra. Strictness analysis: a new perspective based on type in-ference. In Functional Programming Languages and Computer Architecture. ACM,1989.18. D. A. Wright. A New Technique for Strictness Analysis. In Proceedings of TAP-SOFT'91, LNCS 494, pages 260{272. Springer, 1991.

Refinement Types for Program Analysis

Documents

Transcript of Refinement Types for Program Analysis