Interprocedural and Flow-Sensitive Type Analysis for Memory and Type Safety of C Code

36
J Autom Reasoning (2009) 42:265–300 DOI 10.1007/s10817-009-9121-1 Interprocedural and Flow-Sensitive Type Analysis for Memory and Type Safety of C Code Syrine Tlili · Mourad Debbabi Received: 26 February 2009 / Accepted: 26 February 2009 / Published online: 21 March 2009 © Springer Science + Business Media B.V. 2009 Abstract The explicit memory management and type conversion endow the C language with flexibility and performance that render it the de facto language for system programming. However, these appealing features come at the cost of pro- grams’ safety. Due to the C language permissiveness, highly skilled but inadvertent programmers often spawn insidious programming errors that yield exploitable code. In this paper, we present a novel type and effect analysis for detecting memory and type errors in C source code. We extend the standard C type system with effect, region, and host annotations that hold valuable safety information. We also define static safety checks to detect safety errors using the aforementioned annotations. Our analysis performs in an intraprocedural phase and an interprocedural phase. The flow-sensitive and alias-sensitive intraprocedural phase propagates type anno- tations and applies safety checks at each program point. The interprocedural phase generates and propagates unification constraints on type annotations across function boundaries. We present an inference algorithm that automatically infers type anno- tations and applies safety checks to programs without programmers’ interaction. Keywords Type and effect analysis · Memory safety · Type safety · C language This research is the result of a fruitful collaboration between CSL (Computer Security Laboratory) of Concordia University, DRDC (Defense Research and Development Canada) Valcartier and Bell Canada under the NSERC DND Research Partnership Program. S. Tlili (B ) · M. Debbabi Concordia University, 1455 De Maisonneuve Blvd. West, Montreal, Quebec, Canada H3G 1M8 e-mail: [email protected] M. Debbabi e-mail: [email protected]

Transcript of Interprocedural and Flow-Sensitive Type Analysis for Memory and Type Safety of C Code

J Autom Reasoning (2009) 42:265–300DOI 10.1007/s10817-009-9121-1

Interprocedural and Flow-Sensitive Type Analysisfor Memory and Type Safety of C Code

Syrine Tlili · Mourad Debbabi

Received: 26 February 2009 / Accepted: 26 February 2009 / Published online: 21 March 2009© Springer Science + Business Media B.V. 2009

Abstract The explicit memory management and type conversion endow the Clanguage with flexibility and performance that render it the de facto language forsystem programming. However, these appealing features come at the cost of pro-grams’ safety. Due to the C language permissiveness, highly skilled but inadvertentprogrammers often spawn insidious programming errors that yield exploitable code.In this paper, we present a novel type and effect analysis for detecting memory andtype errors in C source code. We extend the standard C type system with effect,region, and host annotations that hold valuable safety information. We also definestatic safety checks to detect safety errors using the aforementioned annotations.Our analysis performs in an intraprocedural phase and an interprocedural phase.The flow-sensitive and alias-sensitive intraprocedural phase propagates type anno-tations and applies safety checks at each program point. The interprocedural phasegenerates and propagates unification constraints on type annotations across functionboundaries. We present an inference algorithm that automatically infers type anno-tations and applies safety checks to programs without programmers’ interaction.

Keywords Type and effect analysis · Memory safety · Type safety · C language

This research is the result of a fruitful collaboration between CSL (Computer SecurityLaboratory) of Concordia University, DRDC (Defense Research and Development Canada)Valcartier and Bell Canada under the NSERC DND Research Partnership Program.

S. Tlili (B) · M. DebbabiConcordia University, 1455 De Maisonneuve Blvd. West, Montreal, Quebec, Canada H3G 1M8e-mail: [email protected]

M. Debbabie-mail: [email protected]

266 S. Tlili, M. Debbabi

1 Introduction

Growing assurance requirements for applications and systems have raised the stakeson software safety and security. Software development process should take intoaccount safety and security attributes at early stages [15]. A special emphasizeshould be put on the implementation phase, since the root cause of many securityvulnerabilities are programming errors that yield readily exploitable code. In thecase of the C language, coding errors are even more present because of its flexiblememory management and its lack of type safety. These memory and type errors arevery insidious and can lead to critical flaws such as denial of service, buffer overflow,format string attack, and code injection [35]. Nevertheless, the C language providesperformance, strong support, and portability that make it the de facto standard forsystem programming. It is the language of choice for embedded systems. Moreover,many legacy software written in C can not be easily ported to other type safelanguages. Therefore, automated tools for memory and type error detection are veryhelpful for programmers in building secure and safe C code.

There is a range of error detection approaches that can be mainly classifiedinto dynamic analysis and static analysis [2–4, 9, 22]. Dynamic analysis monitorsprogram execution to spot errors as they occur. Precision and accuracy are its keyfeatures. However, they come at the cost of a significant performance overheadinduced by the runtime monitoring. Thus, dynamic analysis is unsuitable for low-level software that has performance requirements such as operating systems anddrivers. Moreover, dynamic approaches suffer from incomplete path coverage asthey consider one execution path at a time. The exploration of all execution pathsrequires the challenging definition of a large number of test cases.

On the other hand, static analysis operates on source code without programexecution. It offers the cost-saving advantage of the early detection of softwareerrors. As opposed to the dynamic, static analysis can perform an exhaustive pathcoverage of the code in order to predict runtime errors. As static analysis does notintroduce runtime overhead, it is more suitable and efficient for the thorough errordetection in operating systems and low-level software.

Nevertheless, static analysis requires a tradeoff between precision and simplicityof the code analysis algorithm. Precise analysis often requires complex analysis thatcan not be easily applied to large software, whereas simple analysis may generatefalse positives or miss some errors. We further explore issues and tradeoff of staticanalysis through the discussion of existing static error detection tools. These tools aimat providing a good precision-vs-simplicity tradeoff as we do in this present work.

We can find in the literature a wide range of static analysis tools that use differentapproaches to detect safety and security errors in C source code. Tools such asMOPS [9] and MC [3] define a flow-based analysis to verify security properties ofC source code. They use an approach that is mainly based on syntactical patternmatching to track unsafe sequences of program actions and scales to large software.Unfortunately, the heavy use of pointers and pointer aliasing pitfalls make this kindof syntactical analysis unsuitable for detecting type and memory errors. Some othertools such as the Lint family tools [14] offer a simple analysis that requires userannotations to detect buffer overflows and memory leaks. The annotation duty leftto programmers makes these tools unsuitable for large-scale applications. Moreover,their simplicity comes with a high rate of false positives.

Memory and Type Safety of C Code 267

We also find in the literature hybrid approaches [1, 4, 22] that combine static anddynamic analyses for memory and type error detection. The combination aims atovercoming the lack of precision of static analysis by injecting runtime checks inthe analyzed code. The gain in analysis precision of hybrid approaches is appealing.Nevertheless, these approaches still require an effective static analysis to reducethe number of runtime checks and their induced overhead. CCured [22] is a well-established hybrid tool that combines type inference and run-time checking to ensuresafe execution of C programs. CCured provides an efficient approach to ensurethat pointer dereference is within the bounds of its referred memory location andto detect NULL pointer dereferencing. However, Fig. 1 illustrates the sample code(memoryErrors.c) checked with CCured and it shows that CCured does not detectnor prevent memory errors such as: accessing freed memory and use of uninitializedvalues. From the analysis of the existing tools, we believe that there is enough roomto define a new static approach for type and memory error detection that offers agood precision-vs-simplicity tradeoff.

In this paper, we describe our type and effect analysis for detecting memoryand type errors in C code. The core idea is to decorate the standard type systemof C with safety annotations. We also endow the type system with static safetychecks that use the aforementioned annotations to detect safety violations. The flow-sensitive nature of our approach allows type annotations to change at each programstatement in order to deal with the destructive updates of the imperative C language,such as dynamic allocation and dynamic deallocation of memory. Furthermore, weaddress the pitfalls of aliasing and indirect assignments by endowing our analysiswith flow-sensitive alias information. As such, a modification of the annotationsof a program expression is propagated to all its aliases. The annotation inferencealgorithm presented in this paper operates in two phases. The intraproceduralphase propagates type annotations and verifies memory and type operations of eachfunction. The interprocedural phase instantiates the annotation polymorphic type ofdeclared functions according to its actual argument type.

The main contributions of the work presented in this paper are the following:

• A new type system based on lightweight region, effect, and host annotations fordetecting memory and type errors in C source code.

• A set of static safety checks enforced by our type system in order to detectmemory and type errors.

int main(){int * p;int x;

p = &x;printf("Uninit value of x:%d\n",*p);p=malloc(sizeof(int));printf("Uninit memory:%d\n", *p);free(p);*p=5;

printf("Use after free:%d\n", *p);return 0;}

$ ./ccured --alwaysStopOnErrormemoryErrors.c -o memoryErrors.exe$./memoryErrors.exeUninit value of x:1628796619Uninit memory:0Use after free:5

Fig. 1 Motivating example

268 S. Tlili, M. Debbabi

• An intraprocedural and interprocedural inference algorithm that automaticallypropagates type and effect annotations to program expressions.

• A prototyped GCC extension that statically type-checks C programs for memoryand type errors.

This paper is organized as follows: Section 2 outlines the safety type anno-tations of our imperative language that captures the essence of the C language.Section 3 describes the typing rules for program declarations, program expressions,and program statements. Section 4 presents our algorithms for handling directassignments and indirect assignments through aliasing. Section 5 outlines the staticsafety checks performed during our type analysis. Section 6 is dedicated to ourannotation inference algorithm. We illustrate our implemented prototype with a casestudy and experimental results in Section 7. Section 8 discusses the related work, andwe conclude with Section 9.

2 Safety Type Annotations

In this section, we present the imperative language that we use to illustrate ourformalism. We outline the annotation extensions we made to the standard C typesystem to ensure memory and type safety.

2.1 An Imperative Language

The imperative language defined in Table 1 captures the essence of the C language.A program π contains variable declarations and function declarations in δ, followedby program statements s. Without loss of generality, a function id has only oneargument variable x, local declarations δ, and a body s. Program expressions com-prise lvalues that refer to memory locations and rvalues that refer to the content ofmemory locations. Lvalues provide access to memory locations through variables x,lvalue dereferences ∗lv , and structure fields lv.ϕ. Rvalues include integer scalars n,

Table 1 Syntax of an imperative language that captures the essence of the C language

Memory and Type Safety of C Code 269

rvalue dereferences ∗rv , the address of an lvalue &lv , cast operations (κ)e, pointerarithmetics e op e′, and memory allocations malloc(e). Statements s include deal-location operations f ree(lv), assignment operations lv = e, function calls lv = id(y),return statements return e, and control flow constructs (sequencing, conditionals,and loops).

2.2 Type Annotations

We present in Table 2 the type algebra of our aforementioned imperative language.In fact, our type system propagates lightweight region, effect, and host annotationsthat are relevant for safety analysis of C programs. They are inserted at the outermostconstructor of types in order to facilitate the inference algorithm defined in Section 6.We present in the following paragraphs the static domains of our safety annotations.

– The domain of regions abstracts dynamic memory locations allocated on theheap and variables’ memory locations assigned on the stack. The symbols ρ,ρ ′ represent values drawn from this domain, a fresh symbol is derived at eachmemory allocation. The symbol � stands for a region variable with a currentlyunknown value. The memory location of a given variable x is given the symbolicidentifier rx where x corresponds to the unique identifier of the declared variable(we use alpha-renaming to prevent collisions [20]). The notation ρ.o denotesan offset within a region ρ of a structure type. We assume that the first field isat offset 0 of the hosting location. The remaining fields are located at differentoffsets from the first field. As we define a flow-sensitive analysis, a pointer mayrefer to different regions depending on the followed branches. Hence, we use thenotation ρ ∪ ρ ′ to represent the set of disjoint regions a pointer may refer to at agiven program point.

– The domain of declared types defines a representative subset of the C languagetypes. It includes the empty type void, the integer type int, the pointer typeref (κ), the structure type struct{(ϕi, κi)}i=1..n, and the function type κ−→κ ′.Without loss of generality, we assume that functions take just one argument.

Table 2 Type and effect annotations for memory and type safety

270 S. Tlili, M. Debbabi

– The domain of inferred types decorates the declared types with effect, region,and host annotations. A pointer type is annotated with the memory location ρ itrefers to. Moreover, a pointer type ref ρ(κ)η and an integer type intμ are subjectto type conversions. Thus, we annotate these types with host annotations thatkeep track of their actual type. For a pointer type ref ρ(κ)η, a cast operation iscaptured when η indicates a type that is different from κ . For an integer typeintμ, the annotation μ distinguishes between an integer derived from a convertedpointer and a genuine integer. More details on type conversion are given laterin this section. The term struct{(ϕi, τi, oi)}i=1..n is the type of a structure of nelements. Each field ϕi is decorated with an offset oi from the first field at offset0. The function type τ

σ−→τ ′ is annotated with a latent effect σ that is generatedwhen the corresponding function expression is evaluated. The conditional typeconstruct i f (τ, τ ′) denotes the type of an expression after a branching statement.Type τ is inferred on the true branch, whereas type τ ′ is inferred on the falsebranch. The declared types and the inferred types are related by the mean oftwo operators: The operator “” decorates types at declaration time with hostannotations set to [wild] and fresh region variables �. On the other hand, theoperator “¯” suppresses the annotations of inferred types and recovers theiroriginal declared types. These two operators are defined in Appendix B.

– The domain of effects captures memory operations and type conversions en-countered at each program statement [13, 23]. We use ∅ to denote the ab-sence of effects, and ς to denote an effect variable. Each effect records theprogram point � where it is produced. The term alloc(ρ, �) and dealloc(ρ, �)

denotes memory allocation and memory deallocation, respectively. The effectread(ρ, τ, �) represents the dereference of a pointer to region ρ. The effectassign(ρ, τ, �) represents the assignment of a value of type τ to region ρ. Theeffect arith(ρ, ρ ′, �) captures pointer arithmetic operations on region ρ thatresults in a region ρ ′. Moreover, we define effects that capture control flowconstructs of programs. The term σ ; σ ′ denotes the sequencing of σ and σ ′.The effect i f (σ, σ ′) refers to a branching statement where the effects σ and σ ′are produced at the true branch and the false branch, respectively. Hence, thecollected effects σ provide a tree-based model of the analyzed program thatcaptures safety relevant operations. The static safety checks defined in Section 5refer to the generated effect model in order to verify temporal properties relatedto the bad sequencing of memory and type operations.

2.3 Host Annotation for Type Conversions

The flexibility of the C language allows arbitrary type conversions for pointer andinteger types without performing any safety checks. These explicit type casts aremisleading on the actual type of a memory location and may cause unexpectedbehaviors of programs. To tackle insidious type casting errors, we refer to the hostannotations of pointer and integer types to derive their actual types. As defined inTable 2, a host annotation can be of the following values: (1) the value [malloc]indicates an allocated pointer, (2) the value [dangling] indicates a freed pointer,(3) the element [wild] indicates an uninitialized pointer or uninitialized integer, (5)the element [arith] indicates an arithmetic pointer, and (6) the element [&struct{_}]stands for a region that stores a value of structure type. Notice that integer host

Memory and Type Safety of C Code 271

Fig. 2 Example to illustrate host annotations for dealing with insidious type casts

annotation can refer to an integer type or to a pointer type as conversion betweeninteger and pointer types is allowed. The empty set denotes the absence of hostannotations and the symbol γ stands for a host annotation variable.

We define two auxiliary functions that use host annotations in order to dealwith type conversions: (1) function castType(τ, κ) derives an annotated type τ ′by converting from type τ to κ , (2) function strTypeOf(τ ) yields the actual typereferred to by pointer type τ . The algorithms of these two functions are given inAppendix B.

Consider the example of Fig. 2, we infer at the declaration of structure Pnt theannotated type Pnt = {(x, int[wild]), (y, int[wild])}. The host annotations of fields x andy are initially set to [wild], i.e., not yet initialized integer values. At statement (1),our analysis infers to pointer p the type τp = ref rpt

(Pnt)ηp where ηp = [&Pnt]. It in-dicates that pointer p refers to region rpt of variable pt that holds a value of type Pnt.At program point (2), pointer p is cast from type τp to type ref (CPnt). Notice that thedestination type of the conversion is defined by the programmer, thus does not haveany annotation. The cast operation yields type τcp = castType(τp, ref (CPnt)) =ref rpt

(CPnt)ηp . The host annotation ηcp of the converted pointer cp is derived fromthe host annotation ηp of the source pointer p. We assume that cast operations donot change the content of memory locations, hence ηcp = ηp = [&Pnt]. It indicatesthat pointer cp, initially declared to refer to a CPnt value, is actually referring to avalue of type Pnt. With the precision of the host annotation, our analysis cannot bemislead on the actual type referred to by a given pointer. Hence, we can detect thatthe dereference of field c at program point (3) is unsafe since cp is not referring to aCPnt structure. Section 5 illustrates our safety checks based on host annotations fordetecting type errors. Notice that a pointer may actually refer to a single type or tomultiple types defined in an i f (_) type construct. The latter is derived from branchingstatements such as conditionals and loops. The unique type or the different possibletypes of a pointer may be different from its declared type due to cast operations.

3 Typing Rules

This section outlines our annotation inference rules that are inspired from the typesystem for imperative languages presented in [27]. We define a type environmentE that maps each declared variable to an annotated type. Our type analysis infersinitial annotations that are updated at each program statement to capture imperativedestructive updates. On the other side, we assign the most general type for declaredfunctions. We do not enforce any restrictions on the annotations of function argu-ment type and function return type.

272 S. Tlili, M. Debbabi

Our analysis performs an intraprocedural pass and an interprocedural pass. Theintraprocedural pass defines a unification-based [26] and flow-sensitive analysisthat evaluates each function body. The interprocedural pass propagates unificationconstraints across function calls and returns. The unification constraints are definedto unify pairs of region variables, host variables, program points variables, and effectvariables at functions’ boundaries. At each function call, our interprocedural analysisentails that the actual argument type and return type should be equal, modulo typeannotations, to the declared argument type and return type, respectively. To facilitatethe understanding of our typing rules, we define the following auxiliary functions:

• Function typeOf(e) returns the inferred type of expression e.• Function regionOf(τ ) returns the region annotations of pointer type τ .• Function addressOf(e) returns the memory location of an lvalue e. For a given

pointer e of type τ , we have regionOf(τ ) = addressOf(∗e).• Function fldType(τ, ϕ) returns the type of field ϕ in structure type τ .• Function strTypeOf(τ ) extracts the actual type stored in the region of a pointer

of type τ .

The algorithms of these aforementioned functions are detailed in Appendix B.Through this paper, we will write E † E ′ to denote the overwriting of E by E ′,i.e., the domain of E † E ′ is Dom (E) ∪ Dom(E ′

), and we have (E † E ′)(x) = E ′

(x) ifx ∈ Dom(E ′) and E(x) otherwise.

3.1 Typing Rules for Program Declarations

Table 3 illustrates the typing rules for program, variable declarations, functiondeclarations, and call sites. The sequent E, � � (δ, s) indicates that the program con-taining declarations δ and statements s is well-typed. We augment the initially emptyenvironment E from program declarations in δ. The deduction E, � � δ : E ′ evaluatesvariable declarations and function declarations at program point �, then it yields anew environment E ′ that is used to type-check program statements s. The judgmentE ′, � � s : E ′′, σ, θ evaluates statements s, then yields an updated environment E ′′, aneffect σ that records all memory operations and type conversions in statements s, and

Table 3 Typing rules for program, declarations, and call sites

E, � � δ : E ′ E ′, � � s : E ′′, σ, θE, � � (δ, s) (Program)

·E, � � nil : E (Nil-decl)

E, � � δ : E ′ κ = τE, � � κ x; δ : E ′ † [x → τ ]

(Var-decl)

E, � � δ : E ′ τ1ς−→τ2 = fresh(annot(κ1−→κ2))

τ1−→τ2 = κ1−→κ2 v1..n = fv(τ1ς−→τ2)

E, � � κ2 id(κ1 x) = s; δ : E ′ †[

id → ∀v1..n.τ1ς−→τ2

] (Func-decl)

E(id) = ∀v1..n.τ1ς−→τ2 τ ′

1ς ′

−→τ ′2 = fresh(τ1

ς−→τ2)

E, � � callid : τ ′1

ς ′−→τ ′

2

(Call-site)

Memory and Type Safety of C Code 273

a set of annotation mappings θ . The latter instantiates all annotation variables of theprogram expressions as detailed later in this section.

– The rule (Var-decl) maps a declared variable x of type κ to the annotated type κ

in E . The operator “” sets host annotations to [wild] and region annotations ofpointers to unknown regions � (Appendix B). We assume that alpha-conversionis used for renaming collision variables and avoid conflicts during the analysis[20].

– The rule (Func-decl) assigns the most general type for functions in environmentE using fresh annotation variables. We define type schemes of the form ∀v1..n.τ

where vi can be region, effect, host, and program point annotation variables.The function fresh() takes the annotated type annot(κ1−→κ2) and replacesits annotation variables with fresh variables. Finally, we extend environment Ewith a mapping from the declared function id to a polymorphic type where allfree region, effect, host and program point variables in function type τ1

ς−→τ2

are quantified. We define the function fv(τ ) to derive the set of free variablesof a given type τ . Notice that the inferred function type should be equal to thedeclared type modulo type annotations.

– The rule (Call-site) instantiates the type of a function id with fresh annotationvariables each time the function is called. As such, we introduce a label callid

that captures each invocation of function id at a given program point �. Then,we assign a fresh function type to that label. The fresh annotation variablesof the label are unified with the actual argument type and return type of thecurrent call, more details on the unification process are given in the remaining ofthis section.

3.2 Typing Rules for Expressions

The sequent E, � � e : τ, σ defines the typing rules for expressions presented inTable 4. It states that under environment E and at program point �, the evaluationof expression e returns type τ and effect σ . Some of the expressions refer tocritical memory and type operations. In order to ensure type and memory safety,the evaluation of these expressions is guarded by safety checks that are defined inSection 5.

– The rules (Var) and (Int) are standard rules that produce no effect. The hostannotation of a constant is set to [&int] as it actually refers to an integer value.

– The rule (Ref) derives a pointer to the region that hosts the lvalue lv . Thefunction refTypeTo(lv), defined in Appendix B, derives a pointer type τ ′ thatrefers to the region of expression lv .

– The rule (Deref) dereferences a pointer expression e of type τ and generates theeffect read(ρ, τ ′, �), where τ ′ = strTypeOf(τ ) is the actual type referred to bypointer e. Because of cast operations, type τ ′ may be different from the pointerdeclared type. The safety of the dereference operation is guarded by the staticcheck drfChk(τ ) detailed in Section 5.

– The rule (Arith) evaluates pointer arithmetic and generates the effectarith(ρ, ρ ′, �), where ρ denotes the set of regions pointer e may refer to. Weassume that a pointer arithmetic results in a fresh region ρ ′ with a host annotationset to [arith].

274 S. Tlili, M. Debbabi

Table 4 Typing rules for program expressions

E(x) = τE, � � x : τ, ∅ (Var)

·E, � � n : int[&int], ∅ (Int)

E, � � lv : τ, σ τ ′ = refTypeTo(lv, τ )

E, � � &lv : τ ′, σ(Ref)

E, � � e : τ, σ τ = ref (_) drfChk(τ )

ρ = regionOf(τ ) τ ′ = strTypeof(τ )

E, � � ∗e : τ ′, (σ ; read(ρ, τ ′, �)) (Deref)

E, � � e : τ, σ E, � � e′ : intμ, σ ′ τ = ref (_)

ρ = regionOf(τ ) τ ′ = ref ρ′ (_)[arith] ρ′ freshE, � � e op e′ : τ ′, (σ ; σ ′; arith(ρ, ρ′, �)) (Arith)

E, � � e : τ, σ castChk(τ, κ) τ ′ = castType(τ, κ)

E, � � κ(e) : τ ′, σ(Cast)

E, � � e : τ, σ τ = struct {_} fldChk(τ, ϕ)

τ ′ = fldType(τ, ϕ) ρ = addressOf(e.ϕ)

E, � � e. ϕ : τ ′, (σ ; read(ρ, τ ′, �)) (Field)

E, � � e : τ, σ τ = int ρ f reshE, � � malloc(e) : ref ρ(void)[malloc], (σ ; alloc(ρ, �))

(Malloc)

– The rule (Cast) performs type conversion from type τ to type κ ′. Note that thedestination type κ ′ as specified by the programmer does not have annotations.In Appendix B, we define function castType(τ, κ ′) that derives an annotatedtype τ ′ from the conversion from type τ to type κ ′ such that τ ′ = κ ′. In order todetect and prevent type cast errors, we define safety requirements in the staticcheck castChk(τ, κ ′) that should be met at each cast operation as specified inSection 5.

– The rule (Field) returns the type of field ϕ of a structure expression e of typeτ . The field access is guarded by the fldChk(τ, ϕ) safety check defined inSection 5.

– The rule (Malloc) returns a void pointer to a fresh region location ρ. The hostannotation of the allocated pointer is set to [malloc]. The allocation generatesthe effect alloc(ρ, �).

3.3 Typing Rules for Statements

Table 5 presents the typing rules for statements. The statement judgment is ofthe form E, � � s : E ′, σ, θ . It states that under type environment E and at programpoint �, the evaluation of statement s yields a new environment E ′, an effect σ , andsubstitution constraints θ .

We define a flow-sensitive type inference that generates unification constraints ateach function call and function return. Unification constraints in θ define mappingsthat instantiate region variables, host variables, effect variables and program pointvariables as defined in Section 6. The flow-sensitivity of our analysis allows us tocope with destructive updates of our imperative language by inferring new typesfor variables with new annotation instantiations at each program statement. As in[18], the flow-sensitivity is restricted to type annotations in order not to complicate

Memory and Type Safety of C Code 275

Table 5 Typing rules for program statements

the inference algorithm. As such, we compute a new type environment E ′ at eachprogram point �. Note that region annotations carry aliasing information in a sensethat aliased pointers should have the same region annotations [31, 32]. We utilizethis aliasing information order to propagate annotations’ modification of an lvalueto all its aliases that refer to the same updated region. We define in Section 4, thefunction updEnv(E, s) that evaluates the argument statement s under environmentE and yields the updated environment E ′.

– The rule (Free) conservatively deallocates all memory locations in ρ of pointerlv and generates the effect dealloc(ρ, �). The deallocation is guarded by thestatic check freeChk(τ, σ ) as specified in Section 5. After a free operation, thefunction updEnv() yields a new environment E ′ where host annotations of lv andof all its aliases are set to [dangling].

– The rule (Assign) assigns an expression e of type τ ′ to an lvalue lv of typeτ . The assignment is guarded by the static check asgnChk(τ, τ ′). If the checksucceeds, the function updEnv() updates the type annotations of all variablesthat are directly or indirectly involved in the assignment statement. The effectassign(ρ, τ ′, �) is generated, where ρ is the set of possible regions of the updatedlvalue.

– The rule (Func-call) evaluates the statement of the callee function id andgenerates unification constraints on the generic type τ1

ς−→τ2 of its correspondingcall site callid. The unification algorithm U is given in Section 6. First, we unify

276 S. Tlili, M. Debbabi

the generic argument type τ1 with the actual argument type τ . The generatedconstraint θ is used to evaluate the function statement s. The transitive closureof the statement evaluation yields a new environment E ′, an effect σ ′, and a set ofunification constraints θ ′. The latter is used to instantiate the annotation variablesof the generic return type τ2.

– The rule (Func-return) evaluates the return statement of the current calleefunction. It derives a substitution θ that instantiates the annotation variables ofthe return type instance τ2. Notice that each function call generates new instancesfor the argument type and the return type with new annotation variables.Therefore, substitution constraints generated at function boundaries are appliedto different annotation variables. There is no overlapping between the generatedunification constraints for interprocedural analysis.

– The rule (Seq) defines the sequencing of statements where the generated effectis the sequencing effect of s′ and s′′.

– The rule (Cond) evaluates a branching condition. We define the merge operator“�” that assigns the type i f (E ′

(x), E ′′(x)) to a variable x at the merge point of a

branching condition. It indicates that variable x is of type E ′(x) on the true branchand of type E ′′(x) on the false branch. We use a similar effect construct i f (σ ′, σ ′′)to denote the effect generated at the merge point of a branching condition.

– The rule (Loop) evaluates a loop construct. The resulting environment is equalto E � E ′; it denotes that the environment remains unchanged if the loop is notentered. Otherwise, the environment E ′ refers to the type mappings generatedwhen the loop is entered at least one time.

4 Dealing with Aliasing

To increase the precision of our type annotation inference, we consider alias infor-mation that enables us to propagate annotation updates of an lvalue to all its aliases.Pointer alias analysis has been widely investigated in recent years [5, 10, 30, 36].It is possible for us to integrate one of these analysis techniques as a plug-in intoour type system in order to get aliasing information. Nevertheless, since the regioninference of our analysis carries flow-sensitive aliasing information, we advocateto use it to account for pointer aliasing in programs. This section outlines ouralgorithms in Algorithm 1 that handle direct and indirect assignments and updatethe static environment E at each program point. All auxiliary functions used in thesealgorithms are defined in Appendix B.

– The function updEnv(E, s, τ ) in Algorithm 1 updates the current environmentE , according to the argument statement s and the argument type τ . It invokesthe function directUpd(E, lv, τ ) defined in the following paragraph, and theauxiliary function updHost(τ, η) that sets all host annotations in type τ to η.

– The function directUpd(E, lv, τ ) takes as arguments the current environmentE , the lvalue lv to be updated, and its new type τ . After changing the annotationsof the argument lvalue lv , the function aliasUpd() performs annotation updateto all aliases of lv . Notice that modifying the annotations of a structure fieldimplies updating the annotations of its enclosing structure type as well. Theauxiliary function updFld() handles the annotation update of aggregate types.

Memory and Type Safety of C Code 277

– The function aliasUpd(E, ρ, η) takes as argument the current static environ-ment, the updated memory location ρ, and the host annotation η to set to allalias variables that refer to ρ. We illustrate in Fig. 3 the different aliasing casesthat we consider in our approach:

Algorithm 1 Function updEnv() updates the static type environment at eachprogram statement

Function updEnv(E, s, τ ) =begin

case s off ree(lv) ⇒ directUpd(E, lv,updHost(τ, [dangling]))lv = e ⇒ directUpd(E, lv, τ )

lv = id(_) ⇒ directUpd(E, lv, τ )

endreturn E

end

Function directUpd(E, lv, τ ) =begin

case lv ofx ⇒ E † [x → τ ]x.ϕ ⇒ E † [x → updFld(E(x), ϕ, τ )]

aliasUpd(E,addressOf(x),hostOf(E(x)))

∗l′v.ϕ ⇒ τ ′ = updFld(typeOf(∗l′v), ϕ, τ )

aliasUpd(E,addressOf(∗l′v),hostOf(τ ′))endaliasUpd(E,addressOf(lv),hostOf(τ ))

return Eend

Function aliasUpd(E, ρ, η) =begin

for all y ∈ Dom(E) doif addressOf(y) ⊆ ρ then E † [y → updHost(E(y), η)]else E † [y → indirectUpd(E(y), ρ, η)]

end

Function indirectUpd(τ, ρ, η) =begin let τ ′ = τ in

for all (ρ ′, η′) ∈ regHostof(τ ′) doif ρ ′ ⊆ ρ then τ ′ = updRegHost(τ ′, ρ ′, η)

elseif η′ = [&τ ′′] then τ ′ = indirectUpd(τ ′′, ρ, η)

endreturn τ ′

endend

278 S. Tlili, M. Debbabi

addressOf regionOf

1: q = &x;xq

2: *q = 5; x &

q &

regionOfregionOf regionOf

1: q = malloc(buf);2: if (c)3: p = &x;4: else5: p = q;

x:q:p:

6: *q = 5;x:q: &

p: &

q q regionOfregionOf q p

1: q = malloc(buf);2: p = malloc(buf);

q:p:

3: *q = p; q: &

4: *p = 5;q: & &

p: &

Fig. 3 Examples to illustrate annotation update of aliased variables

– A variable x resides in the updated location ρ as illustrated in Sample (a)of Fig. 3. The invocation aliasUpd(E, rx, η) updates the host annotation ofvariable x in E .

– A pointer p refers to the updated location ρ with one level of indirection(one dereference operator). The aliasing information is extracted from theregion annotation of pointer p, as illustrated in Sample (b) of Fig. 3. Theinvocation indirectUpd(E(p), ρ, [&int]) updates the conditional type ofpointer p. It uses the two following auxiliary functions:(1) The function

Memory and Type Safety of C Code 279

regHostof(ref ρ(τ )η) returns the pair (ρ, η) of region and host annotationsof a pointer type. For conditional type i f (_) the function returns a set ofpairs where each pair corresponds to one of the enclosed pointer types. (2)The function updRegHost(τ, ρ, η) sets to η the host annotation of pointertype τ that refers to region ρ.

– A pointer p refers to the updated location ρ with multiple levels of indi-rection (more than one dereference operator). The aliasing information isextracted from the host annotation of pointer p, as shown in Sample (c)of Fig. 3. The invocation indirectUpd(E(q), ρ ′, [&int]) traverses the typesspecified in the host annotation of pointer p to update their host annotations.

5 Static Safety Checks

This section outlines the static safety checks performed by our type system to detectand prevent memory and type errors. All safety-related operations are guarded by acorresponding static check. From the conservative nature of our analysis, operationsthat pass the checks never cause a runtime error during program execution. Thosewho fail may violate memory or type safety during program execution. Some of thestatic checks refer to the generated effect model σ of the program. We define thefunction allTraces(μ, σ ) that returns a true value, when all paths extracted fromthe tree-model σ contains the effect μ.

5.1 Safe C Memory Access

Our flow-sensitive type system is augmented with static safety checks defined inTable 6 to ensure safe pointer dereferencing, safe pointer assignment, and safepointer deallocation.

Table 6 Static safety checks for detecting memory errors

280 S. Tlili, M. Debbabi

5.1.1 Safe Pointer Dereference

The memory check drfChk(τ ) verifies that pointer of type τ can be safely derefer-enced. It fails in the following cases:

– Dereference of void pointers: The C language disallows dereferencing voidpointers since their size and their type are unknown.

– Dereference of uninitialized pointers: An uninitialized pointer has an indetermi-nate value that can cause harmful effects when accessed.

– Dereference of dangling pointers: A dangling pointer keeps referring to amemory location that has already been freed or even reallocated to anotherprocess. By dereferencing a dangling pointer, the original program may accessmemory locations that do not belong anymore to its address space, leading tounexpected program behaviors.

– Dereference of arithmetic pointers: These pointers are nasty as they may referto out-of-bounds locations. Since we do not perform bounds checking during ourtype analysis, we disallow dereferencing arithmetic pointers.

Notice that our static type analysis can be combined with a dynamic analysis inorder to perform dynamic bounds checking as presented in our work [33]. Moreover,our lightweight type annotations can be extended to carry bounds informationgenerated by existing static bounds checking techniques [21, 25, 28]. As such, wecan increase the precision of our static approach and enhance the performance ofour aforementioned hybrid approach by reducing the number of runtime checks.

5.1.2 Safe Pointer Deallocation

The memory check freeChk(τ, σ ) verifies that a pointer of type τ can be safelydeallocated given the effect model σ of the program. It fails in the following cases:

– Deallocation of uninitialized pointers: These pointers do not have any assignedaddress, and thus any attempt to free such pointers can cause unexpectedbehaviors.

– Deallocation of dangling pointers: Deallocating twice the same memory locationcorrupts the system memory and may lead to buffer overflow attacks.

– Deallocation of not dynamically allocated pointers: These pointers refer to stackmemory locations that have not been dynamically allocated with malloc(), andthus cannot be dynamically freed.

– Deallocation of arithmetic pointers: These pointers may refer to out-of-boundslocations, our conservative analysis disallows deallocating such pointers.

Notice that a pointer type ref ρ(κ)[&τ ] indicates that its region ρ holds a value oftype τ . However, it does not indicate that region ρ is dynamically allocated. For thatreason, we define the function allTraces(alloc(ρ, �), σ ) that tracks the presenceof an effect alloc(ρ, �) in the current effect model σ to confirm that ρ is a dynamicregion that can be freed.

Memory and Type Safety of C Code 281

5.1.3 Safe Pointer Assignment

The memory check asgnChk(τ, τ ′) verifies that an lvalue of type τ can safely beassigned a right-hand-side value of type τ ′. It fails in the following cases:

– Assigning uninitialized right-hand-side value.– Assigning mismatched declared types: the types of the right-hand-side and the

left-hand-side operators must explicitly match. As such, we avoid nasty implicitcast operations that often mislead programmers.

Notice that this check does not consider the effect model σ since the hostannotation is enough to determine the content of the operands’ regions.

5.2 Safe Type Cast

Explicit type casts are misleading since they make pointers refer to types that aredifferent from their declared type. These insidious type conversions are a commonsource of system crashes. We use an approach to deal with type casts that is basedon data memory layout and physical subtyping as defined in [29]. We define inTable 7 the static check castChk(τ, κ) that takes as input the source type τ and thedestination un-annotated type κ of the cast operation. Notice that the destinationtype of a cast operation is defined by the programmer and does not have anyannotation. Our analysis uses the function castType(τ, κ) to derive an annotated

Table 7 Static safety checks for type cast operations

282 S. Tlili, M. Debbabi

type from the conversion (see Appendix A). The following paragraphs outline thetype cast operations considered in our analysis.

5.2.1 Cast from Object Pointers

Type conversion from a pointer type τ = ref ρ(κ)η to an un-annotated pointer typeref (κ ′) is allowed provided that κ and κ ′ are in a physical subtyping relationship(κ � κ ′ or κ ′ � κ) as defined in [29]. The subtyping takes into account the layouts ofobjects in memory. A type κ is considered as a subtype of type κ ′, denoted (κ � κ ′),if memory layout of κ is a prefix of κ ′ memory layout. We use the notation κ ≈ κ ′ toexpress that κ ′ is a subtype of κ or vice versa. Since our approach does not changedata representation, τ and κ = τ have the same memory layout.

5.2.2 Cast from Void Pointers

As stated in the ANSI-C standard, any pointer can be cast to a void pointer. Onthe other hand, our analysis applies additional safety restrictions when casting a voidpointer: (1) A freshly allocated void pointer ref ρ(void)[malloc] can always be cast toany pointer type ref (κ). (2) A converted void pointer ref ρ(void)[&ref ρ (κ)η] can be castto type ref (κ) or to any pointer type ref (κ ′) provided that κ ≈ κ ′.

5.2.3 Cast Between Pointers and Integers

Cast between pointers and integers is allowed provided that an integer type is largeenough to hold a pointer value. However, we entail that only integers derived frompointers can be cast back to pointer type. An integer of type int[&ref ρ (κ)η] indicatesan integer derived from pointer type τ = ref ρ(κ)η. This integer can be converted topointer type ref (κ) or to any pointer type ref (κ ′), where κ ′ ≈ κ .

5.2.4 Safe Field Dereference

The static check fldChk(τ, ϕ) verifies that a field ϕ can be safely accessed through apointer type τ . Due to cast operations, a pointer to a structure type κ can be actuallyreferring to a structure type κ ′ where κ ′ ≈ κ . According to the definition given in [29],a structure type κ is a physical subtype of structure type κ ′ if: (1) all the fields of κ arepresent in κ ′, and (2) the offset of each field in κ is the same in κ ′. Hence, the physicalsubtyping relation establishes a hierarchy for structure types. As such, a pointer oftype τ = ref ρ(κ)[&τ ′] can only access the common fields between the declared typeκ and type τ ′ stored in its region ρ. In the example of Fig. 2, pointer cp is of typeτcp = ref rpt(CPnt)ηcp , where ηcp = [&Pnt]. The dereference of pointer cp returns avalue of type strTypeOf(τcp) = Pnt. The dereference of field c through cp is notsafe, since the list of fields fldList(Pnt) = [x,y] does not contain field c.

5.3 Static Analysis Limitations

As for all static techniques, our conservative type analysis generates false positivesand has undecidability issues when runtime information is required. Undecidability

Memory and Type Safety of C Code 283

may occur when static safety checks are performed on a may-aliased expressionthat has different possible types depending on the followed conditional branches.Since we can not statically determine the executed branches, we require both typesτ and τ ′ of a conditional construct i f (τ, τ ′) to pass the safety check. If one typesucceeds, whereas the other fails, we face an undecidable case. Moreover, ouranalysis performs an exhaustive traversal of all execution paths of the program.However, the analysis is path-insensitive in a sense that it does not have the abilityto remove infeasible paths. Hence, we may generate false positives by consideringpaths that actually are never executed. We defined in [33] a hybrid approach whereour static analysis resorts to a dynamic counterpart resulting in an increased precisionof the overall analysis. Other tools such as CCured [22] and SafeC [4] use a hybridapproach as well to eliminate false positives. In order to reduce the number of falsepositives, the MC approach [3] uses a static technique that prunes infeasible paths.

6 Type Annotations Inference

This section is dedicated to the algorithm for inferring region, effect, and host anno-tations for program expressions. In Appendix A, we detail the proof of soundness ofthe inference algorithm to our static typing rules. The annotation inference algorithmproceeds by case analysis on the structure of expressions and statements. We dividethe inference algorithm into three different categories: (1) annotation inference forprogram declarations, (2) annotation inference for program expressions, and (3)annotation inference for program statements.

The inference algorithm for program declarations is presented in Algorithm 2.It takes as input a 3-tuple made of program declarations δ, a program point �, andan initial static type environment E . The algorithm evaluates the declarations andoutputs a new static environment E ′ that assigns annotated types to variables andannotation polymorphic types for functions. For call sites, the algorithm instantiatesthe type of the callee function with fresh annotation variables.

The inference algorithm for program expressions is presented in Algorithm 2.It takes as input a 3-tuple made of a static environment E , a program point �,and an expression e . It evaluates the input expression and decorates its type witheffect, region, and host annotations. When evaluating a safety-relevant expression:pointer dereferencing, type casting, or structure field access, the algorithm appliesthe required safety checks. When these checks fail, the inference algorithm fails aswell.

The inference algorithm for program statements is presented in Algorithm 3. Ittakes as input a 3-tuple made of a static environment E , a program point �, and astatement s. The algorithm fails when the safety checks related to the consideredstatement fail. Otherwise, the algorithm terminates successfully producing a 3-tuplewhose components are a set of unification constraints θ , a new static environment E ,and an effect σ . Function calls and returns generate unification constraints that arepropagated across function boundaries for interprocedural analysis. The algorithm Uthat performs the unification is defined in Algorithm 4. It uses a syntactic unificationprocedure a la Robinson [26]. The proofs of soundness and completeness of U

284 S. Tlili, M. Debbabi

Algorithm 2 Annotation inference algorithm for program declarations andexpressions

Infer (δ, �, E) =case δ of

nil ⇒ []κx; δ′ ⇒ let E ′ = Inf er(δ′, �, E) in E ′ † [x → κ] endκ ′id(κx); δ′ ⇒ let E ′ = Inf er(δ′, �, E)

τ1ς−→τ2 = fresh(annot(κ−→κ ′))

v1..n = fv(τ1ς−→τ2)

inE ′ † [id → ∀v1..n.τ1

ς−→τ2]end

end

Infer (E, �, callid) =let ∀v1..n.τ1

ς−→τ2 = E(id)

τ ′1

ς ′−→τ ′

2 = fresh(τ1ς−→τ2)

in

τ ′1

ς ′−→τ ′

2end

Infer (E, �, x) =let τ = E(x) in (τ,∅) end

Infer (E, �, n) = (int[&int],∅)

Infer (E, �, &lv) =let (τ, σ ) = Infer (E, �, lv)

ρ = addressOf(lv)τ ′ = refTypeTo(lv, τ )

in(τ ′, σ )

end

Infer (E, �, ∗e) =let (τ, σ ) = Infer (E, �, e)in

if (drfChk(τ )) thenlet τ = ref (_)

τ ′ = strTypeof(τ )

ρ = regionOf(τ )

in(τ ′, (σ ; read(ρ, τ ′, �)))

elsefail: unsafe deref

end

Infer (E, �, e op e′) =let (τ, σ ) = Infer (E, �, e)

τ = ref (_)

ρ = regionOf(τ )

(intμ, σ ′) = Infer (E, �, e′)ρ′ fresh

in(ref ρ′ (_)[arith], (σ ; σ ′; arith(ρ, ρ′, �)))

end

Infer (E, �, (κ)e) =let

(τ, σ ) = Infer (E, �, e)in

if (castChk(τ, κ)) thenlet τ ′ = castType(τ, κ)

in(τ ′, σ )

elsefail: unsafe cast

end

Infer (E, �, e.ϕ) =let (τ, σ ) = Infer (E, �, e)

τ = struct{_}in

if fldChk(τ, ϕ) thenlet τ ′ = fldType(τ, ϕ)

ρ = addressOf(e.ϕ)

in(τ ′, (σ ; read(ρ, τ ′, �)))

elsefail: unsafe field access

end

Infer (E, �, malloc(e)) =let (τ, σ ) = Infer (E, �, e)

τ = intμρ f reshτ ′ = ref ρ(void)[malloc]

in(τ ′, (σ ; alloc(ρ, �)))

end

Memory and Type Safety of C Code 285

Algorithm 3 Annotation inference algorithm for program statements

Infer (E, �, f ree(lv)) =let (τ, σ ) = Infer (E, �, lv)

τ = ref (_)

inif (freeChk(τ, σ )) then

let E ′ = updEnv(E, f ree(lv), τ )

ρ = regionOf(τ )

in(∅, E ′

, (σ ; dealloc(ρ, �)))

elsefail: unsafe free

end

Infer (E, �, lv = e) =let (τ, σ ) = Infer (E, �, lv)

(τ ′, σ ′) = Infer (E, �, e)in

if (asgnChk(τ, τ ′)) thenlet E ′ = updEnv(E, lv = e, τ ′)

ρ = addressOf(lv)in

(∅, E ′, (σ ; σ ′; assign(ρ, τ ′, �)))elsefail: unsafe assign

endInfer (E, �, lv = id(y)) =

let τ1ς−→τ2 = Infer (E, �, callid)

(τ, σ ) = Infer (E, �, lv)(τ ′,∅) = Infer (E, �, y)

θ = U(τ1, τ′)

body(id(x)) = s(E ′, σ ′, θ ′) = Infer (E † [x → θτ1], �, s)θ ′′ = θ ∪ θ ′ ∪ [ς → σ ′′]asgnChk(τ, θ ′′τ2)

ρ = addressOf(lv)E ′′ = updEnv(E ′, lv = id(y), θ ′′τ2)

in(θ ′′, E ′′, (σ ; σ ′; assign(ρ, θ ′′τ2, �)))

end

Infer (E, �, return e) =let τ1

ς−→τ2 = Infer (E, �, callid)

(τ, σ ) = Infer (E, �, e)θ = U(τ2, τ )

in(θ, E, σ )

endInfer (E, �, s′; s′′) =

let (θ ′, E ′, σ ′) = Infer (E, �, s′)

(θ ′′, E ′′, σ ′′) = Infer (E ′, �′, s′′)in

(θ ′ ∪ θ ′′, E ′′, (σ ′; σ ′′))end

Infer (E, �, if e then s′ else s′′) =let (intμ, σ ) = Infer (E, �, e)

(θ ′, E ′, σ ′) = Infer (E, �′, s′)(θ ′′, E ′′

, σ ′′) = Infer (E, �′′, s′′)in

(θ ′ ∪ θ ′′, E ′� E ′′, (σ ; i f (σ ′; σ ′′)))

end

Infer (E, �, while e do s) =let (intμ, σ ) = Infer (E, �, e)

(θ, E ′, σ ′) = Infer (E, �′, s)in

(θ, E � E ′, i f (σ ; (σ ; σ ′)))

end

286 S. Tlili, M. Debbabi

are standard and can be found in [12, 26]. Notice that our analysis generates freshannotation variables for the argument and the return types of each function call. Assuch, each unification constraint generated at function boundaries is applied to freshvariables and does not override constraints related to the previous function calls. Inthe algorithm U , we use the vector �η to denote the sequence of host annotations ofa type τ and �γ to denote the sequence of fresh host annotation variables. Similarly,the vector �ρ denotes the sequence of region annotations of a type τ , and �� denotes asequence of fresh region annotation variables.

In order for the annotation inference algorithm to serve as a static detectionsystem for memory and type errors, it must be sound with respect to the typingrules defined in Section 3. In other words, a typing judgment inferred by the typeannotation inference algorithm must be deducible by the typing rules as stated bythe Soundness Theorem hereafter:

Theorem (Soundness) Given E a typing environment, we have :

– For expression e, if Infer (E, �, e) = (τ, σ ), then E, � � e : τ, σ

– For statement s, if Infer (E, �, s) = (θ, E ′, σ ), then E, � � s : E ′, σ, θ

In Appendix A, we establish this desired error detection property by proving theSoundness Theorem.

7 Extending the GCC Compiler

In this section, we present a detailed overview of the implementation of our type andeffect analysis. We prototyped our approach as an extension of the GCC compilerfor the C programming language. We adopt a summary-based approach for theimplementation that is appropriate with the GCC modus operandi and also has alarge scalability.

7.1 Summary-based Implementation

The GCC compiler performs the analysis of one function at a time, we adapt ourimplementation to the compiler modus operandi. We perform our type annotation

Memory and Type Safety of C Code 287

inference and our safety verification for each function. From this intraproceduralanalysis, we derive a summary of the analyzed function that concisely abstractsits behavior. In the function summary, the argument and return types are definedas patterns that are instantiated during the interprocedural analysis. Each timethe function is called, we map the argument patterns to their actual values andperform the analysis of the callee function summary. With the definition of functionsummaries using patterns, the analysis of a whole program is performed through theinstantiation of the patterns and the application of function summaries. The analysisstarts by applying the summary of the program entry function that will invoke thesummaries of the callee functions. A function summary is composed of three parts,function entry, function body, and function return. We describe the content of eachpart hereafter:

– The function entry defines the function call context based on the following inputpatterns:

– The pattern E in is instantiated with the current type environment.– The pattern τin is instantiated with the current argument type.– The pattern σin is instantiated with the current effect model.

– The function body defines a concise representation of the effect and type analysisof the function statements. For each statement, the function body analysisencompasses the following information:

– The static safety checks that need to be performed.– The static environment that is generated when the safety checks are success-

ful.– The effect model generated when the statements pass the static checks.– The function calls applied inside the considered function body. Each function

call involves the application of its corresponding summary. The applicationtakes as input the static environment, the argument type, and the effectmodel inferred at the current statement. The results of the application aregiven in the f unction return part of the considered function summary

– The function return defines the function exit through the following output:

– The pattern Eout is instantiated with the type environment derived from thefunction analysis.

– The pattern τout is instantiated with the current return type.– The pattern σout is instantiated with the effect model derived from the

function analysis.

To deal with recursion, we store for each summary the f unction entry of its lastinvocation. Before a new execution of the summary, if the current environmentE in and argument type τin are identical to the stored one, a fixed point is reachedand the summary is not executed. To enhance the performance of our analysis,function entries are exclusively stored for recursive functions. These functions areidentified in the call graph generated by the GCC compiler. Moreover, the recursionis interrupted when a safety error is detected in the recursive function. The summary-based approach has been used in MC [3] where an analysis summary is generatedfor each basic block, whereas the Saturn framework [2] derives coarse summary for

288 S. Tlili, M. Debbabi

each function in order to facilitate the interprocedural analysis and to scale to largeprograms. In our case, we derive coarse summaries for entire functions as well.

7.2 Example

We illustrate our interprocedural analysis on the sample code given in Fig. 4. Thestructures Pnt and CPnt are already defined in the sample code of Fig. 2.

First, the analysis evaluates variable and function declarations in order to build aninitial type environment. Then, we use the summary of the main function as an entrypoint for the program analysis. Figure 5 depicts the summaries of function copyCpntand function main.

Program Declarations

From the rules (Var-decl) and (Func-decl), we evaluate the global variable andfunction declarations and enrich the initially empty environment E with the followingmappings:

cpt → ref �1(CPnt)[wild]

p → ref �2(CPnt)[wild]

cpt1 → ref �3(CPnt)[wild]

cpt2 → ref �4(CPnt)[wild]

pt → struct{(x, int[wild]), (y, int[wild])}copyCPnt → ∀v1..n.τCPnt

ς−→τ ′CPnt

All pointers are initially annotated with unknown region variables and host anno-tations set to [wild]. We map function copyCPnt to an annotation polymorphictype where v1..n = ς ∪ fv(τCPnt) ∪ fv(τCPnt) and τCPnt = τ ′

CPnt = ref (CPnt). Afterbuilding the initial type environment, we execute the summary of the main inthe following manner: execute_summary(main,E, int[&int], ∅). Initially, the effectmodel is empty and the parameter c is initialized to a integer value.

Fig. 4 Example to illustrate the intraprocedural pass and the interprocedural pass of our analysis

Memory and Type Safety of C Code 289

Fig. 5 Graphical representation of the summaries of the functions given in Fig. 4. The entry nodeof the graph specifies the input patterns for the type environment, the argument type, and the effectmodel. Each block illustrates the analysis of a statement: The first field of each block shows theconsidered statement, the second field shows the required safety checks, the last field shows theupdated environment derived from the statement. We omit the field that captures the effect tosimplify the graphical representation. Edges in the graph represent the control flow of the function.The graph also has special node for call sites. Each call site returns a type environment Eout , a typeτout and, an effect model σout that corresponds to the exit node of the callee function. In the summaryof function copyCpnt, we define τ ′

in=strTypeOf(τin)

Function Main Summary

To simplify the graphical representation of function summaries, we inten-tionally omit some of the statements that are not relevant to memory andtype safety in the considered sample code. The statements from line 14 toline 17 allocate a memory region ρcpt for pointer cpt and initialize thefields of the structure it refers to. At line 17, the call to directUpd()

infers to pointer cpt the type τcpt = ref ρcpt(CPnt)[&CPntxyc] where CPntxyc =

struct{(x,int[&int]), (y, int[&int]), (c, int[&int])} The statements at lines 18 and 19initialize the structure variable pt. From the (Assign) rule, we infer to variable ptthe type τpt = {(x, int[&int]), (y, int[&int])}.

From the (Cond) rule, we evaluate the two branches of the condition at line20 and merge their derived type environments. The true branch sets pointer

290 S. Tlili, M. Debbabi

p as an alias to pointer cpt, then passes it as argument to copyCPnt().The false branch sets pointer p to the memory location rpt of variablept, then passes it as argument to copyCPnt(). Thus, the true path invokesexecute_summary(copyCPnt,E ,ref ρcpt

(CPnt)[&CPntxyc], σ22) and the false pathinvokes execute_summary(copyCPnt,E ,ref rpt(CPnt)[&τpt], σ27).

As entailed by the (Func-call) rule, the argument types used in these two invoca-tions are equal to the declared argument type ref (CPnt) modulo type annotations.

Function copyCPnt Summary

In what follows, we detail copyCPnt summary through its executions from themain function. The local declaration of pointer z updates environment E with themapping [z → ref �(CPnt)[wild]]. The allocation operation at line 6 sets pointer z totype ref ρz

(CPnt)[&CPnt], where ρz is a fresh memory location at each invocation.

– When following the true branch in the main function, the assignment statementsfrom line 7 to 9 are evaluated with pointer w set to type ref ρcpt

(CPnt)[&CPntxyz].From the (Field) rule and the (Assign) rule, all field accesses through pointer ware safe since w actually refers to a structure of type CPntxyz that contains fieldsx, y, and c. At line 9, we infer to pointer z the type ref ρz

(CPnt)[&CPntxyz].The free operation at line 10 deallocates the memory region ρcpt of the argu-ment pointer w. The call to directUpd(E ,w, updHost(ref ρcpt

(CPnt)[&CPntxyz],[dangling])) sets the host annotations of pointer w and its alias pointer cptto [dangling] as detailed in Section 4. At statement 11, function copyCPntreturns pointer z. As required by the (Func-return) rule, we have the returntype ref ρz

(CPnt)[&CPntxyz] equal to the declared return type ref (CPnt) modulotype annotations.

– When following the false branch in the main function, we evaluate the summarywith argument pointer w set to ref rpt(CPnt)[&τpt]. At line 6, a fresh memoryallocation ρ ′

z is assigned to pointer z. The two first assignment statements at line7 and 8 pass the static safety checks specified in the (Field) and (Assign) rules.However, at line 9, the dereference of field c through pointer w fails the safetycheck fldChk(ref rpt(CPnt)[&τpt],c). In fact, pointer w refers to variable pt thatholds a Pnt structure with no c field. Moreover, freeing pointer w at line 10 isillegal since it refers to the memory location rpt that is not dynamically allocated:the effect model σ10 does not contain an effect alloc(rpt, _).

This example shows the efficiency of our interprocedural and intraproceduralanalysis for detecting memory and type errors in source code.

7.3 Experimental Results

We prototyped our safety analysis approach as an extension of GCC that is consid-ered as the de facto compiler for the C programming language. Starting from version4.0, the GCC compiler is based on the Tree-SSA framework for the development ofhigh-level code optimization techniques and static analysis tools [24]. The Tree-SSAframework provides an easy access to control-flow, data-flow, and type information,

Memory and Type Safety of C Code 291

thus it facilitates the implementation of our static analysis. We also took advan-tage of the GIMPLE intermediate representation language provided by Tree-SSA.GIMPLE preserves source-level information about the code but simplifies complexconstructs (e.g., loops are mapped to if and goto statements).

The implementation of our static analysis consists of two parts. Firstly, functionsummaries of the analyzed program are generated during the intraprocedural phaseof the GCC compiler. Secondly, we invoke function summaries during the interpro-cedural phase of GCC. The order of function summary invocations is according tothe program call graph generated by the Tree-SSA framework. We do not modify thenormal compilation process, we only generate warnings when errors are detected.

Since we integrated our analysis into the GCC compiler, our prototype is able toanalyze all programs that GCC can compile. We analyzed a set of real software suchas openssh-5.0p1, openssl-0.9.8j, and a part of the Linux-2.6.26.6 filesystem in orderto demonstrate the scalability of our prototype. Table 8 gives the overhead on thecompilation time imposed by our safety analysis. The measurements were made on a1GHz Intel, 1GB Linux machine, using the GCC-4.2.1 compiler with -O optimizationlevel. During the experimentation, we activated some of our safety checks in orderto detect: free of dangling pointers, free of unallocated pointers, dereference ofdangling pointers, and bad cast from integer to pointer.

We detected a bad cast operation from integer to pointer in the Linux kernelfunction vmsplice_to_user()(fs/splice.c). It actually corresponds to thewell know vmsplice local root exploit (BID: 27704) that takes advantage of a pointercopied from the user space by the kernel function get_user. According to the Linuxsystem call specifications, get_user is used to get a integer value from the userspace [8]. In the vmsplice_to_user() function, the usage of get_user is notconform to its specification since it copies a pointer value from the user space insteadof an integer value. Moreover, the address referred to by the user space pointer isnever validated before being used.

Figure 6 provides an extract of the vulnerable Linux code and a relevant snippet ofits GIMPLE representation. The latter replaces the call to get_user with its inlineassembly code where: __get_user_4 is the invoked assembly function, __ret_guis the return value of the call, __val_gu is the integer value copied from the userspace, and D.25866 is a temporary variable generated by GIMPLE correspondingto the user space address to copy from. From the GIMPLE code, we can detectthat the integer value __val_gu is cast to void pointer and assigned to the voidpointer base. This cast operation fails our safety check castChk that entails thatonly integers previously derived from pointers can be cast back to pointer type. Weare not aware of any static analysis tool that has discovered this error before beingexploited. This experimentation demonstrates the scalability of our prototype and itspotential in detecting real errors.

Table 8 Experimental results illustrating the performance of our approach

File LOC Compile without Compile with Slowdown(C code) checks (s) checks (s) factor

openssh-5.0p1 46.33K 118 611 5.17openssl-0.9.8j 187.101K 358 1200 3.35Filesystem Linux-2.6.26.6 12.153K 90 338 3.75

292 S. Tlili, M. Debbabi

Fig. 6 Vulnerable Linux kernel function and its GIMPLE representation

8 Related Work

This section presents approaches and tools for detecting type and memory errors inC source code.

MetaCompilation (MC) [3] is a static analysis tool that uses a flow-based analysisapproach for detecting temporal security errors in C code. With the MC approach,the programmers define their temporal security properties as automata that areexecuted during the traversal of the control flow graph of the analyzed program.A security violation is reported when a security automaton reaches an error state.The MC analysis is flow-sensitive, but unlike our approach it performs a trivial aliasanalysis that considers simple variables with no pointer dereferencing.

MOPS [9] is another tool that detects temporal vulnerabilities using pushdownmodel-checking. MOPS is more appropriate to detect high-level security propertiesthan safety properties. In fact, it assumes that the analyzed program is memorysafe. There are some other model-checking tools based on predicate abstraction forvulnerability detection in source code such as BLAST [34], SLAM [6], and SAT[11]. The SLAM model-checker is mainly used to verify windows drivers and hasnot been used for verifying memory and type errors. The BLAST model does notsupport interprocedural pointer aliasing, however it has been used with CCured [22]to reduce runtime checks for memory errors [7].

Saturn [2] is an infrastructure for program analysis based on boolean satisfiability.It analyzes each function separately and produces a function summary that is checkedagainst a set of boolean constraints. It has been used to conduct a sound flow- andcontext-sensitive alias analysis on the Linux kernel. Being summary-based rendersSaturn scalable to large programs. Nevertheless, the high performance and scalabilityof Saturn is reached because the parallelization of the analysis on cluster nodes thatare not commonly available to programmers. Being integrated to the GCC compiler,

Memory and Type Safety of C Code 293

our approach is more user-friendly and can be easily applied by programmers duringtheir program compilation with an acceptable overhead.

Type systems have been used to verify security properties in source code. Thetype-based approach used by the tool CQual [18] consists of extending the typesystem with type qualifiers that are used to express security properties. It canexpress properties related to secure information flow based on tainted and untaintedqualifiers. It has been used to differentiate pointers to user space from pointers tokernel space, and to verify the absence of unchecked user pointers accesses [19]. Toour knowledge, it has not been used to detect memory errors such as double free oruse after free as we do.

The literature contains proposals on hybrid approach analysis that combines staticand dynamic analysis such as CCured [22] and SafeC [4]. These language-based toolsextend the C type system in order to detect memory and type errors. They resortto code instrumentation when static analysis undecidability is faced. As illustratedpreviously in the motivating example of Fig. 1, CCured does not detect memoryerrors related to the bad sequencing of memory operations. It focuses on boundschecking and NULL checking of pointer dereferences. For type casts, CCured ismore restrictive than our approach as it forbids cast operations from integer type topointer type. Our analysis allows an integer that has been used to store a pointeraddress to be cast back to a pointer type. Moreover, the runtime overhead ofCCured renders it unsuitable for the analysis of operating systems and low-levelsoftware that has performance requirements. Vault [16] and Cyclone [17] are safevariants of the C programming language. They enforce static and dynamic safetyrestrictions to prevent runtime errors. For instance, type casts and pointer arithmeticare restricted and null checks are performed at each pointer dereference. Despite thesafety provided by these languages, programmers are reluctant to port legacy codeto Cyclone or Vault as it is effort and time consuming.

9 Conclusion

In this paper, we have presented a type and effect discipline for detecting memoryand type errors in C source code. Our type analysis propagates effect, region,and host annotations that carry safety knowledge regarding the analyzed program.We endow our type system with static safety checks that use the aforementionedannotations to uncover memory and type errors. Our safety analysis performs in anintraprocedural phase and an interprocedural phase: (1) The intraprocedural phaseinfers type annotations taking into consideration control-flow and alias information,(2) The interprocedural phase defines unification constraints and propagates themacross function boundaries. Our type and effect analysis has a number of appealingproperties that we describe hereafter:

• Simplicity: The inference algorithm automatically propagates lightweight anno-tations without manual annotation added by programmers.

• Effectiveness: The flow-sensitivity and alias-sensitivity of our analysis enhanceits precision for uncovering insidious errors.

• Flexibility: The type analysis can easily be combined with dynamic verificationtechniques in order to tackle more safety violations.

294 S. Tlili, M. Debbabi

• Applicability: Our prototype shows that our approach can easily be integratedinto the compilation process.

We prototyped our approach as an extension of the GCC compiler and the ex-perimental results demonstrate its scalability to large software. We are currentlyfinalizing the implementation of our approach in order to evaluate the memory andtype safety of real-world C software.

Appendix A

In the following, the intention is to prove that our inference algorithm is sound withrespect to the static semantics.

Theorem (Soundness) Given E a typing environment, we have:

• For expression e, if Infer (E, �, e) = (τ, σ ), then E, � � e : τ, σ

• For statement s, if Infer (E, �, s) = (θ, E ′, σ ), then E, � � s : E ′

, σ, θ

Proof of Soundness The proof is done by structural induction on expressions andstatements:

• Case of (Var): By hypothesis, we have Infer (E, �, x) = (τ,∅).

By the definition of the algorithm, this requires: x ∈ Dom(E) and E(x) = τ .By the definition of the rule (Var), we have: E, � � x : τ,∅.

• Case of (Ref): By hypothesis, we have: Infer (E, �, &lv) = (τ ′, σ ).By the definition, this requires: (τ, σ )=Infer (E, �, lv) and τ ′ =refTypeTo(lv, τ ).By induction hypothesis on lv , we have: E, � � lv : τ, σ .By the definition of the rule (Ref), we conclude that: E, � � &lv : τ ′, σ .

• Case of (Deref): By hypothesis, we have: Infer (E, �, ∗e)=(τ ′, (σ ; read(ρ, τ ′, �))).From the algorithm, this requires: (τ, σ ) = Infer (E, �, e) and τ = ref (_) andρ = regionOf(τ ) and drfChk(τ ) and strTypeOf(τ ) = τ ′.By induction hypothesis on e, we have: E, � � e : τ, σ .By the definition of the rule (Deref), we have: E, � � ∗e : τ ′, (σ ; read(ρ, τ ′, �)).

• Case of (Arith): By hypothesis, we have: Infer (E, �, e op e′) = (ref ρ ′(_)[arith],(σ ; σ ′; arith(ρ, ρ ′, �))).The algorithm requires: (τ, σ ) = Infer (E, �, e) and (intμ, σ ′) = Infer (E, �, e′)and τ = ref (_) and ρ = regionOf(τ ) and ρ ′ fresh.By induction hypothesis on e and e′, we have: E, � � e : τ, σ and E, � � e′ :intμ, σ ′.From the rule (Arith), we have: E, � � e op e′ : ref ρ ′(_)[arith], (σ ; σ ′; arith( ρ,

ρ ′, �)).• Case of (Cast): By hypothesis, we have: Infer (E, �, (κ)e) = (τ ′, σ ).

This requires: (τ, σ )=Infer (E, �, e) and castChk(τ, κ) and τ ′=castType(τ, κ).By induction hypothesis on e, we have: E, � � e : τ, σ .By the definition of the rule (Cast), we conclude that: E, � � (κ)e : τ ′, σ .

Memory and Type Safety of C Code 295

• Case of (Field): We suppose that: Infer (E, �, e. ϕ) = (τ ′, (σ ; read(ρ, τ ′, �))).By hypothesis, this requires: (struct{_}, σ ) = Infer (E, �, e)and fldChk(τ, ϕ) and τ ′ = fldType(τ, ϕ) and ρ = addressOf(e. ϕ).By induction hypothesis on e, we have: E, � � e : struct{_}, σ .By the definition of the rule (Field), we have: E, � � e. ϕ : τ ′, (σ ; read(ρ, τ ′, �)).

• Case of (Malloc): We suppose that: Infer (E, �, malloc(e)) = (ref ρ(void)[malloc],(σ ; alloc(ρ, �))).By hypothesis, this requires: (τ, σ ) = Infer (E, �, e) and τ = int and ρ fresh.By induction hypothesis on e, we have: E, � � e : τ, σ .From the rule (Malloc), we have: E, � � malloc(e) : ref ρ(void)[malloc], (σ ;alloc(ρ, �))

• Case of (Free): The assumption is that: Infer (E, �, free(lv)) = (∅, E ′, (σ ;

dealloc(ρ, �))).The algorithm requires that: (τ, σ ) = Infer (E, �, lv) and τ = ref (_)

and freeChk(τ, σ ) and E ′ = updEnv(E, free(lv), τ ) and ρ = regionOf(τ ).By induction hypothesis on lv , we have: E, � � lv : τ, σ .By definition of the rule (Free), we have: E, � � free(lv) : E ′, (σ, dealloc(ρ, �)),∅.

• Case of (Assign): By hypothesis, we have: Infer (E, �, lv = e) = (∅, E ′, (σ ; σ ′;assign(ρ, τ ′, �))The algorithm requires: (τ, σ ) = Infer (E, �, lv) and (τ ′, σ ′) = Infer (E, �, e)and asgnChk(τ, τ ′) and E ′ = updEnv(E, lv = e, τ ′) and ρ = addressOf(lv).By induction hypothesis on e and lv , we have: E, � � lv : τ, σ and E, � � e : τ ′, σ ′.From the (Assign) rule, we have: E, � � lv = e : E ′, (σ ; σ ′; assign(ρ, τ ′, �)),∅.

• Case of (Func-call): We suppose that: Infer (E, �, lv = id(y)) = (θ ′′, E ′′, (σ ; σ ′;assign(ρ, θ ′′τ2, �))).By hypothesis, this requires: (τ, σ ) = Infer (E, �, lv) and τ1

ς−→τ2 = Infer (E,

�, callid) and (τ ′, ∅) = Infer (E, �, y) and θ = U(τ1, τ′) and body(id(x)) = s

and (θ ′, E ′, σ ′) = Infer (E † [x → θτ1], �, s) and θ ′′ = θ ∪ θ ′ ∪ [ς → σ ′] andasgnChk(τ, θ ′′τ2) and ρ = addressOf(lv) and E ′′ = updEnv(E ′, lv = id(y),

θ ′′τ2).By induction hypothesis on lv , y, callid, and s, we have: E, � � lv : τ, σ

and E, � � y : τ ′,∅ and E † [x → θτ1], � � s : E ′, σ ′, θ ′.

By the definition of the rule (Func-call), we conclude that: E, � � lv = id(y) :E ′, (σ ; σ ′; assign(ρ, θ ′′τ2, �)), θ

′′.• Case of (Func-return): The assumption is that: (θ, E, σ ) = Infer (E, �, return e).

The algorithm requires that: (τ, σ ) = Infer (E, �, e) and τ1ς−→τ2 = Infer (E,

�, callid) and θ = U(τ2, τ ).By induction on e and callid, we have: E, � � e : τ, σ and E, � � callid :τ1

ς−→τ2,∅.By definition of the rule (Func-return), we conclude that: E, � �return e : E, σ, θ .

• Case of (Seq): The assumption is that: Infer (E, �, s′; s′′) = (θ ′ ∪ θ ′′, E ′′, (σ ′; σ ′′)).The algorithm requires that: (θ ′, E ′, σ ′) = Infer (E, �, s′) and (θ ′′, E ′′, σ ′′) =Infer (E ′, �′, s′′).By induction on s′ and s′′, we have: E, � � s′ : E ′

, σ ′, θ ′ and E ′, �′ � s′′ : E ′′

, σ ′′, θ ′′.By definition of the rule (Seq), we conclude that: E, � � s′; s′′ : E ′′, (σ ′; σ ′′),θ ′ ∪ θ ′′.

296 S. Tlili, M. Debbabi

• Case of (Cond): The assumption is that: Infer (E, �, if e then s′ else s′′) =(θ ′ ∪ θ ′′, E ′

� E ′′, (σ ; if (σ ′; σ ′′))).The algorithm requires that: (θ ′, E ′

, σ ′) = Infer (E, �′, s′)and (θ ′′, E ′′, σ ′′) = Infer (E, �′′, s′′) and (intμ, σ ) = Infer (E, �, e).By induction hypothesis on s′ and s′′, we have: E, � � e : intμ, σ and E, �′ � s′ :E ′, σ ′, θ ′ and E, �′′ � s′′ : E ′′, σ ′′, θ ′′.By definition of the rule (Cond), we conclude that: E, � �if e then s′ else s′′ :E ′

� E ′′, (σ ; if (σ ′; σ ′′)), θ ′ ∪ θ ′′• Case of (Loop): The assumption is that: Infer (E, �, while e do s) = (θ, E � E ′,

if (σ, (σ ; σ ′)))The algorithm requires that: (intμ, σ ) = Infer (E, �, e) and (θ, E ′, σ ′) = Infer (E,

�′, s)By induction hypothesis on s and e, we have: E, � � e : intμ, σ and E, �′ � s :E ′, σ ′, θFrom the rule (Loop), we have: E, � �while e do s : E � E ′, if (σ, (σ ; σ ′)), θ ��

Appendix B

• The operator “ ” returns a type τ decorated with a region variable �, a [wild]host annotation, and a fresh effect variable ς for function types.

int = int[wild]void = voidref (κ) = ref �(κ)[wild] � fresh

_{(ϕi, κi)} = _{(ϕi, κi, oi)} i=1..n

κ−→κ ′ = κς−→κ ′ ς fresh

• The operator “ ¯ ” removes type annotations and recovers an un-annotated type.

intη = int

void = void

ref ρ(κ)η

= ref (κ)

_{(ϕi, τi, oi)} = _{(ϕi, τi)} i=1..n

τσ−→τ ′ = τ−→τ ′

if (τ, τ ′) = τ

• Function refTypeTo takes an lvalue and returns a pointer type to that lvalue:refTypeTo : Lval → InfType

Memory and Type Safety of C Code 297

Function refTypeTo(lv) = case lv ofx | x.ϕ ⇒ let typeOf(lv) = τ

ρ = addressOf(lv)in ref ρ(τ )[&τ ] end

∗lv ⇒ typeOf(lv)end

• Function regionOf takes a pointer type and returns its region annotations.regionOf : τ → Regions

Function regionOf(τ ) = case τ ofref ρ(κ)η ⇒ ρ

i f (τ, τ ′) ⇒ regionOf(τ ) ∪ regionOf(τ ′)else ⇒ ∅

end

• Function addressOf takes an lvalue expression and returns its hosting region.addressOf : Lval → Regions

Function addressOf(lv) = case lv ofx ⇒ rx∗lv ⇒ regionOf(typeOf(lv))lv.ϕ ⇒ (addressOf(lv).offset(ϕ))

end

• Function hostOf takes a pointer type and returns its host annotations.hostOf : τ → PointerHost

Function hostOf(τ ) = case τ ofref ρ(κ)η ⇒ η

i f (τ, τ ′) ⇒ hostOf(τ ) ∪ hostOf(τ ′)else ⇒ ∅

end

• Function fldType(τ, ϕi) extracts the type of field ϕi in a structure type τ :fldTypeOf : InfType × Field → InfType

Function fldType(τ, ϕi) = case τ ofi f (τ ′, τ ′′) ⇒ i f (fldType(τ ′),fldType(τ ′′))_{(ϕi, τi, oi)}1..n ⇒ τi

end

• Function strTypeOf(τ ) extracts the actual type referred to by an expression oftype τ :strTypeOf : InfType → InfType

Function strTypeOf(τ ) = case τ ofi f (τ ′, τ ′′) ⇒i f (strTypeOf(τ ′),strTypeOf(τ ′′))int[&τ ′] ⇒τ ′ref ρ(κ)[&τ ′] ⇒τ ′ref ρ(κ)[malloc] ⇒κ

end

298 S. Tlili, M. Debbabi

• Function castType takes an annotated type and a declared type withoutannotation, and returns an annotated type.castType : InfType × DeclType → InfType

Function castType(τ, κ) =case (τ, κ) of(ref ρ(κ ′)η, int) ⇒int[&ref ρ(κ ′)η](int[&τ ′], ref (κ ′)) ⇒ castType(τ ′, ref (κ ′))(ref ρ(κ ′)[malloc],ref (κ ′′)) ⇒ref ρ(κ ′′)[&κ ′′](ref ρ(κ ′)η,ref (κ ′′)) ⇒ref ρ(κ ′′)η(if (τ, τ ′), κ) ⇒i f (castType(τ, κ),castType(τ ′, κ))

end

• Function updHost takes a type and host annotation and returns a type.updHost : InfType × Host → InfType

Function updHost(τ, η) = case τ ofi f (τ ′, τ ′′) ⇒ i f (updHost(τ ′, η),updHost(τ ′′, η))

intη′ ⇒ intηref ρ(_)η′ ⇒ ref ρ(_)η

end

• Function updRegHost takes a type, a set of regions, and a host annotation, andreturns a type.updRegHost : InfType × Regions × Host → InfType

Function updRegHost(τ, ρ, η) =case τ of

i f (τ ′, τ ′′) ⇒ i f (updRegHost(τ ′, ρ, η),updRegHost(τ ′′, ρ, η))

ref ρ(κ)η′ ⇒ if (ρ ∩ ρ′ �= ∅) thenref ρ(κ)ηelse τ

end

• Function regHostof takes a type and returns a set of region and host annota-tion pairs.regHostof : InfType → P(Regions × Host)

Function regHostof(τ ) = case τ ofi f (τ ′, τ ′′) ⇒ regHostof(τ ′) ∪ regHostof(τ ′′)ref ρ(κ)η′ ⇒ {(ρ, η′)}

end

Memory and Type Safety of C Code 299

• Function updFld takes a type, a field label, and another type, and returns a type.updFld : InferredType × Field × Type → Type

Function updFld(τ, ϕ, τ ′) =case τ ofi f (τ ′, τ ′′) ⇒ regHostof(τ ′) ∪ regHostof(τ ′′)_{(ϕ, τi, oi)}1..n ⇒_{(ϕ, τ ′

i , oi)}1..n

where

{

τ ′i = τ ′ if ϕi = ϕ

τ ′i = τi otherwise.

end

References

1. Aggarwal, A., Jalote, P.: Integrating static and dynamic analysis for detecting vulnerabilities.In: COMPSAC ’06: Proceedings of the 30th Annual International Computer Software andApplications Conference, pp. 343–350. IEEE Computer Society, Washington, DC (2006)

2. Aiken, A., Bugrara, S., Dillig, I., Dillig, T., Hackett, B. Hawkins, P.: An overview of theSaturn project. In: PASTE ’07: Proceedings of the 7th ACM SIGPLAN-SIGSOFT workshopon Program analysis for software tools and engineering, pp. 43–48. ACM, New York (2007)

3. Ashcraft, K., Engler, D.: Using programmer-written compiler extensions to catch security holes.In: SP ’02: Proceedings of the 2002 IEEE Symposium on Security and Privacy, pp. 143–159. IEEEComputer Society, Washington, DC (2002)

4. Todd, M. Scott, A., Breach, E., Sohi, G.S.: Efficient detection of all pointer and array accesserrors. In: PLDI ’94: Proceedings of the ACM SIGPLAN 1994 conference on ProgrammingLanguage Design and Implementation, pp. 290–301. ACM, New York (1994)

5. Avots, D., Dalton, M., Livshits, V.B., Lam, M.S.: Improving software security with a C pointeranalysis. In: ICSE ’05: Proceedings of the 27th International Conference on Software Engineer-ing, pp. 332–341. ACM, New York (2005)

6. Ball, T., Majumdar, R., Millstein, T., Rajamani, S.K.: Automatic predicate abstraction of Cprograms. In: PLDI ’01: Proceedings of the ACM SIGPLAN 2001 conference on ProgrammingLanguage Design and Implementation, pp. 203–213. ACM, New York (2001)

7. Beyer, D., Henzinger, T.A., Jhala, R., Majumdar, R.: Checking memory safety with BLAST.In: FASE ’05: Proceedings of the 8th International Conference on Fundamental Approaches toSoftware Engineering. LNCS, vol. 3442, pp. 2–18. Springer, Edinburgh (2005)

8. Bovet, D., Cesati, M.: Understanding the Linux Kernel, 3rd edn. O’Reilly Media, Sebastopol(2005)

9. Chen, H., Wagner, D.A.: MOPS: an infrastructure for examining security properties of Software.In: CCS ’02: Proceedings of the 9th ACM Conference on Computer and CommunicationsSecurity, pp. 235–244. ACM, New York (2002)

10. Choi, J.-D., Burke, M., Carini, P.: Efficient flow-sensitive interprocedural computation ofpointer-induced aliases and side effects. In: POPL ’93: Proceedings of the 20th ACM SIGPLAN-SIGACT symposium on Principles of Programming Languages, pp. 232–245. ACM, New York(1993)

11. Clarke, E., Kroening, D., Sharygina, N., Yorav, K.: Predicate abstraction of ANSI-C programsusing SAT. Form. Methods Syst. Des. 25(2–3), 105–127 (2004)

12. Corbin, J., Bidoit, M.: A rehabilitation of Robinson’s unification algorithm. In: IFIP Congress,pp. 909–914, Paris, 19–23 September 1983

13. Debbabi, M., Aidoud, Z., Faour, A.: On the inference of structured recursive effects withsubtyping. J. Funct. Logic Program. 1997(5), 1–15 (1997)

300 S. Tlili, M. Debbabi

14. Evans, D.: Static detection of dynamic memory errors. In: PLDI ’96: Proceedings of the ACMSIGPLAN 1996 conference on Programming Language Design and Implementation, pp. 44–53.ACM, New York (1996)

15. Fagan, M.E.: Advances in Software Inspections. IEEE Trans. Softw. Eng. SE-12, 744–751 (1986)16. Fähndrich, M., DeLine, R.: Adoption and focus: practical linear types for imperative program-

ming. In: PLDI ’02: Proceedings of the ACM SIGPLAN 2002 Conference on ProgrammingLanguage Design and Implementation, pp. 13–24. ACM, New York (2002)

17. Grossman, D., Morrisett, G., Jim, T., Hicks, M., Wang, Y. Cheney, J.: Region-based memorymanagement in cyclone. In: PLDI ’02: Proceedings of the ACM SIGPLAN 2002 conference onProgramming Language Design and Implementation, pp. 282–293. ACM, New York (2002)

18. Foster, J.S., Terauchi, T., Aiken, A.: Flow-sensitive type qualifiers. In: PLDI ’02: Proceedings ofthe ACM SIGPLAN 2002 conference on Programming Language Design and Implementation,pp. 1–12. ACM, New York (2002)

19. Johnson, R., Wagner, D.: Finding user/kernel pointer bugs with type inference. In: SSYM’04:Proceedings of the 13th conference on USENIX Security Symposium, pp. 119–134. USENIX,Berkeley (2004)

20. Kfoury, A.J., Ronchi della Rocca, S., Tiuryn, J., Urzyezyn, P.: Alpha-conversion and typability.Inf. Comput. 150(1), 1–21 (1999)

21. Larochelle, D., Evans, D.: Statically detecting likely buffer overflow vulnerabilities. In: SSYM’01:Proceedings of the 10th conference on USENIX Security Symposium, pp. 14–14. USENIX,Berkeley (2001)

22. Necula, G.C., Condit, J., Harren, M., McPeak, S., Weimer, W.: Ccured: type-safe retrofitting oflegacy software. ACM Trans. Program. Lang. Syst. 27(3), 477–526 (2005)

23. Nielson, F., Nielson H.R.: Type and effect systems. In: Correct System Design, Recent Insightand Advances, pp. 114–136. Springer, London (1999)

24. Novillo, D.: Tree-SSA: a new optimization infrastructure for GCC. In: Proceedings of the GCCDevelopers Summit3, pp. 181–193. Ottawa, June 2003

25. Popeea, C., Xu, D.N., Chin, W.-N.: A practical and precise inference and specializer for arraybound checks elimination. In: PEPM ’08: Proceedings of the 2008 ACM SIGPLAN symposiumon Partial Evaluation and Program Manipulation, pp. 177–187. ACM, New York (2008)

26. Robinson, J.A.: A machine-oriented logic based on the resolution Principle. J. ACM 12(1), 23–41(1965)

27. Rugina, R., Cherem, S.: Region inference for imperative languages. Technical report CS TR2003-1914, Computer Science Department, Cornell University (2003)

28. Sankaranarayanan, S., Ivancic, F., Gupta, A.: Program Analysis Using Symbolic Ranges. In: SAS’07: Proceedings of the 14th International Static Analysis Symposium, pp. 366–383. Springer,Kongens Lyngby (2007)

29. Siff, M., Chandra, S., Ball, T., Kunchithapadam, K., Reps, T.: Coping with type casts in C. In:ESEC/FSE-7: Proceedings of the 7th European software engineering conference held jointlywith the 7th ACM SIGSOFT international symposium on Foundations of software engineering,pp. 180–198. Springer, London (1999)

30. Steensgaard, B.: Points-to analysis in almost linear time. In: POPL ’96: Proceedings of the 23rdACM SIGPLAN-SIGACT symposium on Principles of Programming Languages, pp. 32–41.ACM, New York (1996)

31. Talpin, J.-P., Jouvelot, P.: Polymorphic type, region and effect inference. J. Funct. Program.2, 245–271 (1992)

32. Talpin, J.-P., Jouvelot, P.: The type and effect discipline. In: Information and Computation, pp.162–173. IEEE, Piscataway (1992)

33. Tlili, S., Yang, Z., Ling, H.Z., Debbabi, M.: A hybrid approach for safe memory management inC. In: AMAST’08: Proceedings of the 12th international conference on Algebraic Methodologyand Software Technology, pp. 377–391. Springer, Urbana (2008)

34. Visser, W., Havelund, K., Brat, G., Park, S.: Model checking programs. In: ASE ’00: Proceedingsof the 15th IEEE international conference on Automated Software Engineering, pp. 3–12. IEEEComputer Society, Washington, DC (2000)

35. Wagner, D., Foster, J.S., Brewer, E.A., Aiken, A.: A first step towards automated detection ofbuffer overrun vulnerabilities. In: NDSS’00: Proceedings of the Network and Distributed SystemSecurity Symposium, pp. 3–17. The Internet Society, San Diego (2000)

36. Wilson, R.P., Lam, M.S.: Efficient context-sensitive pointer analysis for C programs. In: PLDI’95: Proceedings of the ACM SIGPLAN 1995 conference on Programming Language Designand Implementation, pp. 1–12. ACM, New York (1995)