Type and Effect Annotations for Safe Memory Access in C

8
Type and Effect Annotations for Safe Memory Access in C Syrine Tlili Mourad Debbabi Computer Security Laboratory (CSL) Concordia Institute for Information Systems Engineering Concordia University, Montreal, Quebec, Canada {s_tlili,debbabi}@ece.concordia.ca Abstract In this paper, we present a novel type and effect analysis for detecting memory errors in C source code. We extend the standard C type system with effect, region, and host an- notations that hold valuable security information. We de- fine static security checks to detect errors using the annota- tions. The checks are compliant with the ANSI-C standard, while adding more security restrictions to prevent runtime errors. The flow-sensitivity nature of our analysis enables us to modify type annotations at each program point and to efficiently detect temporal errors. Moreover, we endow our type system with alias information to deal with C alias- ing pitfalls and to improve the precision of our analysis. We present an inference algorithm that automatically infers type annotations and applies security checks without pro- grammer’s intervention. 1 Introduction The C programming language is considered as the de facto standard for system programming. The main reasons of its success are its performance, flexibility, strong support, and portability. However, security features are either ab- sent or badly supported in C programming. Memory man- agement left to the programmer’s discretion are the source of many critical security vulnerabilities. As the number of lines of code grows, manual checking of security vio- lations becomes cumbersome and error-prone for program- mers. Therefore, automated tools for vulnerability detec- tion are very helpful in detecting and fixing errors in source code. This research is the result of a fruitful collaboration between CSL (Computer Security Laboratory) of Concordia University, DRDC (Defense Research and Development Canada) Valcartier and Bell Canada under the NSERC DND Research Partnership Program. There are a range of static analysis techniques [1, 3, 4] used to detect vulnerabilities at early stages of software development. These techniques have not the ability to cover all memory errors since they operate on a conserv- ative approximation of the program. They offer different tradeoffs between the precision and the complexity of the analysis. Our study of static analysis techniques indicates that flow-sensitive analysis and alias analysis are two key factors to precisely detect memory errors. Nevertheless, these factors may increase the complexity of the analysis. The current situation gives us enough room to define a new static analysis approach that offers a good complexity-vs- precision trade-off for detecting memory errors. In this paper, we describe our automated approach based on flow-sensitive type and alias analysis for detecting tem- poral memory errors in C programming. More precisely, we target the following programming errors: double freeing a pointer, using a freed pointer, freeing a pointer that has never been allocated, dereferencing NULL/unallocated pointers, assigning uninitialized values. The core idea is to instrument the C standard type system with effect, region, and host annotations in order to collect valuable security information on the analyzed program. The host annotation indicates the state and reveals the actual type of a memory location. The flow-sensitive nature of our approach allows type annotations to change from one program statement to another. Hence, we improve the analysis precision for detecting temporal errors. As in [11], the flow-sensitivity is restricted to the type annotations in order not to complicate the inference algorithm. Furthermore, we address the pitfalls of indirect assignments by endowing our approach with a flow-sensitive alias analysis [5]. As such, a modifi- cation to the annotations of a variable is propagated to all its aliases. The Third International Conference on Availability, Reliability and Security 0-7695-3102-4/08 $25.00 © 2008 IEEE DOI 10.1109/ARES.2008.18 302 The Third International Conference on Availability, Reliability and Security 0-7695-3102-4/08 $25.00 © 2008 IEEE DOI 10.1109/ARES.2008.18 302 The Third International Conference on Availability, Reliability and Security 0-7695-3102-4/08 $25.00 © 2008 IEEE DOI 10.1109/ARES.2008.18 302 Authorized licensed use limited to: CONCORDIA UNIVERSITY LIBRARIES. Downloaded on September 30, 2009 at 17:25 from IEEE Xplore. Restrictions apply.

Transcript of Type and Effect Annotations for Safe Memory Access in C

Type and Effect Annotations forSafe Memory Access in C ∗

Syrine Tlili Mourad DebbabiComputer Security Laboratory (CSL)

Concordia Institute for Information Systems Engineering

Concordia University, Montreal, Quebec, Canada

{s_tlili,debbabi}@ece.concordia.ca

Abstract

In this paper, we present a novel type and effect analysisfor detecting memory errors in C source code. We extendthe standard C type system with effect, region, and host an-notations that hold valuable security information. We de-fine static security checks to detect errors using the annota-tions. The checks are compliant with the ANSI-C standard,while adding more security restrictions to prevent runtimeerrors. The flow-sensitivity nature of our analysis enablesus to modify type annotations at each program point andto efficiently detect temporal errors. Moreover, we endowour type system with alias information to deal with C alias-ing pitfalls and to improve the precision of our analysis.We present an inference algorithm that automatically inferstype annotations and applies security checks without pro-grammer’s intervention.

1 Introduction

The C programming language is considered as the defacto standard for system programming. The main reasonsof its success are its performance, flexibility, strong support,and portability. However, security features are either ab-sent or badly supported in C programming. Memory man-agement left to the programmer’s discretion are the sourceof many critical security vulnerabilities. As the numberof lines of code grows, manual checking of security vio-lations becomes cumbersome and error-prone for program-mers. Therefore, automated tools for vulnerability detec-tion are very helpful in detecting and fixing errors in sourcecode.

∗This research is the result of a fruitful collaboration between CSL(Computer Security Laboratory) of Concordia University, DRDC (DefenseResearch and Development Canada) Valcartier and Bell Canada under theNSERC DND Research Partnership Program.

There are a range of static analysis techniques [1, 3, 4]used to detect vulnerabilities at early stages of softwaredevelopment. These techniques have not the ability tocover all memory errors since they operate on a conserv-ative approximation of the program. They offer differenttradeoffs between the precision and the complexity of theanalysis. Our study of static analysis techniques indicatesthat flow-sensitive analysis and alias analysis are two keyfactors to precisely detect memory errors. Nevertheless,these factors may increase the complexity of the analysis.The current situation gives us enough room to define a newstatic analysis approach that offers a good complexity-vs-precision trade-off for detecting memory errors.

In this paper, we describe our automated approach basedon flow-sensitive type and alias analysis for detecting tem-poral memory errors in C programming. More precisely,we target the following programming errors: double freeinga pointer, using a freed pointer, freeing a pointer that hasnever been allocated, dereferencing NULL/unallocatedpointers, assigning uninitialized values. The core idea is toinstrument the C standard type system with effect, region,and host annotations in order to collect valuable securityinformation on the analyzed program. The host annotationindicates the state and reveals the actual type of a memorylocation. The flow-sensitive nature of our approach allowstype annotations to change from one program statementto another. Hence, we improve the analysis precision fordetecting temporal errors. As in [11], the flow-sensitivity isrestricted to the type annotations in order not to complicatethe inference algorithm. Furthermore, we address thepitfalls of indirect assignments by endowing our approachwith a flow-sensitive alias analysis [5]. As such, a modifi-cation to the annotations of a variable is propagated to allits aliases.

The Third International Conference on Availability, Reliability and Security

0-7695-3102-4/08 $25.00 © 2008 IEEEDOI 10.1109/ARES.2008.18

302

The Third International Conference on Availability, Reliability and Security

0-7695-3102-4/08 $25.00 © 2008 IEEEDOI 10.1109/ARES.2008.18

302

The Third International Conference on Availability, Reliability and Security

0-7695-3102-4/08 $25.00 © 2008 IEEEDOI 10.1109/ARES.2008.18

302

Authorized licensed use limited to: CONCORDIA UNIVERSITY LIBRARIES. Downloaded on September 30, 2009 at 17:25 from IEEE Xplore. Restrictions apply.

The main contributions of our paper are the following:

• A new type system based on region, effect, and hostannotations for detecting memory errors in C sourcecode. We endow our type system with static securitychecks that use these annotations to verify and ensurethe safety of pointers usages.

• A flow-sensitive inference algorithm is defined to au-tomatically infer annotated types to program expres-sions and statements and to perform the static securitychecks.

• A prototyped GCC extension that statically type-checks C programs for temporal memory errors.

This paper is organized as follows: Section 2 outlines aninformal overview of the different components of our ap-proach. Section 3 presents the effect, region and host an-notations for security. In Section 4, we describe the typ-ing rules for expressions, statements, and control flow con-structs. Section 5 outlines the static security checks per-formed during our type analysis. We describe the limita-tions of our static analysis in Section 6. Section 7 presentsour annotations inference algorithm and the proof of thesoundness between our inference algorithm and the typesystem. We illustrate our prototype with a case study inSection 8. We discuss the related work in Section 9. Fi-nally, we draw conclusions in Section 10.

2 Approach Overview

Our type and effect analysis involves different compo-nents that interact in order to detect temporal memory errorsas detailed hereafter:

• We extend the standard C type system with annota-tions in order to collect relevant security informationof the analyzed program.

• We define static security checks that use our aforemen-tioned annotations to verify and ensure the safety ofpointers usages. Our defined type system is endowedwith these security checks to enforce memory safetyproperties.

• We define flow-sensitive annotated types that are al-lowed to change from one program statement to an-other [11]. Notice that a statement that directlychanges the annotations of a given location is also in-directly changing the annotations of its aliases. Assuch, we elaborate a recursive algorithm based onflow-sensitive alias information in order to propagatetype annotations updates to all locations directly andindirectly involved in a program statement.

3 Extending C Type System

This section presents the effect, region, and host annota-tions that our type system uses to collect valuable securityinformation of the analyzed program.

• The domain of regions is intended to abstract vari-ables’ memory locations and dynamic memory loca-tions.

r ::= ∅ | � | ρ | {ρ1, . . . , ρn}

The symbols ρ, ρ′ represent known values from theregions domain. The symbol � stands for currently un-known location. We use the notation {ρ1, . . . , ρn} torepresent the set of regions a pointer may refer to.

• The domain of declared types defines a subset of thebasic types of the C language.

κ ::= int | void | ref (κ)

It includes the empty type void, the integer type, andthe pointer type ref (κ).

• The domain of inferred types decorates the declaredtypes with top-level effect, region, and host annota-tions, i.e, inserted at the outermost constructor.

τ ::= if �(τ, τ′) | intη | void | ref ρ(κ)η

The pointer type ref ρ(κ)η is annotated with its mem-ory location ρ. The types intη and ref ρ(κ)η have a hostannotation η that indicates the content of their relatedmemory location.

• The host annotation indicates the content of a memorylocation.

η ::= malloc | free | wild | &τ

The element malloc denotes an allocated pointer. Theelement free indicates a freed memory location. Thevalue wild denotes unallocated pointer or uninitializedinteger. The element &τ stands for a region holding avalue of type τ .

• The domain of effects [12] captures the side effect ofmemory operations.

σ ::= ∅ | σ; σ′ | if �(σ, σ′) | rec�(σ)| alloc(ρ, �) | dealloc(r, �)| read(ρ, τ, �) | assign(ρ, τ, �)

303303303

Authorized licensed use limited to: CONCORDIA UNIVERSITY LIBRARIES. Downloaded on September 30, 2009 at 17:25 from IEEE Xplore. Restrictions apply.

We use ∅ to denote the absence of effects. The termσ; σ′ denotes the sequencing of σ and σ′. Each effectrecords the program point � of the source code whereit is produced. The effect if �(σ, σ′) refers to a branch-ing conditions at program point �, where the effects σand σ′ are produced at the true branch and at the falsebranch respectively. The effect rec�(σ) stands for a re-cursively defined effect generated in a loop construct atprogram point �. The term alloc(ρ, �) and dealloc(r, �)denotes memory allocation and memory deallocationrespectively. The effect arith(r, �) captures arithmeticoperations on pointers. The effect read(r, τ, �) de-scribes access to regions in r that holds a value of typeτ . The effect assign(r, τ, �) represents the assignmentof a value of type τ to regions in r.

The declared types and the inferred types are relatedthrough the "̂" operator: When applied on a declared typeκ, it returns an inferred type τ with a host annotation setto [wild] and unknown region location � for pointers. Onthe other hand, the " ¯" operator suppresses all the annota-tions of inferred types. When applied on an if �(τ, τ

′) typeconstruct, it yields the following: if �(τ, τ ′) = τ̄ = τ̄ ′

3.1 Syntax

We illustrate our analysis on a core syntax, presented inFigure 1, that captures the essence of the C language. Aprogram π contains variable declarations, denoted by δ, andprogram statements, referred to by s. Expressions e com-prise lvalues lv and rvalues rv . The rvalues include integerscalar n, dereferencing expression ∗rv, and arithmetic op-eration on pointers e op e′. The lvalues are access paths

Program π ::= δ sDeclarations δ ::= κ x ; δ | nilExpressions e ::= lv | rv

Rvalues rv ::= n | ∗rv | e op e′

Lvalues lv ::= x | ∗lvStatements s ::= s1; s2 | if b then s1 else s2

| while b do s | free(lv)| lv = malloc(e) | lv = e

Figure 1. Core syntax of an imperative lan-guage

to memory locations, it encompasses variables and pointerdereferencing operators. The statements s include the con-trol flow constructs (sequencing, conditionals, and loops),the deallocation operation free(lv), the allocation operationlv= malloc(e), and assignment operations.

4 Typing Rules

In this section, we detail the typing rules of our static se-mantics. We use in these rules the auxiliary functions: (1)regionof (lv) that returns the set of regions an lvalue may re-fer to, (2) hostof (τ) that returns the host annotation of a typeτ . The typing rules for program expressions are presentedin Section 4.1, the type judgements for statements are pre-sented in Section 4.2, and Section 4.3 outlines the typingrules for control flow statements. We based our typing ruleson the type system for imperative languages in [13].

4.1 Typing Rules for Program and Ex-pressions

The typing rules for expression, defined in Figure 2, arebased on the type annotations and the static security checksdefined Section 5. The sequent � (δ, s) indicates that theprogram containing declarations δ and statements s is well-typed. The deduction E � δ : E ′ yields a new environmentE ′ by adding the declarations δ to environment E . The se-quent E � �, e : τ, σ states that under typing environment Eand at program point �, the expression e has type τ and theevaluation of e yields the effect σ. Through this paper, wewill write E † E ′ to denote the overwriting of E by E ′, i.e.,the domain of E † E ′ is Dom(E) ∪ Dom(E ′), and we have(E † E ′)(x) = E ′(x) if x ∈ Dom(E ′) and E(x) otherwise.

Program and Declarations

(program) � δ : E E � �, s, E ′, σ� (δ, s)

(nil-decl) E � nil : E (var-decl) � δ : E κ̂ = τE � κ x; δ : E † [x �→ κ̂]

Expressions

(var)E(x) = τ

E � �, x : τ, ∅ (int) E � �, n : int[], ∅

(pointer arith)

E � �, e : τ, σ E � �, e′ : intη′ , σ′

τ̄ = ref (κ) r = regionof (τ)E � �, e op e′ : τ, (σ; σ′; arith(r, �))

(pointer deref)

E � �, e : τ, σ safeRead(e, τ, σ, �)τ̄ = ref (κ) r = regionof (τ)

τ ′′ = storedType(τ)E � �, ∗e : τ ′′, (σ; read(r, τ ′′, �))

Figure 2. Typing rules for program declara-tions and expressions

The rule (var-decl) maps in E a declared variable of typeκ to the annotated type κ̂. At variable declarations, regionannotations have unknown values and host annotations areset to [wild]. The rule (pointer arith) evaluates pointer arith-metic that is captured by effect arith(r, �), where r is the set

304304304

Authorized licensed use limited to: CONCORDIA UNIVERSITY LIBRARIES. Downloaded on September 30, 2009 at 17:25 from IEEE Xplore. Restrictions apply.

of possible regions of the pointer. The rule (pointer deref)dereferences a pointer expression e of type τ , if the checksafeRead(_) succeeds. We call the function storedType(τ)to get the actual type τ ′′ from the host annotation of pointertype τ . The dereference operation generates the effectread(r, τ ′′, �).

4.2 Typing Rules for Statements

This section presents the typing rules for statements, de-fined in Figure 3. We consider memory allocation, mem-ory deallocation and assignment statements. The aforemen-tioned statements modify the type annotations of their re-lated lvalues. To express flow-sensitivity, the statementsjudgement is of the form E � �, s, E ′, σ to express thatunder typing environment E , the execution of statement syields a new environment E ′ and produces the effect σ. Therecursive algorithm updEnv() used in our typing rules up-dates the type annotations of an lvalue directly involved ina program statement. It also updates the type annotations ofthe aliases of that lvalue. For space constraint, we do not il-lustrate the algorithm of the function updEnv() in this paper.The function updEnv() takes as arguments the current envi-ronment E , a statement s, and its program point �. Then, itoutputs a new environment E’ with updated annotations forvariables directly and indirectly involved in statement s.

Statements

(free)

E � �, lv : τ, σ safeFree(lv, τ, σ, �)τ̄ = ref (κ) r = regionof (τ ) s : free(lv)E � �, s, updEnv(E , s, �), (σ; dealloc(r, �))

(malloc)

E � �, lv : τ, σ E � �, e : intη , σ′

s : lv = malloc(e) τ̄ = ref (κ) ρ freshE � �, s, updEnv(E , s, �), (σ; σ′; alloc(ρ, �))

(assign)

E � �, lv : τ, σ E � �, e : τ ′, σ′ s : lv = esafeWrite(e, τ ′, τ, �) ρ = regionof (lv)

E � �, s, updEnv(E , s, �), (σ; σ′; assign(r, τ ′, �))

Figure 3. Typing rules for program state-ments

The rule (free) conservatively deallocates all memory lo-cations in r an lvalue lv may refers to. The deallocation isguarded by the safeFree(_) security check and produces theeffect dealloc(r, �). The call to updEnv() yields a new en-vironment E ′ where the host annotation of lv and of all itsaliases is set to [free]. The rule (malloc) allocates a freshmemory location ρ for pointer designated by lv and sets itshost annotation to [malloc]. We add the effect alloc(ρ, �)for allocating memory region ρ at program point �. For theother assignment statements, we invoke the safeWrite(_)check, detailed in Section 5, that verifies that a value of

type τ ′ can be assigned to a variable of type τ . If thecheck succeeds, the function updEnv() propagates the an-notations from the rhs operator to the lhs operator. Then,it updates all type annotations of the lhs operator’s aliasesaccordingly. The effect generated for assigning a value oftype τ is assign(r, τ, �), where r is the possible set of re-gions pointed to by lv .

4.3 Typing Rules for Control Flow State-ments

The rules for control flow statements, defined in Figure4, endows our analysis with more precision to detect vul-nerabilities.

Control Flow Statements

(seq) E � �, s1, E ′, σ1 E ′ � �′ , s2, E ′′, σ2

E � �, s1; s2, E ′′, (σ1; σ2)

(cond) E � �, b : bool, ∅ E � �′ , s1, E ′, σ1 E � �′′, s2, E ′′, σ2

E � �, if b then s1 else s2, E ′�� E ′′, if �(σ1, σ2)

(loop)

E ′ � �, b : bool, ∅ E ′ � �, s, E ′′, σE ′ = E† rec (updEnv(E , s, �))

E � �, while b do s, E �� E ′, rec�(σ)

Figure 4. Typing rules for control flow con-structs

The rule (seq) defines the sequencing of statementswhere the generated effect is the sequencing effect of s1 ands2. For the rule (cond), we introduce the merge operator ��defined hereafter:

(E �� E ′)(x) =

�����

E(x) if x /∈ Dom(E ′),

E ′(x) if x /∈ Dom(E),

if �(E(x), E ′(x)) if x ∈ E(x) ∩ E ′(x).

The if �(E(x), E ′(x)) type construct is assigned to variablex at the merge point of the branch condition at programpoint �. It says that variable x is of type E(x) (if the truebranch is followed) or of type E ′(x) (if the false branch isfollowed). The rule (loop) captures the recursive executionof statements inside a loop construct. The recursivecall rec(updEnv(E , s, �)) denotes 0 to infinite number ofiterations of the loop, where each iteration yields a newenvironment as defined hereafter:

rec (updEnv(E , s, �)) = let E ′ = updEnv(E , s, �) inrec (updEnv(E ′, s, �))

The effect rec�(σ) denotes a recursively (infinite) effectgenerated by a loop at program point �. The environmentderived from a loop construct is equal to E �� E ′. This de-notes that a loop may not be accessed if the condition b doesnot hold, hence the initial environment remains unchanged.

305305305

Authorized licensed use limited to: CONCORDIA UNIVERSITY LIBRARIES. Downloaded on September 30, 2009 at 17:25 from IEEE Xplore. Restrictions apply.

5 Static Security Checks

We present in Figure 5 the static security checksperformed by our analysis before using a pointer of typeτ = ref ρ(κ)η . Some of these checks refer to the effectsmodel σ to verify the presence or the absence of a specificeffect. We define the function oneTrace(µ, σ) that returnstrue, if the effect µ is present in at least one possible traceextracted from the effect model σ.

safeRead(e, τ, σ, �) = (κ �= void) ∧ (η /∈ {free, wild})∧¬oneTrace(arith(ρ, _))

safeWrite(e, τ, τ ′, �) = (hostof (τ ) = &τ ′′)∧ (τ̄ = τ̄ ′)

safeFree(lv, τ, σ, �) = (η = malloc)∨((η /∈ {free, wild})∧oneTrace(alloc(ρ, _), σ))

safe_(e, if �(τ, τ ′), _) = safe_(e, τ, _) ∧ safe_(e, τ ′, _)

Figure 5. Static security checks for memoryaccess

Safe Pointer Dereference. The safeRead(e, τ, σ, �) checksthat pointer e of type τ can be safely dereferenced at pro-gram point � It fails for void, freed, unallocated, and NULLpointers. Statically, we consider dereference of arithmeticpointers as unsafe. We issue an error for accessing suchpointers. However, we plan to resort to dynamic analysis toverify pointer boundaries as described in Section 6.Safe Assignment. The safeWrite(e, τ, τ ′, �) checks that therhs expression e of type τ can be assigned to an lvalue oftype τ ′. It fails if the rhs expression is uninitialized or if ithas a declared type different from the lvalue.Safe Memory Deallocation. The safeFree(lv, τ, σ, �)checks that pointer lv can safely be deallocated at programpoint �. It fails for un-allocated, freed, and not dynamicallyallocated pointers. For the last case, we use the collectedeffects model σ to verify that the region to free has pre-viously been allocated. In other words, we verify that aneffect alloc(ρ, _) is present in the effects trace σ of the pro-gram before freeing region ρ.

6 Static Analysis Limitations

As for all static techniques, our conservative type analy-sis generates false positives and has undecidability issueswhen runtime information is required. In fact, our staticapproach has limitations as detailed in the following cases:

• We may face undecidability when static securitychecks are performed on a may-aliased expression that

has two possible types depending on the chosen exe-cution path. Since we can not statically determine theexecuted branch, we require both types τ1 and τ2 in agiven type construct if �(τ1, τ2) to be safe in order topass the security check. If one of the types succeedsand the other fails, we face an undecidable case andthe check fails.

• Our analysis performs an exhaustive traversal of all ex-ecution paths of the program. However, the analysis ispath-insensitive and does not prune infeasible paths.Hence, we may generate false positives by consideringpaths that actually are never executed.

In a future work, we plan to reduce the number of falsepositives and solve undecidable cases by resorting to codeinstrumentation. The generated effects model is useful toguide the instrumentation of the code, for it captures thesuspected memory operations, their program point, andtheir execution paths.

7 Inference Algorithm

This section is dedicated to region, effect, and host an-notations inference. The algorithm Infer is applied recur-sively on expressions and statements to evaluate their typesand annotations. The complete algorithm Infer algorithmis presented in Figure 6. For expressions, the Infer algo-rithm takes as input a type environment E , an expression e,and its program point �. It returns a type for the expressionand a generated effect from the evaluation of the expression.For statements, the algorithm takes as input the current en-vironment E , a statement s, and its program point �. Theevaluation of statement s yields a new type environment E ′

computed by the updEnv() algorithm and a generated effect.The inference algorithm fails if the static security checksdefined in Section 5 for memory safety fail.

7.1 Soundness

In the following, the intention is to prove that our infer-ence algorithm is sound with respect to the static semantics.

Soundness (Theorem) Let E be the typing environment, ean expression, and s a statement:

• If Infer(E , �, e) = (τ, σ), then E � �, e : τ, σ.

• If Infer(E , �, s) = (E ′, unit, σ), then E � �, s, E ′, σ

Proof of Soundness The proof is done by structural in-duction on expressions. Due to space constraints, wepresent a sketch of the proof for the (ref) rule and the (as-sign) rule:

306306306

Authorized licensed use limited to: CONCORDIA UNIVERSITY LIBRARIES. Downloaded on September 30, 2009 at 17:25 from IEEE Xplore. Restrictions apply.

Infer(δ, E) =case δ of

nil ⇒ []κx; δ′ ⇒ let E ′ = E † [x �→ κ̂] in

let E ′′ = Infer(δ′, E ′) inE ′′

endend

Infer(δ, s) =let E = Infer(δ, []) in

Infer(E , s, �)end

Infer(E , �, ∗e) =let (τ, σ) = Infer(E, e, �)

safeRead(e, τ, σ, �) = trueτ ′′ = storedType(τ)r = regionof (τ)

in(τ ′′, (σ; read(r, τ ′′, �)))

end

Infer(E , �, x) =let τ = Ex in (τ, ∅) end

Infer(E , �, n) = (int[&int], ∅)Infer(E , �, e op e′) =

let (τ, σ) = Infer(E , �, e)τ̄ = ref (κ)r = regionof (τ)(intη′ , σ′) = Infer(E, �, e′)

in(τ, (σ; σ′; arith(r, τ, �)))

end

Infer(E , �, free(lv)) =let (τ ′, σ) = Infer(E, �, lv)

τ̄ ′ = ref (κ)safeFree(lv, τ ′, σ, �) = trueE ′ = updEnv(E , free(lv), �)r = regionof (lv)

in(E ′, unit, (σ; dealloc(r, �)))

end

Infer(E , �, lv = e) =let (τ, σ) = Infer(E, �, lv)

(τ ′, σ′) = Infer(E, �, e)safeWrite(e, τ ′, τ, �) = true;E ′ = updEnv(E, lv = e, �)r = regionof (lv)

in(E ′, unit, (σ; σ′; assign(r, τ ′, �)))

end

Infer(E , �, s′; s′′) =let (E ′, unit, σ′) = Infer(E , �, s′)

(E ′′, unit, σ′′) = Infer(E ′, �, s′′)in

(E ′′, unit, (σ′; σ′′))end

Infer(E , �, while b do s) =let E ′ = E† rec (updEnv(E, s, �))

(E ′′, unit, σ) = Infer(E ′, �, s)in

(E �� E ′, unit, rec�(σ))end

Infer(E , �, lv = malloc(e)) =let (τ, σ) = Infer(E, �, lv)

τ̄ = ref (κ)(intη′ , σ′) = Infer(E, �, e)ρ′freshE ′ = updEnv(E, lv = malloc(e), �)

in(E ′, unit, (σ; σ′; alloc(ρ′, �)))

end

Infer(E , �, if b then s′ else s′′) =let (E ′, unit, σ′) = Infer(E , �, s′)

(E ′′, unit, σ′′) = Infer(E, �, s′′)in

(E ′�� E ′′, unit, if �(σ

′; σ′′))end

Figure 6. Annotations inference algorithm

• Case of (pointer deref): By hypothesis, we have

Infer(E, �, ∗e) = (τ, (σ; read(r, τ, �))).

By the definition of the algorithm, this requires:

Infer(E , �, e) = (τ ′, σ) and τ̄ ′ = ref (κ)and safeRead(e, τ, σ, �) and storedType(τ ′) = τ

By the definition of the (pointer deref) rule, we have:

E � �, ∗e : τ, (σ; read(r, τ, �))

• Case of (assign): By hypothesis, we have

Infer(E, �, lv = e) = (E ′, unit, (σ; σ′; assign(r, τ ′, �)))

By the definition of the algorithm, this requires:

Infer(E , �, lv) = (τ, σ) and Infer(E , �, e) = (τ ′, σ′)and safeWrite(e, τ ′, τ, �) and r = regionof (lv)and E ′ = updEnv(E , lv = e, �)

By the definition of the (assign) rule, we have:

E � �, lv = e, E′, (σ; σ′; assign(r, τ ′, �))

8 Implementation and Case Study

We prototyped our approach as an extension to the GCCcompiler for the C programming language [8]. Our im-plementation is based on the GCC core distribution version

307307307

Authorized licensed use limited to: CONCORDIA UNIVERSITY LIBRARIES. Downloaded on September 30, 2009 at 17:25 from IEEE Xplore. Restrictions apply.

4.2.0. We implemented our type analysis as a security passthat infers type annotations and collects effects during theoptimization phase of the GCC compiler. To enable ourstatic analysis, we pass the -ftree-type-inference command-line option to the extended GCC compiler. The sample code

#include <stdio.h>2: int main(int argc, char *argv[])3: {4: int i;5: const int BUFSIZE ;6: char *buf1;7:8: buf1 = (char*)malloc(BUFSIZE);9: *buf1 = ’u’;10:11: free(buf1);12: *buf1 = ’o’;13:14: if (argc % 2 == 0)15: {16: buf1 = (int*)malloc(BUFSIZE);17: }18: free(buf1);19: return 0;20: }

Analyzing function main

L8: MODIFY_EXPRMalloc of buf1 REGION 8:2150effect(alloc,8:2150,8)

L9: INDIRECT_REFSAFE DEREF OF buf1 REGION 8:2150effect (read,8:2150,char,9)

L9: MODIFY_EXPRSAFE WRITE to buf1 REGION 8:2150effect (assign,8:2150,char,9)

L11:CALL_EXPRSAFE FREE OF buf1 REGION 8:2150effect(dealloc,8:2150,11)

L12:INDIRECT_REFERROR !!! DEREF FREED buf1 REGION 8:2150(alloc,8:2150,8)->(read,8:2150,char,9)->(write,8:2150,char,9)->(dealloc,8:2149,11)

L16:MODIFY_EXPRMalloc of buf1 REGION 16:2150effect(alloc,16:2150,16)

L18:CALL_EXPRUNDECIDABLE DOUBLE FREE buf1 REGION 8:2150(alloc,8:2150,8)->(dealloc,8:2150,11)->(If,14,false)->(dealloc,8:2150,18)

Figure 7. Sample code to illustrate our analy-sis

of Figure 7 illustrates our implementation of the type analy-sis. The output of our static pass shows that we can de-tect errors and outputs their execution traces based on thecollected effects. Furthermore, our analysis can detect un-decidable cases when dealing with branch conditions suchas in line 18 of the sample code. It also outputs the exact

trace that can lead to a double free of pointer buf1. Hence,we show that our approach can be easily extended in a fu-ture work to cooperate with a dynamic analysis in order toreduce false positives and solve undecidable cases.

9 Related Work

This section presents approaches and tools for vulner-abilities detection in C source code. The MetaCompila-tion (MC) [1] is a static analysis tool that uses a flow-basedanalysis approach for detecting temporal security errors inC code. Their analysis is flow-sensitive, however unlikeour approach they do not perform alias analysis and rely onheuristics to deal with aliasing issues. MOPS [4] is anothertool that detects temporal vulnerabilities using a model-checking technique. MOPS is more appropriate to detecthigh-level security properties than safety properties. In fact,it assumes that the analyzed program is memory safe. Thereare some others model-checking tools based on predicateabstraction for vulnerabilities detection in source code suchas BLAST [14], and SLAM [3]. The SLAM model checkeris mainly used to verify windows drivers and have not beenused for verifying memory and type errors. The BLASTmodel checker does not easily handle C pointers, it has beenused with CCured to reduce runtime checks for memory er-rors.

Type systems have been used to verify security proper-ties in source code. The type-based approach used by thetool CQual [7], consists of extending the type system withtype qualifiers that are used to express security properties.To our knowledge, it has not been used to detect memoryand type errors as we do. The literature contains proposalson hybrid approach analysis that combines static and dy-namic analysis: CCured [9], SafeC [2], Vault [6], and Cy-clone [10]. These language-based tools extend the C typesystem in order to detect memory and type errors. Theyresort to code instrumentation when static analysis unde-cidability is faced. We still did not define a dynamic phasefor our approach. However we illustrated that the effectsmodel that we generate can be extended in future work toguide code instrumentation.

10 Conclusion

We presented a novel flow-sensitive type and effectanalysis for detecting memory errors in C source code. Ourtype analysis is based on effect, region, and host annota-tions. We defined security checks that are compliant withthe ANSI-C standard, we do not allow operations forbid-den by the standard. However, we added restrictions to en-sure memory safety not supported by the standard C typesystem. We developed a flow-sensitive inference algorithm

308308308

Authorized licensed use limited to: CONCORDIA UNIVERSITY LIBRARIES. Downloaded on September 30, 2009 at 17:25 from IEEE Xplore. Restrictions apply.

to insert annotated types to program. The advantage ofour algorithm is that annotations are automatically inferredwithout any external intervention. Furthermore, the flow-sensitivity and alias analysis of our inference algorithm en-abled us to detect more efficiently temporal memory errorsand to address C aliasing pitfalls. Our approach based oneffects analysis can also be easily extended with runtimechecks to increase its precision and reduce the number offalse positives.

References

[1] Ken Ashcraft and Dawson Engler. Using Programmer-Written Compiler Extensions to Catch Security Holes.In SP ’02: Proceedings of the 2002 IEEE Symposiumon Security and Privacy, page 143, Washington, DC,USA, 2002. IEEE Computer Society.

[2] Todd M. Austin, Scott E. Breach, and Gurindar S.Sohi. Efficient Detection of all Pointer and Array Ac-cess Errors. In PLDI ’94: Proceedings of the ACMSIGPLAN 1994 conference on Programming languagedesign and implementation, pages 290–301, 1994.

[3] Thomas Ball, Rupak Majumdar, Todd Millstein, andSriram K. Rajamani. Automatic Predicate Abstractionof C Programs. In PLDI ’01: Proceedings of the ACMSIGPLAN 2001 conference on Programming languagedesign and implementation, pages 203–213, 2001.

[4] Hao Chen and David A. Wagner. MOPS: an In-frastructure for Examining Security Properties of Soft-ware. In CCS ’02: Proceedings of the 9th ACM con-ference on Computer and communications security,pages 235–244, 2002.

[5] Jong-Deok Choi, Michael Burke, and Paul Carini. Ef-ficient flow-sensitive interprocedural computation ofpointer-induced aliases and side effects. In POPL’93: Proceedings of the 20th ACM SIGPLAN-SIGACTsymposium on Principles of programming languages,pages 232–245, New York, NY, USA, 1993. ACMPress.

[6] Manuel Fahndrich and Robert DeLine. Adoption andFocus: Practical Linear Types for Imperative Pro-gramming. In PLDI ’02: Proceedings of the ACMSIGPLAN 2002 Conference on Programming lan-guage design and implementation, pages 13–24, 2002.

[7] Jeffrey S. Foster, Manuel Fahndrich, and AlexanderAiken. A Theory of Type Qualifiers. In SIGPLANConference on Programming Language Design andImplementation, pages 192–203, 1999.

[8] Free Software Foundation Inc. GCC Internals. http://gcc.gnu.org/onlinedocs/gccint/.

[9] George C. Necula, Scott McPeak, and WestleyWeimer. CCured: Type-Safe Retrofitting of LegacyCode. In Symposium on Principles of ProgrammingLanguages, pages 128–139, 2002.

[10] Dan Grossman, Greg Morrisett, Trevor Jim, MichaelHicks, Yanling Wang, and James Cheney. Region-based Memory Management in Cyclone. In PLDI’02: Proceedings of the ACM SIGPLAN 2002 Con-ference on Programming language design and imple-mentation, pages 282–293, 2002.

[11] Jeffrey S. Foster, Tachio Terauchi, and Alex Aiken.Flow-Sensitive Type Qualifiers. In PLDI ’02: Pro-ceedings of the ACM SIGPLAN 2002 Conference onProgramming language design and implementation,pages 1–12, 2002.

[12] Flemming Nielson and Hanne Riis Nielson. Typeand effect systems. In Correct System Design, Re-cent Insight and Advances, (to Hans Langmaack onthe occasion of his retirement from his professorship atthe University of Kiel), pages 114–136, London, UK,1999. Springer-Verlag.

[13] R. Rugina and S. Cherem. Region Inference for Im-perative Languages. Technical Report CS TR2003-1914, Computer Science Department, Cornell Univer-sity, 2003.

[14] Willem Visser, Klaus Havelund, Guillaume Brat, andSeungJoon Park. Model Checking Programs. In ASE’00: Proceedings of the 15th IEEE international con-ference on Automated software engineering, page 3,Washington, DC, USA, 2000. IEEE Computer Soci-ety.

309309309

Authorized licensed use limited to: CONCORDIA UNIVERSITY LIBRARIES. Downloaded on September 30, 2009 at 17:25 from IEEE Xplore. Restrictions apply.