A Formal Modeling Scheme for Analyzing a Software System ...

A Formal Modeling Scheme for Analyzing a Software System Designagainst the GDPR

Evangelia Vanezi, Georgia M. Kapitsaki, Dimitrios Kouzapas and Anna PhilippouDepartment of Computer Science, University of Cyprus, Nicosia, Cyprus

Keywords: GDPR, Privacy Protection, π-calculus, Static Analysis, Privacy by Design.

Abstract: Since the adoption of the EU General Data Protection Regulation (GDPR) in May 2018, designing softwaresystems that conform to the GDPR principles has become vital. Modeling languages can be a facilitator forthis process, following the principles of model-driven development. In this paper, we present our work on theusage of a π-calculus-based language for modeling and reasoning about the GDPR provisions of 1) lawfulnessof processing by providing consent, 2) consent withdrawal, and 3) right to erasure. A static analysis methodbased on type checking is proposed to validate that a model conforms to associated privacy requirements. Thisis the first step towards a rigorous Privacy-By-Design methodology for analyzing and validating a softwaresystem model against the GDPR. A use case is presented to discuss and illustrate the framework.

1 INTRODUCTION

Privacy protection is now more crucial than ever, andthis is reflected in the new EU General Data Protec-tion Regulation (GDPR) (European Parliament andCouncil of the European Union, 2015). The GDPRhas been applied since May 25th, 2018 not only to or-ganizations established in the EU, but also to non-EUestablished organizations that process personal data1

of individuals who are located in the EU. The GDPRdefines multiple principles and rights for EU residentsin regard to personal data collection, storage and pro-cessing. All existing systems must be reviewed andadapt their processes accordingly, and new systemsmust be built and function in a way that ensures theirGDPR compliance.

As this change is recent, there have only been fewattempts to understand the GDPR, mainly from thelegal perspective, and even fewer discussing mech-anisms to support GDPR compliance (Gjermundrødet al., 2016; Hintze and LaFever, 2017). In the con-text of this work, our aim is to discuss the GDPR withregard to data collection and processing as this needarises in software systems, and to propose a rigor-ous methodology for embedding GDPR complianceinto a software engineering development methodol-ogy, specifically into the design and modeling phase,

1As defined in GDPR Article 4(1), personal data is anyinformation relating to an identified or identifiable naturalperson (’data subject’)

using a π-calculus (Milner et al., 1992) based for-mal modeling scheme exploiting the Privacy calcu-lus (Kouzapas and Philippou, 2017). Additionally,we present an extension to the Privacy calculus in-tegrating the notion of granting and withdrawing con-sent. The ulterior, long-run objective of our work is toprovide a rigorous framework for developing GDPR-compliant systems that can support the entire soft-ware development methodology, thus pursuing con-formance to Privacy by Design (PbD) or, as referredto in the GDPR, Data Protection By Design (Rubin-stein, 2011).

Specifically, in this paper we describe the integra-tion of the following principles of the GDPR within aformal framework for reasoning about privacy-relatedproperties: lawfulness of processing by providingconsent, consent withdrawal, and right to erasure. Inparticular, we build on the Privacy calculus, a pro-cess calculus based on the π-calculus with groupsextended with constructs for reasoning about privatedata (Cardelli et al., 2000). We extend the PrivacyCalculus syntax and semantics, in order to includethe aforementioned GDPR principles. Furthermore,we associate the calculus with a type system thatvalidates the proper processing of data inside a sys-tem and we prove that a well-typed system satisfiesthe above mentioned GDPR requirements. We arguethat the resulting framework can be used for provid-ing models of software systems while supporting thestatic analysis of GDPR compliance during the design

68Vanezi, E., Kapitsaki, G., Kouzapas, D. and Philippou, A.A Formal Modeling Scheme for Analyzing a Software System Design against the GDPR.DOI: 10.5220/0007722900680079In Proceedings of the 14th International Conference on Evaluation of Novel Approaches to Software Engineering (ENASE 2019), pages 68-79ISBN: 978-989-758-375-9Copyright c© 2019 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved

phase. Furthermore, we discuss the possibility for fu-ture additions to the framework, incorporating moreGDPR provisions, in order to support an almost fullycompliant software model design and verification.

We consider the type system accompanying ourframework to be a promising dimension of our ap-proach, since it enhances the methodology with theability to statically validate models against formally-defined privacy policies. This approach is comple-mentary to verification by model-checking and hasthe advantage that it does not require the constructionof a system’s state space, which may yield the state-space explosion problem. Moreover, this dimensionof our proposal can be transferred to the developmentphase of software systems through proper tools to-wards the generation of GDPR-compliant code. Thus,as future work we envision the development of theframework into programming semantics and analy-sis tools for supporting the software engineering de-velopment phase and the construction of privacy-respecting code following the Privacy by Design prin-ciple.

The remainder of the paper is structured as fol-lows. In Section 2, we discuss some backgroundwork: we present the Principle of Data Protection byDesign and the use of modeling languages during thedesign phase of the software development cycle. Wethen provide a quick overview of the π-calculus andthe Privacy calculus. In Section 3, we introduce thePrivacy calculus as a language to model a softwaresystem handling private data, and in Section 4 we pro-pose an extension to the Privacy calculus that incorpo-rates the GDPR principles of lawfulness of processingby providing consent, consent withdrawal and right toerasure into the software’s model. Section 5 presentsthis framework formally and it develops a type systemfor guaranteeing compliance to privacy-related prop-erties pertaining to the GDPR principles under inves-tigation. Finally, Section 6 concludes the paper anddiscusses possible directions for future work. Due tolack of space, proofs of results are omitted. Further-more, for the complete exposition to the Privacy cal-culus the reader is referred to (Kouzapas and Philip-pou, 2017).

2 BACKGROUND AND RELATEDWORK

2.1 Data Protection by Design

Privacy by Design was introduced by the Dutch DataProtection Authority (Data Protection and Privacy

Commissioners, 2010) and the former Informationand Privacy Commissioner of Ontario, Canada, AnnCavoukian (Cavoukian, 2008) having as a result oftheir joint project a published report (Hes and Bork-ing, 1998) as explained in (Hustinx, 2010). It advo-cates that privacy should be incorporated into systemsby default and should be a priority from the beginningof a system’s design. Similarly, the GDPR (EuropeanParliament and Council of the European Union, 2015)defines Data Protection by Design in Article 25(1) as“implementing the appropriate technical and organi-zational measures in order to meet the requirementsof the Regulation and protect the rights of data sub-jects”. Consequently, the obligation of software de-velopers to build systems designed to be compliant tothe GDPR becomes apparent.

By its definition, Data Protection by Design needsno specific distinct analysis, handling and mecha-nisms in order to be fulfilled. If all principles andrights are correctly embedded in a system’s specifi-cations model and are transferred into the succeedingimplementation, then Data Protection by Design willalso be guaranteed. Moreover, Data Protection byDesign advocates the proactive consideration of pri-vacy issues: ensuring a software’ compliance shouldtake place from the design phase and possible pitfallsshould be anticipated and prevented before they canmaterialise, rather than remedied on a reactive basis.

In the literature, different approaches to PbD havebeen proposed. PriS is a security requirements engi-neering method that integrates privacy requirementsin the early stages of the system development pro-cess (Kalloniatis et al., 2008). It focuses on the impactof privacy requirements onto the organizational pro-cesses and on suggesting implementation techniquesfor realizing these requirements. privacyTracker con-siders the GDPR and proposes a framework that sup-ports some of its basic principles including data trace-ability and allowing a user to get a cryptographicallyverifiable snapshot of her data trail (Gjermundrødet al., 2016). PbD for Internet of Things and relevantsteps that should be followed for assessing applica-tions is discussed in (Perera et al., 2016).

2.2 Software Design and Modeling

During the design phase, a software system model isusually developed capturing the requirements speci-fied during the requirements analysis step. Such amodel presents a view of the software at a suitablelevel of abstraction by visualizing, specifying anddocumenting all artifacts and aspects of software sys-tems. To this effect, modeling languages have beendeveloped and are employed for providing system

A Formal Modeling Scheme for Analyzing a Software System Design against the GDPR

69

models, a notable example being the Unified Mod-eling Language (UML) (Fowler, 2004). Extensionsto UML have been proposed, such as UMLsec thatallows to express security relevant information fora system specification (Jurjens, 2002), the Privacy-aware Context Profile (PCP) that can be exploited incontext-aware applications (Kapitsaki and Venieris,2008) and the Privacy UML profile that can be usedto capture the privacy policies of a software sys-tem (Basso et al., 2015). These models can be sub-sequently used to test whether the design meets thesystem specifications, but also to produce the sys-tem source code skeleton following the paradigm ofModel Driven Engineering (MDE) (Schmidt, 2006;Kapitsaki et al., 2008). A previous work using mod-eling to analyze the system design and propose secu-rity and privacy controls that can improve it can befound in (Ahmadian et al., 2018). In this framework,a privacy impact assessment (PIA) methodology is in-troduced. Other works rely on ontologies and model-ing in the Web Ontology Language (OWL) or the Re-source Description Framework (RDF), such as LinkedUSDL Privacy (Kapitsaki et al., 2018). However, dueto their semi-formal nature and lack of precise seman-tics, modeling languages as the above do not supportthe formal verification of property satisfaction of sys-tem models.

Formal methods aim to address this challenge byproviding languages and tools for the rigorous speci-fication, development and verification of systems. Inthis paper, we focus on one such formalism, namelythe π-calculus and we argue that it can provide thebasis of a modeling language for software systemsthat may additionally support verification, includingthe validation of conformance to privacy requirementsfrom the design phase of a software system thereforefulfilling the Data Protection by Design GDPR re-quirement. The π-calculus has been used for formal-izing aspects of the UML modeling language (Lam,2008), as well as for transforming the basic work-flow patterns of Business Process Model and Nota-tion (BPMN) into equivalent π-calculus code (Bous-setoua et al., 2015). It has also formed the theo-retical basis of the Business Process Modeling Lan-guage (BPML) (Havey, 2005) and of Microsoft’sXLANG (Thatte, 2001).

2.3 Modeling Software in a π-calculusbased Language

The π-calculus is a simple, concise but very expres-sive process calculus that can describe concurrent pro-cesses whose configuration may evolve during com-putation, such as mobile and web applications (Mil-

Table 1: Privacy Calculus Basic Syntax.

Functionality π-Calculus Term

Concurrency P |Q (1)Input c?(x).P (2)

Output c!〈v〉.P (3)Restriction (ν n)P (4)Replication ∗P (5)Matching if e then P else Q (6)

Nil (stopped) process 0 (7)Store r . [ι⊗δ] (8)

Systems G[P] | G[S] | S ||S | (ν n)S (9)

ner et al., 1992). It is associated with formal se-mantics which precisely define the behavior of a sys-tem, based on which dynamic (e.g. model checking)and static (e.g. type checking) analyses of systemsproperties can be performed (Sangiorgi and Walker,2003). A software system model expressed in a π-calculus based language will include representationsof all components and entities of the system as pro-cesses that are able to interact and communicate witheach other through communication channels. Chan-nels can by dynamically created and even sent fromone process to another creating new communicationlinks between the processes, thus enabling the dy-namic evolution of a system.

The core functionalities provided by the basic π-calculus syntax are summarized in Table 1, lines (1)-(7). Concurrent composition P |Q allows the parallelexecution of processes, i.e. system entities. Simi-larly to software systems, concurrent processes mayproceed independently, each executing its own code,while communicating with each other on shared chan-nels referred to as names. In its basic form, the π-calculus supports synchronous communication whichoccurs through input and output of messages on chan-nels. The input term, c?(x).P, describes a processwaiting to receive a message on channel c, result-ing in variable substitution storing the message inx, before proceeding as P. The output construct,c!〈v〉.P, describes a process sending v as a messageon channel c and then proceeds as P. Restriction,or new-name creation, (ν n)P, allows for the allo-cation of a new name n inside the scope of processP. Replication, ∗P, allows the creation of multiplecopies of process P. Matching or conditional con-struct, if e then P else Q, can evolve as P if thecondition e is evaluated as true, or as Q otherwise. Fi-nally, the terminated process, 0, defines a process thathas no interactions.

The operational semantics of the π-calculus con-sists of a set of rules that describe the behavior ofeach of the above constructs, thus allowing to con-struct the possible executions of a defined system. For

ENASE 2019 - 14th International Conference on Evaluation of Novel Approaches to Software Engineering

70

instance, in the case of the parallel composition con-struct, it specifies that whenever two processes areable to send/receive a message respectively on thesame channel, then computation proceeds by the ex-change of the message in a synchronized step afterwhich the processes will evolve to their subsequentstate. As an example, consider the following com-munication implementing the transmission of value42 on channel a between two concurrent processes,where −→ captures the evolution and where τ speci-fies the name of the transition being an internal com-munication.

a!〈42〉.P |a?(x).Q τ−→ P |Q{42/x}

Note that, on the receiving side after the interactionthe value 42 is substituted in the process continuationon variable x as described by the notation Q{42/x}.

2.4 The Privacy Calculus

Many different extensions of the π-calculus have beenconsidered in the literature. Among these works, nu-merous studies have focused on security and privacyproperties. One such extension created with the pur-pose of describing and analyzing cryptographic pro-tocols is the spi-calculus (Abadi and Gordon, 1997),whereas in (Cardelli et al., 2000) the π-calculus is ex-tended with the notion of groups as types for chan-nels which are used to statically prohibit the leakageof secrets. Type systems have also been employedin process calculi to reason about access control thatis closely related to privacy. For instance, the workon the Dπ-calculus has introduced sophisticated typesystems for controlling the access to resources ad-vertised at different locations (Hennessy et al., 2005;Hennessy and Riely, 2002; Hennessy, 2007). Further-more, discretionary access control has been consid-ered in (Bugliesi et al., 2009) which similarly to ourwork employs the π-calculus with groups, while role-based access control (RBAC) has been consideredin (Braghin et al., 2006; Dezani-Ciancaglini et al.,2010; Compagnoni et al., 2008). In addition, autho-rization policies and their analysis via type checkinghas been considered in a number of papers includ-ing (Fournet et al., 2007; Backes et al., 2008; Bengt-son et al., 2011).

In this paper, we extend the work of (Kouzapasand Philippou, 2017), where the authors propose aformal framework, the Privacy calculus, for captur-ing privacy requirements. This framework is basedon (Cardelli et al., 2000). Specifically, as shown inTable 1, lines 8 and 9, the Privacy calculus introducesthe notion of personal data, id⊗ c that associate iden-tities, id, with values, c. It also extends process terms

with stores of private data r . [id⊗ c], which are pro-cesses that store and provide access to private data.These stores are accessed through names, r, calledreferences. Processes are organized in groups, G[P],which associate each entity P with an identifier/role Gin the system.

As such, personal data as defined in the GDPR canbe represented in the Privacy calculus by associatingeach piece of plain data with an identifier value, id⊗c.The privacy calculus also enables the form ⊗ c toassociate data with a hidden identity. The collectionof stores in a system can be thought of as a databasetable, managed by a database manager process. Otherprocesses can hold a reference to a store, to access,read, and process the data. The type system ensuresthat data manipulation conforms to a system policy,which specifies how each process/role in the systemmay process private date.

Furthermore, the Privacy calculus provisions al-low the analysis of privacy-related requirements. Tobegin with, the notion of a group ensures secrecy.This is achieved by associating channels with groups,which are types for channels. No channel of a cer-tain group can be communicated to processes outsidethe scope of the group, as in (Cardelli et al., 2000).This task can be checked statically. The Privacy cal-culus goes a step further by associating processes withgroups, and uses the group memberships of processesto distinguish their roles within systems. The frame-work is accompanied by a type system for capturingprivacy-related notions and a privacy language for ex-pressing privacy policies. The policies enable to spec-ify how the various agents, who are referred to bytheir groups, or roles, may or may not handle the sys-tem’s sensitive data. In the current work we will notbe defining privacy policies in contrast to (Kouzapasand Philippou, 2017). Instead, the proposed valida-tion by type checking will specifically analyze con-formance to the specific GDPR principles and rightsunder discussion.

3 A SOFTWARE SYSTEMSPECIFICATION MODEL CASE

In order to present the rationale of our approach, as-sume that we want to build a part of an online bankingsystem with the following specifications:

1. The system waits to receive the following datafrom the user: her phone number.

2. The user’s phone number is classified as personaldata.


71

3. The system receives and saves the data providedby the user to the system database.

4. When a transaction occurs, a notification systemprocess reads the user’s phone number from thedatabase and sends a text message to the number.The system’s functionality is then terminated.

These specifications can be modeled in the Privacycalculus as follows. A user component will providea private data input. The system entities will receivethe input, manage the input’s storage, read the datastored, and manipulate them. Initial communicationbetween the user and the system is realized throughchannel c. Groups will be used in order to define thedistinct system entities, i.e. the interacting subsys-tems expressing different functionality. A user entityprocess, U , of group User, creates the private sessioncommunication channel a, and shares it with the sys-tem. Next, she waits to receive on channel a a storereference and, when received, she writes her phonenumber on it.

The functionalities of writing and reading datadirectly to or from the store are considered blackboxes. This is a typical practice during the modelingphase, when some aspects of a system are abstractedand only become concrete during the implementationphase. As such, the abstractions of writing and read-ing directly from the store will be implemented duringthe system development, for instance, by calling a re-spective RESTful service that offers an interface forcommunicating with the system data store. In addi-tion, these abstractions enable the Right of Access ofthe data subject to her private data as required by theGDPR. The behavior of the user can thus be modeledas follows:

U ::= (ν a)(User[c!〈a〉.a?(x).x!〈id⊗phone〉.P′])

The system entity DBase of group DBaseServiceis responsible for holding and managing the data stor-age, i.e. the user’s store. For the purpose of the ex-ample we will assume that a store already exists forthe specific user, filled with the user identifier and aplaceholder for the phone number:

DBase ::= DBaseService[r . [id⊗∗]]

Another system entity S1 of groupInterfaceService, already considered to be hold-ing a reference to the store, waits for an input fromthe user on the public communication channel c tobe used for establishing a private communicationsession between the two entities. It then proceedsto send the store reference to the user through thatchannel:

S1 ::= InterfaceService[c?(x).x!〈r〉.0]

Finally, the system entity S2 of groupNotificationService, also considered to be hold-ing the reference to the user’s store, takes as inputfrom the store the respective data, and uses it to senda text notification to that phone number. This processwill be triggered when a new transaction occurs andcan be modeled as follows:

S2 ::= NotificationService[r?(x⊗ y).y!〈notification〉.0]

The whole system is defined as the parallel com-position of the above processes comprising a systemof group

System ::= Sys[U |DBase |S1 |S2]

To further understand how the Privacy calculusmodels interact, we proceed to provide an executionof the system.

System τ−→ Sys[(ν a)(User[a?(x).x!〈id⊗phone〉.P′]| InterfaceService[a!〈r〉.0]) |DBase |S2]

τ−→ Sys[(ν a)(User[r!〈id⊗phone〉.P′]| InterfaceService[0]) |DBase |S2]

τ−→ Sys[(ν a)(User[P′] | InterfaceService[0])|DBaseService[r . [id⊗phone]] |S2]

τ−→ Sys[(ν a)(User[P′] | InterfaceService[0])|DBaseService[r . [id⊗phone]]

|NotificationService[phone!〈notification〉.0]]

The first step captures the synchronization be-tween the user process and S1, where the former sendthe fresh channel a through public channel c result-ing in a substitution within S1 of variable y by namea. In the second step, S1 sends the store reference rto the user process, with reference r substituting vari-able x and subsequently used by the user to updateher private information on the store managed by theDBaseService. After the interaction we can observethat the empty store now contains the phone of theuser. Finally, process S2 receives the phone numberof the user from the database and is ready to send anotification to the user via her phone number.

4 GDPR AND THE PRIVACYCALCULUS

The GDPR defines multiple principles and rights re-garding personal data collection, storage, and pro-cessing. While these principles were mostly wel-comed, various challenges have been recognized in


72

terms of effectively integrating the new requirementsinto computing infrastructures. Furthermore, a ques-tion that arises is how to verify that a system adheresto privacy requirements as enunciated by the GDPR.In this work, we address this challenge by providingan initial step towards a formal framework for rea-soning about privacy-related concepts. On the onehand, such a framework would provide solid foun-dations towards understanding the notion of privacywhile, on the other hand, it would provide the basisfor system modeling and analysis tools for privacy-respecting code. Beyond proposing the incorpora-tion of such a framework into a software’s engineer-ing methodology, we concentrate on GDPR Article 6,Lawfulness of Processing, and the requirements de-fined therein regarding consent, Article 7(3), ConsentWithdrawal and Article 17, The Right to Erasure. Wethen evaluate the Privacy calculus as a formalism toreason about the GDPR, we discuss necessary addi-tions to address the above articles and we demonstratethem with the use of examples. In the next section weformally define these extensions and present a staticmethodology for assessing the conformance of a sys-tem to these requirements.

4.1 Storing and Processing PersonalData

We recall that according to the first principle of Ar-ticle 5 (“Lawfulness, fairness and transparency”),Article 6 (“Lawfulness of Processing (1)”), and inRecital 40 (“Lawfulness of data processing”), pri-vate data may be collected, stored, and processed bya system, only if the users (data owners) have giventheir consent for the intended collection and process-ing. The consent should be given explicitly, freely andwillingly by the user, based on one or more specificpurposes stated and described in the privacy policy.

In order to reason about the above concepts, a for-malism should contain the notion of private data asa basic entity. This is already the case in the Privacycalculus, where private data are considered to be datastructures representing pieces of information relatedto individuals along with the identities of the associ-ated individuals. Furthermore, the formalism shouldprovide a clear definition of the notion of consent andensure that consent is given before any storage or pro-cessing of the data is performed. Naturally, this con-sent should be given by the owner of the data to beprocessed.

In order to enable the above, we propose the fol-lowing extension in Privacy calculus. Initially, thecalculus should enable the provision (and withdrawal)of consent, whereas the notion of a user’s identifier

should become a first-class entity to allow the verifi-cation that a consent (or withdrawal of consent) fora piece of private data is indeed given by the appro-priate party. To achieve this, we extend the Privacycalculus with (i) additional constructs for providingconsent, receiving consent and creating an associatedstore upon the receipt, and withdrawing consent andemptying the associated store, and (ii) the introduc-tion of special identifiers and their association withsystem processes. The implementation of the con-sent construct should ensure that provision of consentshould precede any storage of personal data in a store,except when the system is handling anonymized data.No processing of private data will be enabled beforethe consent of the user. As such, we associate thegrant of consent by a user with the creation of a storewithin the software. Once consent is obtained, thedata is saved within the store and its reference can becommunicated to other entities in the system, whichmay in turn process the data as needed by the softwarespecification and specified by its privacy policy. Fur-thermore, the store reference is shared with the con-senting entity, i.e. the user, giving the user direct ac-cess to the data associated with her within the systemat any point in time.

To ensure the association between a process entityand its associated data, a user process is annotated bya unique identifier value, {P}id, where id is embeddedwithin all communications requiring the matching be-tween a data owner and their data. Note that whilethe actions of providing consent and writing personaldata to the store are implemented by single constructswithin the calculus, they should be thought of asblack boxes including functionality that will be im-plemented in detail at coding level. We point out thatthis abstraction will need to be appropriately imple-mented for safeguarding identifiers and channel com-munication from potential hostile entities, possibly byembedding a communication protection mechanism,such as a public-private key cryptography scheme, asin (Abadi and Gordon, 1997). We illustrate the aboveconcepts by extending the example of Section 3.

Example 4.1. Section 3 model case with the follow-ing additions:

1. The user should provide consent for the storageand usage of her phone number by the systemprior to any storage and processing of the phonenumber by the system.

2. No store will be associated with the user withinthe system before the user explicitly provides herconsent.

To model these additions, we extend the descrip-tion of the various entities as follows.


73

User entity process U is annotated by its identifier.Furthermore, once a private session is achieved withthe database manager it provides its consent via chan-nel a while simultaneously receiving a reference toits newly-created store. This is achieved via constructa / consent(x). As in the previous version of the ex-ample, the user proceeds to write her data to the store.This is modeled as follows:

{U}id ::= (ν a)(User[c!〈a〉.a/ consent(x).x!〈id⊗phone〉.0])

Moving on to the process of receiving and stor-ing the private data of the user by the software sys-tem, in the proposed extension a new entity S1 ofgroup DBInterfaceService is defined, combining thedatabase and the interface service. This process dy-namically creates a new reference for a store and pro-vides it to the disposal of the user as soon as she givesher consent (the actual store created is not present inthe definition below since it will be dynamically cre-ated once consent is given, as implemented by the se-mantics of our calculus presented in the next section).The fresh store reference needs also to be communi-cated to the notification service, which is the purposeof the communication on channel d below. This ismodeled as follows:

S1 ::= DBInterfaceService[c?(y).(ν r)(y. consent(r).d!〈r〉.0)]

System process S2 receives the reference to thestore and then continues to perform its task as in theprevious example:

S2 ::= NotificationService[d?(p).p?(x⊗ y).y!〈notification〉.0]

The system definition as a whole is comprised ofthe parallel composition of the three processes:

System ::= Sys[{U}id |S1 |S2]

4.2 Withdrawing Consent andForgetting Personal Data

The Right to Erasure is stated in Article 17 of theGDPR, defining the right of a user to have her per-sonal data erased from a system without any unduedelay, if one of multiple reasons stated is applied, in-cluding consent withdrawal. As defined in GDPR Ar-ticle 7 (3): ”the data subject shall have the right towithdraw his or her consent at any time”. In this case,the individual’s personal data should be deleted or be-come anonymized in a way that the data subject canno longer become identified by them.

In this work, we examine consent withdrawal asthe cause for erasure enforcement, though the ma-chinery can be easily applied to other erasure causes.In this case, we may assume that consent withdrawaloriginates directly from the user process and, givensuch a directive, the software is obliged to forget theuser’s data. Thus in the Privacy calculus, when adeletion instruction is given, the associated store willbe emptied with the consequence of having its refer-ence leading to no data. Subsequently, no data will beavailable to any entity possessing a reference to thestore and attempting to read or write to it. The abovementioned functionality is implemented by appropri-ate rules added to the Privacy calculus semantics, andspecifying an abstraction of a GDPR-compliant be-havior to be appropriately instantiated in the imple-mentation process.

Example 4.2. Let us consider Example 4.1 where auser provided consent and sent her phone number toa system. To implement the Right to Erasure, let usassume that the user has the right at a later stage towithdraw her consent. As such, the example must beexpanded with the following specifications:

1. The user should be able to withdraw her consent.2. When a user’s consent is withdrawn all her per-

sonal data should be deleted.3. No further storage or processing of the specific

user’s personal data is allowed unless the userprovides again her consent.

The model of a user process {U}id as in Example4.1 who withdraws her consent at some point aftersubmitting her data can be modeled as follows:

{U}id ::= (ν a)(User[c!〈a〉.a/ consent(x).x!〈id⊗phone〉.x/withdraw.0])

The semantics of consent withdrawal instructionensure that such an instruction will synchronize withthe associated store resulting in the automatic dele-tion of all content within the store causing all attemptsto process the store futile (again through appropriateenunciation of the semantics rules for store process-ing). Note that system entities S1 and S2 require nochange in their definitions. Furthermore, the entiresystem’s definition remains unaltered.

5 THE EXTENDED PRIVACYCALCULUS

In this section we describe formally the Extended Pri-vacy calculus and its operational semantics and wepropose a type-checking methodology for ensuring


74

Table 2: Extended Syntax of the Privacy Calculus.

Name Definition

(identity values) ι ::= id | | x(data values) δ ::= c | ∗ | x(private data) ι⊗δ where ι 6= x⇒ δ = c and ι = x⇒ δ = y

(identifiers) u ::= a | r | x(terms) t ::= a | r | ι⊗δ | d | x(constants) v ::= a | r | id⊗d | d(placeholders) k ::= x | x⊗ y | ⊗ x

(processes) P ::= 0 | u!〈t〉.P | u?(k).P | (ν n)P| P |P | ∗P | if e then P else P| r . [ι⊗δ] | u/ consent(x).P| u. consent(r).P | u/withdraw.P

(systems) S ::= G[P] | G[{P}id] | G[S] | S ||S | (ν n)S

that well-typed processes respect the GDPR require-ments pertaining to consent, as discussed in the previ-ous section. Due to the lack of space we only presentthe extensions we propose to the Privacy calculus andwe refer the reader to (Kouzapas and Philippou, 2017)for its complete exposition.

5.1 Formal Definition

The extended syntax of the Privacy calculus proposedfor our modeling scheme can be found in Table 2. Thesyntax first defines the values used. Assume a set ofconstants c ∈ C, and a set of variables x,y ∈ V . Aspecial case of values denoted by ι refer to identities.Symbol ι ranges over (i) identities, id, (ii) the hiddenidentity, , denoting anonymized data, and (iii) vari-ables x. Data values ranged over by δ include con-stants, c, the empty constant, ∗, and variables. Privatedata ι⊗δ associate identities id with data c. Variableplaceholders for private data are written as x⊗ y.

The set of identifiers used for the definition of thePrivacy calculus are defined next. Assume a set ofnames n ∈ N that are partitioned over names, a, andstore references, r. We use meta-variable u to rangeover names or variables. Values, ranging over v, in-clude names, private data, or data; placeholders, rang-ing over k, include the variable structures of the calcu-lus, whereas terms, ranging over t, include both val-ues and placeholders.

The syntax of processes follows. It includesthe Privacy calculus constructs together with con-structs for creating private data stores upon consentand emptying private data stores upon consent with-drawal. Term u/consent(x).P defines the term, wherea process provides with consent on creating a newstore on channel u. The dual operation is term u .consent(r).P, where the process receives a consentrequest on channel u and creates a new store on ref-

erence r. During such an interaction the consentingprocess (u / consent(x).P) will receive the new storereference on variable x. Term, u/withdraw.P denotesa process ready to withdraw consent on a store refer-ence u.

The syntax of systems is also extended with sys-tem G[{P}id] that denotes a system that runs on aprocess with an identity. Processes with identitiesare used to abstract individual users within a system,whose private data are being processed.

The computation steps of the Privacy calculus aredefined using a relation called labeled transition re-lation. Specifically, we write S `−→ S′ to denote thata system S can take a step and evolve into system S′

by executing the action indicated by label `. When-ever two parallel processes/systems can execute twodual actions then they synchronize with each otherand their composition executes a computation inter-action, indicated by τ, as follows:

S1`1−→ S′1 S1

`2−→ S′2 `1 dual `2

S1 |S2τ−→ S′1 |S′2

The labels of the Privacy calculus are extendedwith the labels:

a/ consent(r) a/ consent(r)@id a. consent(r)@id

r /withdraw r /withdraw@id r .withdraw@id

where

a/ consent(r)@id dual a. consent(r)@id

r /withdraw@id dual r .withdraw@id

Label a/ consent(r) denotes the basic action for pro-viding consent on channel a and receiving a refer-ence on channel r, whereas label a / consent(r)@idis the same action lifted to a user identity, id. Du-ally action a . consent(r)@id is the acceptance of aconsent on channel a and the creation of a new storewith reference r and identity id. Withdraw labels arer / withdraw that denotes a withdraw on referencer; r /withdraw@id that lifts a withdraw at the userlevel; and r .withdraw@id that denotes the receipt ofa withdraw on reference r with identity, id.

The labeled transition semantics definition fol-lows in Table 3. The first rule extends structuralcongruence with a garbage collection rule to collectstores that do not contain useful private data. Rule[SConsentU1] describes the interaction of a processgiving consent on channel a; the process observes thea / consent(r) label. Next, rule [SConsentU2] lifts aconsent action to a process associated with an identity,using label a/ consent(r)@id. The dual action of re-ceiving consent is done via label a . consent(r)@id.After the label is observed a new store, r . [id⊗ ∗]


75

Table 3: Additions to Privacy Calculus Labeled Transition Semantics.

Rule Name Structure

[GarbageC] (ν r)(r . [ ⊗∗]≡ 0)

[SConsentU1] a/ consent(x).Pa/consent(r)−→ P{r/x}

[SConsentU2] Pa/consent(r)−→ P′

{P}ida/consent(r)@id−→ {P′}id

[SConsentS1] a. consent(r).Pa.consent(r)@id−→ P |r . [id⊗∗]

[SWithdraw1] r /withdraw.P r/withdraw−→ P

[SWithdraw2] Pr/withdraw−→ P′

{P}idr/withdraw@id−→ {P′}id

[SWithdraw3] r . [id⊗ c] r.withdraw@id−→ r . [id⊗∗]

[PId] P `−→P′ 6=a/consent(r),r/withdraw

{P}id`−→{P′}id

[SId] {P}id`−→{P}id

S[{P}id]`−→S[{P}id]

[SOut2] r . [ ⊗∗] r!〈 ⊗∗〉−→ r . [ ⊗∗]

[SInp2] r . [ ⊗∗] r?(id⊗c)−→ r . [ ⊗∗]

is created on reference r with identity id. Follow-ing the dual definition of labels, a process that givesconsent interacts with a process that receives consentto create a new store and exchange the correspond-ing reference. Withdraw follows a similar fashion.A process with a withdraw prefix observes a with-draw label r /withdraw (rule [SWithdraw1]), and itis lifted to a user process via label r /withdraw@id(rule [SWithdraw2]). Finally, rule [SWithdraw3] ob-serves a store receiving a withdraw request via labelr . withdraw@id and as a result it deletes the cor-responding private data and assigns the anonymousidentity and the empty value to its memory. The nexttwo rules bridge the gap between the new process{P}id and System S[{P}id] and the rest of the calcu-lus. Finally, two new rules define the interaction ofthe empty store.

5.2 Typing GDPR Compliance

As we have already discussed, the objective of thiswork is to propose a formal framework for modelingsoftware systems and associated analysis techniquesfor guaranteeing that systems comply to privacy-related requirements. In this section, we discuss suchan analysis technique and we show that the intendedrequirements we have discussed regarding the pro-

vision and withdrawal of consent can be staticallychecked by typechecking.

For instance, consider the following system:

User[a!〈id⊗ c〉.P] ||Sys[a?(x⊗ y).Q]

This system describes the situation, where pri-vate data are sent (resp. received) via a non-referencechannel. However, this leads to the disseminationand processing of private data without proper permis-sion and consent and, while it is a legal process, itclearly fails to satisfy the GRPR requirements. Notethat exchange of private data can instead be achievedby communicating the reference to the associated pri-vate data store. A similar error would have occurred,if upon acquisition of private data from a store, asubcomponent of a system created a new store andthereby copied the data. However, this would lead to asituation, where the withdrawal of consent by the userwould not reach the newly-defined store and thus, thesystem would continue to hold the data violating theright of the user for her data to be forgotten. Thissuggests that copying of private data should (if car-ried out) be done with care and ensuring that a link ismaintained between the various copies of user data.In the sequel, we propose an approach for recognizingthese and other undesirable situations via typecheck-ing, and we prove that well-typed processes do notpresent error, such as the ones discussed above. We


76

Table 4: Typing Rules.

Rule Structure

[TCons] Γ,x:G[t[g]];Λ`P Γù:G′[G[t[g]]]Γ;Λù/consent(x).P

[TStore] Γ,r:G[t[g]];Λ`P Γù:G′[G[t[g]]]Γ;Λ,rù.consent(r).P

[TWithdraw] Γ;Λ`P Γù:G[t[g]]Γ;Λù/withdraw.P

[TId] Γ;Λ`PΓ;Λ,id`{P}id

first define the set of types that are used to character-ize the channels and the values that are used by thetyping system. A type, T , is of the following form:

g ::= nat | bool | (g1, . . . ,gn) | . . .

T ::= t[g] | G[T ] | g

A type g is a primitive data type, e.g. natural num-bers and boolean, or a structure of primitive data(g1, . . . ,gn). A type t[g] is used to type private datavalues id⊗ c, where constant c is of type g. Channelsare typed the G[T ] type, that denotes a channel thatcan only send channels or values of type T betweenparticipants beloning to group G.

Our type system extends the one of the Privacycalculus by the rules in Table 4. It is based on thenotion of a typing context defined as follows:

Γ ::= Γ, t : T | /0 Λ ::= Λ,r | Λ, id | /0

A typing context Γ, or shared typing environment,maps values and variables to types. Operator Γ1,Γ2is defined as Γ1 ∪Γ2. A typing context Λ, or lineartyping environment, includes store references r andidentifiers id. It is used to track names in a system andensure that store references and identities are uniquein a system. Operator Λ1,Λ2 is defined as Λ1 ∪Λ2,whenever Λ1∪Λ2 = /0 and undefined otherwise. Typ-ing judgments are of the following form:

Γ ` t : T Γ;Λ ` P Γ;Λ ` S

The first judgment states the a meta-variable t has atype T following a typing context. The second judg-ment states that a process P is well typed given inthe typing contexts Γ and Λ. Similarly for the typingjudgment for systems.

The first typing rule in Table 4 checks for the cor-rect usage of the u/consent(x).P term. It first checksthat process P is correctly typed under a typing envi-ronment. It then checks whether the type of variablex can carry private data, i.e. it is of store referencetype. Moreover, it checks that the type of channel ucan carry an x variable type. In the premise of therule variable x is not present because x is bounded in

process P. Rule [TStore] follows a similar logic tothe previous rule. However, it is checked that name rdoes not already have a store present in P by addingr in environment Λ. If r was already present in Λ itmeans that a store (or a consent for a store) is alreadypresent in process P. Rule [TWithdraw] performs acheck that a withdraw can be performed on a channelthat carries private data, i.e. it is a reference channel.The final rule, rule [TId], types a process which isequipped with an identity. The rule tracks the identityin the Λ environment. The presence of the identity inthe Λ environment ensures that it cannot be used byanother process as an identity in the system, becauseΛ cannot be composed with another linear environ-ment that contains the same identity.

A main property of the system is that the execu-tion of computations steps by a statically-typed sys-tem, yields systems that are also well typed.

Theorem 5.1 (Type Preservation). Let S be a systemsuch that Γ;Λ ` S for some Γ and Λ. Whenever S−→S′ then there exists Γ′,Λ′ such that Γ′;Λ′ ` S′.

Type preservation is an important theorem thatstates the soundness of the typing system. Moreover,as we discussed above, the cases of mishandling pri-vate data are not allowed in statically checked pro-cesses. This is formally given by the a type safetytheorem. First, we define the notion of the error pro-cess, which are processes that mishandle private data.

Definition 5.1 (Error Process). A process P is an er-ror process whenever it is of one of the forms:

r . [ι⊗δ] |r . [ι′⊗δ′],

a. consent(x).P if r ∈ fn(P),a!〈id⊗ c〉.P, a?(x⊗ y).P,

r!〈v〉.P, and r?(x).P

A system is an error system whenever, either it is ofthe form G[P] and P ≡ (ν n)(P1 | . . . |Pn) and ∃i,1 ≤i ≤ n such that Pi is an error processes; or of theform G[S] and S is an error system; or of the form(ν n)(S1 || . . . ||Sn) and ∃i,1≤ i≤ n such that Si is anerror system.

The class of error systems captures cases wherea system mishandles private data. The first case iswhere a system defines more than one stores on thesame reference. This can lead to situations where thetwo stores have inconsistent private data.

The next case defines the situation, where onestore is created with consent and another without con-sent. For example, in the system:

Sys[a. consent(r).(P |r . [id⊗ c])]

a second store will be created without consent.


77

The next two cases define the situation, where pri-vate data are sent (resp. received ) via a non-referencechannel with the possibility of leading to data pro-cessing without consent from the user.

The final two cases characterized as errors is whennon private data are sent (resp. received) via a refer-ence, e.g. the system

User[r!〈a〉.P] ||Sys[r . [id⊗ c]]

cannot observe any meaningful interaction on refer-ence r.

Based on a sound type system we can statically in-fer that a system is safe with respect to handling pri-vate data, i.e. a statically-checked system will neverreduce to an error.

Theorem 5.2 (Type Safety). Let system S be a systemsuch that Γ ` S for some Γ. Whenever S−→∗ S′ thenS′ is not an error system.

The proof follows from the fact that error sys-tems cannot be successfully type checked and Theo-rem 5.1, that states that reduction preserves the typingproperty.

6 CONCLUSIONS

In this paper, we have extended the Privacy calculusof (Kouzapas and Philippou, 2017) with features tosupport the granting and withdrawal of consent byusers, and for checking associated requirements asenunciated by the GDPR. While we have not con-sidered privacy policies as in (Kouzapas and Philip-pou, 2017), where type checking is performed to en-sure satisfaction of requirements pertaining to the pur-pose of usage, as well as the permissions endowedto each user in the system, we believe that the twoapproaches can easily be integrated being orthogonalto each other. Our vision is that a π-calculus-basedframework can be developed and used by softwareengineers to model systems during the design phase,as well as in order to analyze and validate privacy-related properties, such as the GDPR-based provi-sions of Lawfulness of Processing, and the Right toErasure and Consent Withdrawal. Analyzing and val-idating a system’s specifications design model is a re-quired step of software system creation. The mod-eling scheme introduced will provide information onconformance or may reveal any infringements found.We point out however, that this process cannot guar-antee that the software engineers will implement thesystem to precisely conform to its specifications de-sign, unless MDE techniques relying on the Privacycalculus are employed.

As future work, we intend to embed more GDPRprovisions into the Privacy calculus. We will alsowork towards offering tools for easily expressinga system’s specifications model in π-calculus-basedformalism. Furthermore, tools for static checkingactual code implementations, and tools for formallytranslating verified models to verified developed sys-tems can be created, thus assisting the transition fromthe design phase to the development phase, by ex-ploiting the ability of type-checking techniques at thecoding level. Relevant work, that does not capture pri-vacy requirements, but was developed in the contextof the π-calculus and applies automated static analysistechniques to software code can be found in (Yoshidaet al., 2013; Ng et al., 2015). Moreover, in futurework we will focus on the notion of purpose by de-scribing and validating purpose-based properties us-ing a purpose-based, customizable privacy policy lan-guage, providing also a mechanism for privacy policyvalidation as embedded in the Privacy calculus. Ourultimate goal is to provide a holistic approach to pri-vacy management in software system development.

REFERENCES

Abadi, M. and Gordon, A. D. (1997). A calculus for crypto-graphic protocols: The spi calculus. In Proceedings ofthe 4th ACM Conference on Computer and Communi-cations Security, pages 36–47. ACM.

Ahmadian, A. S., Struber, D., Riediger, V., and Jurjens,J. (2018). Supporting privacy impact assessment bymodel-based privacy analysis. In ACM Symposium onApplied Computing, pages 1142–1149.

Backes, M., Hritcu, C., and Maffei, M. (2008). Type-checking zero-knowledge. In Proceedings of the 2008ACM Conference on Computer and CommunicationsSecurity, CCS 2008, pages 357–370.

Basso, T., Montecchi, L., Moraes, R., Jino, M., andBondavalli, A. (2015). Towards a uml profile forprivacy-aware applications. In Proceedings of theIEEE International Conference on Computer and In-formation Technology; Ubiquitous Computing andCommunications; Dependable, Autonomic and Se-cure Computing; Pervasive Intelligence and Com-puting (CIT/IUCC/DASC/PICOM, 2015), pages 371–378. IEEE.

Bengtson, J., Bhargavan, K., Fournet, C., Gordon, A. D.,and Maffeis, S. (2011). Refinement types for secureimplementations. ACM Transactions on Program-ming Languages and Systems, 33(2):8.

Boussetoua, R., Bennoui, H., Chaoui, A., Khalfaoui, K.,and Kerkouche, E. (2015). An automatic approach totransform bpmn models to pi-calculus. In Proceedingsof the International Conference of Computer Systemsand Applications (AICCSA, 2015), pages 1–8. IEEE.

Braghin, C., Gorla, D., and Sassone, V. (2006). Role-based


78

access control for a distributed calculus. Journal ofComputer Security, 14(2):113–155.

Bugliesi, M., Colazzo, D., Crafa, S., and Macedonio, D.(2009). A type system for discretionary access con-trol. Mathematical Structures in Computer Science,19(4):839–875.

Cardelli, L., Ghelli, G., and Gordon, A. D. (2000). Secrecyand group creation. In International Conference onConcurrency Theory, pages 365–379. Springer.

Cavoukian, A. (2008). Privacy by design. InformationCommissioner’s Office.

Compagnoni, A. B., Gunter, E. L., and Bidinger, P. (2008).Role-based access control for boxed ambients. Theo-retical Computer Science, 398(1-3):203–216.

Data Protection and Privacy Commissioners (2010). Res-olution on privacy by design. In Proceedings ofICDPPC’10.

Dezani-Ciancaglini, M., Ghilezan, S., Jaksic, S., and Pan-tovic, J. (2010). Types for role-based access controlof dynamic web data. In Proceedings of WFLP’10,LNCS 6559, pages 1–29. Springer.

European Parliament and Council of the European Union(2015). General data protection regulation. OfficialJournal of the European Union.

Fournet, C., Gordon, A., and Maffeis, S. (2007). A typediscipline for authorization in distributed systems. In20th IEEE Computer Security Foundations Sympo-sium, CSF 2007, 6-8 July 2007, Venice, Italy, pages31–48.

Fowler, M. (2004). UML distilled: a brief guide to the stan-dard object modeling language. Addison-Wesley Pro-fessional.

Gjermundrød, H., Dionysiou, I., and Costa, K. (2016).privacytracker: A privacy-by-design gdpr-compliantframework with verifiable data traceability controls.In Proceedings of the International Conference onWeb Engineering, pages 3–15. Springer.

Havey, M. (2005). Essential business process modeling. ”O’Reilly Media, Inc.”.

Hennessy, M. (2007). A distributed Pi-calculus. CambridgeUniversity Press.

Hennessy, M., Rathke, J., and Yoshida, N. (2005). safedpi:a language for controlling mobile code. Acta Infor-matica, 42(4-5):227–290.

Hennessy, M. and Riely, J. (2002). Resource access controlin systems of mobile agents. Information and Compu-tation, 173(1):82–120.

Hes, R. and Borking, J. (1998). Privacy enhancing tech-nologies: the path to anonymity. ISBN, 90(74087):12.

Hintze, M. and LaFever, G. (2017). Meeting upcoming gdprrequirements while maximizing the full value of dataanalytics.

Hustinx, P. (2010). Privacy by design: deliveringthe promises. Identity in the Information Society,3(2):253–255.

Jurjens, J. (2002). Umlsec: Extending uml for securesystems development. In Proceedings of the Inter-national Conference on The Unified Modeling Lan-guage, pages 412–425. Springer.

Kalloniatis, C., Kavakli, E., and Gritzalis, S. (2008). Ad-dressing privacy requirements in system design: thepris method. Requirements Engineering, 13(3):241–255.

Kapitsaki, G., Ioannou, J., Cardoso, J., and Pedrinaci, C.(2018). Linked usdl privacy: Describing privacy poli-cies for services. In 2018 IEEE International Confer-ence on Web Services (ICWS), pages 50–57. IEEE.

Kapitsaki, G. M., Kateros, D. A., Pappas, C. A., Tselikas,N. D., and Venieris, I. S. (2008). Model-driven devel-opment of composite web applications. In Proceed-ings of the 10th International Conference on Informa-tion Integration and Web-based Applications & Ser-vices, pages 399–402. ACM.

Kapitsaki, G. M. and Venieris, I. S. (2008). Pcp: privacy-aware context profile towards context-aware applica-tion development. In Proceedings of the 10th Inter-national Conference on Information Integration andWeb-based Applications & Services, pages 104–110.ACM.

Kouzapas, D. and Philippou, A. (2017). Privacy by typing inthe π-calculus. Logical Methods in Computer Science,13(4).

Lam, V. S. (2008). On π-calculus semantics as a formalbasis for uml activity diagrams. Prooceedings of theInternational Journal of Software Engineering andKnowledge Engineering, 18(04):541–567.

Milner, R., Parrow, J., and Walker, D. (1992). A calculusof mobile processes, parts I and II. Information andComputation, 100(1):1–77.

Ng, N., de Figueiredo Coutinho, J. G., and Yoshida, N.(2015). Protocols by default - safe MPI code gener-ation based on session types. In Proceedings of Inter-national Conference on Compiler Construction, CC2015, pages 212–232.

Perera, C., McCormick, C., Bandara, A. K., Price, B. A.,and Nuseibeh, B. (2016). Privacy-by-design frame-work for assessing internet of things applications andplatforms. In Proceedings of the 6th InternationalConference on the Internet of Things, pages 83–92.ACM.

Rubinstein, I. S. (2011). Regulating privacy by design.Berkeley Technology Law Journal, 26:1409.

Sangiorgi, D. and Walker, D. (2003). The pi-calculus: aTheory of Mobile Processes. Cambridge UniversityPress.

Schmidt, D. C. (2006). Model-driven engineer-ing. COMPUTER-IEEE COMPUTER SOCIETY-,39(2):25.

Thatte, S. (2001). Xlang: Web services for business processdesign. Microsoft Corporation, 2001.

Yoshida, N., Hu, R., Neykova, R., and Ng, N. (2013). TheScribble protocol language. In Proceedings of TGC2013, Revised Selected Papers, pages 22–41.


79

A Formal Modeling Scheme for Analyzing a Software System ...

Documents

Transcript of A Formal Modeling Scheme for Analyzing a Software System ...