Flexibility vs. Security in Linked Enterprise Data Access Control Graphs

11
Journal of Information Assurance and Security. ISSN 1554-1010 Volume 9 (2014) pp. 093-103 c MIR Labs, www.mirlabs.net/jias/index.html Flexibility vs. Security in Linked Enterprise Data Access Control Graphs Markus Graube 1 , Patricia Ortiz 2 , Manuel Carnerero 3 , Oscar L ´ azaro 2 , Mikel Uriarte 3 , Johannes Pfeffer 1 and Leon Urbas 1 1 Chair for Process Control Systems Engineering, Technische Universit¨ at Dresden, Germany [email protected], [email protected], [email protected] 2 R&D, ICT Unit, Innovalia Association S.A., Bilbao, Spain [email protected], [email protected] 3 R&D, Nextel S.A., Bilbao, Spain [email protected], [email protected] Abstract: Linked Data offers easy extensibility and interoper- ability of information spaces. This provides a great potential for industrial companies allowing to share information with part- ners in a virtual enterprise. Hence, together they can become faster and more flexible which results in an advantage in the market. However, there is still the barrier to protect own infor- mation with a fine grain. Access control graphs are an approach for this issue. Information is put into different views by exe- cuting inference mechanisms on role-based policy rules. After- wards queries are automatically rewritten at runtime in order to match the generated views and provide only data from views that should be accessible by the authenticated role. This paper demonstrates the balance between flexibility and security using this approach. The amount and complexity of the policy rules are highly dependent on the information model used. However, a moderate restriction of the huge flexibility in the information modeling allows for few rules, but those are powerful ones. Ad- ditionally, the approach can also be leveraged for consistency checking of Linked Data information structures. Thus, clients can rely on these data invariants and the information provider can rely on the fact that fine grained access is granted. Keywords: Security; Flexibility; Semantic Web; Linked Data; S- PARQL; Access Control; Named Graphs I. Introduction In order to become competitive within the global market, an increase of efficiency is no longer sufficient for companies. They need to improve their cross-links to relevant collabora- tion partners. In fact, inter-enterprise collaboration through intense data sharing has become essential, especially for s- mall and medium-size enterprises, so as to acquire a crit- ical mass of resources and competencies. Communication and exchange of information are no longer instruments for process control but are key drivers of business performance, which can lead to the transformation of isolated individual companies towards an agile virtual enterprise. The explosion of the Semantic Web in recent years [1] has provided the opportunity to develop advanced technology en- ablers to support new inter-organizational collaboration mod- els towards the creation of virtual enterprises. More pre- cisely, the ComVantage 1 project explores the capabilities of Linked Data (LD) as a flexible and fast unifying approach to provide access to the data vaults of all stakeholders of a vir- tual enterprise by means of the creation of a product-centric collaboration space. Linked Data is an interesting candidate for enterprise information integration offering two main ad- vantages in comparison to other approaches [2]: The flexibility of Linked Data makes it possible to em- ploy a model-as-you-use approach starting with a small subset of data and easy extensibility. The semantic lifting of enterprise data towards Linked Enterprise Data (LED) is an enabler for the exploitation of the potential of semantic technologies in production. However, this approach comes with a severe disadvantage: unlimited openness and flexibility are not compatible with rigorous access control (AC) approaches to security. Indeed, the realization of such collaboration space through Linked Data poses significant challenges in terms of providing the right security controls to ensure effective access control man- agement to distributed linked information data sources that create the specific product-centric information spaces shared by the enterprises engaged in collaboration. The focus of this paper is to describe the current progress achieved with the ComVantage project in designing a securi- ty model which will enable agile inter-organization collabo- ration while keeping a balance between security and flexibil- ity using Linked Data in industrial environments. The paper will discuss and present an enhanced multi-domain security access control approach based on intelligent structuring of enterprise data through Linked Data and innovative SPAR- QL query rewriting capabilities. Some proposals in order to increase the flexibility within this security model will be pre- sented and discussed. Section 2 of this paper states the need for access control in virtual companies and provides an overview of access control 1 EU FP7 IP Project “Collaborative Manufacturing Network for Compet- itive Advantage”: www.comvantage.eu MIR Labs, USA

Transcript of Flexibility vs. Security in Linked Enterprise Data Access Control Graphs

Journal of Information Assurance and Security.ISSN 1554-1010 Volume 9 (2014) pp. 093-103c©MIR Labs, www.mirlabs.net/jias/index.html

Flexibility vs. Security in Linked Enterprise DataAccess Control Graphs

Markus Graube1, Patricia Ortiz2, Manuel Carnerero3, Oscar Lazaro2, Mikel Uriarte3, Johannes Pfeffer1 and Leon Urbas1

1Chair for Process Control Systems Engineering, Technische Universitat Dresden, [email protected], [email protected], [email protected]

2R&D, ICT Unit, Innovalia Association S.A., Bilbao, [email protected], [email protected]

3R&D, Nextel S.A., Bilbao, [email protected], [email protected]

Abstract: Linked Data offers easy extensibility and interoper-ability of information spaces. This provides a great potential forindustrial companies allowing to share information with part-ners in a virtual enterprise. Hence, together they can becomefaster and more flexible which results in an advantage in themarket. However, there is still the barrier to protect own infor-mation with a fine grain. Access control graphs are an approachfor this issue. Information is put into different views by exe-cuting inference mechanisms on role-based policy rules. After-wards queries are automatically rewritten at runtime in orderto match the generated views and provide only data from viewsthat should be accessible by the authenticated role. This paperdemonstrates the balance between flexibility and security usingthis approach. The amount and complexity of the policy rulesare highly dependent on the information model used. However,a moderate restriction of the huge flexibility in the informationmodeling allows for few rules, but those are powerful ones. Ad-ditionally, the approach can also be leveraged for consistencychecking of Linked Data information structures. Thus, clientscan rely on these data invariants and the information providercan rely on the fact that fine grained access is granted.Keywords: Security; Flexibility; Semantic Web; Linked Data; S-PARQL; Access Control; Named Graphs

I. Introduction

In order to become competitive within the global market, anincrease of efficiency is no longer sufficient for companies.They need to improve their cross-links to relevant collabora-tion partners. In fact, inter-enterprise collaboration throughintense data sharing has become essential, especially for s-mall and medium-size enterprises, so as to acquire a crit-ical mass of resources and competencies. Communicationand exchange of information are no longer instruments forprocess control but are key drivers of business performance,which can lead to the transformation of isolated individualcompanies towards an agile virtual enterprise.The explosion of the Semantic Web in recent years [1] hasprovided the opportunity to develop advanced technology en-ablers to support new inter-organizational collaboration mod-

els towards the creation of virtual enterprises. More pre-cisely, the ComVantage1 project explores the capabilities ofLinked Data (LD) as a flexible and fast unifying approach toprovide access to the data vaults of all stakeholders of a vir-tual enterprise by means of the creation of a product-centriccollaboration space. Linked Data is an interesting candidatefor enterprise information integration offering two main ad-vantages in comparison to other approaches [2]:

• The flexibility of Linked Data makes it possible to em-ploy a model-as-you-use approach starting with a smallsubset of data and easy extensibility.

• The semantic lifting of enterprise data towards LinkedEnterprise Data (LED) is an enabler for the exploitationof the potential of semantic technologies in production.

However, this approach comes with a severe disadvantage:unlimited openness and flexibility are not compatible withrigorous access control (AC) approaches to security. Indeed,the realization of such collaboration space through LinkedData poses significant challenges in terms of providing theright security controls to ensure effective access control man-agement to distributed linked information data sources thatcreate the specific product-centric information spaces sharedby the enterprises engaged in collaboration.The focus of this paper is to describe the current progressachieved with the ComVantage project in designing a securi-ty model which will enable agile inter-organization collabo-ration while keeping a balance between security and flexibil-ity using Linked Data in industrial environments. The paperwill discuss and present an enhanced multi-domain securityaccess control approach based on intelligent structuring ofenterprise data through Linked Data and innovative SPAR-QL query rewriting capabilities. Some proposals in order toincrease the flexibility within this security model will be pre-sented and discussed.Section 2 of this paper states the need for access control invirtual companies and provides an overview of access control

1EU FP7 IP Project “Collaborative Manufacturing Network for Compet-itive Advantage”: www.comvantage.eu

MIR Labs, USA

Flexibility vs. Security in Linked Enterprise Data Access Control Graphs 94

and the protection of information published as Linked Da-ta. In section 3, the access control framework defined withinComVantage is presented. Section 4 shows a concrete usecase in order to illustrate the situation we are facing and thenintroduces some ways to add more flexibility to the securitymodel. Section 5 handles the consistency checking and ver-ification concept and basic policies for a fast and seamlessintegration of the approach into information systems. Sec-tion 6 shows performance characteristics and section 7 dis-cusses the concept further before the paper is concluded withan outlook of possible enhancements.

II. Motivation

A. Linked Data for inter-organizational collaboration

So far the use of Linked Data has been a very effective mean-s to manage open webs of data [3] with the W3C LinkingOpen Data being the most outstanding one. Linked Data is aset of technologies and best practices for the Semantic Web.It uses RDF (Resource Description Framework) as a seman-tic graph-based information model, URIs as references forthe modeled information entities, HTTP as transport proto-col and SPARQL (SPARQL Protocol And RDF Query Lan-guage) as graph-based query language. However, as statedin [4] current implementations such as Kowari metastore,TAP2 or Redland [5] are mostly focused on scalability andrarely address security and access control issues.

B. A need for access control

Due to the open nature of Linked Data and the Web of Data,collaborative enterprises may consider that their shared con-tent is not safe, thus preventing them from publishing theirdata as RDF in the Linked Data cloud, at the expense of thegrowth of the Web of Data itself [6]. In fact, the creationof private Linked Data clouds is essential when dealing withan inter-organizational collaboration environment, where aset of interlinked data will be shared and used by differentagents. Therefore, the implementation of multi-domain ac-cess control mechanisms is a crucial issue, ensuring that theshared information remains secure and only accessible to au-thorized members.Additionally, the mechanism should also be flexible, so thatinformation security enforcement does not deter companiesfrom engaging in collaboration, and trust is not limited by thenature of the enterprises comprising the collaboration net-work.

C. Related work

Traditional access control solutions based on XACML stan-dards3 are well-established solutions to implement accesscontrol solutions in distributed and federated environments.XACML architectures allow decisions to be made in terms ofpermit/deny based on a set of access control policies, as ad-dressed in [7]. However, coarse-grained XACML decisions(permit/deny) to control SPARQL requests for information

2http://tap.stanford.edu3http://docs.oasis-open.org/xacml/3.0/xacml-3.

0-core-spec-cs-01-en.pdf

access to Graph Stores supporting the Linked Data resourcesmay prove insufficient to deal with the requirements of in-dustrial collaboration.The implementation of effective and flexible access controlsolutions for Linked Data sharing in the context of enterprisedistributed collaboration raises three major challenges: (1)how to define a fine-grained access control model for LinkedData resources, (2) how to allow a controlled mobile con-sumption of such information (3) how to keep the flexibilityof the access control model. Addressing these security chal-lenge should leverage more agile, flexible and effective formsof secure collaboration.Over recent years, several authors have addressed the provi-sioning of access control solutions for the Semantic Web. Tobegin with, Shen and Cheng [8] proposed a context-based ac-cess control model using Semantic Web technologies, wherepolicies are expressed using SWRL. In [4] a set of actions areoutlined that can be performed on an RDF store. The secu-rity policy consists of prohibitions or permissions to performactions on some RDF triples.In [9] the authors assign security labels to RDF triples. So asto limit unauthorized inferences, they consider the entailmentrules and recommendations defined in the W3C RDF Seman-tics [10] and propose some rules for automatically assigningsecurity labels to involved RDF statements. Nevertheless,none of these models include an administration model speci-fying how the security policy can be updated. In fact, both ofthem assume that the definition of the security policy shouldbe carried out by a central authority. This is really a prob-lem in the context of the Internet of Things or even the Web,where metadata come from different sources and should bemanaged in a decentralized way.Besides, the authors in [11] present a Privacy Preference On-tology (PPO) to express fine-grained access control policiesto RDF data. Apart from that, Costabello et al. [12] presenta Linked Data access control framework focused on context-aware control policies for querying Web of Data servers frommobile environments. Nevertheless, neither of these paper-s considers the application of their access control models forthe Web of Data to decentralized collaboration environments,such as the one envisioned within ComVantage. Moreover,the SHI3LD mobile access control framework for LD [12]has been recently proposed. However, to determine the listof accessible graphs, it is necessary to evaluate pre-definedaccess policies against the context of the requester in execu-tion time, which may create scalability issues as the size ofusers and the data space increases, as well as delays in theretrieval of information.Finally, in [13], we envisioned a security model based onViews generation. The main drawback of this model is thatthe size of the information that was generated as a conse-quence of the data views creation, was five times bigger dueto the inference run, due to the reification4 process, whichsplits every statements into four statements. As a result, thissecurity model will be extended in this paper by proposingthe use of Named graphs instead of data views.

4http://www.w3.org/TR/rdf-mt/#Reif

95 Graube et al.

D. Contributions

To overcome such limitations, the security approach present-ed in this paper relies on the concept of the access controlgraph generation (also called view generation), which doesnot require policy re-evaluation at the SPARQL processingstage. This approach is extended compared to [13] by usingNamed Graph for storing the views in order to avoid the hugestorage cost implied by reification. A multi-domain accesscontrol framework is proposed, which allows fine-grainedpolicies to be defined for dynamically restricting (or grant-ing) access to specific sets of RDF data,. This enables secureand ad-hoc collaboration between geographically distribut-ed experts in a decentralized manner. The trade-off betweenflexibility in the models and security of the information canbe reached by using specific model patterns.

III. Access Control Graph Approach

A. General overview of the security framework

The ComVantage mobile collaboration framework (de-scribed in detail in [14]) relies on a Domain Access Server(DAS), which offers an HTTP interface to the mobile appli-cation framework, as shown in Figure 1. The central elementin the DAS is the query interface that receives either HTTPrequests with embedded SPARQL queries to RDF data orrequests for a single dereference-able URI where live data(sensor/machine data) information is located. The DomainSource Layer hosts the triple stores (SPARQL endpoints)where the semantic data is located, the servers where the se-mantic data is located, and the servers where the live datais located. The security architecture relies on following fivemajor elements (authentication modules in orange and autho-rization modules in green):

1. Identity Provider: Module which provides authenti-cation services according with SAML5 standard. It in-cludes a LDAP6 repository where users’ credentials/at-tributes are stored. It is responsible for issuing users’security tokens or credentials once they have been prop-erly authenticated in its domain.

2. Mobile/Application Client: Element that requires ac-cess to information in the target domain, which willcontain a Domain Authenticator module to authenticateproperly and retrieve security tokens and credentials forfurther information requests.

3. Service provider: Module in the DAS responsible forprotecting the resource enforcing basic authorizationpolicies based on credentials gathered after right au-thentication and dealing with reception of resource ac-cess requests according to SAML standard.

4. SPARQL Rewriter: Module that grants fine graineddata access based on the RDF data Named Graphs avail-able through the rewriting of the SPARQL queries re-ceived.

5https://www.oasis-open.org/committees/security/6Lightweight Directory Access Protocol

5. Non-RDF Access Control: Module to enforce accesscontrol policies related to non-RDF data, based on X-CAML standard.

The first step when a user is willing to access some infor-mation from another company is to authenticate towards theuserside domain. The line of research has been based onSAML (Security Assertion MarkUp Language). This OASISstandard defines both an XML schema for security assertionsand protocols to exchange these assertions relying on previ-ous trust establishment between domains, that includes mu-tual acknowledge and key exchange. SAML allows businessentities to make assertions regarding the identity, attributesand entitlements of a subject authenticated in its own domainto other entities in the target domain, such as a partner com-pany or another enterprise application. This is implementedby the blocks colored in orange in Figure 1.After this cross domain authentication process has been com-pleted, the query will be received in the Query Interface andthe authorization process starts. The Query Interface receiveseither HTTPS requests with embedded SPARQL queries orrequests for single dereference-able URIs. Depending on thenature of the requested data, different authorization process-es will be executed. If the requested data are published inthe Linked Data cloud as RDF, the SPARQL query rewrit-ing process will be performed and will provide a SPARQLrewritten query as a result, which will be launched to the cor-responding SPARQL endpoint. However, in those situationswhere live data are demanded, a traditional XACML accesscontrol will take place.The focus of this paper is on the SPARQL rewriting processwhich will guarantee that the information published in RD-F remains accessible only for authorized users. This accesscontrol process is based on an out-of-band policy definitionprocess (Named Graphs generation) and a runtime enforce-ment process (SPARQL query rewriting).

B. View generation

While in Role-based access control (RBAC), policies are ex-pressed by placing users in groups and assigning rights togroups, in the approach suggested here data views are creat-ed (i.e. a set of triples) and access types (canSee and canUse)are established for defined roles over them. This is achievedby policies which describe rules and/or formulas using Nota-tion3 (N3) as a human readable RDF serialization.Views are created by inferring new facts from original dataaccording to the policies and importing them into RDF datastores with the new facts organized in views. So, if an in-formation is not included into any view accessible for one ormore roles, that information will remain private. The organi-zation of the information in views uses reification as a way oftransforming one triple into three single statements (for eachsubject, predicate and object) which makes it possible to addcontext to this triple in this case the access rights for a role.The N3 policy enforcement (or inferring process) could bedone using a general-purpose data processor as CWM7. It isa forward chaining reasoned that includes a set of built-infunctions enabling the writing of more sophisticated rules ifthey are required. To help with the writing and enforcement

7http://infomesh.net/2001/cwm/

Flexibility vs. Security in Linked Enterprise Data Access Control Graphs 96

Figure. 1: Combined Access Control approach in the ComVantage framework

1 @pref ix f o a f : <h t t p : / / xmlns . com / f o a f/ 0 . 1 / > .

2 @pref ix ac : <h t t p : / / www. comvantage . eu /o n t o l o g i e s / ac−schema / > .

3 @pref ix l o g : <h t t p : / / www. w3 . org / 2 0 0 0 / 1 0 /swap / l o g #>.

4 @pref ix i n s : <h t t p : / / www. comvantage . eu /i n s t a n c e s / > .

56 { } l o g : i m p l i e s { i n s : e v e r y o n e a f o a f :

Group } .7 { } l o g : i m p l i e s { i n s : p u b l i c a ac : View } .8 { } l o g : i m p l i e s { i n s : e v e r y o n e ac : canSee

i n s : p u b l i c ; ac : canUse i n s : p u b l i c } .9 @forAll i n s : x , i n s : y .

10 { i n s : x ac : canSee i n s : y } l o g : i m p l i e s {i n s : x ac : canUse i n s : y } .

Listing 1: Policy Rule in N3 logic

of the policies, it is desirable the inclusion of a public view‘public’ accessible for all the users (by a common role ‘ev-eryone’) in which it could be inserted any public data.Another consideration to take into account is a generic rulethat should be applied after the creation of a canSee view,that implies the assignment of canUse rights because all auser can see should be also usable (canUse) (see listing 1).The policy enforcement can also be applied by a SPARQLengine, replacing the N3 notation policies with the corre-sponding SPARQL statements. This approach allows to usethe full power of SPARQL including negation with MINUSand property paths. For example. it is possible to allow only

1 INSERT INTO <u n i t−view> {2 ? p cv : v a l u e ? v a l u e .3 ? p cv : h a s U n i t ? u n i t4 }5 WHERE{6 : c a cv : Company .7 : p a cv : P a r a m e t e r .8 :m a cv : Machine .9 : c cv : m a i n t a i n s :m.

10 :m cv : h a s P a r a m e t e r : p .11 : p cv : v a l u e : v a l u e .12 : p cv : h a s U n i t : u n i t13 }14 INSERT INTO <a c c e s s r u l e s > {15 : manager : canSee <u n i t−view >.16 }

Listing 2: Policy enforcement by SPARQL

circular lists as a policy. Furthermore, as SPARQL is a tech-nology the user already have to know there is no need to learnanother technology. Listing 2 is an example of SPARQL pol-icy enforcement for the named graph solution. However, thiswill also work for the reification approach.The determination of which triples should be on the ACviews (or AC named graphs) could be done in two differ-ent ways. The simplest one is the analysis of the differentSPARQL statements that are expected to be launch to obtainthe required data by the end users (or applications). The oth-er one is the acquisition of the knowledge about the companydata model and the relations between that data model and theroles.

97 Graube et al.

The use of reified statement to store in AC views the triplesto protect has a high cost in storage space. The amount ofnew triples to store after the policy enforcement could befive times bigger. Nowadays, storage space is not a criticissue, but it also implies that the time required to process aSPARQL statement will be increased significantly. To avoidthis issue, named graphs are used for modeling the viewsinstead of reified statements. This new approach comparedto [13] drastically reduces the amount of new triples to store.It could vary from one new triple, if it is reused an exist-ing named graph, to just the same number of new triples asthe number of triples to protect plus one. In the case that isreused an existing named graph, it implies another extra ad-vantage. This extra advantage is related with the avoiding ofthe necessity of rerun the policy enforcement each time thereis an update on the original data.

C. SPARQL Query Rewriting

During runtime, SPARQL queries are rewritten in such amatter that only information associated with views to whichthe requester has access is retrieved. This access depends onits role, which will be determined by a prior authenticationmechanism as described in [14]. Those checks added by therewriting algorithm are composed of triple patterns and con-straints implementing a reification pattern similar to the viewgeneration. During query resolution, they will limit the ac-cess to the information that has been collected in any viewrelated with the roles of the requester. The rewritten querywill be executed on the endpoint where the views had beenpublished out of band. Only the information related to theviews that the requester is able to see or use will be returned.Indeed, as a relevant distinguishing feature from other exist-ing Liked Data access control methods, within the ComVan-tage approach, this SPARQL rewriting time cost does not de-pend on the amount of data to protect and on the number ofpolicies to apply to those data, thus facilitating a time effi-cient security method.The SPARQL rewriting algorithm first of all gathers all thevariables present in the query and determines which of thembelong to the canSee type or to the canUse type. That isdone looking at the SELECT clause of the main query, ex-cluding any other SELECT clause of an existing subquery.All the variables present in the SELECT clause are part ofthe canSee variables because the user needs to have canSeerights over them. The rest of the variables are of the typecanUse. Once all the variables have been cataloged, it anal-yses all the clauses to replace them with a new version con-taining a GRAPH clause if necessary. For example, if it find-s a GRAPH clause, the clause will be eliminated preservingthe content of the GRAPH clause (patterns, filters, ...) beenrewritten properly. For example, if the algorithm finds a FIL-TER clause, the algorithm will preserve the clause intact.Summarizing, each searching pattern will by enclosed intoa GRAPH clause and the variable assigned to that GRAPHclause will be include as a canSee AC rule if any of the ele-ments of the pattern is a canSee variable.The SPARQL rewriting algorithm also takes into account ifthe user has more than one role. The algorithm generates aUNION clause to include the AC rules of the roles grantingthe access to the information requested if any of the roles has

1 INSERT{2 GRAPH <a l l c a n S e e v i e w > {3 ?m a cv : Machine .4 }5 GRAPH <a c c e s s r u l e s > {6 r o l e : anybody : canSee <a l l c a n S e e v i e w

>.7 }8 }9 WHERE {

10 ?m a cv : Machine .11 }

Listing 3: Everybody may see all machines

access rights.

IV. Use Case: Access control graphs for main-tenance data

A. Information Space and Access Policy

An example from the domain of industrial maintenance shallillustrate the approach and its dependency on flexibility. Amanufacturer has sourced the maintenance of its machinesout to several external companies. Figure 2 shows the exem-plary dataset that should be protected. Within that dataset,one might want to implement the following security con-straint:

• Any company may see all machines.

• Every company c shall see only parameters p of thosemachines m’ that are maintained by this company c.

Following prefixes are used within this use case:

: <http://wwww.comvantage.eu/inst/maintenance/>

cv: <http://www.comvantage.eu/ont/maintenance/>

ac: <http://www.comvantage.eu/ontologies/ac-schema/>

role: <http://www.comvantage.eu/ontologies/ac-schema/>

B. Access Control for Linked Enterprise Data

The access policy makes it necessary to create different viewsfor every company and assign the according roles to theviews. This would be expressed as given in listing 3 and 4for the maintenance company :c1. The conditional part of thesecond rule constraints the application to the relevant objects(cv:Machine) (line 9). The action part of the access controlrule (lines 1-8) is modeled as INSERT statement and addsthose elements to the access control graph which is namedwith the URI <c1 canSee view> and give the role the re-quired access rights (lines 5-7). Now they can be matched bythe conditions of the rewritten query. Thus, it is ensured thatthe access constraints will be fulfilled for any query. There-fore, it is important to allow the specific roles (role:anybody,role:c1 staff ) to access the generated named graphs (<al-l canSee view>, <c1 canSee view>).The second part of the access policy requires specific viewsfor every company. Listing 4 shows a rule for company :c1.

Flexibility vs. Security in Linked Enterprise Data Access Control Graphs 98

Figure. 2: Sample Dataset about machines and their parameters

1 INSERT {2 GRAPH <c1 canSee v iew> {3 ? p cv : v a l u e ? v a l u e .4 }5 GRAPH <a c c e s s r u l e s > {6 r o l e : c 1 s t a f f : canSee <c1 canSee v iew

>.7 }8 }9 WHERE {

10 ?m a cv : Machine ;11 cv : h a s P a r a m e t e r ? p .12 ? p a cv : P a r a m e t e r ;13 cv : v a l u e : v a l u e .14 : c1 cv : m a i n t a i n s ?m.15 }

Listing 4: Company :c1 may view parameters of ownmachines

Line 14 limits the application to the company :c1. Then, theaction part (lines 1-8) adds the triples to the named graphwhere they can match the conditions of the rewritten query.Listing 5 illustrates the formula required to include the inter-nal variables in a query for use in the WHERE clause withoutallowing access to the end users (in the SELECT clause). Forthat purpose, it uses the ac:canUse attribute.Once the access views have been inferred, a SPARQL querylike the one in listing 6 has to be rewritten by the SPARQLrewriting engine giving the query shown in listing 7. Thatrewritten query contains the required constraints to ensurethat the response is limited to the data stored on the previous-ly created access control graphs. It is obvious that this query

Figure. 3: Original (left) and inferred triples (right) of policyrule 1

1 INSERT {2 GRAPH <c1 canUse v iew> {3 ? c1 cv : m a i n t a i n s ?m.4 ?m cv : h a s P a r a m e t e r ? p .5 }6 GRAPH <a c c e s s r u l e s > {7 r o l e : c 1 s t a f f : canUse <c1 canUse v iew

>.8 }9 }

10 WHERE {11 : c1 cv : m a i n t a i n s ?m.12 ?m a cv : Machine ;13 cv : h a s P a r a m e t e r ? p .14 ? p a cv : P a r a m e t e r .15 }

Listing 5: Company role c:1 staff may use but not see linkinformation

1 SELECT ?m ? p ? v2 WHERE {3 ?m a cv : Machine ;4 cv : h a s P a r a m e t e r ? p .5 ? p cv : v a l u e ? v .6 }

Listing 6: Original SPARQL query retrieving parameters ofmachines

is much simpler than the same with the approach from [13](see figure 8).

C. Adding Properties to the Linked Enterprise Data Cloud

Whenever data is added to Linked Enterprise Data cloud, forinstance by adding a machine, the access named graphs needto be regenerated. New predicates may be added freely tothe Linked Data cloud. However, the rule base needs to beadapted in order to provide access to them. Without provid-ing an additional access rule the secured access to the LinkedEnterprise Data stays stable. Listing 9 depicts how to addcontrolled access to the new property cv:hasUnit in exten-sion to listing 4.This example clearly illustrates that the security approach ef-

99 Graube et al.

1 SELECT ?m ? p ? v2 WHERE {3 GRAPH <a c c e s s r u l e s > {4 r o l e : c 1 s t a f f ac : canSee ? g1 , ? g2 , ? g3 .5 }6 GRAPH ? g1 { ?m a cv : Machine . }7 GRAPH ? g2 { ?m cv : h a s P a r a m e t e r ? p . }8 GRAPH ? g3 { ? p cv : v a l u e ? v . }9 }

Listing 7: Rewritten SPARQL query for role role:c1 staff

1 SELECT ?m ? v2 WHERE {3 ?m r d f : t y p e cv : Machine .4 r o l e : c 1 s t a f f ac : canSee ? v 0 , ? v 1 , ? v 2

.5 ? v 0 ac : h a s T u p l e ? t 0 .6 ? t 0 ac : s u b j e c t ?m;7 ac : p r e d i c a t e r d f : t y p e ;8 ac : o b j e c t cv : Machine .9 ?m cv : h a s P a r a m e t e r ? p .

10 ? v 1 ac : h a s T u p l e ? t 1 .11 ? t 1 ac : s u b j e c t ?m;12 ac : p r e d i c a t e cv : h a s P a r a m e t e r ;13 ac : o b j e c t ? p .14 ? p cv : v a l u e ? v .15 ? v 2 ac : h a s T u p l e ? t 2 .16 ? t 2 ac : s u b j e c t ? p ;17 ac : p r e d i c a t e cv : v a l u e ;18 ac : o b j e c t ? v .19 }

Listing 8: Rewritten SPARQL query following reificationapproach [13]

1 INSERT{2 GRAPH <c1 param view> {3 ? p cv : v a l u e ? v a l u e ;4 cv : h a s U n i t ? u n i t .5 }6 GRAPH <a c c e s s r u l e s > {7 r o l e : c 1 s t a f f ac : canSee <c1 param view

>.8 }9 }

10 WHERE {11 ? c a cv : Company ;12 cv : m a i n t a i n s ?m.13 ?m a cv : Machine ;14 cv : h a s P a r a m e t e r ? p .15 ? p a cv : P a r a m e t e r ;16 cv : v a l u e ? v a l u e ;17 cv : h a s U n i t ? u n i t .18 }

Listing 9: Adding access to cv:hasUnit attribute in:c1 param view

Figure. 4: Switching from a pure predicate model (left) to anattribute-value pair structure (right)

fectively prevents unsolicited data injection. On the downsid-e, however, we can observe a security triggered loss of flexi-bility in the data access. Any new machinery with probablynew attributes of the parameters (default value, last change)would need an adaptation of the access rules.

D. Adding Flexibility

This disadvantage can be overcome by proper modeling ofthe machine parameters with the attribute-value pair pat-tern. Instead of putting all the semantics into the predi-cate cv:hasUnit, this pattern splits and captures the semanticswith a pair of attribute and value nodes (see figures IV-D andIV-D). In our example the attribute node is connected witha cv:attribute predicate, the actual value of the parameter at-tribute is held in a node which is connected with a cv:valuepredicate.This approach allows general purpose access that makes itpossible to freely explore the parameter structure of all de-vices which are maintained by a specific company withoutthe need to capture all the possible predicates in advance inthe rule base. However, it comes with the disadvantage ofan additional modeling layer. This additional triple for eachparameter can be avoided when modeling the attribute-nodeas type of the parameter as depicted in figure 6. Here, the

Flexibility vs. Security in Linked Enterprise Data Access Control Graphs 100

Figure. 5: Switching from a pure predicate model (left) to anattribute-value pair structure (right)

Figure. 6: Subclassing allows both global and fine grainedpolicy rules

information about the type of the parameter is modeled ascv:unitParameter as a sub class of cv:Parameter in the un-derlying ontology. Thus, the inference rule can match allnodes that are subclasses of cv:Parameter.

V. Enhancements

A. Consistency Checking

The modeling, securing and providing of Linked Data is notan end in itself, but it is an indispensable part of an informa-tion system. These systems usually have applications whichdepend on the available secured information space. Clientsare increasingly implemented as mobile applications withspecific needs for their data input. These needs consist ofa unified information model, semantic descriptions and openaccess methods [15] which Linked Data can serve well.However, apps should be reusable for varied information likea mail client can work with different mail servers. Thisreusability is core requirement for single apps as well as en-hanced concepts like App Orchestration [16]. The conceptselects, adapts and manages generic apps in order to supportthe user in their complex task. Those generic apps can servea single purpose. However, they have to be flexible enoughthat they can be applied in different situations on differentdatasets in order to make this concept successful. Thus, theyrely on a kind of data information invariant for proper func-tionality. Checking for consistency of a dataset with such aninvariant could verify that the specialized client can operateon this specific dataset. This allows the correct selection ofapps in this concept when different generic apps with similarfunctionality are available but differ in the required informa-tion model.

1 INSERT {2 GRAPH <s p e c i f i c−type−view> {3 ? s ? p ? o .4 }5 GRAPH <a c c e s s r u l e s > {6 <s p e c i f i c−r o l e> ac : canSee <s p e c i f i c−

type−view >.7 }8 }9 WHERE{

10 ? s a <s p e c i f i c−type >;11 ? p ? o .12 }

Listing 10: Basic policies for protecting resources with alltheir attributes and connections according to their types

1 : b1 a : Box ;2 r d f s : l a b e l ”Box1 ” ;3 : l o c a t i o n [ : x ” 1 2 ” ; : y ” 1 0 ” ] .

Listing 11: Model with indirectly associated attributes

The approach presented here has the advantage that it of-fers this additional functional verification for free. There isno need to investigate the data structures on the client side.This very complex full functional verification of data struc-tures [17] can be omitted by relying on the access controlgraphs which are strictly defined. The policies can be usedas an invariant verification since they will only match spe-cific types of data structure and data semantics. Hence, wedo not need to follow an ad-hoc approach for checking theconsistency of the datasets at runtime in the apps, since thisis assured a-priori by the access control graphs.

B. Basic Policies

A set of some basic policies that can leverage the applicationof this approach in industrial environments. This is especial-ly needed for small and medium sized companies which donot have a big experience with information modeling. Thesebasic policies should provide a starting point for a smoothintegration of the access control. They should protect infor-mation in coarse-grained way and thus be easily applicable.One of the most simplest solution is the restriction on RDFresources based on their type. Thus, a basic SPARQL policycould look like the one in listing 10.Even without huge experience on security and informationmodeling, this approach leads to a secured environment.However, many are not that easy. Often, there are attributeswhich are connected to the belonging resource via a secondhop as in listing 11 (where the coordinates of a box are spec-ified). To cover those cases, one can extend the policy aspresented in listing 12. Now, additionally to resources of aspecific type, also connected resources with their attributesare put into the same security named graph as long as theydon’t have an own type.

101 Graube et al.

1 INSERT {2 GRAPH <s p e c i f i c−type−view> {3 ? s ? p ? o .4 }5 GRAPH <a c c e s s r u l e s > {6 <s p e c i f i c−r o l e> ac : canSee <s p e c i f i c−

type−view >.7 }8 }9 WHERE {

10 {11 ? s a <s p e c i f i c−type >;12 ? p ? o .13 } UNION {14 ? s t emp a <s p e c i f i c−type >;15 ? p temp ? s .16 ? s ? p ? o .17 MINUS {? s a ? some type .}18 }19 }

Listing 12: Extended basic policiy

VI. Evaluation

A. Response Time

An important metric for evaluating the usability of this con-cept is the additional query time. Therefore, we have mea-sured the time between the request sent by the client and theresponse received by using Apache jMeter8. We evaluatedthe operation time within setup consisting of a 4 GB RAMsystem running a Virtuoso 7 Open-Source Edition 9 as S-PARQL endpoint. The rewriting module was implementedin Java using the Apache Jena library10 for parsing and an-alyzing the query. The module has run while the IdentityProvider was located on a third machine. All of the machinesas well as the client performing the queries were located inthe same network in order to reduce network influence. Weapplied a set of policies on a dataset with 8k for four differen-t roles. After that, we have performed four different queries(Q1-Q4) with increasing complexity of the query onto thedataset. First, we used the Virtuoso endpoint without usingthe security features and afterwards we performed the samequeries including our security environment. That means thatthe captured response time includes the processes of authen-tication (on the IdP), the authorization (SPARQL rewriting)and the actual query (on the Virtuoso SPARQL endpoint).The measurement has been repeated 50 times to capture ran-dom effects such as computing load and network latency.Figure 7 presents a boxplot of the results showing a compari-son of the response time of the two approaches for all queries.The complexity of the original queries reflects only little inthe rewritten queries. The rewritten queries took constantlyabout 100 ms which mainly are due to the authentication andquery rewriting processes.

8http://jmeter.apache.org/9http://virtuoso.openlinksw.com/dataspace/doc/

dav/wiki/Main10http://jena.apache.org/

Figure. 7: Response time of secured access (gray) in com-parison to unsecured access (white) for different queries (Q1-Q4)

B. Storage

The costs of the different approaches measured in numberof new triples can be calculated mathematically, allowinga comparison between the reification approach and the onewith named graphs. Thus, we define following variables:

P Set of triples to be protected

ti Each of the triples in P

vi Number of AC views containing the triple ti

Pg Set of triples to be protected included in any previouslyexisting named graph reused for AC purpose

tj Each of the triples in Pg

vj Number of AC views containing the triple tj

Pf Set of triples to be protected excluded from any previ-ously existing named graph reused for AC purpose

tk Each of the triples in Pf

vk Number of AC views containing the triple tk

s Number of canSee AC rules

u Number of canUse AC rules

P is the union of the disjoint graphs Pg and Pf :

P = Pg ∪ Pf (1)Pg ∩ Pf = ∅ (2)

∀vj ∈ Pg : vj = 0 (3)

This leads to different storage consumptions for the ap-proaches, taking into account the triples that have to be storedadditionally to the existing ones:

• Reified Statements approach

trs = s+ u+

|P |∑i=1

4vi (4)

Flexibility vs. Security in Linked Enterprise Data Access Control Graphs 102

• Named Graphs approach without reuse of existingnamed graphs

tng = s+ u+

|P |∑i=1

vi (5)

• Named Graphs approach reusing some of the existingnamed graphs

trng = s+ u+

|Pg|∑j=1

vj +

|Pf |∑k=1

vk = s+ u+

|Pf |∑k=1

vk (6)

Comparing the formulas, it is possible to conclude that theapproach base on named graphs decreases considerably thenumber a new triples to store, consequently the amount ofdisk space needed. And, if they are reused one or more of thepreviously defined named graphs, it will reduce the storageneeded, apart from the other benefits it provides.

VII. Discussion

The security approach presented here results in some lossesin terms of flexibility as access to parameters can be forbid-den or allowed only in general. But two approaches couldsolve this issue. The necessary negations, which are not sup-ported by the views and the SPARQL query rewriting, canbe handled in the policies for view generation. FortunatelySPARQL queries allow this with the keyword MINUS. Theother possibility is to add additional triples in the data set thatcan be queried positively. This would allow simple rules butinvolve further restrictions to the flexibility of Linked Data.The generation of views can take a long time when appliedto big datasets. The computing time depends heavily on thenumber of potential statements that could affect new state-ments. However, models usually have parts which are onlyloosely coupled. Therefore it is worth putting knowledge andeffort into this system in order to extract specific parts that donot have an influence on other parts regarding specific rules.Now we can run the inferencing engine only on those smallsegmented parts. This can decrease the computation effortdramatically.This approach allows a very powerful access control toLinked Data information using SPARQL queries. Howev-er, the formulating and application of meaningful policies ishard work. For use in a production environment, additionalsupport is required for users. This could be a user-friendlypolicy editor. Furthermore the SPARQL query rewriting en-gine has to provide full SPARQL 1.1 support and easy de-ployment.Inferencing of the access control graphs and checking formodel constraints can go hand in hand. Both should be con-ducted every time the original model changes. The infer-ence run is necessary in order to make the new informationavailable and the model checking is necessary to ensure thatclients can operate on the data in the expected way.Moreover, the inference run could also detect items in thedataset which are not handled by any policy. These itemsare protected by default. However, information that can notbe accessed by anyone is very unlikely and might be a hintthat the policies are not working as expected. Thus, it might

make sense to return this information to an administrator inorder to check and revise the policies and the derived rules.The view generation, creating the specific named graphs,could also generate new statements, when the content of onepath between specific resources should be hidden while thefact that there is a connection should still be visible.The change from the reified statements approach to the ap-proach based on Named Graphs provides advantages both inperformance and storage. The reification approach has oftenlet to timeouts during query evaluation to the of the SPAR-QL endpoint because the rewritten queries had become re-ally complex. In comparison, the new approach with namegraphs shows a good performance. Furthermore, the storageneeded is just less than a fourth of the reified approach.

VIII. Conclusion

Access control is a crucial requirement for the success ofLinked Data for inter-organizational collaboration environ-ments. The approach proposed here for securing SPARQLaccess consists of two parts. Firstly, the original data is orga-nized in different views modeled as Named Graphs accord-ing to a set of access policies. The access to the views whichcontain triples can be restricted to different roles. Second-ly, incoming SPARQL requests need an authenticated role asfurther parameter and are rewritten in a way that the state-ments have to be in accessible views for this role.Meaningful policies require taking the information model in-to account. Thus, the modeling concepts not only affect theway the model looks but also have a big impact on the cre-ation and maintenance of security policies. This fact shouldbe kept in mind when designing information models. Usual-ly thoughts about securing the information are not present atthis step. However, we have shown that policy rules heavilydepend on the assumptions made in the information models.The approach shows a balanced trade-off between security,flexibility and performance. We will perform further investi-gations using the system in real industrial scenarios.

Acknowledgment

The research leading to these results was funded by the Eu-ropean Community’s Seventh Framework Programme undergrant agreement no. FP7-284928 IP ComVantage.

References

[1] J. Murdock, C. Buckner, and C. Allen, “Containing thesemantic explosion,” in Procedings of PhiloWeb, Lyon,2012.

[2] M. Graube, J. Pfeffer, J. Ziegler, and L. Urbas, “Linkeddata as integrating technology for industrial data,” inProceedings of the 14th International Conference onNetwork-Based Information Systems. IEEE, Sep.2011, pp. 162–167.

[3] C. Bizer, T. Heath, and T. Berners-Lee, “Linked data- the story so far,” International Journal on SemanticWeb and Information Systems (IJSWIS), vol. 5, no. 3,pp. 1–22, 2009.

103 Graube et al.

[4] P. Reddivari, “Policy based access control for a rdf s-tore,” in Proceedings of the Policy Management for theWeb Workshop, 2005, p. 78–83.

[5] D. Beckett, “The design and implementation of the red-land RDF application framework,” in Proceedings ofthe 10th international conference on World Wide Web,ser. WWW ’01. New York, NY, USA: ACM, 2001, p.449–456.

[6] T. Heath and C. Bizer, Linked Data : Evolving the Webinto a Global Data Space. Morgan & Claypool, 2011.

[7] H. Shen, “A semantic-aware attribute-based access con-trol model for web services,” in Proceedings of the 9thInternational Conference on Algorithms and Architec-tures for Parallel Processing, ser. ICA3PP ’09. Berlin,Heidelberg: Springer-Verlag, 2009, p. 693–703.

[8] H. Shen and Y. Cheng, “A semantic context-based mod-el for mobile web services access control,” Internation-al Journal of Computer Network and Information Secu-rity (IJCNIS), vol. 3, no. 1, pp. 18–25, Feb. 2011.

[9] A. Jain and C. Farkas, “Secure resource descriptionframework: An access control model,” in Proceedingsof the Eleventh ACM Symposium on Access ControlModels and Technologies, ser. SACMAT ’06. NewYork, NY, USA: ACM, 2006, p. 121–129.

[10] Patrick Hayes, “RDF semantics,” Feb. 2004.[Online]. Available: http://www.w3.org/TR/2004/REC-rdf-mt-20040210/

[11] O. Sacco and A. Passant, “A privacy preference ontolo-gy (PPO) for linked data,” in Proceedings of the LinkedData on the Web Workshop (LDOW2011), 2011.

[12] L. Costabello, S. Villata, N. Delaforge, and F. Gandon,“Linked data access goes mobile: Context-aware autho-rization for graph stores,” in Proc. of 5th WWW Work-shop on Linked Data on the Web, 2012.

[13] M. Graube, P. Ortiz, M. Carnerero, O. Lazaro, M. Uri-arte, and L. Urbas, “Flexibility vs. security in linkedenterprise data access control graphs,” in Proc. of 9thIEEE Int. Conf. on Information Assurance and Securi-ty, 2013.

[14] P. Ortiz, O. Lazaro, M. Uriarte, and M. Carnerero, “En-hanced multi-domain access-control for secure mobilecollaboration through linked data cloud in manufactur-ing,” in Proceedings of IEEE World of Wireless Mo-bile and Multimedia Networks (WoWMoM) conference2013, 2013, pp. 1–9.

[15] M. Graube, J. Ziegler, J. Hladik, and L. Urbas, “Linkeddata as enabler for mobile applications for complextasks in industrial settings,” in Proceedings of IEEE18th Conference on Emerging Technologies & FactoryAutomation (ETFA 2013). IEEE, 2013, pp. 1–8.

[16] J. Ziegler, M. Graube, J. Pfeffer, and L. Urbas, “Be-yond app-chaining: Mobile app orchestration for effi-cient model driven software generation,” in 17th inter-national IEEE Conference on Emerging Technologies& Factory Automation, Krakau, Poland, 2012, pp. 1–8.

[17] K. Zee, V. Kuncak, and M. Rinard, “Full functional ver-ification of linked data structures,” ACM SIGPLAN No-tices, vol. 43, no. 6, p. 349–361, 2008.