Resource management for isolation enhanced cloud services

8
Resource Management for Isolation Enhanced Cloud Services Himanshu Raj Microsoft Corporation 1 Microsoft Way Redmond, WA [email protected] Ripal Nathuji Microsoft Corporation 1 Microsoft Way Redmond, WA [email protected] Abhishek Singh Microsoft Corporation 1 Microsoft Way Redmond, WA [email protected] Paul England Microsoft Corporation 1 Microsoft Way Redmond, WA [email protected] ABSTRACT The cloud infrastructure provider (CIP) in a cloud comput- ing platform must provide security and isolation guarantees to a service provider (SP), who builds the service(s) for such a platform. We identify last level cache (LLC) sharing as one of the impediments to finer grain isolation required by a service, and advocate two resource management approaches to provide performance and security isolation in the shared cloud infrastructure - cache hierarchy aware core assignment and page coloring based cache partitioning. Experimental results demonstrate that these approaches are effective in isolating cache interference impacts a VM may have on an- other VM. We also incorporate these approaches in the re- source management (RM) framework of our example cloud infrastructure, which enables the deployment of VMs with isolation enhanced SLAs. Categories and Subject Descriptors D.4.6 [Operating Systems]: Security and Protection General Terms Security, Performance, Measurement Keywords Isolation Attributes, Cache Isolation, Cache Coloring 1. INTRODUCTION Cloud computing environments provide an Infrastructure as a Service (IaaS) model to host services provided by in- dependent service providers. This vision of cloud comput- ing enables services running on leased computation plat- Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. CCSW’09, November 13, 2009, Chicago, Illinois, USA. Copyright 2009 ACM 978-1-60558-784-4/09/11 ...$10.00. forms, decoupling the service provider (SP) from the plat- form owner or the cloud infrastructure provider (CIP). Such a model has potential for huge cost savings for service providers as they avoid the significant financial overheads associated with deploying, maintaining, and managing datacenter en- vironments, and instead pay just for the usage of these re- sources. However, this benefit comes at a price - the sep- aration of service and infrastructure providers implies that the service provider has less control over the service deploy- ment, and must trust the CIP to uphold the guarantees pro- vided in the service level agreement (SLA). An SLA works as a contract between CIP and service provider, and provides guarantees either in terms of low level resource provisioning, e.g., X number of CPUs with some computational capacity and Y Mbps network throughput, or in terms of higher-level goodput value(s) for the service, such as Z ops/second. A service provider must also trust in the infrastructure provider’s ability to properly isolate services from each other. Isolating a service from other services includes both perfor- mance and security isolation. This implies that the infras- tructure provider must employ mechanisms so that it is not possible for one service to interfere with the execution of another service. A typical method to achieve isolation is to enforce physical isolation, e.g., ensure that different services execute on unique physical machines, and use isolated net- work infrastructures. Although these mechanisms achieve excellent isolation, strict physical isolation is costly, and in many cases the dedicated resources will be under-utilized. Hence, many cloud computing platforms use virtualization to encapsulate services inside virtual machines (VMs), in- cluding Amazon EC2 [1] and Microsoft Azure [2]. This approach allows CIP to better utilize resources, while still providing adequate isolation. The adoption of virtualiza- tion also enables other benefits including the ease of service deployment and the flexibility of VM migration to provide fault tolerance and improved consolidation. Isolation properties of a virtualized platform, however, are weaker compared to physical isolation. In particular, re- sources that may be implicitly shared among VMs, such as the last level cache (LLC) on multicore processors and mem- ory bandwidth, present opportunities for security or perfor- mance interference. For example, it has been shown that an otherwise isolated process can compromise the confiden- 77

Transcript of Resource management for isolation enhanced cloud services

Resource Management for Isolation Enhanced CloudServices

Himanshu RajMicrosoft Corporation

1 Microsoft WayRedmond, WA

[email protected]

Ripal NathujiMicrosoft Corporation

1 Microsoft WayRedmond, WA

[email protected]

Abhishek SinghMicrosoft Corporation

1 Microsoft WayRedmond, WA

[email protected]

Paul EnglandMicrosoft Corporation

1 Microsoft WayRedmond, WA

[email protected]

ABSTRACTThe cloud infrastructure provider (CIP) in a cloud comput-ing platform must provide security and isolation guaranteesto a service provider (SP), who builds the service(s) for sucha platform. We identify last level cache (LLC) sharing asone of the impediments to finer grain isolation required by aservice, and advocate two resource management approachesto provide performance and security isolation in the sharedcloud infrastructure - cache hierarchy aware core assignmentand page coloring based cache partitioning. Experimentalresults demonstrate that these approaches are effective inisolating cache interference impacts a VM may have on an-other VM. We also incorporate these approaches in the re-source management (RM) framework of our example cloudinfrastructure, which enables the deployment of VMs withisolation enhanced SLAs.

Categories and Subject DescriptorsD.4.6 [Operating Systems]: Security and Protection

General TermsSecurity, Performance, Measurement

KeywordsIsolation Attributes, Cache Isolation, Cache Coloring

1. INTRODUCTIONCloud computing environments provide an Infrastructure

as a Service (IaaS) model to host services provided by in-dependent service providers. This vision of cloud comput-ing enables services running on leased computation plat-

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.CCSW’09, November 13, 2009, Chicago, Illinois, USA.Copyright 2009 ACM 978-1-60558-784-4/09/11 ...$10.00.

forms, decoupling the service provider (SP) from the plat-form owner or the cloud infrastructure provider (CIP). Sucha model has potential for huge cost savings for service providersas they avoid the significant financial overheads associatedwith deploying, maintaining, and managing datacenter en-vironments, and instead pay just for the usage of these re-sources. However, this benefit comes at a price - the sep-aration of service and infrastructure providers implies thatthe service provider has less control over the service deploy-ment, and must trust the CIP to uphold the guarantees pro-vided in the service level agreement (SLA). An SLA works asa contract between CIP and service provider, and providesguarantees either in terms of low level resource provisioning,e.g., X number of CPUs with some computational capacityand Y Mbps network throughput, or in terms of higher-levelgoodput value(s) for the service, such as Z ops/second.

A service provider must also trust in the infrastructureprovider’s ability to properly isolate services from each other.Isolating a service from other services includes both perfor-mance and security isolation. This implies that the infras-tructure provider must employ mechanisms so that it is notpossible for one service to interfere with the execution ofanother service. A typical method to achieve isolation is toenforce physical isolation, e.g., ensure that different servicesexecute on unique physical machines, and use isolated net-work infrastructures. Although these mechanisms achieveexcellent isolation, strict physical isolation is costly, and inmany cases the dedicated resources will be under-utilized.Hence, many cloud computing platforms use virtualizationto encapsulate services inside virtual machines (VMs), in-cluding Amazon EC2 [1] and Microsoft Azure [2]. Thisapproach allows CIP to better utilize resources, while stillproviding adequate isolation. The adoption of virtualiza-tion also enables other benefits including the ease of servicedeployment and the flexibility of VM migration to providefault tolerance and improved consolidation.

Isolation properties of a virtualized platform, however, areweaker compared to physical isolation. In particular, re-sources that may be implicitly shared among VMs, such asthe last level cache (LLC) on multicore processors and mem-ory bandwidth, present opportunities for security or perfor-mance interference. For example, it has been shown thatan otherwise isolated process can compromise the confiden-

77

tiality of another process [14]; with similar attacks possi-ble in the virtualized environment. Similarly, use of sharedcache(s) makes it difficult to properly isolate performance [7],possibly allowing a malicious VM to launch the denial of ser-vice attack on another VM, and certainly threatens a serviceprovider’s ability to guarantee an SLA to its clients.

Isolation issues are further complicated by the fact thatlarge scale Internet services will likely be developed by build-ing services on top of existing underlying services, and notjust the hosting VMM. In this case, a service might relyon services from multiple different service providers, andcomplete isolation among dependent services may not bepossible (or indeed desirable). However, currently any in-terdependence among services implies a trust relationship -the dependent service trusts the other service to do the rightthing (e.g. securely communicate and store any user relateddata), with severe repercussions for a service provider thattrusts another service that is then compromised.

Isolation attributes for a service defined as part of the SLAbetween a service provider and the infrastructure providerserve two purposes - 1) to capture the degree of isolationdemanded by a service (from both the performance and se-curity points of view), and 2) to allow a service to author-itatively report its isolation characteristics so that serviceconsumers can decide whether to trust it. We call the latterfeature “isolation attestation” and it is specifically useful inthe case when one SP depends on another SP for some ser-vice, or when a cloud service user (client) is deciding whetherto use a service. Based on the isolation attestation fromCIP, a SP can choose to trust another SP rather than be-ing forced to trust the SP and the cloud service provider.Similarly, this attestation can be provided to the client as ameasure to gain trust in the SP.

Our focus in this work is limited to the first item. In par-ticular, isolation related SLA constraints are used in CIP’sresource manager to manage various service components en-capsulated in the VMs, including their instantiation, mi-gration, and termination. We also present mechanisms toenforce some of the isolation constraints, in particular, fo-cusing on the last-level cache as the shared resource in themulticore environment prevalent in today’s cloud infrastruc-tures. We explore two such mechanisms in the paper: cachehierarchy aware core assignment, and cache-aware memoryassignment through page coloring. These techniques preventcache-based side channel attacks and provide performanceisolation against a VM. We also present an example for-mulation of a constraint satisfaction problem (CSP) for VMplacement in the cloud environment based on enhanced SLAconstraints. We conclude with future directions on furtherenhancing SP’s SLA by integrating trusted computing tech-niques, such as attestation, in cloud computing platforms.

2. ISOLATION ATTRIBUTES FOR CLOUDSERVICES: EXAMPLE SCENARIOS

In cloud services architecture, the service provider imple-ments the service logic and presents it to clients over the in-ternet (cloud). The service logic itself is typically composedof multiple components. The cloud infrastructure provideruses some container abstraction, e.g. a virtual machine, forservice deployment, and it is up to the service provider topackage various components of the service into these con-tainers. Several of these VMs, belonging to various inde-

pendent SPs, are then deployed on the infrastructure basedon the decisions of the resource manager (RM) of the CIP.We make a simplifying assumption that a SP will deploy allthe components on a single CIP. Moreover, any dependen-cies a SP might have on other SPs are also limited to thesame CIP.

A specific example of such a cloud based service is theVirtual Desktop Experience (VDE) (refer to Figure 1). Inthis scenario the SP provides a virtual machine for each cus-tomer. The customer may access the virtual machine fromanother PC, or possibly from a dumb terminal or set topbox. The service provider adds value by allowing roamingaccess to the VDE, and possibly providing centralized pro-fessional management. To realize this architecture, the VDEcloud service, provided by one or more SPs, forms the mid-dle layer. The SP has some number of session VMs andservice VMs. The session VM is specific to a client, andworks as her personal computer. These SPs will in turn usecloud infrastructure services from a CIP, such as MicrosoftAzure.

This scenario shows an implicit dependency between aclient and a single service provider. However, this is scenariospecific, and may not be representative of cloud based ser-vices in general. Another example is Live Mesh [3], wherethe client stores data with an SP, which in turn uses re-sources provided by the CIP. In this example, the SP mayimplement the service using a pool of VMs, which providestorage to the client via a web interface. These VMs thenutilize some name space partitioning mechanism to storedata for different clients, such as a distributed file system.The file system may be implemented by the same SP, orit may use another SP that provides the file service. Ulti-mately, the disk storage is required for this SP is provisionedby the CIP.

In all these examples, there exist constraints at each bound-ary between adjoining layers that form the SLA betweenthese layers. For example, in our sample VDE scenario, aclient is usually concerned about experience (including per-formance and ease of use), security of service, and privacyof user data. Constraints based on these concerns, such as“screen resolution”, “data encryption quality”, and “need foranti-virus agent”, are then presented to an SP as part ofthe SLA between the client and the SP. Similar security andprivacy concerns are also applicable to the Live Mesh sce-nario. In this case SPs implement appropriate mechanismsin order to meet concerns of their clients, such as encapsulat-ing each client in a separate VM, or by using access controlmechanisms provided by file systems. These mechanisms,in turn, create concerns about isolation and resource man-agement for the SP (e.g., whether a VM from another SPcan adversely impact the performance of a Session VM, howmuch CPU resource be allocated to a Session VM), whichare then passed on as constraints to the CIP as part of theSLA between SP and CIP. Table 1 outlines some of the SLAconstraints typical of a SP. In general, an SLA specificationmay use a subset of these at a given time.

The CIP actively manages resources in order to meet theSLA constraints specified by SPs. For our specific cloudinfrastructure scenario, this includes assigning physical re-sources to the VMs. This resource assignment problem canbe posed as a Constraint Satisfaction Problem (CSP) for-mulation for the CIP. However, the constraints of the CSPare different than the SLA constraints, since they need to

78

Figure 1: Example interaction between different entities in virtual desktop experience cloud service

Constraint Classification Type CommentNumber of processors QoS attribute Integer Service specifies this constraint based on the par-

allelism desired.Goodput of service QoS attribute Float Service specific QoS metric. This can be used

in lieu of specifying low level resource allocation,such as CPU share, RAM size, storage capacity,and network bandwidth [wood 2008, choi 2008].

Replication factor (r) QoS attribute Integer Defines how many replicas of the VM are needed.H/w fault domain (n) Isolation attribute Integer Defines the number of physical nodes across which

replicas should be scattered.Cache based DoS attack avoidance Isolation attribute Boolean Defines whether VM must be safeguarded against

a malicious VM that might cause cache thrashing.Cache based side channel attackavoidance

Isolation attribute Boolean Defines whether VM must be safeguarded againsta malicious VM that might try to steal secrets(e.g., encryption keys) based on cache based at-tacks [6, 14, 13]

Table 1: Example SLA Constraints specified by a SP

be stated in terms of resources and mechanisms meaning-ful to the CIP, such as CPU share, size of RAM, storagecapacity, and network bandwidth. We defer to Section 5to describe one such formulation; first we describe mecha-nisms employed by the CIP to enforce some of the isolationattributes. In the following section we describe two suchmechanisms for cache based isolation, along with a proto-type implementation and evaluation.

3. ENFORCING CACHE ISOLATION INMULTICORE SYSTEMS

Multicore systems are prevalent in today’s large scale datacenters, which form the core of the cloud infrastructure.This multicore trend is expected to continue in the future.Shared caches are commonly used in such multicore archi-tectures. A drawback to these designs, though, is that itis difficult to guarantee performance to a thread whose ac-tive working set spills out of its local non-shared caches intothe last level cache (LLC) since other threads can simul-taneously access the LLC resulting in active interference.Hence, memory-bound threads that thrash the cache (ma-liciously or otherwise) can severely impact performance ofother applications sharing the LLC. Moreover, it is possi-ble to impact the confidentiality of another thread using

cache as the medium for side channel attacks. Addressingthe cache isolation issues is especially important in cloudcomputing scenarios, since interfering threads may belongto different SPs (more precisely, to VMs owned by differ-ent SPs). This can impact the ability of the SP to upholdthe SLA to its clients, or makes it more expensive for theSP to provision to uphold the SLA. To this end, we presenttwo techniques that we have implemented - cache hierarchyaware core allocation and page coloring based cache parti-tioning, that provide better isolation.

3.1 Cache hierarchy aware core assignmentCurrent generation multicore systems usually share the

LLC at the package level, however many server class ma-chines deployed in data centers today are configured withmultiple packages. These machines provide opportunities forplacing VMs in a manner so as to exclude any cache sharing.Traditionally, details of how processing cores, caches, andmemory are organized are exposed to software so that com-putational thread placement can be optimized. We groupcores on a machine based on their LLC organization - allcores sharing the LLC are put in a single group. Next, if aVM’s SLA defines isolation attributes related to cache, thecaching hierarchy aware core assignment algorithm tries tosatisfy these constraints by choosing a group that is cur-

79

rently not assigned to any other VM. Cores from this groupare then assigned to different virtual processors of the VM.Depending on the number of virtual processors required bythe VM, one or more groups may be used.

This approach is simple to implement, although the biggestdrawback is that it may result in under utilization of theplatform. In particular, if a VM requires cache isolation anduses fewer cores than the sum of all the cores in the groupsassigned to it, these unassigned cores cannot be used.

3.2 Page-coloring based cache partitioningPage coloring is a software method to control how the

physical memory used by an application maps to cache hard-ware. The number of colors that a cache can support is de-termined by its organization, and is obtained by multiplyingthe cache line size by the number of sets and dividing by thepage size. In the case of virtualized systems, the mannerin which the hypervisor allocates memory pages to back aVM can influence the cache usage of threads in the VM.We utilize page coloring as a software mechanism for cacheisolation by isolating the color sets that are used to backindividual VMs running on CPU cores that share the LLC(i.e., belonging to the same group).

Page coloring has the advantage over the cache hierarchyaware core allocation technique in that it does not result inprocessor under utilization. However, it is still possible tounder utilize the memory available on the platform. For ex-ample, if memory required by a VM is not an integral sum ofthe amount of memory available of colors that are currentlyunassigned to any other VMs, remaining pages of these col-ors may not be assigned to any other VM. Also, since cacheis exclusively partitioned using colors, the performance ofa VM may suffer if VM’s working set does not fit in thecache partition. Such performance degradation may be ac-ceptable, as long as the QoS SLA constraints of the VM aresatisfied.

4. EXPERIMENTAL EVALUATION OFCACHE ISOLATION TECHNIQUES

4.1 Implementation Details and MethodologyOur prototype implementation of cache isolation tech-

niques is based on the Hyper-V virtualization infrastruc-ture [5]. Hyper-V consists of a micro-kernel hypervisor thatmanages CPU and memory resources and a privileged VM,called the primary partition, for the overall management ofthe platform. In particular, the primary partition managesthe life-cycle of other VMs, and also hosts a virtualizationstack for I/O virtualization. As part of the VM creation,the memory management component of Hyper-V runningin the primary partition allocates physical memory pagesto back memory pages of the VM. We modified this com-ponent to use a variant of Windows NT kernel’s memoryallocation API that allows the caller to specify an addressrange and stride factor for allocated pages. We used theseparameters to limit pages to a set of colors for VMs so thatonly a percentage of the LLC would be accessible. Actualnumber of colors supported by a platform is specific to itscache organization, and is described later in the section.

Next, we enhanced the configuration of every physical ma-chine with two pieces of information - the group informationfor cores, and the numbers of page colors and their current

Figure 2: Nehalem cache hierarchy

sizes. The experimental platform we use is 8-core Intel Ne-halem processor based machine, with 6GB RAM. The Ne-halem processor cache hierarchy includes a local L1 and L2per core, as well as an 8MB shared LLC as shown in Fig-ure 2. The machine consists of two such packages, organizedas two NUMA nodes. All three levels of the cache have 64-byte cache lines. Hence, there are two core groups, and eachgroup shares a LLC supporting 128 colors (based on 4Kbytepage size). For initial results, we utilized a synthetic appli-cation to run inside of VMs. The application allocates anarray of a specified working set size, and then accesses itin a regular pattern. The Nehalem processor includes mul-tiple hardware prefetching mechanisms to enable improvedperformance. For the included measurements, we have dis-abled some of these mechanisms to better isolate the effectsof page coloring. In particular, we have disabled the DataPrefetch Logic (DPL) prefetching mechanism. Impact of dif-ferent hardware prefetching mechanisms on page coloring ispart of our future work.

The target VM running the synthetic workload is a sin-gle virtual processor (VP) VM. To observe the impact onperformance, we run a perturbing VM comprising of threeVPs. For the cache hierarchy aware placement, the perturb-ing VM is placed on the separate group (i.e., the separatepackage). For page coloring based cache partitioning, theperturbing VM is assigned cores from the same group asthat of the target VM. The perturbing VM runs a memoryintensive application with varying number of threads thatrepeatedly access memory and cause cache thrashing.

4.2 Experimental ResultsOur first set of experiments consider the impact of cache

sharing by consolidating the target VM and the perturbingVM on the same core group (quad-core package) with nopage coloring. Here we measure the execution time of thebenchmark application in the target VM. Figure 3 providesthe execution time from our experiments. When the targetVM is executing alone, we observe that once the working setfits within the hardware LLC size (8MB), the execution timedrops to a baseline value of approximately 40 seconds. Sub-sequently we introduce threads in the perturbing VM thathave a working set of 8MB. As expected, as such threads exe-cute concurrently on the remaining three cores in the group,we see increased performance interference to the measuredapplication, requiring reduced working set sizes before exe-cution time drops to the baseline value. These initial resultswithout any cache isolation technique highlight the fact thatthere can be significant performance impact of interferencefrom other threads in the shared LLC (up to 400%). We next

80

look at how cache management can impact the isolation forthe target VM.

Figure 3: Performance of target VM with varyingdegrees of perturbation

For the cache hierarchy aware core assignment approach,target and perturbation VMs are placed in different coregroups. In particular, VMs are assigned cores and memoryfrom separate NUMA nodes. Hence, the target VM doesn’tshare cache resources with the perturbation VM. The per-formance of the target VM is similar to that in the previousexperiment without cache isolation when target VM exe-cutes without the perturbing VM, as shown by the lowestcurve in Figure 3. Execution of the perturbation VM doesnot result in any visible loss of performance for the targetVM. Due to brevity, we have omitted these results from thepaper.

We evaluate page coloring by using it to segment the LLC.We assign the target VM a preferential share of 50% (byusing half of the total number of colors available, which inour case is 8), and coloring the perturbing VM with theremaining 50%. Figure 4 depicts the performance data ofthe measured application inside the target VM with pagecoloring turned on. We observe that by integrating coloring,the performance of the target application is fairly consistentas additional threads are added to the perturbing VM. Wenext look at the performance impact of this static isolationwhen compared to the non colored case.

Figure 4: Performance of target VM with page col-oring

Figure 5 compares the execution time of the various sce-narios with and without coloring (with Y-axis shown on log-

arithmic scale). We observe that in the unconsolidated casethere is a penalty of coloring (up to 3.6x for 50% coloring).This is expected since the coloring limits the ability of theapplication in target VM to make use of the entire LLCwhere it otherwise would have. Once threads from perturb-ing VMs are included, however, we observe that the execu-tion time can be cut by up to a factor of three with coloring.These numbers highlight that though page coloring can im-pact performance, it is an effective means of providing cacheisolation. As long as the performance degradation does notviolate any QoS SLA constraint, this approach can be usedto provide performance and security isolation to a VM.

Figure 5: Comparative performance of target VMwith and without page coloring

5. BRINGING IT TOGETHER: AN SLADRIVEN APPROACH TO RESOURCEMANAGEMENT IN THE CLOUDINFRASTRUCTURE

In this section, we demonstrate how to utilize various re-source management techniques to manage resources in thecloud infrastructure in a way so as to satisfy various QoSand isolation SLA constraints put forth by a SP to the CIP.As specified earlier, the SLA constraints are converted into aset of CIP specific constraints - defined in terms of attributesrelated to resources available at the CIP. The problem, then,reduces to a constraint satisfaction problem (CSP). For-mally, a CSP is defined as having a set of constraint C thatare defined over a set of variables X. The variables in X cantake values in the domain D. The goal is to find value assign-ments to X such that all the constraints in C are satisfied.For the example cloud infrastructure, the CSP can be in-formally defined as: given a set of VMs with CIP specificconstraints, is it possible to place these VM on a subset ofphysical nodes in the infrastructure in a manner that all CIPspecific constraints related to these VMs are satisfied? Wewill present the CSP formulation with the help of a runningexample.

81

FOREACH vm IN VMs

FOREACH blade IN Blades

FOREACH D IN blade.ProcessorDomains

FOREACH P IN procdomain.PageColorDomains

vm.Blade = blade

vm.ProcessorDomain = D

vm.PageColorDomain = P

IF all constraints evaluate to true

Jump to next vm

ELSE

vm.Blade = NULL

IF THERE EXISTS vm in VM : vm.Blade == NULL

PRINT "FAILED"

ELSE

PRINT "SUCCEEDED"

Figure 7: Pseudo-code of a greedy algorithm for theCSP formulation

Suppose the SP specifies the following SLA for its service:

Number of processors = 2

Replication factor(𝑟) = 5

H/w fault domain(𝑛) = 5

Cache based DoS attack avoidance = True

Cache based side channel attack avoidance = True

The goal for this specific example is to place 5 VMs (basedon replication factor, r = 5) on physical machines in thecloud such that the SLA is satisfied.

In our example cloud infrastructure model, a physicalnode is identified as a Blade object, and the complete setof these objects is the set Blades. Figure 6 and Table 2describe various attributes associated with a blade object.

Let VMs be the set of virtual machines, correspondingto the five replicas, vm1, . . . , vm5, that need to be placedon the set Blades. Each VM object has following decisionvariables that need to be solved:

∙ Blade, The mapping to a blade object;

∙ ProcessorDomain, The processor domain object; and

∙ PageColorDomain, The page color domain object.

The domain of Blade decision variable is the set of bladesBlades. Similarly, the domain of ProcessorDomain variableis the Processor Domain objects present in the cloud. How-ever, there is an added constraint that a VM’s ProcessorDo-main must belong to the same Blade which corresponds tothe value of the decision variable Blade. Similar constraintis applicable to the PageColorDomain. The goal of the re-source manager, then, to find a placement that satisfies theconstraints presented in Table 3.

A solution to this placement problem would be valid as-signment of objects Blade, ProcessorDomain and PageCol-orDomain such that the all the constraints are satisfied. Wecurrently use a simple greedy approach to find the solution,as described below (refer to Figure 7).

Our current formulation, and the greedy algorithm, do notyet consider multiple ProcessorDomains or multiple Page-ColorDomains to satisfy VM’s resource requirements (num-ber of processors and amount of memory, respectively). Also,

the greedy approach does not guarantee a solution if it ex-ists. We are currently investigating a more generic formula-tion of this problem using Microsoft Solver Foundation (orother CSP solvers).

6. RELATED WORKThere is little prior work on security and isolation spe-

cific SLA constraints. Monahan et al. define example se-curity related SLA constraints that are applicable in cloudcomputing scenarios [10]. However, they only broadly de-fine isolation among multiple services. To our knowledge,presented work is the first attempt on characterizing spe-cific isolation related attributes for SLAs between the CIPand SPs. Specifically we define attributes to thwart againstcache based side channel attacks in a shared cloud comput-ing infrastructure. We also extend the resource managementframework to include these isolation based constraints whendeploying and managing services in the infrastructure.

Cache based interference has given rise to many isola-tion problems in multicore systems, both impacting perfor-mance [7] and security [6, 14, 13]. Prior research on cachebased isolation includes many software [8, 15], and hardwaretechniques [16, 9], with focus of software techniques mostlyon performance isolation. In this work, our focus is on us-ing software approaches for both security and performanceisolation in virtualized environment. Further hardware sup-port [11] may be necessary to provide better performanceisolation guarantees.

7. CONCLUSIONS AND FUTURE WORKWe envision that in future cloud computing environments,

service providers (SPs) will also specify security and perfor-mance isolation constraints as part of their Service LevelAgreement (SLA). One such set of constraints advocatedin this paper are based on cache sharing in contemporarymulticore systems, where a VM may severely impact an-other VM’s performance and compromise its secrecy andintegrity using cache based interference. To this end, wepresent two approaches to provide cache-based security andperformance isolation - cache hierarchy aware core assign-ment, and page-coloring based cache partitioning. Experi-mental results based on our prototype implementation basedon Hyper-V virtualization platform demonstrate that bothof these techniques are effective in providing required isola-tion properties. We also provide a generic Constraint Satis-faction Problem (CSP) formulation that incorporates theseapproaches in the general resource management frameworkof our example cloud infrastructure. We are currently inthe process of implementing our CSP formulation using theMicrosoft Solver Foundation [4], and plan to evaluate theimpact of SLA isolation attributes on the overall cost of VMplacement in a typical cloud infrastructure.

In future, we plan to incorporate attestation of an SP’sisolation attributes by the CIP. Such “isolation attestation”will enable clients and other SPs in the cloud services plat-form to make an informed trust decision regarding whether,and how much, to depend on a particular service.

Another class of isolation issues that we plan to address infuture arise from the fact that cloud administrators or othermanagement related entities can impact a service’s confiden-tiality and/or integrity. Although we do not consider denialof service attacks by an entity in CIP - we assume that a

82

Blade

ProcDomains

. . .D1

PageColorDomains

. . .P1

AvailableCurrentVMsCapacity

AvailableCurrentVMsCapacity

FaultDomainAvailableProcessors

Figure 6: Hierarchical attributes of a Blade object

Attribute Type CommentAvailableProcessors Integer The number of processors currently available for reservation.FaultDomain Integer Identifies the hardware fault domain number assigned to this blade.

Different number implies different fault domain. Currently each bladeis in its unique fault domain.

ProcessorDomains Set Set of ProcessorDomain objects. Each processor domain’s cache is in-dependent of others. Each ProcessorDomain object in turn has the fol-lowing attributes - Capacity, VMs, Available, and PageColorDomains,described next.

.Capacity Integer Number of processors in this domain

.CurrentVMs Set Set of VMs assigned to this domain

.Available Integer Number of Processors available in this domain

.PageColorDomains Set Set of PageColorDomain objects. The memory pages in different pagecolors do not intersect. Each PageColorDomain object in turn has thefollowing attributes - Capacity, VMs, and Available, described next.

..Capacity Integer Number of pages in this domain

..CurrentVMs Set Set of VMs assigned to this domain

..Available Integer Number of pages available in this domain

Table 2: Details of attributes of a Blade object

service provider can simply switch to another CIP if this isthe case, the other security problems are more perniciousin that SPs may never discover the compromise. This prob-lem is present for non-virtualized cloud environments, but isperhaps more worrisome in virtualized environments wherethe infrastructure provider has quick and easy access to thehosting software. Thus a malicious administrator, who isan integral part of current virtualized environments, cancompromise the confidentiality/integrity of a service. Re-cent attempts at disaggregation in virtualized environmentsdeal with some of these issues [12]. These disaggregationtechniques, combined with the remote platform attestationof virtualization software that employs them, will form theroot of trust for SPs/client and a basis for trust in “isolationattestation” provided by the CIP.

8. REFERENCES[1] Amazon elastic compute cloud (ec2).

http://aws.amazon.com/ec2/.

[2] Microsoft azure services platform.http://www.microsoft.com/azure/default.mspx.

[3] Microsoft Live Mesh. www.mesh.com.

[4] Microsoft solver foundation.http://code.msdn.microsoft.com/solverfoundation.

[5] Virtualization with Hyper-V.http://www.microsoft.com/windowsserver2008/en/us/hyperv-main.aspx.

[6] D. J. Bernstein. Cache-timing attacks on AES.http://cr.yp.to/antiforgery/cachetiming-20050414.pdf.

[7] D. Chandra, F. Guo, S. Kim, and Y. Solihin.Predicting inter-thread cache contention on a chipmulti-processor architecture. In HPCA ’05:Proceedings of the 11th International Symposium onHigh-Performance Computer Architecture, pages340–351, Washington, DC, USA, 2005. IEEEComputer Society.

[8] A. Fedorova and M. Seltzer. Improving performanceisolation on chip multiprocessors via an operating

83

SLA Constraint Value Translated constraint evaluates to either true or falseNumber of processors 2

∀vm ∈ VMs, vm.Blade ∈ Blades :

vm.Blade.AvailableProcessors ≥ 2

H/w fault domain (n) 5

∀vm1, vm2 ∈ VMs, vm1 ∕= vm2

vm1.Blade.FaultDomain ∕= vm2.Blade.FaultDomain

Cache based isolation True

∀vm ∈ VMs :

(vm.ProcessorDomain ∈ vm.Blade.ProcessorDomains

AND

(vm.ProcessorDomain.CurrentVMs = 𝜙

OR

(vm.PageColorDomain ∈ vm.ProcessorDomain.PageColorDomains

AND

vm.PageColorDomain.CurrentVMs = 𝜙)))

Table 3: CSP Formulation

system scheduler. In Parallel Architecture andCompilation Techniques, 2007. PACT 2007. 16thInternational Conference on, pages 25–38, Sept. 2007.

[9] S. Kim, D. Chandra, and Y. Solihin. Fair cachesharing and partitioning in a chip multiprocessorarchitecture. In PACT ’04: Proceedings of the 13thInternational Conference on Parallel Architectures andCompilation Techniques, pages 111–122, Washington,DC, USA, 2004. IEEE Computer Society.

[10] B. Monahan and M. Yearworth. Meaningful securityslas. Technical report, HP Labs, 2008.

[11] T. Moscibroda and O. Mutlu. Memory performanceattacks: denial of memory service in multi-coresystems. In SS’07: Proceedings of 16th USENIXSecurity Symposium on USENIX Security Symposium,pages 1–18, 2007.

[12] D. G. Murray, G. Milos, and S. Hand. Improving xensecurity through disaggregation. In VEE ’08:Proceedings of the fourth ACM SIGPLAN/SIGOPSinternational conference on Virtual executionenvironments, pages 151–160, 2008.

[13] D. A. Osvik, A. Shamir, and E. Tromer. Cache attacksand countermeasures: the case of aes. In Topics inCryptology - CT-RSA 2006, The CryptographersSTrack at the RSA Conference 2006, pages 1–20.Springer-Verlag, 2005.

[14] C. Percival. Cache missing for fun and profit.http://www.daemonology.net/papers/htt.pdf.

[15] D. Tam, R. Azimi, L. Soares, and M. Stumm.Managing shared l2 caches on multicore systems insoftware. In Workshop on the Interaction betweenOperating Systems and Computer Architecture, 2007.

[16] Z. Wang and R. B. Lee. New cache designs forthwarting software cache-based side channel attacks.In ISCA ’07: Proceedings of the 34th annualinternational symposium on Computer architecture,pages 494–505, New York, NY, USA, 2007. ACM.

84