A Privacy Leakage Upper-Bound Constraint Based Approach for Cost-Effective Privacy Preserving of...

95
A Privacy Leakage Upper-Bound Constraint Based Approach for Cost-Effective Privacy Preserving of Intermediate Datasets In cloud CHAPTER-1 INTRODUCTION Technically cloud computing is regarded as an ingenious combination of a series of technologies, establishing a novel business model by offering IT services and using economies of scale. Participants in the business chain of cloud computing can benefit from this novel model. Cloud customers can save huge capital investment of IT infrastructure, and concentrate on their own core business. Therefore, many companies or organizations have been migrating or building their business into cloud. However, numerous potential customers are still hesitant to take advantage of cloud due to security and privacy concerns. The privacy concerns caused by retaining intermediate data sets in cloud are important but they are paid little attention. Storage and computation services in cloud are equivalent from an economical perspective because they are charged in proportion to their usage. Thus, cloud users can store valuable intermediate data sets selectively when processing original data sets in data intensive applications like medical diagnosis, in order to curtail the overall expenses by avoiding frequent re computation to obtain these data sets. Such scenarios are 1

Transcript of A Privacy Leakage Upper-Bound Constraint Based Approach for Cost-Effective Privacy Preserving of...

A Privacy Leakage Upper-Bound Constraint BasedApproach for Cost-Effective Privacy Preservingof Intermediate Datasets In cloud

CHAPTER-1

INTRODUCTION

Technically cloud computing is regarded as an ingenious

combination of a series of technologies, establishing a novel

business model by offering IT services and using economies

of scale. Participants in the business chain of cloud

computing can benefit from this novel model. Cloud customers

can save huge capital investment of IT infrastructure, and

concentrate on their own core business. Therefore, many

companies or organizations have been migrating or building

their business into cloud. However, numerous potential

customers are still hesitant to take advantage of cloud due

to security and privacy concerns. The privacy concerns caused

by retaining intermediate data sets in cloud are important

but they are paid little attention. Storage and computation

services in cloud are equivalent from an economical

perspective because they are charged in proportion to their

usage. Thus, cloud users can store valuable intermediate data

sets selectively when processing original data sets in data

intensive applications like medical diagnosis, in order to

curtail the overall expenses by avoiding frequent re

computation to obtain these data sets. Such scenarios are1

quite common because data users often reanalyze results,

conduct new analysis on intermediate data sets, or share some

intermediate results with others for collaboration. Without

loss of generality, the notion of intermediate data set

herein refers to intermediate and resultant data sets.

However, the storage of intermediate data enlarges attack

surfaces so that privacy requirements of data holders are at

risk of being violated. Usually, intermediate data sets in

cloud are accessed and processed by multiple parties, but

rarely controlled by original data set holders. This enables

an adversary to collect intermediate data sets together and

menace privacy-sensitive information from them, bringing

considerable economic loss or severe social reputation

impairment to data owners. But, little attention has been

paid to such a cloud-specific privacy issue.

1.1. PROCESS FLOW:

2

1.2. Objective of the Project

In this paper, we propose a novel upper-bound privacy

leakage constraint based approach to identify which

intermediate datasets need to be encrypted and which do not,

so that privacy-preserving cost can be saved while the

privacy requirements of data holders can still be satisfied.

Evaluation results demonstrate that the privacy-preserving

cost of intermediate datasets can be significantly reduced

with our approach over existing ones where all datasets are

encrypted.

3

CHAPTER-2

LITERATURE SURVEY

CHAPTER-2

LITERATURE SURVEY

4

View of Cloud Computing

Provided certain obstacles are overcome, we believe

Cloud Computing has the potential to transform a large part

of the IT industry, making software even more attractive as a

service and shaping the way IT hardware is designed and

purchased. Developers with innovative ideas for new

interactive Internet services no longer require the large

capital outlays in hardware to deploy their service or the

human expense to operate it. They need not be concerned about

over-provisioning for a service whose popularity does not

meet their predictions, thus wasting costly resources, or

under-provisioning for one that becomes wildly popular, thus

missing potential customers and revenue. Moreover, companies

with large batch-oriented tasks can get their results as

quickly as their programs can scale, since using 1000 servers

for one hour costs no more than using one server for 1000

hours. This elasticity of resources, without paying a premium

for large scale, is unprecedented in the history of IT. The

economies of scale of very large-scale datacenters combined

with ``pay-as-you-go'' resource usage has heralded the rise

of Cloud Computing. It is now attractive to deploy an

innovative new Internet service on a third party's Internet

Datacenter rather than your own infrastructure, and to

gracefully scale its resources as it grows or declines in

popularity and revenue. Expanding and shrinking daily in

response to normal diurnal patterns could lower costs even

5

further. Cloud Computing transfers the risks of over-

provisioning or under-provisioning to the Cloud Computing

provider, who mitigates that risk by statistical multiplexing

over a much larger set of users and who offers relatively low

prices due better utilization and from the economy of

purchasing at a larger scale. We define terms, present an

economic model that quantifies the key buy vs. pay-as-you-go

decision, offer a spectrum to classify Cloud Computing

providers, and give our view of the top 10 obstacles and

opportunities to the growth of Cloud Computing.

Cloud Computing and Emerging IT Platforms: Vision,

Hype, and

Reality for Delivering Computing as the 5th Utility

With the significant advances in Information and

Communications Technology (ICT) over the last half century,

there is an increasingly perceived vision that computing will

one day be the 5th utility (after water, electricity, gas,

and telephony). This computing utility, like all other four

existing utilities, will provide the basic level of computing

service that is considered essential to meet the everyday

needs of the general community. To deliver this vision, a

number of computing paradigms have been proposed, of which

the latest one is known as Cloud computing. Hence, in this

paper, we define Cloud computing and provide the architecture

6

for creating Clouds with market-oriented resource allocation

by leveraging technologies such as Virtual Machines (VMs). We

also provide insights on market-based resource management

strategies that encompass both customer-driven service

management and computational risk management to sustain

Service Level Agreement (SLA)-oriented resource allocation.

In addition, we reveal our early thoughts on interconnecting

Clouds for dynamically creating global Cloud exchanges and

markets. Then, we present some representative Cloud

platforms, especially those developed in industries along

with our current work towards realizing market-oriented

resource allocation of Clouds as realized in Aneka enterprise

Cloud technology. Furthermore, we highlight the difference

between High Performance Computing (HPC) workload and

Internet-based services workload. We also describe a meta-

negotiation infrastructure to establish global Cloud

exchanges and markets, and illustrate a case study of

harnessing ‘Storage Clouds’ for high performance content

delivery. Finally, we conclude with the need for convergence

of competing IT paradigms to deliver our 21st century vision.

Cloud computing is a new and promising paradigm

delivering IT services as computing utilities. As Clouds are

designed to provide services to external users, providers

need to be compensated for sharing their resources and

capabilities. In this paper, we have proposed architecture

for market-oriented allocation of resources within Clouds.

We have also presented a vision for the creation of global

Cloud exchange for trading services. Moreover, we have

discussed some representative platforms for Cloud computing7

covering the state-of-the-art. In particular, we have

presented various Cloud efforts in practice from the market-

oriented perspective to reveal its emerging potential for

the creation of third-party services to enable the successful

adoption of Cloud computing, such as meta-negotiation

infrastructure for global Cloud exchanges and provide high

performance content delivery via ‘Storage Clouds’. The

state-of-the-art Cloud technologies have limited support for

market-oriented resource management and they need to be

extended to support: negotiation of QoS between users and

providers to establish SLAs; mechanisms and algorithms for

allocation of VM resources to meet SLAs; and manage risks

associated with the violation of SLAs. Furthermore,

interaction protocols needs to be extended to support

interoperability between different

Cloud service providers In addition, we need programming

environments and tools that allow rapid creation of Cloud

applications. Data Centers are known to be expensive to

operate and they consume huge amounts of electric power. For

example, the Google data center consumes power as much as a

city such as San Francisco. As Clouds are emerging as next-

generation data centers and aim to support ubiquitous

service-oriented applications, it is important that they are

designed to be energy efficient to reduce both their power

bill and carbon footprint on the environment. To achieve this

at software systems level, we need to investigate new

techniques for allocation of resources to applications

depending on quality of service expectations of users and

service contracts established between consumers and providers8

[67]. As Cloud platforms become ubiquitous, we expect the

need for internetworking them to create market-oriented

global Cloud exchanges for trading services. Several

challenges need to be addressed to realize this vision. They

include: market-maker for bringing service providers and

consumers; market registry for publishing and discovering

Cloud service providers and their services; clearing houses

and brokers for mapping service requests to providers who can

meet QoS expectations; and payment management and accounting

infrastructure for trading services. Finally, we need to

address regulatory and legal issues, which go beyond

technical issues. Some of these issues are explored in

related paradigms such as Grids and service-oriented

computing systems. Hence, rather than competing, these past

developments need to be leveraged for advancing Cloud

computing. Also, Cloud computing and other related paradigms

need to converge so as to produce unified and interoperable

platforms for delivering IT services as the 5th utility to

individuals, organizations, and corporations.

Scientific Communities Benefit from the Economies of

Scale

9

The basic idea behind Cloud computing is that resource

providers offer elastic resources to end users. In this

paper, we intend to answer one key question to the success of

Cloud computing: in Cloud, can small or medium-scale

scientific computing communities benefit from the economies

of scale? Our research contributions are three-fold: first,

we propose an enhanced scientific public cloud model (ESP)

that encourages small or medium scale research organizations

rent elastic resources from a public cloud provider; second,

on a basis of the ESP model, we design and implement the

Dawning Cloud system that can consolidate heterogeneous

scientific workloads on a Cloud site; third, we propose an

innovative emulation methodology and perform a comprehensive

evaluation. We found that for two typical workloads: high

throughput computing (HTC) and many task computing (MTC),

Dawning Cloud saves the resource consumption maximally by

44.5% (HTC) and 72.6% (MTC) for service providers, and saves

the total resource consumption maximally by 47.3% for a

resource provider with respect to the previous two public

Cloud solutions. To this end, we conclude that for typical

workloads: HTC and MTC, Dawning Cloud can enable scientific

communities to benefit from the economies of scale of public

Clouds. Index Terms— Cloud Computing, Scientific

Communities, Economies of Scale, Many-Task Computing and High

Throughput Computing Traditionally, in scientific computing

communities (in short, scientific communities), many small-

or medium-scale organizations tend to purchase and build

dedicated cluster systems to provide computing services for

typical workloads. We call this usage model the dedicated10

system model. The dedicated system model prevails in

scientific communities, of which an organization owns a

small- or medium-scale dedicated cluster system and deploys

specific runtime environment software that is responsible for

managing resources and its workloads. A dedicated system is

definitely worthwhile [35] as such a system is under the

complete control of the principal investigators and can be

devoted entirely to the needs of their experiment. However,

there is a prominent shortcoming of the dedicated system

model: for peak loads, a dedicated cluster system cannot

provide enough resources, while lots of resources are idle

for light loads. Recently, as resource providers [2],

several pioneer computing companies are adopting the concept

of infrastructure as a service, among which, Amazon EC2

contributed to popularizing the infrastructure-as-a-service

paradigm [32]. A new term Cloud is used to describe this new

computing paradigm [4] [17] [24] [32]. In this paper, we

adopt the terminology from B. Sotomayoretal.’s paper [32] to

describe different types of clouds. Public clouds offer a

publicly accessible remote interface for the masses’ creating

and managing virtual machine instances within their

proprietary infrastructure.

In this paper, we have answered one key question to the

success of Cloud computing: In scientific communities, can

small- or medium-scale organizations benefit from the

economies of scale? Our contributions are four-fold: first,

we proposed a dynamic service provisioning (ESP) model in

Cloud computing. In the ESP model, a resource provider can

create specific runtime environments on demand for MTC or HTC11

service providers, while a service provider can resize

dynamic resources. Second, on a basis of the ESP model, we

designed and implemented an enabling system, Dawning Cloud,

which provides automatic management for heterogeneous MTC and

HTC workloads. Third, our experiments show that for typical

MTC and HTC workloads, MTC and HTC service providers and the

resource service provider can benefit from the economies of

scale on a Cloud platform. Lastly, using an analytical

approach we verify that irrespective of specific workloads,

Dawning Cloud can achieve the economies of scale on Cloud

platforms.

Making Cloud Intermediate Data Fault-Tolerant

Parallel dataflow programs generate enormous amounts of

distributed data that are short-lived, yet are critical for

completion of the job and for good run-time performance. We

call this class of data as intermediate data. This paper is

the first to address intermediate data as a first-class

citizen, specifically targeting and minimizing the effect of

run-time server failures on the availability of intermediate

data, and thus on performance metrics such as job completion

time. We propose new design techniques for a new storage

system called ISS (Intermediate Storage System), implement

these techniques within Hadoop, and experimentally evaluate

the resulting system. Under no failure, the performance of

Hadoop augmented with ISS (i.e., job completion time) turns

out to be comparable to base Hadoop. Under a failure, Hadoop12

with ISS outperforms base Hadoop and incurs up to 18%

overhead compared to base no-failure Hadoop, depend upon the

test based setup.

We have shown the need for, presented requirements

towards, and designed a new intermediate storage system (ISS)

that treats intermediate data in dataflow programs as a

first-class citizen in order to tolerate failures. We have

shown ex-perimentally that the existing approaches are

insufficient in satisfying the requirements of data

availability and minimal interference. We have also shown

that our asynchronous rack-level selective replication

mechanism is effective, and masks interference very well. Our

failure injection experiments show that Hadoop with ISS can

speed up the performance by approximately 45% compared to

Hadoop without ISS. Under no failures, ISS incurs very little

overhead compared to Hadoop. Under a failure, ISS incurs only

up to 18% of overhead compared to Hadoop with no failures.

Privacy-Preserving Multi-Keyword Ranked Search over

Encrypted

With the advent of cloud computing, data owners are

motivated to outsource their complex data management systems

from local sites to the commercial public cloud for great

flexibility and economic savings. But for protecting data

privacy, sensitive data has to be encrypted before

outsourcing, which obsoletes traditional data utilization

based on plaintext keyword search. Thus, enabling an13

encrypted cloud data search service is of paramount

importance. Considering the large number of data users and

documents in the cloud, it is necessary to allow multiple

keywords in the search request and return documents in the

order of their relevance to these keywords. Related works on

searchable encryption focus on single keyword search or

Boolean keyword search, and rarely sort the search results.

In this paper, for the first time, we define and solve the

challenging problem of privacy-preserving multi-keyword

ranked search over encrypted cloud data (MRSE). We establish

a set of strict privacy requirements for such a secure cloud

data utilization system. Among various multi-keyword

semantics, we choose the efficient similarity measure of

“coordinate matching”, i.e., as many matches as possible, to

capture the relevance of data documents to the search query.

We further use “inner product similarity” to quantitatively

evaluate such similarity measure. We first propose a basic

idea for the MRSE based on secure inner product computation,

and then give two significantly improved MRSE schemes to

achieve various stringent privacy requirements in two

different threat models. Thorough analysis investigating

privacy and efficiency guarantees of proposed schemes is

given. Experiments on the real-world dataset further show

proposed schemes indeed introduce low overhead on computation

and communication.

14

Authorized Private Keyword Search over Encrypted

Data in Cloud Computing

In cloud computing, clients usually outsource their data

to the cloud storage servers to reduce the management costs.

While those data may contain sensitive personal information,

the cloud servers cannot be fully trusted in protecting them.

Encryption is a promising way to protect the confidentiality

of the outsourced data, but it also introduces much

difficulty to performing effective searches over encrypted

information. Most existing works do not support efficient

searches with complex query conditions, and care needs to be

taken when using them because of the potential privacy

leakages about the data owners to the data users or the cloud

server. In this paper, using online

Personal Health Record (PHR) as a case study, we first show

the necessity of search capability authorization that reduces

the privacy exposure resulting from the search results, and

establish a scalable framework for Authorized Private Keyword

Search (APKS) over encrypted cloud data. We then propose two

novel solutions for APKS based on a recent cryptographic

primitive, Hierarchical Predicate Encryption (HPE). Our

solutions enable efficient multi-dimensional keyword searches

with range query allow delegation and revocation of search

capabilities. Moreover, we enhance the query privacy which

hides users’ query keywords against the server. We implement

our scheme on a modern workstation, and experimental results

demonstrate its suitability for practical usage.

15

In this paper, we address the problem of authorized

private keyword searches over encrypted data in cloud

computing, where multiple data owners encrypt their records

along with a keyword index to allow searches by multiple

users. To limit the exposure of sensitive information due to

unrestricted query capabilities, we propose a scalable, fine-

grained authorization framework where users obtain their

search capabilities from local trusted authorities according

to their attributes. We then propose two novel solutions for

APKS over encrypted data based on HPE, where in the first one

we enhance the search efficiency using attribute hierarchy,

and in the second we enhance the query privacy via the help

of proxy servers. Our solutions also support efficient multi-

dimensional range query, search capability delegation and

revocation. In addition, we implement our solution on a

modern workstation; the results show that APKS achieves

reasonable search performance.

Fully Homomorphic Encryption Using Ideal Lattice

We propose a fully homomorphic encryption scheme – i.e.,

a scheme that allows one to evaluate circuits over encrypted

data without being able to decrypt. Our solution comes

in three steps. First, we provide a general result – that, to

construct an encryption scheme that permits evaluation of16

arbitrary circuits, it suffices to construct an encryption

scheme that can evaluate (slightly augmented versions of) its

own decryption circuit; we call a scheme that can evaluate

its (augmented) decryption circuit bootstrappable. Next, we

describe a public key encryption scheme using ideal lattices

that is almost bootstrappable. Lattice-based cryptosystems

typically have decryption algorithms with low circuit

complexity, often dominated by an inner product computation

that is in NC1. Also, ideal lattices provide both additive

and multiplicative homomorphisms (modulo a public-key ideal

in a polynomial ring that is represented as a lattice), as

needed to evaluate general circuits. Unfortunately, our

initial scheme is not quite bootstrappable – i.e., the depth

that the scheme can correctly evaluate can be logarithmic in

the lattice dimension, just like the depth of the decryption

circuit, but the latter is greater than the former. In the

final step, we show how to modify the scheme to reduce the

depth of the decryption circuit, and thereby obtain a

bootstrappable encryption scheme, without reducing the depth

that the scheme can evaluate. Abstractly, we accomplish this

by enabling the encrypter to start the decryption process,

leaving less work for the decrypter, much like the server

leaves less work for the decrypter in a server-aided

cryptosystem.

17

Privacy-Preserving Data Publishing: A Survey of

Recent Development

The collection of digital information by governments,

corporations, and individuals has created tremendous

opportunities for knowledge- and information-based decision

making. Driven by mutual benefits, or by regulations that

require certain data to be published, there is a demand for

the exchange and publication of data among various parties.

Data in its original form, however, typically contains

sensitive information about individuals, and publishing such

data will violate individual privacy. The current practice in

data publishing relies mainly on policies and guidelines as

to what types of data can be published and on agreements on

the use of published data. This approach alone may lead to

excessive data distortion or insufficient protection.

Privacy-preserving data publishing (PPDP) provides methods

and tools for publishing useful information while preserving

data privacy. Recently, PPDP has received considerable

attention in research communities, and many approaches have

been proposed for different data publishing scenarios. In

this survey, we will systematically summarize and evaluate

18

different approaches to PPDP, study the challenges in

practical data publishing, clarify the differences and

requirements that distinguish PPDP from other related

problems, and propose future research directions.

An Upper-Bound Control Approach for Cost-Effective

Private Protection of Intermediate Data Set

Storage in Cloud

Along with more and more data intensive applications

have been migrated into cloud environments, storing some

valuable intermediate datasets has been accommodated in order

to avoid the high cost of re-computing them. However, this

poses a risk on data privacy protection because malicious

parties may deduce the private information of the parent

dataset or original dataset by analyzing some of those stored

intermediate datasets. The traditional way for addressing

this issue is to encrypt all of those stored datasets so that

they can be hidden. We argue that this is neither efficient

nor cost-effective because it is not necessary to encrypt ALL

of those datasets and encryption of all large amounts of

datasets can be very costly. In this paper, we propose a new

approach to identify which stored datasets need to be

encrypted and which not. Through intensive analysis of

information theory, our approach designs an upper bound on

privacy measure. As long as the overall mixed information

amount of some stored datasets is no more than that upper

bound, those datasets do not need to be encrypted while

19

privacy can still be protected. A tree model is leveraged to

analyze privacy disclosure of datasets, and privacy

requirements are decomposed and satisfied layer by layer.

With a heuristic implementation of this approach, evaluation

results demonstrate that the cost for encrypting intermediate

datasets decreases significantly compared with the

traditional approach while the privacy protection of parent

or original dataset is guaranteed

Toward Data Confidentiality in Storage-Intensive

Cloud Applications

By offering high availability and elastic access to

resources, third-party cloud infrastructures such as Amazon

EC2 are revolutionizing the way today's businesses operate.

Unfortunately, taking advantage of their benefits requires

businesses to accept a number of serious risks to data

security. Factors such as software bugs, operator errors and

external attacks can all compromise the confidentiality of

sensitive application data on external clouds, by making them

vulnerable to unauthorized access by malicious parties.

In this paper, we study and seek to improve the

confidentiality of application data stored on third-party

computing clouds. We propose to identify and encrypt all

functionally encryptable data, sensitive data that can be encrypted

without limiting the functionality of the application on the

cloud. Such data would be stored on the cloud only in an

20

encrypted form, accessible only to users with the correct

keys, thus protecting its confidentiality against

unintentional errors and attacks alike. We describe Silverline, a

set of tools that automatically 1) identify all functionally

encryptable data in a cloud application, 2) assign encryption

keys to specific data subsets to minimize key management

complexity while ensuring robustness to key compromise, and

3) provide transparent data access at the user device while

preventing key compromise even from malicious clouds. Through

experiments with real applications, we find that many web

applications are dominated by storage and data sharing components

that do not require interpreting raw data. Thus, Silverline

can protect the vast majority of data on these applications,

simplify key management, and protect against key compromise.

Together, our techniques provide a substantial first step

towards simplifying the complex process of incorporating data

confidentiality into these storage-intensive cloud

applications.

 

21

CHAPTER-3

SYSTE

M ANALYSIS

CHAPTER-3

SYSTEM ANALYSIS22

3.1. Introduction

The Systems Development Life Cycle (SDLC), or Software

Development Life Cycle in systems engineering, information

systems and software engineering, is the process of creating

or altering systems, and the models and methodologies that

people use to develop these systems. In software engineering

the SDLC concept underpins many kinds of software development

methodologies. These methodologies form the framework for

planning and controlling the creation of an information

system the software development process.

3.2. Existing System

Existing technical approaches for preserving the privacy of

datasets stored in cloud mainly include encryption and

anonymization. On one hand, encrypting all datasets, a

straightforward and effective approach, is widely adopted in

current research. However, processing on encrypted datasets

efficiently is quite a challenging task, because most existing

applications only run on unencrypted datasets. However, preserving

the privacy of intermediate datasets becomes a challenging problem

because adversaries may recover privacy-sensitive information by

analyzing multiple intermediate datasets. Encrypting ALL datasets

in cloud is widely adopted in existing approaches to address this

challenge. But we argue that encrypting all intermediate datasets

are neither efficient nor cost-effective because it is very time

consuming and costly for data-intensive applications to en/decrypt

datasets frequently while performing any operation on them.

3.3. Proposed System 

23

In this paper, we propose a novel approach to identify which

intermediate datasets need to be encrypted while others do not, in

order to satisfy privacy requirements given by data holders. A tree

structure is modeled from generation relationships of intermediate

datasets to analyze privacy propagation of datasets. As quantifying

joint privacy leakage of multiple datasets efficiently is

challenging, we exploit an upper-bound constraint to confine

privacy disclosure. Based on such a constraint, we model the

problem of saving privacy-preserving cost as a con-strained

optimization problem. This problem is then di-vided into a series

of sub-problems by decomposing privacy leakage constraints.

Finally, we design a practical heuristic algorithm accordingly to

identify the datasets that need to be encrypted. Experimental

results on real-world and extensive datasets demonstrate that

privacy-preserving cost of intermediate datasets can be

significantly reduced with our approach over existing ones where

all datasets are encrypted.

System overview of U-Cloud:

24

3.4. Software Requirement Specification 

 3.4.1. Overall Description

A Software Requirements Specification (SRS) – a

requirements specification for a software system is a complete

description of the behavior of a system to be developed. It

includes a set of use cases that describe all the interactions

the users will have with the software. In addition to use

cases, the SRS also contains non-functional requirements.

Nonfunctional requirements are requirements which impose

constraints on the design or implementation (such as

performance engineering requirements, quality standards, or

design constraints).

System requirements specification: A structured

collection of information that embodies the requirements of a

25

system. A business analyst, sometimes titled system analyst, is

responsible for analyzing the business needs of their clients

and stakeholders to help identify business problems and propose

solutions. Within the systems development lifecycle domain, the

BA typically performs a liaison function between the business

side of an enterprise and the information technology department

or external service providers. Projects are subject to three

sorts of requirements:

Business requirements describe in business terms what

must be delivered or accomplished to provide value.

Product requirements describe properties of a system or

product (which could be one of several ways to accomplish a

set of business requirements.)

Process requirements describe activities performed by

the developing organization. For instance, process

requirements could specify .Preliminary investigation examine

project feasibility, the likelihood the system will be useful

to the organization. The main objective of the feasibility

study is to test the Technical, Operational and Economical

feasibility for adding new modules and debugging old running

system. All system is feasible if they are unlimited

resources and infinite time. There are aspects in the

feasibility study portion of the preliminary investigation:

ECONOMIC FEASIBILITY

A system can be developed technically and that will be

used if installed must still be a good investment for the

26

organization. In the economical feasibility, the development

cost in creating the system is evaluated against the ultimate

benefit derived from the new systems. Financial benefits must

equal or exceed the costs. The system is economically

feasible. It does not require any addition hardware or

software. Since the interface for this system is developed

using the existing resources and technologies available at

NIC, There is nominal expenditure and economical feasibility

for certain.

OPERATIONAL FEASIBILITY

Proposed projects are beneficial only if they can be

turned out into information system. That will meet the

organization’s operating requirements. Operational

feasibility aspects of the project are to be taken as an

important part of the project implementation. This system is

targeted to be in accordance with the above-mentioned issues.

Beforehand, the management issues and user requirements have

been taken into consideration. So there is no question of

resistance from the users that can undermine the possible

application benefits. The well-planned design would ensure

the optimal utilization of the computer resources and would

help in the improvement of performance status.

TECHNICAL FEASIBILITY

27

Earlier no system existed to cater to the needs of

‘Secure Infrastructure Implementation System’. The current

system developed is technically feasible. It is a web based

user interface for audit workflow at NIC-CSD. Thus it

provides an easy access to .the users. The database’s purpose

is to create, establish and maintain a workflow among various

entities in order to facilitate all concerned users in their

various capacities or roles. Permission to the users would be

granted based on the roles specified. Therefore, it provides

the technical guarantee of accuracy, reliability and

security.

 3.4.2. External Interface Requirements

User Interface

The user interface of this system is a user friendly

Java Graphical User Interface.

Hardware Interfaces

The interaction between the user and the console is

achieved through Java capabilities.

Software Interfaces

The required software is JAVA1.6.

Operating Environment

Windows XP, Linux.

HARDWARE REQUIREMENTS:

Hardware : Pentium

28

Speed : 1.1 GHz

RAM : 1GB

Hard Disk : 20 GB

Floppy Drive : 1.44 MB

Key Board : Standard Windows

Keyboard

Mouse : Two or Three

Button Mouse

Monitor : SVGA

SOFTWARE REQUIREMENTS:

Operating System : Windows

Technology : J2SE

IDE : My Eclipse

Database : My SQL

Java Version : J2SDK1.6

29

CHAPTER-4

SYSTEM DESIGN

30

CHAPTER-4

SYSTEM DESIGN

Unified Modeling Language:

The Unified Modeling Language allows the software

engineer to express an analysis model using the modeling

notation that is governed by a set of syntactic semantic and

pragmatic rules.

A UML system is represented using five different views

that describe the system from distinctly different

perspective. Each view is defined by a set of diagram, which

is as follows.

User Model View

i. This view represents the system from the users

perspective.

ii. The analysis representation describes a usage

scenario from the end-users perspective.

31

Structural model view

i. In this model the data and functionality are

arrived from inside the system.

ii. This model view models the static structures.

Behavioral Model View

It represents the dynamic of behavioral as parts of

the system, depicting the interactions of

collection between various structural elements

described in the user model and structural model

view.

Implementation Model View

In this the structural and behavioral as parts of

the system are represented as they are to be built.

Environmental Model View

In this the structural and behavioral aspects of

the environment in which the system is to be

implemented are represented.

UML is specifically constructed through two different domains

they are:

UML Analysis modeling, this focuses on the user model

and structural model views of the system.

UML design modeling, which focuses on the behavioral

modeling, implementation modeling and environmental

model views.

Use case Diagrams represent the functionality of the system

from a user’s point of view. Use cases are used during

32

requirements elicitation and analysis to represent the

functionality of the system. Use cases focus on the behavior

of the system from external point of view.

Actors are external entities that interact with the system.

Examples of actors include users like administrator, bank

customer …etc., or another system like central database.

1. UML diagrams

Class diagram:-

The class diagram is the main building block of object

oriented modelling. It is used both for general conceptual

modeling of the systematics of the application, and for

detailed modelling translating the models into programming

code . Class diagrams can also be used for data modeling. The

classes in a class diagram represent both the main objects,

interactions in the application and the classes to be

programmed.

A class with three sections.

In the diagram, classes are represented with boxes which

contain three parts:

The upper part holds the name of the class

The middle part contains the attributes of the class

The bottom part gives the methods or operations the

class can take or undertake

33

Class diagram:

PrivacyLeakageSocket : socketString : addressString : occupationString : ageString : gender

uploadDataset()accessDataset()privacyPreservingalgorithm ()logout()

UploaddatasetString : ageString : genderString : occString : address

generateAlgorithm ()loadtoCloud()clear()

Adm inString : usernam eString : password

login()logout()reset()register()

UserString : usernam eString : password

register()login()logout()reset()

AccessDataSetString : occupationString : ageString : addressString : gender

subm it()search()

Use case diagram:-

A use case diagram at its simplest is a representation

of a user's interaction with the system and depicting the

34

specifications of a use case. A use case diagram can portray

the different types of users of a system and the various ways

that they interact with the system. This type of diagram is

typically used in conjunction with the textual use case and

will often be accompanied by other types of diagrams as well.

Use case diagram for Admin:

35

login

upload dataset

generate algorithm

access dataset

load to cloud

clear

Adm in

logout

Use case diagram for User:

36

register

login

search

access dataset

User

logout

Sequence Diagram:-

A sequence diagram is a kind of interaction diagram that

shows how processes operate with one another and in what

order. It is a construct of a Message Sequence Chart. A

sequence diagram shows object interactions arranged in time

sequence. It depicts the objects and classes involved in the

scenario and the sequence of messages exchanged between the

objects needed to carry out the functionality of the

scenario. Sequence diagrams are typically associated with use

37

case realizations in the Logical View of the system under

development. Sequence diagrams are sometimes called event

diagrams, event scenarios, and timing diagrams.

Sequence diagram for Admin:

Adm in Hom e PrivacyLeakage Cloud Logout

login

upload dataset

generate algorithm

load to cloud

preserve the interm ediate dataset

access the dataset

logout

38

Sequence diagram for User:

User Hom e Cloud Logout

register

login

search string

view dataset

logout

Collaborative diagram:-

A collaboration diagram describes interactions

among objects in terms of sequenced messages.

Collaboration diagrams represent a combination of39

information taken from class, sequence, and use case

diagrams describing both the static structure and

dynamic behavior of a system.

Collaboration diagram for Admin:

PrivacyLeakage

Adm in Hom e

CloudLogout

1: login2: upload dataset

3: generate algorithm

4: load to cloud5: preserve the interm ediate dataset

6: access the dataset

7: logout

Collaboration diagram for User:

40

User Hom e

CloudLogout

1: register2: login

3: search string4: view dataset

5: logout

Component Diagram

In the Unified Modeling Language, a component diagram

depicts how components are wired together to form larger

components and or software systems. They are used to

illustrate the structure of arbitrarily complex systems

Components are wired together by using an assembly connector to

connect the required interface of one component with the

provided interface of another component. This illustrates the

service consumer - service provider relationship between the two

components.

Component diagram for Admin:

41

access dataset

Adm in

upload dataset

generate algorithm

load to cloud

Collaboration diagram for User:

42

User

login

access dataset

logout

register

Deployment Diagram

A deployment diagram in the Unified Modeling Language

models the physical deployment of artifacts on nodes.[1] To

describe a web site, for example, a deployment diagram would

show what hardware components ("nodes") exist (e.g., a web

server, an application server, and a database server), what

software components ("artifacts") run on each node (e.g., web

application, database), and how the different pieces are

connected (e.g. JDBC, REST, RMI).

The nodes appear as boxes, and the artifacts allocated

to each node appear as rectangles within the boxes. Nodes may

have sub nodes, which appear as nested boxes. A single node

in a deployment diagram may conceptually represent multiple

physical nodes, such as a cluster of database servers.

43

Adm in

upload dataset

generate algorithm

load to cloud

access dataset

User

Activity diagram:

Activity diagram is another important diagram in UML to

describe dynamic aspects of the system. It is basically a

flow chart to represent the flow form one activity to another

activity. The activity can be described as an operation of

the system.So the control flow is drawn from one operation to

another. This flow can be sequential, branched or concurrent.

44

Activity diagram:

Yes No

45

Login

Admin??

Uploaddataset

Search string

Access data setGenerate

algorithm

Load to cloud

Access data set

CHAPTER-5

IMPLEMENTATION

46

CHAPTER-5

IMPLEMENTATION

Number of Modules

After careful analysis the system has been identified to havethe following modules:

1.Data Storage Privacy Module.

2.Privacy Preserving Module.

3.Intermediate Dataset Module.

4.Privacy Upper Bound Module.

47

1.Data Storage Privacy Module:

The privacy concerns caused by retaining intermediate

datasets in cloud are important but they are paid little

attention. A motivating scenario is illustrated where an on-

line health service provider, e.g., Microsoft Health Vault

has moved data storage into cloud for economical benefits.

Original datasets are encrypted for confidentiality. Data

users like governments or research centers access or process

part of original datasets after anonymization. Intermediate

datasets generated during data access or process are retained

for data reuse and cost saving. We proposed an approach that

combines encryption and data fragmentation to achieve privacy

protection for distributed data storage with encrypting only

part of datasets.

2. Privacy Preserving Module :

Privacy-preserving techniques like generalization can

with-stand most privacy attacks on one single dataset, while

preserving privacy for multiple datasets is still a

challenging problem. Thus, for preserving privacy of multiple

datasets, it is promising to anonymize all datasets first and

then encrypt them before storing or sharing them in cloud.

Privacy-preserving cost of intermediate datasets stems from

frequent en/decryption with charged cloud services.

48

3. Intermediate Dataset Module :

An intermediate dataset is assumed to have been ano-

nymized to satisfy certain privacy requirements. However,

putting multiple datasets together may still invoke a high

risk of revealing privacy-sensitive information, resulting in

violating the privacy requirements. Data provenance is

employed to manage intermediate datasets in our research.

Provenance is commonly defined as the origin, source or

history of derivation of some objects and data, which can be

reckoned as the information upon how data was generated.

Reproducibility of data provenance can help to regenerate a

dataset from its nearest existing predecessor datasets rather

than from scratch

4. Privacy UpperBound Module:

Privacy quantification of a single data-set is stated.

We point out the challenge of privacy quantification of

multiple datasets and then derive a privacy leakage upper-

bound constraint correspondingly. We propose an upper-bound

constraint based approach to select the necessary subset of

intermediate datasets that needs to be encrypted for

minimizing privacy-preserving cost. The privacy leakage

upper-bound constraint is decomposed layer by layer.

5.3. Introduction of technologies used

49

About Java :

Initially the language was called as “oak” but it was

renamed as “java” in 1995.The primary motivation of this

language was the need for a platform-independent(i.e.

architecture neutral)language that could be used to create

software to be embedded in various consumer electronic

devices.

Java is a programmer’s language

Java is cohesive and consistent

Except for those constraint imposed by the Internet

environment. Java gives the programmer, full

control

Finally Java is to Internet Programming where c was to System

Programming.

Importance of Java to the Internet

Java has had a profound effect on the Internet. This is

because; java expands the Universe of objects that can move

about freely in Cyberspace. In a network, two categories of

objects are transmitted between the server and the personal

computer. They are passive information and Dynamic active

programs. in the areas of Security and probability. But Java

addresses these concerns and by doing so, has opened the door

to an exciting new form of program called the Applet.

Applications and applets.

50

An application is a program that runs on our Computer

under the operating system of that computer. It is more or

less like one creating using C or C++ .Java’s ability to

create Applets makes it important. An Applet an application,

designed to be transmitted over the Internet and executed by

a Java-compatible web browser. An applet I actually a tiny

Java program, dynamically downloaded across the network, just

like an image. But the difference is, it is an intelligent

program, not just a media file. It can be react to the user

input and dynamically change.

Java Architecture

Java architecture provides a portable, robust, high

performing environment for development. Java provides

portability by compiling the byte codes for the Java Virtual

Machine, which is then interpreted on each platform by the

run-time environment. Java is a dynamic system, able to load

code when needed from a machine in the same room or across

the planet.

51

Source code

Pc compiler

Macintosh compiler

SPARC Compiler

Java Byte code

Platform independe

nt

Java interpreter

Java interpreterm

acintosh

)))

Java interpreter(

SPARC)

Compilation of code

When you compile the code, the Java compiler creates

machine code (called byte code)for a hypothetical machine

called Java Virtual Machine(JVM). The JVM is supposed t

executed the byte code. The JVM is created for the overcoming

the issue of probability. The code is written and compiled

for one machine and interpreted on all machines .This machine

is called Java Virtual Machine.

Compiling and interpreting java source code.

During run-time the Java interpreter tricks the byte code

file into thinking that it is running on a Java Virtual

Machine. In reality this could be an Intel Pentium windows 95

or sun SPARCstation running Solaris or Apple Macintosh

running system and all could receive code from any computer

through internet and run the Applets.52

Simple :

Java was designed to be easy for the Professional

programmer to learn and to use effectively. If you are an

experienced C++ Programmer Learning Java will oriented

features of C++. Most of the confusing concepts from C++ are

either left out of Java or implemented in a cleaner, more

approachable manner. In Java there are a small number of

clearly defined ways to accomplish a given task.

Object oriented

Java was not designed to be source-code compatible with any

other language. This allowed the Java team the freedom to

design with a blank state. One outcome of this was a clean

usable, pragmatic approach to objects. The object model in

Java is simple and easy to extend, while simple types, such

as integers, are kept as high-performance non-objects.

Robust

The multi-platform environment of the web places

extraordinary demands on a program, because the program must

execute reliably in a variety of systems. The ability to

create robust programs was given a high priority in the

design of Java. Java is strictly typed language; it checks

your code at compile time and runtime. Java virtually53

eliminates the problems of memory management and deal

location, which is completely automatic. In a well-written

Java program, all run-time errors can and should be managed

by your program.

Servlets/JSPA Servlet Is a generic server extension. a Java class that can be loaded

Dynamically to expand the functionality of a server. Servlets

are commonly used with web servers. Where they can take the

place CGI scripts. A servlet is similar to proprietary server

extension, except that it runs inside a Java Virtual

Machine(JVM) on the server, so it is safe and portable

Servlets operate solely within the domain of the server.

Unlike CGI and Fast CGI, which use multiple processes to

handle separate program or separate requests, separate

threads within web server process handle all servlets. This

means that servlets are all efficient and scalable. Servlets

are portable; both across operating systems and also across

web servers. Java Servlets offer the best possible platform

for web application development.Servlets are used as

replacement for CGI scripts on a web server, they can extend

any sort of server such as a mail server that allows servelts

t extend its functionality perhaps by performing a virus scan

on all attached documents or handling mail filtering tasks.

Servlets provide a Java-based solution used to address the

problems currently associated with doing server-side

programming including inextensible scripting solutions

platform-specific API’s and incomplete interface.54

Servlets are objects that conform to a specific interface

that can be plugged into a Java-based server. Servlets are to

the server-side what applets are to the server-side what

applets are to the client-side-object byte codes that can be

dynamically loaded off the net. They differ from applets in

than they are faceless objects(without graphics or a GUI

component).They serve as platform independent, dynamically

loadable, pluggable helper byte code objects on the server

side that can be used to dynamically extend server-side

functionality.

For example an HTTP servlet can be used to generate

dynamic HTML content when you use servlets to do dynamic

content you get the following advantages:

They’re faster and cleaner then CGI scripts

They use a standard API( the servlet API)

They provide all the advantages of Java (run on a

variety of servers without needing to be rewritten)

Attractiveness of Servlets:

They are many features of servlets that make them easy

and attractive to tuse these include:

Easily configure using the GUI-based Admin tool]

Can be Loaded and Invoked from a local disk or

remotely across the network.

Can be linked together or chained, so that on

servlet can call another servlet, or several

servlets in sequence.55

Can be called dynamically from within HTML, pages

using server-side include-tags.

Are secure-even when downloading across the

network, the servlet security model and servlet and

box protect your system from unfriendly behavior.

Advantages of the servlet API

One of the great advantages of the servlet API is protocol

independent. It assumes nothing about:

The protocol being used to transmit on the net

How it is loaded

The server environment it will be running in

These quantities are important, because it allows

the Servlet API to be embedded in many different

kinds of servers. There are other advantages to the

servlet API as well These include:

It’s extensible-you can inherit all your

functionality from the base classes made available

to you

It’s simple small, and easy to use.

Features of Servlets:

Servlets are persistent. Servlet are loaded only by

the web server and can maintain services between

requests.

56

Servlets are fast. Since servlets only need to be

l\loaded once, they offer much better performance

over their CGI counterparts.

Servlets are platform independent.

Servlets are extensible Java is a robust, object-

oriented programming language, which easily can be

extended to suit your needs.

Servlets are secure

Servlets are used with a variety of client.

Servlets are classes and interfaces from tow packages,

javax .servlet and javax.servlet.http. The java.servlet

package contains classes support generic, protocol-

independent servlets.The classes in the javax.servelt.http

package To and HTTP specific functionality extend these

classes

Every servlet must implement the javax.servelt interface.

Most servlets implement it by extending one of two classes.

javax.servlet.GenericServlet or

javax.servlet.http.HttpServlet. A protocol-independent

servlet should subclass Generic-Servlet. While an Http

servlet should subclass HttpServlet, which is itself a

subclass of Generic-servlet with added HTTP-specific

functionality.

Unlike a java program, a servlet does not have a main()

method, Instead the server in the process of handling

requests invoke certain methods of a servlet. Each time the

57

server dispatches a request to a servlet, it invokes the

servelts Service() method,

A generic servlet should override its service() method to

handle requests as appropriate for the servlet. The service()

accepts two parameters a request object and a response object

.The request object tells the servlet about the request,

while the response object is used to return a response

In Contrast, an Http servlet usually does not override the

service() method. Instead it overrides doGet() to handle GET

requests and doPost() to handle Post requests. An Http

servlet can override either or both of these modules the

service() method of HttpServlet handles the setup and

dispatching to all the doXXX() methods. which is why it

usually should not be overridden

The remainders in the javax.servlet and

javax.servlet.http.package are largely support classes .The

ServletRequest and ServletResponse classes in javax.servlet

provide access to generic server requests and responses while

HttpServletRequest and HttpServletResponse classes in

javax.servlet provide access to generic server requests and

responses while HttpServletRequest and HttpServletResponse in

javax.servlet.http provide access a HTTP requests and

responses. The javax.servlet.http provide contains an

HttpSession class that provides built-in session tracking

functionality and Cookie class that allows quickly setup and

processing HttpCookies.

Loading Servlets:58

Servlets can be loaded from their places. From a

directory that is on the CLASSPATH. The CLASSPATH of the

JavaWebServer includes service root/classes/, which is where

the system classes reside from the

<SERVICE_ROOT/servlets/directory. This is not in the server’s

classpath. A class loader is used to create servlets form

this directory. New servlets can be added-existing servlets

can be recompiled and the server will notice these changes.

From a remote location. For this a code base like

http://nine.eng/classes/foo/ is required in addtion to the

servlet’s class name. Refer to the admin Gui docs on servlet

section to see how to set this up.Loading Remote Servlets

Remote servlets can be loaded by

Configuring the admin Tool to setup automatic

loading of remote servlets.

Selecting up server side include tags in .html

files

Defining a filter chain Configuration

Invoking Servlets

A servlet invoker is a servlet that invokes the “server”

method on a named servlet. If the servlet is not loaded in

the server, then the invoker first loads the servlet(either

form local disk or from the network) and the then invokes the

“service” method. Also like applets, local servlets in the

server can be identified by just the class name. In other

words, if a servlet name is not absolute. it is treated as

local.59

A Client can Invoke Servlets in the Following Ways:

The client can ask for a document that is served by

the servlet.

The client(browser) can invoke the servlet directly

using a URL, once it has been mapped using the

SERVLET ALIASES Section of the admin GUI

The servlet can be invoked through server side

include tags.

The servlet can be invoked by placing it in the

servlets/directory

The servlet can be invoked by using it in a filter

chain

The Servlet Life Cycle:-

The Servlet life cycle is one of the most exciting features

of Servlets. This life cycle is a powerful hybrid of the life

cycles used in CGI programming and lower-level NSAPI and

ISAPI programming. The servlet life cycle allows servlet

engines to address both the performance and resource problems

of CGI and the security concerns of low level server API

programming. Servlet life cycle is highly flexible Servers

have significant leeway in how they choose to support

servlets.The only hard and fast rule is that a servlet engine

must confor to the following life cycle contact:

Create and initialize the servlets

Handle zero or more service from clients

Destroy the servlet and then garbage Collects it.

60

It’s perfectly legal for a servlet t be loaded, created

an initialized in its own JVM, only to be destroyed and

garbage collected without handling any client request or

after handling just one request The most common and most

sensible life cycle implementations for HTTP servelts are:

Single java virtual machine and astatine persistence.

Init and Destroy:-

Just like Applets servlets can define init() and

destroy() methods, A servlets init(ServiceConfig) method is

called by the server immediately after the server constructs

the servlet’s instance. Depending on the server and its

configuration, this can be at any of these times

When the server states

When the servlet is first requested, just before

the service() method is invoked

At the request of the server administrator

In any case, nit() is guaranteed to be called before the

servlet handles its first requestThe init() method is

typically used to perform servlet initialization creating or

loading objects that are used by the servlet in handling of

its request. In order to providing a new servlet any

information about itself and its environment, a server has to

call a servelts init() method and pass an object that

implement the ServletConfig interface.This ServletConfig

61

object supplies a servlet with information about its

initialization parameters. These parameters are given to the

servlets and are not associated with any single request. They

can specify initial values, such as where a counter should

begin counting, or default values, perhaps a template to use

when not specified by the request, The server calls a

servlet’s destroy() method when the servlet is about to be

unloaded. In the destroy() method, a servlet should free any

resources it has acquired that will not be garbage collected.

The destroy() method also gives a servlet a chance to write

out its unsaved. cached information or any persistent

information that should be read during the next call to

init().

Session Tracking:

HTTP is a stateless protocol, it provides no way for a

server to recognize that a sequence of requests is all from

the same client. This causes a problem for application such

as shopping cart applications. Even in chat application

server can’t know exactly who’s making a request of several

clients. The solution for this is for client to introduce

itself as it makes each request, Each client needs to provide

a unique identifier that lets the server identify it, or it

needs to give some information that the server can use to

properly handle the request, There are several ways to send

this introductory information with each request Such as:

USER Authorization:62

One way to perform session tracking is to leverage the

information that comes with

User authorization. When a web server restricts access to

some of its resources to only those clients that log in using

a recognized username and password. After the client logs in,

the username is available to a servlet through getRemoteUser

()

When use the username to track the session. Once a user has

logged in, the browser remembers her username and resends the

name and password as the user views new pages on the site. A

servlet can identify the user through her username and

they’re byTrack her session.

The biggest advantage of using user authorization to

perform session tracking is that it’s easy to implement.

Simply tell the protect a set of pages, and use

getRemoteUser() to identify each client. Another advantage is

that the technique works even when the user accesses your

site form or exists her browser before coming back.

The biggest disadvantage of user authorization is that it

requires each user to register for an account and then log in

each time the starts visiting your site. Most users will

tolerate registering and lagging in as a necessary evil when

they are accessing sensitive information, but it all overkill

for simple session tracking. Other problem with user

authorization is that a user cannot simultaneously maintain

more than one session at the same site.

Hidden Form Fields:63

One way to support anonymous session tracking is to use

hidden from fields. As the name implies, these are fields

added to an HTML, form that are not displayed in the client’s

browser, They are sent back to the server when the form that

contains them is submitted.In a sense, hidden form fields

define constant variables for a form. To a servlet receiving

a submitted form, there is no difference between a hidden

fields and a visible filed.

As more and more information is associated with a

clients session. It can become burdensome to pass it all

using hidden form fields. In these situations it’s possible

to pass on just a unique session ID that identifies as

particular clients session.

That session ID can be associated with complete

information about its session that is stored on the server.

The advantage of hidden form fields is their ubiquity and

support for anonymity. Hidden fields are supported in all the

popular browsers, they demand on special server requirements,

and they can be used with clients that haven’t registered or

logged in.

The major disadvantage with this technique, however is that

works only for a sequence of dynamically generated forms, The

technique breaks down immediately with static documents,

emailed documents book marked documents and browser

shutdowns.

URL Rewriting:

64

URL rewriting is another way to support anonymous session

tracking, With URL rewriting every local URL the user might

click on is dynamically modified. or rewritten, to include

extra, information. The extra information can be in the

deform of extra path information, added parameters, or some

custom, server-specific.URL change. Due to the limited space

available in rewriting a URL, the extra information is

usually limited to a unique session.Each rewriting technique

has its own advantage and disadvantage Using extra path

information works on all servers, and it works as a target

for forms that use both the Get and Post methods. It does not

work well if the servlet has to use the extra path

information as true path information The advantages and

disadvantages of URL rewriting closely match those of hidden

form fields, The major difference is that URL rewriting works

for all dynamically created documents, such as the Help

servlet, not just forms. With the right server support,

custom URL rewriting can even work for static documents.

Persistent Cookies:

A fourth technique to perform session tracking involves

persistent cookies. A cookie is a bit of information. sent by

a web server to a browser that can later be read back form

that browser. When a browser receives a cookie, it saves the

cookie and there after sends the cookie back to the server

each time it accesses a page on that server, subject to

certain rules. Because a cookie’s value can uniquely identify

a client, cookies are often used for session

tracking.Persistent cookies offer an elegant, efficient easy

65

way to implement session tracking. Cookies provide as

automatic an introduction for each request as we could hope

for. For each request, a cookie can automatically provide a

client’s session ID or perhaps a list of clients performance.

The ability to customize cookies gives them extra power and

versatility.

The biggest problem with cookies is that browsers don’t

always accept cookies sometimes this is because the browser

doesn’t support cookies. More often it’s because the browser

doesn’t support cookies. More often it’s because the user has

specifically configured the browser to refuse cookies.

The power of serves:

The power of servlets is nothing but the advantages of

servlets over other approaches, which include portability,

power, efficiency, endurance, safety elegance, integration,

extensibility and flexibility.

Portability:

As servlets are written in java and conform to a well

defined and widely accepted API. they are highly portable

across operating systems and across server implementation

We can develop a servlet on a windows NT machine running the

java web server and later deploy it effortlessly on a high-

end Unix server running apache. With servlets we can really

“write once, serve everywhere” Servlet portability is not the

stumbling block it so often is with applets, for two reasons

66

First, Servlet portability is not mandatory i.e. servlets has

to work only on server machines that we are using for

development and deployment

Second, servlets avoid the most error-prone and inconstancy

implemented portion of the java languages.

Power:

Servlets can harness the full power of the core java.

API’s: such as Networking and Url access, multithreading,

image manipulation, data compression, data base

connectivity, internationalization, remote method

invocation(RMI) CORBA connectivity, and object serialization,

among others,

Efficiency And Endurance:

Servlet invocation is highly efficient, Once a servlet is

loaded it generally remains in the server’s memory as a

single object instance, There after the server invokes the

servelt to handle a request using a simple, light weighted

method invocation .Unlike the CGI, there’s no process to

spawn or interpreter to invoke, so the servlet can begin

handling the request almost immediately, Multiple, concurrent

requests are handled the request almost immediately.

Multiple, concurrent requests are handled by separate

threads, so servlets are highly scalable. Servlets in general

are enduring objects. Because a servlets stays in the

server’s memory as a single object instance. it automatically

67

maintains its state and can hold onto external resources,

such as database connections.

Safety:

Servlets support safe programming practices on a number of

levels.As they are written in java, servlets inherit the

strong type safety of the java language. In addition the

servlet API is implemented to be type safe. Java’s automatic

garbage collection and lack of pointers mean that servlets

are generally safe from memory management problems like

dangling pointers invalid pointer references and memory

leaks. Servlets can handle errors safely, due to java’s

exception – handling mechanism. If a servlet divides by zero

or performs some illegal operations, it throws an exception

that can be safely caught and handled by the server.A server

can further protect itself from servlets through the use of

java security manager. A server can execute its servlets

under the watch of a strict security manager.

Elegance:

The elegance of the servlet code is striking .Servlet code

is clean, object oriented modular and amazingly simple one

reason for this simplicity is the served API itself. Which

includes methods and classes to handle many of the routine

chores of servlet development. Even advanced to operations

like cookie handling and session tracking are abstracted int

convenient classes.

Integration:

68

Servlets are tightly integrated with the server. This

integration allows a servlet to cooperate with the server in

two ways . for e.g.: a servlet can use the server to

translate file paths, perform logging, check authorization,

perform MIME type mapping and in some cases even add users to

the server’s user database.

Extensibility and Flexibility:

The servlet API is designed to be easily extensible. As it

stands today the API includes classes that are optimized for

HTTP servlets. But later it can be extended and optimized for

another type of servlets. It is also possible that its

support for HTTP servlets could be further enhanced.

Servlets are also quite flexible, Sun also introduced

java server pages. which offer a way to write snippets of

servlet code directly with in a static HTML page using syntax

similar to Microsoft’s Active server pages(ASP)

JDBC

What is JDBC?

Any relational database. One can write a single program

using the JDBC API, and the JDBC is a Java Api for executing

SQL, Statements(As a point of interest JDBC is trademarked

name and is not an acronym; nevertheless, Jdbc is often

thought of as standing for Java Database Connectivity. It

consists of a set of classes and interfaces written in the

Java Programming language. JDBC provides a standard API for

69

tool/database developers and makes it possible to write

database applications using a pure Java APIUsing JDBC, it is

easy to send SQL statements to virtually program will be able

to send SQL .statements to the appropriate database. The

Combination of Java and JDBC lets a programmer writes it once

and run it anywhere.

What Does JDBC Do?

Simply put, JDBC makes it possible to do three things

o Establish a connection with a database

o Send SQL statements

o Process the results

o JDBC Driver Types

o The JDBC drivers that we are aware of this time fit into

one of four categories

o JDBC-ODBC Bridge plus ODBC driver

o Native-API party-java driver

o JDBC-Net pure java driver

o Native-protocol pure Java driver

An individual database system is accessed via a specific

JDBC driver that implements the java.sql.Driver interface.

Drivers exist for nearly all-popular RDBMS systems, through

few are available for free. Sun bundles a free JDBC-ODBC

bridge driver with the JDK to allow access to a standard

ODBC, data sources, such as a Microsoft Access database, Sun

advises against using the bridge driver for anything other

than development and very limited development.

70

JDBC drivers are available for most database platforms,

from a number of vendors and in a number of different

flavors. There are four driver categories

Type 01-JDBC-ODBC Bridge Driver

Type 01 drivers use a bridge technology to connect a

java client to an ODBC database service. Sun’s JDBC-ODBC

bridge is the most common type 01 driver. These drivers

implemented using native code.

Type 02-Native-API party-java Driver

Type 02 drivers wrap a thin layer of java around database-

specific native code libraries for Oracle databases, the

native code libraries might be based on the OCI(Oracle call

Interface) libraries, which were originally designed for c/c+

+ programmers, Because type-02 drivers are implemented using

native code. in some cases they have better performance than

their all-java counter parts. They add an element of risk,

however, because a defect in a driver’s native code section

can crash the entire server

Type 03-Net-Protocol All-Java Driver

Type 03 drivers communicate via a generic network protocol

to a piece of custom middleware. The middleware component

might use any type of driver to provide the actual database

access. These drivers are all java, which makes them useful

for applet deployment and safe for servlet deployment

Type-04-native-protocol All-java Driver

71

Type o4 drivers are the most direct of the lot. Written

entirely in java, Type 04 drivers understand database-

specific networking. protocols and can access the database

directly without any additional software

JDBC-ODBC Bridge

If possible use a Pure Java JDBC driver instead of the

Bridge and an ODBC driver. This completely eliminates the

client configuration required by ODBC. It also eliminates the

potential that the Java VM could be corrupted by an error in

the native code brought in by the Bridge(that is, the Bridge

native library, the ODBC driver manager library, library, the

ODBC driver library, and the database client library)

What is the JDBC-ODBE Bridge?

The JDBC-ODBC Bridge is a Jdbc driver, which

implements JDBC operations by translating them into ODBC

operations. To ODBC it appears as a normal application

program. The Bridge is implemented as the sun.jdbc.odbc Java

package and contains a native library used to access ODBC.

The Bridge is joint development of Intersolv and Java Soft.

72

73

CHAPTER-6 TESTING

CHAPTER-6

TESTING

Implementation and Testing:

74

Implementation is one of the most important tasks in

project is the phase in which one has to be cautions because

all the efforts undertaken during the project will be very

interactive. Implementation is the most crucial stage in

achieving successful system and giving the users confidence

that the new system is workable and effective. Each program

is tested individually at the time of development using the

sample data and has verified that these programs link

together in the way specified in the program specification.

The computer system and its environment are tested to the

satisfaction of the user.

Implementation

The implementation phase is less creative than system

design. It is primarily concerned with user training, and

file conversion. The system may be requiring extensive user

training. The initial parameters of the system should be

modifies as a result of a programming. A simple operating

procedure is provided so that the user can understand the

different functions clearly and quickly. The different

reports can be obtained either on the inkjet or dot matrix

printer, which is available at the disposal of the user. The

proposed system is very easy to implement. In general

implementation is used to mean the process of converting a

new or revised system design into an operational one.

Testing

Testing is the process where the test data is prepared

75

and is used for testing the modules individually and later

the validation given for the fields. Then the system testing

takes place which makes sure that all components of the

system property functions as a unit. The test data should be

chosen such that it passed through all possible condition.

Actually testing is the state of implementation which aimed

at ensuring that the system works accurately and efficiently

before the actual operation commence. The following is the

description of the testing strategies, which were carried out

during the testing period.

System Testing

Testing has become an integral part of any system or

project especially in the field of information technology.

The importance of testing is a method of justifying, if one

is ready to move further, be it to be check if one is capable

to with stand the rigors of a particular situation cannot be

underplayed and that is why testing before development is so

critical. When the software is developed before it is given

to user to user the software must be tested whether it is

solving the purpose for which it is developed. This testing

involves various types through which one can ensure the

software is reliable. The program was tested logically and

pattern of execution of the program for a set of data are

repeated. Thus the code was exhaustively checked for all

possible correct data and the outcomes were also checked.

76

Module Testing

To locate errors, each module is tested individually.

This enables us to detect error and correct it without

affecting any other modules. Whenever the program is not

satisfying the required function, it must be corrected to get

the required result. Thus all the modules are individually

tested from bottom up starting with the smallest and lowest

modules and proceeding to the next level. Each module in the

system is tested separately. For example the job

classification module is tested separately. This module is

tested with different job and its approximate execution time

and the result of the test is compared with the results that

are prepared manually. The comparison shows that the results

proposed system works efficiently than the existing system.

Each module in the system is tested separately. In this

system the resource classification and job scheduling modules

are tested separately and their corresponding results are

obtained which reduces the process waiting time.

Integration Testing

After the module testing, the integration testing is

applied. When linking the modules there may be chance for

errors to occur, these errors are corrected by using this

testing. In this system all modules are connected and tested.

The testing results are very correct. Thus the mapping of

jobs with resources is done correctly by the system.

77

Acceptance Testing

When that user fined no major problems with its

accuracy, the system passers through a final acceptance test.

This test confirms that the system needs the original goals,

objectives and requirements established during analysis

without actual execution which elimination wastage of time

and money acceptance tests on the shoulders of users and

management, it is finally acceptable and ready for the

operation.

78

79

CHAPTER-7

SCREENS

80

CHAPTER-7

SCREENS

Screen Shots

Cloud Server:

81

After running Client application:

82

Login as admin ( username : admin & password : admin )

Admin home screen:83

Admin will upload the dataset (click on upload dataset button then select p1.dat file).

After uploading the dataset:

84

Click on generalize algorithm (it will anonymize the required data, here age)

Admin will load this dataset into the cloud.

Click on load to cloud button to upload into cloud.

After uploading the dataset into the cloud:

85

Admin will also access the dataset.

To access the data set click on Access Dataset button.

Access dataset screen:

Search for the data based on our requirement:

Searching for United-Sates candidates data set, so it will generatea intermediate dataset of United-Sates candidates.

86

Click on Privacy Preserving algorithm button to preserve the intermediate dataset.

Run one more client application then register as a user.

User registration screen:

87

Login as the registered user:

User home screen:

88

User can access the dataset (click on Access Dataset button)

89

User viewing the data set of United-States peoples.

In the above result the occupation details of United-States candidates are encrypted because the same data has been accessed bymany numbers of candidates so it crossed certain heuristic value so it is encrypted.

90

The above table will show the details like searched string, access frequency, encryption status and heuristic values.

(in our application we are taking heuristic value(sigma) as 1000000, if hvaluecrosses 1000000 then only the dataset will be encrypted)

91

CHAPTER-8

CONCLUSION

CHAPTER-8

CONCLUSION

In this paper, we have proposed an approach that

identifies which part of intermediate data sets needs to be92

encrypted while the rest does not, in order to save the

privacy preserving cost. A tree structure has been modeled

from the generation relationships of intermediate data sets

to analyze privacy propagation among data sets. We have

modeled the problem of saving privacy-preserving cost as a

constrained optimization problem which is addressed by

decomposing the privacy leakage constraints. A practical

heuristic algorithm has been designed accordingly. Evaluation

results on real-world data sets and larger extensive data

sets have demonstrated the cost of preserving privacy in

cloud can be reduced significantly with our approach over

existing ones where all data sets are encrypted.

93

Bibliography

[1] M. Armbrust, A. Fox, R. Griffith, A.D. Joseph, R. Katz,

A. Konwinski, G. Lee, D. Patterson, A. Rabkin, I. Stoica, and

M. Zaharia, “A View of Cloud Computing,” Comm. ACM, vol. 53,

no. 4, pp. 50-58, 2010.

[2] R. Buyya, C.S. Yeo, S. Venugopal, J. Broberg, and I.

Brandic, “Cloud Computing and Emerging It Platforms: Vision,

Hype, and Reality for Delivering Computing as the Fifth

Utility,” Future Generation Computer Systems, vol. 25, no. 6,

pp. 599-616, 2009.

[3] L. Wang, J. Zhan, W. Shi, and Y. Liang, “In Cloud, Can

Scientific Communities Benefit from the Economies of Scale?,”

IEEE Trans. Parallel and Distributed Systems, vol. 23, no. 2,

pp. 296-303, Feb. 2012.

[4] H. Takabi, J.B.D. Joshi, and G. Ahn, “Security and

Privacy Challenges in Cloud Computing Environments,” IEEE

Security & Privacy, vol. 8, no. 6, pp. 24-31, Nov./Dec. 2010.

94

[5] D. Zissis and D. Lekkas, “Addressing Cloud Computing

Security Issues,” Future Generation Computer Systems, vol.

28, no. 3, pp. 583-592, 2011.

[6] D. Yuan, Y. Yang, X. Liu, and J. Chen, “On-Demand Minimum

Cost Benchmarking for Intermediate Data Set Storage in

Scientific Cloud Workflow Systems,” J. Parallel Distributed

Computing, vol. 71, no. 2, pp. 316-332, 2011.

[7] S.Y. Ko, I. Hoque, B. Cho, and I. Gupta, “Making Cloud

Intermediate Data Fault-Tolerant,” Proc. First ACM Symp.

Cloud Computing (SoCC ’10), pp. 181-192, 2010.

[8] H. Lin and W. Tzeng, “A Secure Erasure Code-Based Cloud

Storage System with Secure Data Forwarding,” IEEE Trans.

Parallel and Distributed Systems, vol. 23, no. 6, pp. 995-

1003, June 2012.

[9] N. Cao, C. Wang, M. Li, K. Ren, and W. Lou, “Privacy-

Preserving Multi-Keyword Ranked Search over Encrypted Cloud

Data,” Proc. IEEE INFOCOM ’11, pp. 829-837, 2011.

[10] M. Li, S. Yu, N. Cao, and W. Lou, “Authorized Private

Keyword Search over Encrypted Data in Cloud Computing,” Proc.

31st Int’l Conf. Distributed Computing Systems (ICDCS ’11),

pp. 383-392, 2011.

95