A Privacy Leakage Upper-Bound Constraint Based Approach for Cost-Effective Privacy Preserving of...
-
Upload
independent -
Category
Documents
-
view
1 -
download
0
Transcript of A Privacy Leakage Upper-Bound Constraint Based Approach for Cost-Effective Privacy Preserving of...
A Privacy Leakage Upper-Bound Constraint BasedApproach for Cost-Effective Privacy Preservingof Intermediate Datasets In cloud
CHAPTER-1
INTRODUCTION
Technically cloud computing is regarded as an ingenious
combination of a series of technologies, establishing a novel
business model by offering IT services and using economies
of scale. Participants in the business chain of cloud
computing can benefit from this novel model. Cloud customers
can save huge capital investment of IT infrastructure, and
concentrate on their own core business. Therefore, many
companies or organizations have been migrating or building
their business into cloud. However, numerous potential
customers are still hesitant to take advantage of cloud due
to security and privacy concerns. The privacy concerns caused
by retaining intermediate data sets in cloud are important
but they are paid little attention. Storage and computation
services in cloud are equivalent from an economical
perspective because they are charged in proportion to their
usage. Thus, cloud users can store valuable intermediate data
sets selectively when processing original data sets in data
intensive applications like medical diagnosis, in order to
curtail the overall expenses by avoiding frequent re
computation to obtain these data sets. Such scenarios are1
quite common because data users often reanalyze results,
conduct new analysis on intermediate data sets, or share some
intermediate results with others for collaboration. Without
loss of generality, the notion of intermediate data set
herein refers to intermediate and resultant data sets.
However, the storage of intermediate data enlarges attack
surfaces so that privacy requirements of data holders are at
risk of being violated. Usually, intermediate data sets in
cloud are accessed and processed by multiple parties, but
rarely controlled by original data set holders. This enables
an adversary to collect intermediate data sets together and
menace privacy-sensitive information from them, bringing
considerable economic loss or severe social reputation
impairment to data owners. But, little attention has been
paid to such a cloud-specific privacy issue.
1.1. PROCESS FLOW:
2
1.2. Objective of the Project
In this paper, we propose a novel upper-bound privacy
leakage constraint based approach to identify which
intermediate datasets need to be encrypted and which do not,
so that privacy-preserving cost can be saved while the
privacy requirements of data holders can still be satisfied.
Evaluation results demonstrate that the privacy-preserving
cost of intermediate datasets can be significantly reduced
with our approach over existing ones where all datasets are
encrypted.
3
CHAPTER-2
LITERATURE SURVEY
CHAPTER-2
LITERATURE SURVEY
4
View of Cloud Computing
Provided certain obstacles are overcome, we believe
Cloud Computing has the potential to transform a large part
of the IT industry, making software even more attractive as a
service and shaping the way IT hardware is designed and
purchased. Developers with innovative ideas for new
interactive Internet services no longer require the large
capital outlays in hardware to deploy their service or the
human expense to operate it. They need not be concerned about
over-provisioning for a service whose popularity does not
meet their predictions, thus wasting costly resources, or
under-provisioning for one that becomes wildly popular, thus
missing potential customers and revenue. Moreover, companies
with large batch-oriented tasks can get their results as
quickly as their programs can scale, since using 1000 servers
for one hour costs no more than using one server for 1000
hours. This elasticity of resources, without paying a premium
for large scale, is unprecedented in the history of IT. The
economies of scale of very large-scale datacenters combined
with ``pay-as-you-go'' resource usage has heralded the rise
of Cloud Computing. It is now attractive to deploy an
innovative new Internet service on a third party's Internet
Datacenter rather than your own infrastructure, and to
gracefully scale its resources as it grows or declines in
popularity and revenue. Expanding and shrinking daily in
response to normal diurnal patterns could lower costs even
5
further. Cloud Computing transfers the risks of over-
provisioning or under-provisioning to the Cloud Computing
provider, who mitigates that risk by statistical multiplexing
over a much larger set of users and who offers relatively low
prices due better utilization and from the economy of
purchasing at a larger scale. We define terms, present an
economic model that quantifies the key buy vs. pay-as-you-go
decision, offer a spectrum to classify Cloud Computing
providers, and give our view of the top 10 obstacles and
opportunities to the growth of Cloud Computing.
Cloud Computing and Emerging IT Platforms: Vision,
Hype, and
Reality for Delivering Computing as the 5th Utility
With the significant advances in Information and
Communications Technology (ICT) over the last half century,
there is an increasingly perceived vision that computing will
one day be the 5th utility (after water, electricity, gas,
and telephony). This computing utility, like all other four
existing utilities, will provide the basic level of computing
service that is considered essential to meet the everyday
needs of the general community. To deliver this vision, a
number of computing paradigms have been proposed, of which
the latest one is known as Cloud computing. Hence, in this
paper, we define Cloud computing and provide the architecture
6
for creating Clouds with market-oriented resource allocation
by leveraging technologies such as Virtual Machines (VMs). We
also provide insights on market-based resource management
strategies that encompass both customer-driven service
management and computational risk management to sustain
Service Level Agreement (SLA)-oriented resource allocation.
In addition, we reveal our early thoughts on interconnecting
Clouds for dynamically creating global Cloud exchanges and
markets. Then, we present some representative Cloud
platforms, especially those developed in industries along
with our current work towards realizing market-oriented
resource allocation of Clouds as realized in Aneka enterprise
Cloud technology. Furthermore, we highlight the difference
between High Performance Computing (HPC) workload and
Internet-based services workload. We also describe a meta-
negotiation infrastructure to establish global Cloud
exchanges and markets, and illustrate a case study of
harnessing ‘Storage Clouds’ for high performance content
delivery. Finally, we conclude with the need for convergence
of competing IT paradigms to deliver our 21st century vision.
Cloud computing is a new and promising paradigm
delivering IT services as computing utilities. As Clouds are
designed to provide services to external users, providers
need to be compensated for sharing their resources and
capabilities. In this paper, we have proposed architecture
for market-oriented allocation of resources within Clouds.
We have also presented a vision for the creation of global
Cloud exchange for trading services. Moreover, we have
discussed some representative platforms for Cloud computing7
covering the state-of-the-art. In particular, we have
presented various Cloud efforts in practice from the market-
oriented perspective to reveal its emerging potential for
the creation of third-party services to enable the successful
adoption of Cloud computing, such as meta-negotiation
infrastructure for global Cloud exchanges and provide high
performance content delivery via ‘Storage Clouds’. The
state-of-the-art Cloud technologies have limited support for
market-oriented resource management and they need to be
extended to support: negotiation of QoS between users and
providers to establish SLAs; mechanisms and algorithms for
allocation of VM resources to meet SLAs; and manage risks
associated with the violation of SLAs. Furthermore,
interaction protocols needs to be extended to support
interoperability between different
Cloud service providers In addition, we need programming
environments and tools that allow rapid creation of Cloud
applications. Data Centers are known to be expensive to
operate and they consume huge amounts of electric power. For
example, the Google data center consumes power as much as a
city such as San Francisco. As Clouds are emerging as next-
generation data centers and aim to support ubiquitous
service-oriented applications, it is important that they are
designed to be energy efficient to reduce both their power
bill and carbon footprint on the environment. To achieve this
at software systems level, we need to investigate new
techniques for allocation of resources to applications
depending on quality of service expectations of users and
service contracts established between consumers and providers8
[67]. As Cloud platforms become ubiquitous, we expect the
need for internetworking them to create market-oriented
global Cloud exchanges for trading services. Several
challenges need to be addressed to realize this vision. They
include: market-maker for bringing service providers and
consumers; market registry for publishing and discovering
Cloud service providers and their services; clearing houses
and brokers for mapping service requests to providers who can
meet QoS expectations; and payment management and accounting
infrastructure for trading services. Finally, we need to
address regulatory and legal issues, which go beyond
technical issues. Some of these issues are explored in
related paradigms such as Grids and service-oriented
computing systems. Hence, rather than competing, these past
developments need to be leveraged for advancing Cloud
computing. Also, Cloud computing and other related paradigms
need to converge so as to produce unified and interoperable
platforms for delivering IT services as the 5th utility to
individuals, organizations, and corporations.
Scientific Communities Benefit from the Economies of
Scale
9
The basic idea behind Cloud computing is that resource
providers offer elastic resources to end users. In this
paper, we intend to answer one key question to the success of
Cloud computing: in Cloud, can small or medium-scale
scientific computing communities benefit from the economies
of scale? Our research contributions are three-fold: first,
we propose an enhanced scientific public cloud model (ESP)
that encourages small or medium scale research organizations
rent elastic resources from a public cloud provider; second,
on a basis of the ESP model, we design and implement the
Dawning Cloud system that can consolidate heterogeneous
scientific workloads on a Cloud site; third, we propose an
innovative emulation methodology and perform a comprehensive
evaluation. We found that for two typical workloads: high
throughput computing (HTC) and many task computing (MTC),
Dawning Cloud saves the resource consumption maximally by
44.5% (HTC) and 72.6% (MTC) for service providers, and saves
the total resource consumption maximally by 47.3% for a
resource provider with respect to the previous two public
Cloud solutions. To this end, we conclude that for typical
workloads: HTC and MTC, Dawning Cloud can enable scientific
communities to benefit from the economies of scale of public
Clouds. Index Terms— Cloud Computing, Scientific
Communities, Economies of Scale, Many-Task Computing and High
Throughput Computing Traditionally, in scientific computing
communities (in short, scientific communities), many small-
or medium-scale organizations tend to purchase and build
dedicated cluster systems to provide computing services for
typical workloads. We call this usage model the dedicated10
system model. The dedicated system model prevails in
scientific communities, of which an organization owns a
small- or medium-scale dedicated cluster system and deploys
specific runtime environment software that is responsible for
managing resources and its workloads. A dedicated system is
definitely worthwhile [35] as such a system is under the
complete control of the principal investigators and can be
devoted entirely to the needs of their experiment. However,
there is a prominent shortcoming of the dedicated system
model: for peak loads, a dedicated cluster system cannot
provide enough resources, while lots of resources are idle
for light loads. Recently, as resource providers [2],
several pioneer computing companies are adopting the concept
of infrastructure as a service, among which, Amazon EC2
contributed to popularizing the infrastructure-as-a-service
paradigm [32]. A new term Cloud is used to describe this new
computing paradigm [4] [17] [24] [32]. In this paper, we
adopt the terminology from B. Sotomayoretal.’s paper [32] to
describe different types of clouds. Public clouds offer a
publicly accessible remote interface for the masses’ creating
and managing virtual machine instances within their
proprietary infrastructure.
In this paper, we have answered one key question to the
success of Cloud computing: In scientific communities, can
small- or medium-scale organizations benefit from the
economies of scale? Our contributions are four-fold: first,
we proposed a dynamic service provisioning (ESP) model in
Cloud computing. In the ESP model, a resource provider can
create specific runtime environments on demand for MTC or HTC11
service providers, while a service provider can resize
dynamic resources. Second, on a basis of the ESP model, we
designed and implemented an enabling system, Dawning Cloud,
which provides automatic management for heterogeneous MTC and
HTC workloads. Third, our experiments show that for typical
MTC and HTC workloads, MTC and HTC service providers and the
resource service provider can benefit from the economies of
scale on a Cloud platform. Lastly, using an analytical
approach we verify that irrespective of specific workloads,
Dawning Cloud can achieve the economies of scale on Cloud
platforms.
Making Cloud Intermediate Data Fault-Tolerant
Parallel dataflow programs generate enormous amounts of
distributed data that are short-lived, yet are critical for
completion of the job and for good run-time performance. We
call this class of data as intermediate data. This paper is
the first to address intermediate data as a first-class
citizen, specifically targeting and minimizing the effect of
run-time server failures on the availability of intermediate
data, and thus on performance metrics such as job completion
time. We propose new design techniques for a new storage
system called ISS (Intermediate Storage System), implement
these techniques within Hadoop, and experimentally evaluate
the resulting system. Under no failure, the performance of
Hadoop augmented with ISS (i.e., job completion time) turns
out to be comparable to base Hadoop. Under a failure, Hadoop12
with ISS outperforms base Hadoop and incurs up to 18%
overhead compared to base no-failure Hadoop, depend upon the
test based setup.
We have shown the need for, presented requirements
towards, and designed a new intermediate storage system (ISS)
that treats intermediate data in dataflow programs as a
first-class citizen in order to tolerate failures. We have
shown ex-perimentally that the existing approaches are
insufficient in satisfying the requirements of data
availability and minimal interference. We have also shown
that our asynchronous rack-level selective replication
mechanism is effective, and masks interference very well. Our
failure injection experiments show that Hadoop with ISS can
speed up the performance by approximately 45% compared to
Hadoop without ISS. Under no failures, ISS incurs very little
overhead compared to Hadoop. Under a failure, ISS incurs only
up to 18% of overhead compared to Hadoop with no failures.
Privacy-Preserving Multi-Keyword Ranked Search over
Encrypted
With the advent of cloud computing, data owners are
motivated to outsource their complex data management systems
from local sites to the commercial public cloud for great
flexibility and economic savings. But for protecting data
privacy, sensitive data has to be encrypted before
outsourcing, which obsoletes traditional data utilization
based on plaintext keyword search. Thus, enabling an13
encrypted cloud data search service is of paramount
importance. Considering the large number of data users and
documents in the cloud, it is necessary to allow multiple
keywords in the search request and return documents in the
order of their relevance to these keywords. Related works on
searchable encryption focus on single keyword search or
Boolean keyword search, and rarely sort the search results.
In this paper, for the first time, we define and solve the
challenging problem of privacy-preserving multi-keyword
ranked search over encrypted cloud data (MRSE). We establish
a set of strict privacy requirements for such a secure cloud
data utilization system. Among various multi-keyword
semantics, we choose the efficient similarity measure of
“coordinate matching”, i.e., as many matches as possible, to
capture the relevance of data documents to the search query.
We further use “inner product similarity” to quantitatively
evaluate such similarity measure. We first propose a basic
idea for the MRSE based on secure inner product computation,
and then give two significantly improved MRSE schemes to
achieve various stringent privacy requirements in two
different threat models. Thorough analysis investigating
privacy and efficiency guarantees of proposed schemes is
given. Experiments on the real-world dataset further show
proposed schemes indeed introduce low overhead on computation
and communication.
14
Authorized Private Keyword Search over Encrypted
Data in Cloud Computing
In cloud computing, clients usually outsource their data
to the cloud storage servers to reduce the management costs.
While those data may contain sensitive personal information,
the cloud servers cannot be fully trusted in protecting them.
Encryption is a promising way to protect the confidentiality
of the outsourced data, but it also introduces much
difficulty to performing effective searches over encrypted
information. Most existing works do not support efficient
searches with complex query conditions, and care needs to be
taken when using them because of the potential privacy
leakages about the data owners to the data users or the cloud
server. In this paper, using online
Personal Health Record (PHR) as a case study, we first show
the necessity of search capability authorization that reduces
the privacy exposure resulting from the search results, and
establish a scalable framework for Authorized Private Keyword
Search (APKS) over encrypted cloud data. We then propose two
novel solutions for APKS based on a recent cryptographic
primitive, Hierarchical Predicate Encryption (HPE). Our
solutions enable efficient multi-dimensional keyword searches
with range query allow delegation and revocation of search
capabilities. Moreover, we enhance the query privacy which
hides users’ query keywords against the server. We implement
our scheme on a modern workstation, and experimental results
demonstrate its suitability for practical usage.
15
In this paper, we address the problem of authorized
private keyword searches over encrypted data in cloud
computing, where multiple data owners encrypt their records
along with a keyword index to allow searches by multiple
users. To limit the exposure of sensitive information due to
unrestricted query capabilities, we propose a scalable, fine-
grained authorization framework where users obtain their
search capabilities from local trusted authorities according
to their attributes. We then propose two novel solutions for
APKS over encrypted data based on HPE, where in the first one
we enhance the search efficiency using attribute hierarchy,
and in the second we enhance the query privacy via the help
of proxy servers. Our solutions also support efficient multi-
dimensional range query, search capability delegation and
revocation. In addition, we implement our solution on a
modern workstation; the results show that APKS achieves
reasonable search performance.
Fully Homomorphic Encryption Using Ideal Lattice
We propose a fully homomorphic encryption scheme – i.e.,
a scheme that allows one to evaluate circuits over encrypted
data without being able to decrypt. Our solution comes
in three steps. First, we provide a general result – that, to
construct an encryption scheme that permits evaluation of16
arbitrary circuits, it suffices to construct an encryption
scheme that can evaluate (slightly augmented versions of) its
own decryption circuit; we call a scheme that can evaluate
its (augmented) decryption circuit bootstrappable. Next, we
describe a public key encryption scheme using ideal lattices
that is almost bootstrappable. Lattice-based cryptosystems
typically have decryption algorithms with low circuit
complexity, often dominated by an inner product computation
that is in NC1. Also, ideal lattices provide both additive
and multiplicative homomorphisms (modulo a public-key ideal
in a polynomial ring that is represented as a lattice), as
needed to evaluate general circuits. Unfortunately, our
initial scheme is not quite bootstrappable – i.e., the depth
that the scheme can correctly evaluate can be logarithmic in
the lattice dimension, just like the depth of the decryption
circuit, but the latter is greater than the former. In the
final step, we show how to modify the scheme to reduce the
depth of the decryption circuit, and thereby obtain a
bootstrappable encryption scheme, without reducing the depth
that the scheme can evaluate. Abstractly, we accomplish this
by enabling the encrypter to start the decryption process,
leaving less work for the decrypter, much like the server
leaves less work for the decrypter in a server-aided
cryptosystem.
17
Privacy-Preserving Data Publishing: A Survey of
Recent Development
The collection of digital information by governments,
corporations, and individuals has created tremendous
opportunities for knowledge- and information-based decision
making. Driven by mutual benefits, or by regulations that
require certain data to be published, there is a demand for
the exchange and publication of data among various parties.
Data in its original form, however, typically contains
sensitive information about individuals, and publishing such
data will violate individual privacy. The current practice in
data publishing relies mainly on policies and guidelines as
to what types of data can be published and on agreements on
the use of published data. This approach alone may lead to
excessive data distortion or insufficient protection.
Privacy-preserving data publishing (PPDP) provides methods
and tools for publishing useful information while preserving
data privacy. Recently, PPDP has received considerable
attention in research communities, and many approaches have
been proposed for different data publishing scenarios. In
this survey, we will systematically summarize and evaluate
18
different approaches to PPDP, study the challenges in
practical data publishing, clarify the differences and
requirements that distinguish PPDP from other related
problems, and propose future research directions.
An Upper-Bound Control Approach for Cost-Effective
Private Protection of Intermediate Data Set
Storage in Cloud
Along with more and more data intensive applications
have been migrated into cloud environments, storing some
valuable intermediate datasets has been accommodated in order
to avoid the high cost of re-computing them. However, this
poses a risk on data privacy protection because malicious
parties may deduce the private information of the parent
dataset or original dataset by analyzing some of those stored
intermediate datasets. The traditional way for addressing
this issue is to encrypt all of those stored datasets so that
they can be hidden. We argue that this is neither efficient
nor cost-effective because it is not necessary to encrypt ALL
of those datasets and encryption of all large amounts of
datasets can be very costly. In this paper, we propose a new
approach to identify which stored datasets need to be
encrypted and which not. Through intensive analysis of
information theory, our approach designs an upper bound on
privacy measure. As long as the overall mixed information
amount of some stored datasets is no more than that upper
bound, those datasets do not need to be encrypted while
19
privacy can still be protected. A tree model is leveraged to
analyze privacy disclosure of datasets, and privacy
requirements are decomposed and satisfied layer by layer.
With a heuristic implementation of this approach, evaluation
results demonstrate that the cost for encrypting intermediate
datasets decreases significantly compared with the
traditional approach while the privacy protection of parent
or original dataset is guaranteed
Toward Data Confidentiality in Storage-Intensive
Cloud Applications
By offering high availability and elastic access to
resources, third-party cloud infrastructures such as Amazon
EC2 are revolutionizing the way today's businesses operate.
Unfortunately, taking advantage of their benefits requires
businesses to accept a number of serious risks to data
security. Factors such as software bugs, operator errors and
external attacks can all compromise the confidentiality of
sensitive application data on external clouds, by making them
vulnerable to unauthorized access by malicious parties.
In this paper, we study and seek to improve the
confidentiality of application data stored on third-party
computing clouds. We propose to identify and encrypt all
functionally encryptable data, sensitive data that can be encrypted
without limiting the functionality of the application on the
cloud. Such data would be stored on the cloud only in an
20
encrypted form, accessible only to users with the correct
keys, thus protecting its confidentiality against
unintentional errors and attacks alike. We describe Silverline, a
set of tools that automatically 1) identify all functionally
encryptable data in a cloud application, 2) assign encryption
keys to specific data subsets to minimize key management
complexity while ensuring robustness to key compromise, and
3) provide transparent data access at the user device while
preventing key compromise even from malicious clouds. Through
experiments with real applications, we find that many web
applications are dominated by storage and data sharing components
that do not require interpreting raw data. Thus, Silverline
can protect the vast majority of data on these applications,
simplify key management, and protect against key compromise.
Together, our techniques provide a substantial first step
towards simplifying the complex process of incorporating data
confidentiality into these storage-intensive cloud
applications.
21
3.1. Introduction
The Systems Development Life Cycle (SDLC), or Software
Development Life Cycle in systems engineering, information
systems and software engineering, is the process of creating
or altering systems, and the models and methodologies that
people use to develop these systems. In software engineering
the SDLC concept underpins many kinds of software development
methodologies. These methodologies form the framework for
planning and controlling the creation of an information
system the software development process.
3.2. Existing System
Existing technical approaches for preserving the privacy of
datasets stored in cloud mainly include encryption and
anonymization. On one hand, encrypting all datasets, a
straightforward and effective approach, is widely adopted in
current research. However, processing on encrypted datasets
efficiently is quite a challenging task, because most existing
applications only run on unencrypted datasets. However, preserving
the privacy of intermediate datasets becomes a challenging problem
because adversaries may recover privacy-sensitive information by
analyzing multiple intermediate datasets. Encrypting ALL datasets
in cloud is widely adopted in existing approaches to address this
challenge. But we argue that encrypting all intermediate datasets
are neither efficient nor cost-effective because it is very time
consuming and costly for data-intensive applications to en/decrypt
datasets frequently while performing any operation on them.
3.3. Proposed System
23
In this paper, we propose a novel approach to identify which
intermediate datasets need to be encrypted while others do not, in
order to satisfy privacy requirements given by data holders. A tree
structure is modeled from generation relationships of intermediate
datasets to analyze privacy propagation of datasets. As quantifying
joint privacy leakage of multiple datasets efficiently is
challenging, we exploit an upper-bound constraint to confine
privacy disclosure. Based on such a constraint, we model the
problem of saving privacy-preserving cost as a con-strained
optimization problem. This problem is then di-vided into a series
of sub-problems by decomposing privacy leakage constraints.
Finally, we design a practical heuristic algorithm accordingly to
identify the datasets that need to be encrypted. Experimental
results on real-world and extensive datasets demonstrate that
privacy-preserving cost of intermediate datasets can be
significantly reduced with our approach over existing ones where
all datasets are encrypted.
System overview of U-Cloud:
24
3.4. Software Requirement Specification
3.4.1. Overall Description
A Software Requirements Specification (SRS) – a
requirements specification for a software system is a complete
description of the behavior of a system to be developed. It
includes a set of use cases that describe all the interactions
the users will have with the software. In addition to use
cases, the SRS also contains non-functional requirements.
Nonfunctional requirements are requirements which impose
constraints on the design or implementation (such as
performance engineering requirements, quality standards, or
design constraints).
System requirements specification: A structured
collection of information that embodies the requirements of a
25
system. A business analyst, sometimes titled system analyst, is
responsible for analyzing the business needs of their clients
and stakeholders to help identify business problems and propose
solutions. Within the systems development lifecycle domain, the
BA typically performs a liaison function between the business
side of an enterprise and the information technology department
or external service providers. Projects are subject to three
sorts of requirements:
Business requirements describe in business terms what
must be delivered or accomplished to provide value.
Product requirements describe properties of a system or
product (which could be one of several ways to accomplish a
set of business requirements.)
Process requirements describe activities performed by
the developing organization. For instance, process
requirements could specify .Preliminary investigation examine
project feasibility, the likelihood the system will be useful
to the organization. The main objective of the feasibility
study is to test the Technical, Operational and Economical
feasibility for adding new modules and debugging old running
system. All system is feasible if they are unlimited
resources and infinite time. There are aspects in the
feasibility study portion of the preliminary investigation:
ECONOMIC FEASIBILITY
A system can be developed technically and that will be
used if installed must still be a good investment for the
26
organization. In the economical feasibility, the development
cost in creating the system is evaluated against the ultimate
benefit derived from the new systems. Financial benefits must
equal or exceed the costs. The system is economically
feasible. It does not require any addition hardware or
software. Since the interface for this system is developed
using the existing resources and technologies available at
NIC, There is nominal expenditure and economical feasibility
for certain.
OPERATIONAL FEASIBILITY
Proposed projects are beneficial only if they can be
turned out into information system. That will meet the
organization’s operating requirements. Operational
feasibility aspects of the project are to be taken as an
important part of the project implementation. This system is
targeted to be in accordance with the above-mentioned issues.
Beforehand, the management issues and user requirements have
been taken into consideration. So there is no question of
resistance from the users that can undermine the possible
application benefits. The well-planned design would ensure
the optimal utilization of the computer resources and would
help in the improvement of performance status.
TECHNICAL FEASIBILITY
27
Earlier no system existed to cater to the needs of
‘Secure Infrastructure Implementation System’. The current
system developed is technically feasible. It is a web based
user interface for audit workflow at NIC-CSD. Thus it
provides an easy access to .the users. The database’s purpose
is to create, establish and maintain a workflow among various
entities in order to facilitate all concerned users in their
various capacities or roles. Permission to the users would be
granted based on the roles specified. Therefore, it provides
the technical guarantee of accuracy, reliability and
security.
3.4.2. External Interface Requirements
User Interface
The user interface of this system is a user friendly
Java Graphical User Interface.
Hardware Interfaces
The interaction between the user and the console is
achieved through Java capabilities.
Software Interfaces
The required software is JAVA1.6.
Operating Environment
Windows XP, Linux.
HARDWARE REQUIREMENTS:
Hardware : Pentium
28
Speed : 1.1 GHz
RAM : 1GB
Hard Disk : 20 GB
Floppy Drive : 1.44 MB
Key Board : Standard Windows
Keyboard
Mouse : Two or Three
Button Mouse
Monitor : SVGA
SOFTWARE REQUIREMENTS:
Operating System : Windows
Technology : J2SE
IDE : My Eclipse
Database : My SQL
Java Version : J2SDK1.6
29
CHAPTER-4
SYSTEM DESIGN
Unified Modeling Language:
The Unified Modeling Language allows the software
engineer to express an analysis model using the modeling
notation that is governed by a set of syntactic semantic and
pragmatic rules.
A UML system is represented using five different views
that describe the system from distinctly different
perspective. Each view is defined by a set of diagram, which
is as follows.
User Model View
i. This view represents the system from the users
perspective.
ii. The analysis representation describes a usage
scenario from the end-users perspective.
31
Structural model view
i. In this model the data and functionality are
arrived from inside the system.
ii. This model view models the static structures.
Behavioral Model View
It represents the dynamic of behavioral as parts of
the system, depicting the interactions of
collection between various structural elements
described in the user model and structural model
view.
Implementation Model View
In this the structural and behavioral as parts of
the system are represented as they are to be built.
Environmental Model View
In this the structural and behavioral aspects of
the environment in which the system is to be
implemented are represented.
UML is specifically constructed through two different domains
they are:
UML Analysis modeling, this focuses on the user model
and structural model views of the system.
UML design modeling, which focuses on the behavioral
modeling, implementation modeling and environmental
model views.
Use case Diagrams represent the functionality of the system
from a user’s point of view. Use cases are used during
32
requirements elicitation and analysis to represent the
functionality of the system. Use cases focus on the behavior
of the system from external point of view.
Actors are external entities that interact with the system.
Examples of actors include users like administrator, bank
customer …etc., or another system like central database.
1. UML diagrams
Class diagram:-
The class diagram is the main building block of object
oriented modelling. It is used both for general conceptual
modeling of the systematics of the application, and for
detailed modelling translating the models into programming
code . Class diagrams can also be used for data modeling. The
classes in a class diagram represent both the main objects,
interactions in the application and the classes to be
programmed.
A class with three sections.
In the diagram, classes are represented with boxes which
contain three parts:
The upper part holds the name of the class
The middle part contains the attributes of the class
The bottom part gives the methods or operations the
class can take or undertake
33
Class diagram:
PrivacyLeakageSocket : socketString : addressString : occupationString : ageString : gender
uploadDataset()accessDataset()privacyPreservingalgorithm ()logout()
UploaddatasetString : ageString : genderString : occString : address
generateAlgorithm ()loadtoCloud()clear()
Adm inString : usernam eString : password
login()logout()reset()register()
UserString : usernam eString : password
register()login()logout()reset()
AccessDataSetString : occupationString : ageString : addressString : gender
subm it()search()
Use case diagram:-
A use case diagram at its simplest is a representation
of a user's interaction with the system and depicting the
34
specifications of a use case. A use case diagram can portray
the different types of users of a system and the various ways
that they interact with the system. This type of diagram is
typically used in conjunction with the textual use case and
will often be accompanied by other types of diagrams as well.
Use case diagram for Admin:
35
login
upload dataset
generate algorithm
access dataset
load to cloud
clear
Adm in
logout
Use case diagram for User:
36
register
login
search
access dataset
User
logout
Sequence Diagram:-
A sequence diagram is a kind of interaction diagram that
shows how processes operate with one another and in what
order. It is a construct of a Message Sequence Chart. A
sequence diagram shows object interactions arranged in time
sequence. It depicts the objects and classes involved in the
scenario and the sequence of messages exchanged between the
objects needed to carry out the functionality of the
scenario. Sequence diagrams are typically associated with use
37
case realizations in the Logical View of the system under
development. Sequence diagrams are sometimes called event
diagrams, event scenarios, and timing diagrams.
Sequence diagram for Admin:
Adm in Hom e PrivacyLeakage Cloud Logout
login
upload dataset
generate algorithm
load to cloud
preserve the interm ediate dataset
access the dataset
logout
38
Sequence diagram for User:
User Hom e Cloud Logout
register
login
search string
view dataset
logout
Collaborative diagram:-
A collaboration diagram describes interactions
among objects in terms of sequenced messages.
Collaboration diagrams represent a combination of39
information taken from class, sequence, and use case
diagrams describing both the static structure and
dynamic behavior of a system.
Collaboration diagram for Admin:
PrivacyLeakage
Adm in Hom e
CloudLogout
1: login2: upload dataset
3: generate algorithm
4: load to cloud5: preserve the interm ediate dataset
6: access the dataset
7: logout
Collaboration diagram for User:
40
User Hom e
CloudLogout
1: register2: login
3: search string4: view dataset
5: logout
Component Diagram
In the Unified Modeling Language, a component diagram
depicts how components are wired together to form larger
components and or software systems. They are used to
illustrate the structure of arbitrarily complex systems
Components are wired together by using an assembly connector to
connect the required interface of one component with the
provided interface of another component. This illustrates the
service consumer - service provider relationship between the two
components.
Component diagram for Admin:
41
access dataset
Adm in
upload dataset
generate algorithm
load to cloud
Collaboration diagram for User:
42
User
login
access dataset
logout
register
Deployment Diagram
A deployment diagram in the Unified Modeling Language
models the physical deployment of artifacts on nodes.[1] To
describe a web site, for example, a deployment diagram would
show what hardware components ("nodes") exist (e.g., a web
server, an application server, and a database server), what
software components ("artifacts") run on each node (e.g., web
application, database), and how the different pieces are
connected (e.g. JDBC, REST, RMI).
The nodes appear as boxes, and the artifacts allocated
to each node appear as rectangles within the boxes. Nodes may
have sub nodes, which appear as nested boxes. A single node
in a deployment diagram may conceptually represent multiple
physical nodes, such as a cluster of database servers.
43
Adm in
upload dataset
generate algorithm
load to cloud
access dataset
User
Activity diagram:
Activity diagram is another important diagram in UML to
describe dynamic aspects of the system. It is basically a
flow chart to represent the flow form one activity to another
activity. The activity can be described as an operation of
the system.So the control flow is drawn from one operation to
another. This flow can be sequential, branched or concurrent.
44
Activity diagram:
Yes No
45
Login
Admin??
Uploaddataset
Search string
Access data setGenerate
algorithm
Load to cloud
Access data set
CHAPTER-5
IMPLEMENTATION
Number of Modules
After careful analysis the system has been identified to havethe following modules:
1.Data Storage Privacy Module.
2.Privacy Preserving Module.
3.Intermediate Dataset Module.
4.Privacy Upper Bound Module.
47
1.Data Storage Privacy Module:
The privacy concerns caused by retaining intermediate
datasets in cloud are important but they are paid little
attention. A motivating scenario is illustrated where an on-
line health service provider, e.g., Microsoft Health Vault
has moved data storage into cloud for economical benefits.
Original datasets are encrypted for confidentiality. Data
users like governments or research centers access or process
part of original datasets after anonymization. Intermediate
datasets generated during data access or process are retained
for data reuse and cost saving. We proposed an approach that
combines encryption and data fragmentation to achieve privacy
protection for distributed data storage with encrypting only
part of datasets.
2. Privacy Preserving Module :
Privacy-preserving techniques like generalization can
with-stand most privacy attacks on one single dataset, while
preserving privacy for multiple datasets is still a
challenging problem. Thus, for preserving privacy of multiple
datasets, it is promising to anonymize all datasets first and
then encrypt them before storing or sharing them in cloud.
Privacy-preserving cost of intermediate datasets stems from
frequent en/decryption with charged cloud services.
48
3. Intermediate Dataset Module :
An intermediate dataset is assumed to have been ano-
nymized to satisfy certain privacy requirements. However,
putting multiple datasets together may still invoke a high
risk of revealing privacy-sensitive information, resulting in
violating the privacy requirements. Data provenance is
employed to manage intermediate datasets in our research.
Provenance is commonly defined as the origin, source or
history of derivation of some objects and data, which can be
reckoned as the information upon how data was generated.
Reproducibility of data provenance can help to regenerate a
dataset from its nearest existing predecessor datasets rather
than from scratch
4. Privacy UpperBound Module:
Privacy quantification of a single data-set is stated.
We point out the challenge of privacy quantification of
multiple datasets and then derive a privacy leakage upper-
bound constraint correspondingly. We propose an upper-bound
constraint based approach to select the necessary subset of
intermediate datasets that needs to be encrypted for
minimizing privacy-preserving cost. The privacy leakage
upper-bound constraint is decomposed layer by layer.
5.3. Introduction of technologies used
49
About Java :
Initially the language was called as “oak” but it was
renamed as “java” in 1995.The primary motivation of this
language was the need for a platform-independent(i.e.
architecture neutral)language that could be used to create
software to be embedded in various consumer electronic
devices.
Java is a programmer’s language
Java is cohesive and consistent
Except for those constraint imposed by the Internet
environment. Java gives the programmer, full
control
Finally Java is to Internet Programming where c was to System
Programming.
Importance of Java to the Internet
Java has had a profound effect on the Internet. This is
because; java expands the Universe of objects that can move
about freely in Cyberspace. In a network, two categories of
objects are transmitted between the server and the personal
computer. They are passive information and Dynamic active
programs. in the areas of Security and probability. But Java
addresses these concerns and by doing so, has opened the door
to an exciting new form of program called the Applet.
Applications and applets.
50
An application is a program that runs on our Computer
under the operating system of that computer. It is more or
less like one creating using C or C++ .Java’s ability to
create Applets makes it important. An Applet an application,
designed to be transmitted over the Internet and executed by
a Java-compatible web browser. An applet I actually a tiny
Java program, dynamically downloaded across the network, just
like an image. But the difference is, it is an intelligent
program, not just a media file. It can be react to the user
input and dynamically change.
Java Architecture
Java architecture provides a portable, robust, high
performing environment for development. Java provides
portability by compiling the byte codes for the Java Virtual
Machine, which is then interpreted on each platform by the
run-time environment. Java is a dynamic system, able to load
code when needed from a machine in the same room or across
the planet.
51
Source code
Pc compiler
Macintosh compiler
SPARC Compiler
Java Byte code
Platform independe
nt
Java interpreter
Java interpreterm
acintosh
)))
Java interpreter(
SPARC)
Compilation of code
When you compile the code, the Java compiler creates
machine code (called byte code)for a hypothetical machine
called Java Virtual Machine(JVM). The JVM is supposed t
executed the byte code. The JVM is created for the overcoming
the issue of probability. The code is written and compiled
for one machine and interpreted on all machines .This machine
is called Java Virtual Machine.
Compiling and interpreting java source code.
During run-time the Java interpreter tricks the byte code
file into thinking that it is running on a Java Virtual
Machine. In reality this could be an Intel Pentium windows 95
or sun SPARCstation running Solaris or Apple Macintosh
running system and all could receive code from any computer
through internet and run the Applets.52
Simple :
Java was designed to be easy for the Professional
programmer to learn and to use effectively. If you are an
experienced C++ Programmer Learning Java will oriented
features of C++. Most of the confusing concepts from C++ are
either left out of Java or implemented in a cleaner, more
approachable manner. In Java there are a small number of
clearly defined ways to accomplish a given task.
Object oriented
Java was not designed to be source-code compatible with any
other language. This allowed the Java team the freedom to
design with a blank state. One outcome of this was a clean
usable, pragmatic approach to objects. The object model in
Java is simple and easy to extend, while simple types, such
as integers, are kept as high-performance non-objects.
Robust
The multi-platform environment of the web places
extraordinary demands on a program, because the program must
execute reliably in a variety of systems. The ability to
create robust programs was given a high priority in the
design of Java. Java is strictly typed language; it checks
your code at compile time and runtime. Java virtually53
eliminates the problems of memory management and deal
location, which is completely automatic. In a well-written
Java program, all run-time errors can and should be managed
by your program.
Servlets/JSPA Servlet Is a generic server extension. a Java class that can be loaded
Dynamically to expand the functionality of a server. Servlets
are commonly used with web servers. Where they can take the
place CGI scripts. A servlet is similar to proprietary server
extension, except that it runs inside a Java Virtual
Machine(JVM) on the server, so it is safe and portable
Servlets operate solely within the domain of the server.
Unlike CGI and Fast CGI, which use multiple processes to
handle separate program or separate requests, separate
threads within web server process handle all servlets. This
means that servlets are all efficient and scalable. Servlets
are portable; both across operating systems and also across
web servers. Java Servlets offer the best possible platform
for web application development.Servlets are used as
replacement for CGI scripts on a web server, they can extend
any sort of server such as a mail server that allows servelts
t extend its functionality perhaps by performing a virus scan
on all attached documents or handling mail filtering tasks.
Servlets provide a Java-based solution used to address the
problems currently associated with doing server-side
programming including inextensible scripting solutions
platform-specific API’s and incomplete interface.54
Servlets are objects that conform to a specific interface
that can be plugged into a Java-based server. Servlets are to
the server-side what applets are to the server-side what
applets are to the client-side-object byte codes that can be
dynamically loaded off the net. They differ from applets in
than they are faceless objects(without graphics or a GUI
component).They serve as platform independent, dynamically
loadable, pluggable helper byte code objects on the server
side that can be used to dynamically extend server-side
functionality.
For example an HTTP servlet can be used to generate
dynamic HTML content when you use servlets to do dynamic
content you get the following advantages:
They’re faster and cleaner then CGI scripts
They use a standard API( the servlet API)
They provide all the advantages of Java (run on a
variety of servers without needing to be rewritten)
Attractiveness of Servlets:
They are many features of servlets that make them easy
and attractive to tuse these include:
Easily configure using the GUI-based Admin tool]
Can be Loaded and Invoked from a local disk or
remotely across the network.
Can be linked together or chained, so that on
servlet can call another servlet, or several
servlets in sequence.55
Can be called dynamically from within HTML, pages
using server-side include-tags.
Are secure-even when downloading across the
network, the servlet security model and servlet and
box protect your system from unfriendly behavior.
Advantages of the servlet API
One of the great advantages of the servlet API is protocol
independent. It assumes nothing about:
The protocol being used to transmit on the net
How it is loaded
The server environment it will be running in
These quantities are important, because it allows
the Servlet API to be embedded in many different
kinds of servers. There are other advantages to the
servlet API as well These include:
It’s extensible-you can inherit all your
functionality from the base classes made available
to you
It’s simple small, and easy to use.
Features of Servlets:
Servlets are persistent. Servlet are loaded only by
the web server and can maintain services between
requests.
56
Servlets are fast. Since servlets only need to be
l\loaded once, they offer much better performance
over their CGI counterparts.
Servlets are platform independent.
Servlets are extensible Java is a robust, object-
oriented programming language, which easily can be
extended to suit your needs.
Servlets are secure
Servlets are used with a variety of client.
Servlets are classes and interfaces from tow packages,
javax .servlet and javax.servlet.http. The java.servlet
package contains classes support generic, protocol-
independent servlets.The classes in the javax.servelt.http
package To and HTTP specific functionality extend these
classes
Every servlet must implement the javax.servelt interface.
Most servlets implement it by extending one of two classes.
javax.servlet.GenericServlet or
javax.servlet.http.HttpServlet. A protocol-independent
servlet should subclass Generic-Servlet. While an Http
servlet should subclass HttpServlet, which is itself a
subclass of Generic-servlet with added HTTP-specific
functionality.
Unlike a java program, a servlet does not have a main()
method, Instead the server in the process of handling
requests invoke certain methods of a servlet. Each time the
57
server dispatches a request to a servlet, it invokes the
servelts Service() method,
A generic servlet should override its service() method to
handle requests as appropriate for the servlet. The service()
accepts two parameters a request object and a response object
.The request object tells the servlet about the request,
while the response object is used to return a response
In Contrast, an Http servlet usually does not override the
service() method. Instead it overrides doGet() to handle GET
requests and doPost() to handle Post requests. An Http
servlet can override either or both of these modules the
service() method of HttpServlet handles the setup and
dispatching to all the doXXX() methods. which is why it
usually should not be overridden
The remainders in the javax.servlet and
javax.servlet.http.package are largely support classes .The
ServletRequest and ServletResponse classes in javax.servlet
provide access to generic server requests and responses while
HttpServletRequest and HttpServletResponse classes in
javax.servlet provide access to generic server requests and
responses while HttpServletRequest and HttpServletResponse in
javax.servlet.http provide access a HTTP requests and
responses. The javax.servlet.http provide contains an
HttpSession class that provides built-in session tracking
functionality and Cookie class that allows quickly setup and
processing HttpCookies.
Loading Servlets:58
Servlets can be loaded from their places. From a
directory that is on the CLASSPATH. The CLASSPATH of the
JavaWebServer includes service root/classes/, which is where
the system classes reside from the
<SERVICE_ROOT/servlets/directory. This is not in the server’s
classpath. A class loader is used to create servlets form
this directory. New servlets can be added-existing servlets
can be recompiled and the server will notice these changes.
From a remote location. For this a code base like
http://nine.eng/classes/foo/ is required in addtion to the
servlet’s class name. Refer to the admin Gui docs on servlet
section to see how to set this up.Loading Remote Servlets
Remote servlets can be loaded by
Configuring the admin Tool to setup automatic
loading of remote servlets.
Selecting up server side include tags in .html
files
Defining a filter chain Configuration
Invoking Servlets
A servlet invoker is a servlet that invokes the “server”
method on a named servlet. If the servlet is not loaded in
the server, then the invoker first loads the servlet(either
form local disk or from the network) and the then invokes the
“service” method. Also like applets, local servlets in the
server can be identified by just the class name. In other
words, if a servlet name is not absolute. it is treated as
local.59
A Client can Invoke Servlets in the Following Ways:
The client can ask for a document that is served by
the servlet.
The client(browser) can invoke the servlet directly
using a URL, once it has been mapped using the
SERVLET ALIASES Section of the admin GUI
The servlet can be invoked through server side
include tags.
The servlet can be invoked by placing it in the
servlets/directory
The servlet can be invoked by using it in a filter
chain
The Servlet Life Cycle:-
The Servlet life cycle is one of the most exciting features
of Servlets. This life cycle is a powerful hybrid of the life
cycles used in CGI programming and lower-level NSAPI and
ISAPI programming. The servlet life cycle allows servlet
engines to address both the performance and resource problems
of CGI and the security concerns of low level server API
programming. Servlet life cycle is highly flexible Servers
have significant leeway in how they choose to support
servlets.The only hard and fast rule is that a servlet engine
must confor to the following life cycle contact:
Create and initialize the servlets
Handle zero or more service from clients
Destroy the servlet and then garbage Collects it.
60
It’s perfectly legal for a servlet t be loaded, created
an initialized in its own JVM, only to be destroyed and
garbage collected without handling any client request or
after handling just one request The most common and most
sensible life cycle implementations for HTTP servelts are:
Single java virtual machine and astatine persistence.
Init and Destroy:-
Just like Applets servlets can define init() and
destroy() methods, A servlets init(ServiceConfig) method is
called by the server immediately after the server constructs
the servlet’s instance. Depending on the server and its
configuration, this can be at any of these times
When the server states
When the servlet is first requested, just before
the service() method is invoked
At the request of the server administrator
In any case, nit() is guaranteed to be called before the
servlet handles its first requestThe init() method is
typically used to perform servlet initialization creating or
loading objects that are used by the servlet in handling of
its request. In order to providing a new servlet any
information about itself and its environment, a server has to
call a servelts init() method and pass an object that
implement the ServletConfig interface.This ServletConfig
61
object supplies a servlet with information about its
initialization parameters. These parameters are given to the
servlets and are not associated with any single request. They
can specify initial values, such as where a counter should
begin counting, or default values, perhaps a template to use
when not specified by the request, The server calls a
servlet’s destroy() method when the servlet is about to be
unloaded. In the destroy() method, a servlet should free any
resources it has acquired that will not be garbage collected.
The destroy() method also gives a servlet a chance to write
out its unsaved. cached information or any persistent
information that should be read during the next call to
init().
Session Tracking:
HTTP is a stateless protocol, it provides no way for a
server to recognize that a sequence of requests is all from
the same client. This causes a problem for application such
as shopping cart applications. Even in chat application
server can’t know exactly who’s making a request of several
clients. The solution for this is for client to introduce
itself as it makes each request, Each client needs to provide
a unique identifier that lets the server identify it, or it
needs to give some information that the server can use to
properly handle the request, There are several ways to send
this introductory information with each request Such as:
USER Authorization:62
One way to perform session tracking is to leverage the
information that comes with
User authorization. When a web server restricts access to
some of its resources to only those clients that log in using
a recognized username and password. After the client logs in,
the username is available to a servlet through getRemoteUser
()
When use the username to track the session. Once a user has
logged in, the browser remembers her username and resends the
name and password as the user views new pages on the site. A
servlet can identify the user through her username and
they’re byTrack her session.
The biggest advantage of using user authorization to
perform session tracking is that it’s easy to implement.
Simply tell the protect a set of pages, and use
getRemoteUser() to identify each client. Another advantage is
that the technique works even when the user accesses your
site form or exists her browser before coming back.
The biggest disadvantage of user authorization is that it
requires each user to register for an account and then log in
each time the starts visiting your site. Most users will
tolerate registering and lagging in as a necessary evil when
they are accessing sensitive information, but it all overkill
for simple session tracking. Other problem with user
authorization is that a user cannot simultaneously maintain
more than one session at the same site.
Hidden Form Fields:63
One way to support anonymous session tracking is to use
hidden from fields. As the name implies, these are fields
added to an HTML, form that are not displayed in the client’s
browser, They are sent back to the server when the form that
contains them is submitted.In a sense, hidden form fields
define constant variables for a form. To a servlet receiving
a submitted form, there is no difference between a hidden
fields and a visible filed.
As more and more information is associated with a
clients session. It can become burdensome to pass it all
using hidden form fields. In these situations it’s possible
to pass on just a unique session ID that identifies as
particular clients session.
That session ID can be associated with complete
information about its session that is stored on the server.
The advantage of hidden form fields is their ubiquity and
support for anonymity. Hidden fields are supported in all the
popular browsers, they demand on special server requirements,
and they can be used with clients that haven’t registered or
logged in.
The major disadvantage with this technique, however is that
works only for a sequence of dynamically generated forms, The
technique breaks down immediately with static documents,
emailed documents book marked documents and browser
shutdowns.
URL Rewriting:
64
URL rewriting is another way to support anonymous session
tracking, With URL rewriting every local URL the user might
click on is dynamically modified. or rewritten, to include
extra, information. The extra information can be in the
deform of extra path information, added parameters, or some
custom, server-specific.URL change. Due to the limited space
available in rewriting a URL, the extra information is
usually limited to a unique session.Each rewriting technique
has its own advantage and disadvantage Using extra path
information works on all servers, and it works as a target
for forms that use both the Get and Post methods. It does not
work well if the servlet has to use the extra path
information as true path information The advantages and
disadvantages of URL rewriting closely match those of hidden
form fields, The major difference is that URL rewriting works
for all dynamically created documents, such as the Help
servlet, not just forms. With the right server support,
custom URL rewriting can even work for static documents.
Persistent Cookies:
A fourth technique to perform session tracking involves
persistent cookies. A cookie is a bit of information. sent by
a web server to a browser that can later be read back form
that browser. When a browser receives a cookie, it saves the
cookie and there after sends the cookie back to the server
each time it accesses a page on that server, subject to
certain rules. Because a cookie’s value can uniquely identify
a client, cookies are often used for session
tracking.Persistent cookies offer an elegant, efficient easy
65
way to implement session tracking. Cookies provide as
automatic an introduction for each request as we could hope
for. For each request, a cookie can automatically provide a
client’s session ID or perhaps a list of clients performance.
The ability to customize cookies gives them extra power and
versatility.
The biggest problem with cookies is that browsers don’t
always accept cookies sometimes this is because the browser
doesn’t support cookies. More often it’s because the browser
doesn’t support cookies. More often it’s because the user has
specifically configured the browser to refuse cookies.
The power of serves:
The power of servlets is nothing but the advantages of
servlets over other approaches, which include portability,
power, efficiency, endurance, safety elegance, integration,
extensibility and flexibility.
Portability:
As servlets are written in java and conform to a well
defined and widely accepted API. they are highly portable
across operating systems and across server implementation
We can develop a servlet on a windows NT machine running the
java web server and later deploy it effortlessly on a high-
end Unix server running apache. With servlets we can really
“write once, serve everywhere” Servlet portability is not the
stumbling block it so often is with applets, for two reasons
66
First, Servlet portability is not mandatory i.e. servlets has
to work only on server machines that we are using for
development and deployment
Second, servlets avoid the most error-prone and inconstancy
implemented portion of the java languages.
Power:
Servlets can harness the full power of the core java.
API’s: such as Networking and Url access, multithreading,
image manipulation, data compression, data base
connectivity, internationalization, remote method
invocation(RMI) CORBA connectivity, and object serialization,
among others,
Efficiency And Endurance:
Servlet invocation is highly efficient, Once a servlet is
loaded it generally remains in the server’s memory as a
single object instance, There after the server invokes the
servelt to handle a request using a simple, light weighted
method invocation .Unlike the CGI, there’s no process to
spawn or interpreter to invoke, so the servlet can begin
handling the request almost immediately, Multiple, concurrent
requests are handled the request almost immediately.
Multiple, concurrent requests are handled by separate
threads, so servlets are highly scalable. Servlets in general
are enduring objects. Because a servlets stays in the
server’s memory as a single object instance. it automatically
67
maintains its state and can hold onto external resources,
such as database connections.
Safety:
Servlets support safe programming practices on a number of
levels.As they are written in java, servlets inherit the
strong type safety of the java language. In addition the
servlet API is implemented to be type safe. Java’s automatic
garbage collection and lack of pointers mean that servlets
are generally safe from memory management problems like
dangling pointers invalid pointer references and memory
leaks. Servlets can handle errors safely, due to java’s
exception – handling mechanism. If a servlet divides by zero
or performs some illegal operations, it throws an exception
that can be safely caught and handled by the server.A server
can further protect itself from servlets through the use of
java security manager. A server can execute its servlets
under the watch of a strict security manager.
Elegance:
The elegance of the servlet code is striking .Servlet code
is clean, object oriented modular and amazingly simple one
reason for this simplicity is the served API itself. Which
includes methods and classes to handle many of the routine
chores of servlet development. Even advanced to operations
like cookie handling and session tracking are abstracted int
convenient classes.
Integration:
68
Servlets are tightly integrated with the server. This
integration allows a servlet to cooperate with the server in
two ways . for e.g.: a servlet can use the server to
translate file paths, perform logging, check authorization,
perform MIME type mapping and in some cases even add users to
the server’s user database.
Extensibility and Flexibility:
The servlet API is designed to be easily extensible. As it
stands today the API includes classes that are optimized for
HTTP servlets. But later it can be extended and optimized for
another type of servlets. It is also possible that its
support for HTTP servlets could be further enhanced.
Servlets are also quite flexible, Sun also introduced
java server pages. which offer a way to write snippets of
servlet code directly with in a static HTML page using syntax
similar to Microsoft’s Active server pages(ASP)
JDBC
What is JDBC?
Any relational database. One can write a single program
using the JDBC API, and the JDBC is a Java Api for executing
SQL, Statements(As a point of interest JDBC is trademarked
name and is not an acronym; nevertheless, Jdbc is often
thought of as standing for Java Database Connectivity. It
consists of a set of classes and interfaces written in the
Java Programming language. JDBC provides a standard API for
69
tool/database developers and makes it possible to write
database applications using a pure Java APIUsing JDBC, it is
easy to send SQL statements to virtually program will be able
to send SQL .statements to the appropriate database. The
Combination of Java and JDBC lets a programmer writes it once
and run it anywhere.
What Does JDBC Do?
Simply put, JDBC makes it possible to do three things
o Establish a connection with a database
o Send SQL statements
o Process the results
o JDBC Driver Types
o The JDBC drivers that we are aware of this time fit into
one of four categories
o JDBC-ODBC Bridge plus ODBC driver
o Native-API party-java driver
o JDBC-Net pure java driver
o Native-protocol pure Java driver
An individual database system is accessed via a specific
JDBC driver that implements the java.sql.Driver interface.
Drivers exist for nearly all-popular RDBMS systems, through
few are available for free. Sun bundles a free JDBC-ODBC
bridge driver with the JDK to allow access to a standard
ODBC, data sources, such as a Microsoft Access database, Sun
advises against using the bridge driver for anything other
than development and very limited development.
70
JDBC drivers are available for most database platforms,
from a number of vendors and in a number of different
flavors. There are four driver categories
Type 01-JDBC-ODBC Bridge Driver
Type 01 drivers use a bridge technology to connect a
java client to an ODBC database service. Sun’s JDBC-ODBC
bridge is the most common type 01 driver. These drivers
implemented using native code.
Type 02-Native-API party-java Driver
Type 02 drivers wrap a thin layer of java around database-
specific native code libraries for Oracle databases, the
native code libraries might be based on the OCI(Oracle call
Interface) libraries, which were originally designed for c/c+
+ programmers, Because type-02 drivers are implemented using
native code. in some cases they have better performance than
their all-java counter parts. They add an element of risk,
however, because a defect in a driver’s native code section
can crash the entire server
Type 03-Net-Protocol All-Java Driver
Type 03 drivers communicate via a generic network protocol
to a piece of custom middleware. The middleware component
might use any type of driver to provide the actual database
access. These drivers are all java, which makes them useful
for applet deployment and safe for servlet deployment
Type-04-native-protocol All-java Driver
71
Type o4 drivers are the most direct of the lot. Written
entirely in java, Type 04 drivers understand database-
specific networking. protocols and can access the database
directly without any additional software
JDBC-ODBC Bridge
If possible use a Pure Java JDBC driver instead of the
Bridge and an ODBC driver. This completely eliminates the
client configuration required by ODBC. It also eliminates the
potential that the Java VM could be corrupted by an error in
the native code brought in by the Bridge(that is, the Bridge
native library, the ODBC driver manager library, library, the
ODBC driver library, and the database client library)
What is the JDBC-ODBE Bridge?
The JDBC-ODBC Bridge is a Jdbc driver, which
implements JDBC operations by translating them into ODBC
operations. To ODBC it appears as a normal application
program. The Bridge is implemented as the sun.jdbc.odbc Java
package and contains a native library used to access ODBC.
The Bridge is joint development of Intersolv and Java Soft.
72
Implementation is one of the most important tasks in
project is the phase in which one has to be cautions because
all the efforts undertaken during the project will be very
interactive. Implementation is the most crucial stage in
achieving successful system and giving the users confidence
that the new system is workable and effective. Each program
is tested individually at the time of development using the
sample data and has verified that these programs link
together in the way specified in the program specification.
The computer system and its environment are tested to the
satisfaction of the user.
Implementation
The implementation phase is less creative than system
design. It is primarily concerned with user training, and
file conversion. The system may be requiring extensive user
training. The initial parameters of the system should be
modifies as a result of a programming. A simple operating
procedure is provided so that the user can understand the
different functions clearly and quickly. The different
reports can be obtained either on the inkjet or dot matrix
printer, which is available at the disposal of the user. The
proposed system is very easy to implement. In general
implementation is used to mean the process of converting a
new or revised system design into an operational one.
Testing
Testing is the process where the test data is prepared
75
and is used for testing the modules individually and later
the validation given for the fields. Then the system testing
takes place which makes sure that all components of the
system property functions as a unit. The test data should be
chosen such that it passed through all possible condition.
Actually testing is the state of implementation which aimed
at ensuring that the system works accurately and efficiently
before the actual operation commence. The following is the
description of the testing strategies, which were carried out
during the testing period.
System Testing
Testing has become an integral part of any system or
project especially in the field of information technology.
The importance of testing is a method of justifying, if one
is ready to move further, be it to be check if one is capable
to with stand the rigors of a particular situation cannot be
underplayed and that is why testing before development is so
critical. When the software is developed before it is given
to user to user the software must be tested whether it is
solving the purpose for which it is developed. This testing
involves various types through which one can ensure the
software is reliable. The program was tested logically and
pattern of execution of the program for a set of data are
repeated. Thus the code was exhaustively checked for all
possible correct data and the outcomes were also checked.
76
Module Testing
To locate errors, each module is tested individually.
This enables us to detect error and correct it without
affecting any other modules. Whenever the program is not
satisfying the required function, it must be corrected to get
the required result. Thus all the modules are individually
tested from bottom up starting with the smallest and lowest
modules and proceeding to the next level. Each module in the
system is tested separately. For example the job
classification module is tested separately. This module is
tested with different job and its approximate execution time
and the result of the test is compared with the results that
are prepared manually. The comparison shows that the results
proposed system works efficiently than the existing system.
Each module in the system is tested separately. In this
system the resource classification and job scheduling modules
are tested separately and their corresponding results are
obtained which reduces the process waiting time.
Integration Testing
After the module testing, the integration testing is
applied. When linking the modules there may be chance for
errors to occur, these errors are corrected by using this
testing. In this system all modules are connected and tested.
The testing results are very correct. Thus the mapping of
jobs with resources is done correctly by the system.
77
Acceptance Testing
When that user fined no major problems with its
accuracy, the system passers through a final acceptance test.
This test confirms that the system needs the original goals,
objectives and requirements established during analysis
without actual execution which elimination wastage of time
and money acceptance tests on the shoulders of users and
management, it is finally acceptable and ready for the
operation.
78
Admin will upload the dataset (click on upload dataset button then select p1.dat file).
After uploading the dataset:
84
Click on generalize algorithm (it will anonymize the required data, here age)
Admin will load this dataset into the cloud.
Click on load to cloud button to upload into cloud.
After uploading the dataset into the cloud:
85
Admin will also access the dataset.
To access the data set click on Access Dataset button.
Access dataset screen:
Search for the data based on our requirement:
Searching for United-Sates candidates data set, so it will generatea intermediate dataset of United-Sates candidates.
86
Click on Privacy Preserving algorithm button to preserve the intermediate dataset.
Run one more client application then register as a user.
User registration screen:
87
User viewing the data set of United-States peoples.
In the above result the occupation details of United-States candidates are encrypted because the same data has been accessed bymany numbers of candidates so it crossed certain heuristic value so it is encrypted.
90
The above table will show the details like searched string, access frequency, encryption status and heuristic values.
(in our application we are taking heuristic value(sigma) as 1000000, if hvaluecrosses 1000000 then only the dataset will be encrypted)
91
CHAPTER-8
CONCLUSION
CHAPTER-8
CONCLUSION
In this paper, we have proposed an approach that
identifies which part of intermediate data sets needs to be92
encrypted while the rest does not, in order to save the
privacy preserving cost. A tree structure has been modeled
from the generation relationships of intermediate data sets
to analyze privacy propagation among data sets. We have
modeled the problem of saving privacy-preserving cost as a
constrained optimization problem which is addressed by
decomposing the privacy leakage constraints. A practical
heuristic algorithm has been designed accordingly. Evaluation
results on real-world data sets and larger extensive data
sets have demonstrated the cost of preserving privacy in
cloud can be reduced significantly with our approach over
existing ones where all data sets are encrypted.
93
Bibliography
[1] M. Armbrust, A. Fox, R. Griffith, A.D. Joseph, R. Katz,
A. Konwinski, G. Lee, D. Patterson, A. Rabkin, I. Stoica, and
M. Zaharia, “A View of Cloud Computing,” Comm. ACM, vol. 53,
no. 4, pp. 50-58, 2010.
[2] R. Buyya, C.S. Yeo, S. Venugopal, J. Broberg, and I.
Brandic, “Cloud Computing and Emerging It Platforms: Vision,
Hype, and Reality for Delivering Computing as the Fifth
Utility,” Future Generation Computer Systems, vol. 25, no. 6,
pp. 599-616, 2009.
[3] L. Wang, J. Zhan, W. Shi, and Y. Liang, “In Cloud, Can
Scientific Communities Benefit from the Economies of Scale?,”
IEEE Trans. Parallel and Distributed Systems, vol. 23, no. 2,
pp. 296-303, Feb. 2012.
[4] H. Takabi, J.B.D. Joshi, and G. Ahn, “Security and
Privacy Challenges in Cloud Computing Environments,” IEEE
Security & Privacy, vol. 8, no. 6, pp. 24-31, Nov./Dec. 2010.
94
[5] D. Zissis and D. Lekkas, “Addressing Cloud Computing
Security Issues,” Future Generation Computer Systems, vol.
28, no. 3, pp. 583-592, 2011.
[6] D. Yuan, Y. Yang, X. Liu, and J. Chen, “On-Demand Minimum
Cost Benchmarking for Intermediate Data Set Storage in
Scientific Cloud Workflow Systems,” J. Parallel Distributed
Computing, vol. 71, no. 2, pp. 316-332, 2011.
[7] S.Y. Ko, I. Hoque, B. Cho, and I. Gupta, “Making Cloud
Intermediate Data Fault-Tolerant,” Proc. First ACM Symp.
Cloud Computing (SoCC ’10), pp. 181-192, 2010.
[8] H. Lin and W. Tzeng, “A Secure Erasure Code-Based Cloud
Storage System with Secure Data Forwarding,” IEEE Trans.
Parallel and Distributed Systems, vol. 23, no. 6, pp. 995-
1003, June 2012.
[9] N. Cao, C. Wang, M. Li, K. Ren, and W. Lou, “Privacy-
Preserving Multi-Keyword Ranked Search over Encrypted Cloud
Data,” Proc. IEEE INFOCOM ’11, pp. 829-837, 2011.
[10] M. Li, S. Yu, N. Cao, and W. Lou, “Authorized Private
Keyword Search over Encrypted Data in Cloud Computing,” Proc.
31st Int’l Conf. Distributed Computing Systems (ICDCS ’11),
pp. 383-392, 2011.
95