MULTI SECURE: INTEGRATION OF BIG DATA AND CLOUD ...

11
e-ISSN: 2582-5208 International Research Journal of Modernization in Engineering Technology and Science ( Peer-Reviewed, Open Access, Fully Refereed International Journal ) Volume:04/Issue:05/May-2022 Impact Factor- 6.752 www.irjmets.com www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science [3960] MULTI SECURE: INTEGRATION OF BIG DATA AND CLOUD FOR EFFECTIVE MULTI SECURE DATA UTILITY WITH ACCESS VERIFICATION USING ABE Noorul Ameen J *1 *1 Assistant Professor, Department of Computer Science and Engineering, E.G.S. Pillay Engineering College, Nagapattinam, Tamilnadu, India. Shafeeq S* 2 , Tamilmaran D* 3 *2.3 UG, Student, Computer Science, And Engineering, E.G.S. Pillay Engineering College, Nagapattinam, Tamil Nadu, India. ABSTRACT In the Proposed System, the proposed proxy-based multi-cloud computing framework allows dynamic, on-the- fly collaborations and resource sharing among cloud-based services. The security issue is the major issue in the cloud. so data is principally categorized into 3 major parts, High Security, Medium Security, and low security. Modification of the project is apart from the proposed system, data is secured by encryption and stored in the cloud. As per the proposed system, we implement High security with data encryption, and the data is split into two parts and stored in two parts. In medium secured, data is encryption, and data are stored in a single cloud. In normal security, data is encrypted and stored. Keywords: Cloud Computing, Public Cloud, Private Cloud, Hybrid Cloud, Big Data, Hadoop. I. INTRODUCTION Recent growth in cloud computing benefits from its ability to provide software, infrastructure, and platform services without requiring large investments or expenses to manage and operate them. Clouds typically involve service providers, infrastructure resource providers, and service users. They include applications delivered as services, as well as the hardware and software systems providing these services. Recently, collaborative cloud computing has gradually attracted the attention of industry and academia. Liking the Internet is the inevitable stage of the development of network technologies, the collaborative cloud computing will be an inevitable trend of cloud computing development. Collaborative cloud computing successfully uses information technology as a service over the network and provides end-users with extremely strong computational capability and huge memory space at a low cost. Apart from the cost, collaborative cloud computing also supports the growing concerns of carbon emissions and environmental impact since collaborative cloud computing advocates better management of resources. Although all benefits are introduced by collaborative cloud computing, this new paradigm still faces several challenges related to trust computing, responding speed, and automatic resource match-making. All these challenges will introduce new holistic design, cooperative strategies, and distribution infrastructures. II. METHODOLOGY User Interface In this Module, the User interface is created so that the data owner will upload the data to the server the main objective of this module is to store and share the data by uploading the file to the remote machine. Cloud Server The data owner will upload their data to the cloud server and a request for a particular file is sent to the cloud server. Both the upload and the file request are handled by the main Cloud Server. During the file request is processed the main server will communicate with the data owner and the files are retrieved only after the approval is given to the data owner. Merkle Hash Tree In this module, data is encrypted, hashed then the data is uploaded to the server. Once the data is uploaded, the entire data is chunked into multiple parts using the Merkle Hash Tree algorithm and stored in separate data servers. In cryptography and computer science, a hash tree or Merkle tree is a tree in which every leaf node is labeled with the hash of a data block, and every non-leaf node is labeled with the cryptographic hash of the

Transcript of MULTI SECURE: INTEGRATION OF BIG DATA AND CLOUD ...

e-ISSN: 2582-5208 International Research Journal of Modernization in Engineering Technology and Science

( Peer-Reviewed, Open Access, Fully Refereed International Journal )

Volume:04/Issue:05/May-2022 Impact Factor- 6.752 www.irjmets.com

www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science

[3960]

MULTI SECURE: INTEGRATION OF BIG DATA AND CLOUD FOR EFFECTIVE

MULTI SECURE DATA UTILITY WITH ACCESS VERIFICATION USING ABE

Noorul Ameen J*1

*1Assistant Professor, Department of Computer Science and Engineering, E.G.S. Pillay Engineering

College, Nagapattinam, Tamilnadu, India.

Shafeeq S*2, Tamilmaran D*3 *2.3UG, Student, Computer Science, And Engineering, E.G.S. Pillay Engineering College, Nagapattinam,

Tamil Nadu, India.

ABSTRACT

In the Proposed System, the proposed proxy-based multi-cloud computing framework allows dynamic, on-the-

fly collaborations and resource sharing among cloud-based services. The security issue is the major issue in the

cloud. so data is principally categorized into 3 major parts, High Security, Medium Security, and low security.

Modification of the project is apart from the proposed system, data is secured by encryption and stored in the

cloud. As per the proposed system, we implement High security with data encryption, and the data is split into

two parts and stored in two parts. In medium secured, data is encryption, and data are stored in a single cloud.

In normal security, data is encrypted and stored.

Keywords: Cloud Computing, Public Cloud, Private Cloud, Hybrid Cloud, Big Data, Hadoop.

I. INTRODUCTION

Recent growth in cloud computing benefits from its ability to provide software, infrastructure, and platform

services without requiring large investments or expenses to manage and operate them. Clouds typically involve

service providers, infrastructure resource providers, and service users. They include applications delivered as

services, as well as the hardware and software systems providing these services. Recently, collaborative cloud

computing has gradually attracted the attention of industry and academia. Liking the Internet is the inevitable

stage of the development of network technologies, the collaborative cloud computing will be an inevitable

trend of cloud computing development. Collaborative cloud computing successfully uses information

technology as a service over the network and provides end-users with extremely strong computational

capability and huge memory space at a low cost. Apart from the cost, collaborative cloud computing also

supports the growing concerns of carbon emissions and environmental impact since collaborative cloud

computing advocates better management of resources. Although all benefits are introduced by collaborative

cloud computing, this new paradigm still faces several challenges related to trust computing, responding speed,

and automatic resource match-making. All these challenges will introduce new holistic design, cooperative

strategies, and distribution infrastructures.

II. METHODOLOGY

User Interface

In this Module, the User interface is created so that the data owner will upload the data to the server the main

objective of this module is to store and share the data by uploading the file to the remote machine.

Cloud Server

The data owner will upload their data to the cloud server and a request for a particular file is sent to the cloud

server. Both the upload and the file request are handled by the main Cloud Server. During the file request is

processed the main server will communicate with the data owner and the files are retrieved only after the

approval is given to the data owner.

Merkle Hash Tree

In this module, data is encrypted, hashed then the data is uploaded to the server. Once the data is uploaded, the

entire data is chunked into multiple parts using the Merkle Hash Tree algorithm and stored in separate data

servers. In cryptography and computer science, a hash tree or Merkle tree is a tree in which every leaf node is

labeled with the hash of a data block, and every non-leaf node is labeled with the cryptographic hash of the

e-ISSN: 2582-5208 International Research Journal of Modernization in Engineering Technology and Science

( Peer-Reviewed, Open Access, Fully Refereed International Journal )

Volume:04/Issue:05/May-2022 Impact Factor- 6.752 www.irjmets.com

www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science

[3961]

labels of its child nodes. Hash trees allow efficient and secure verification of the contents of large data

structures. Hash trees are a generalization of hash lists and hash chains.

Secure The Data

User will upload their data to the cloud server and a request for a particular file is sent to the cloud server. To

deploy our system we use Dropbox cloud storage to store our details. Here we store sensitive, normal, and less

secure information in three different formats by splitting four, three, and one respectively. User data will be

secure in three different stages high secure, normal, and low secure. Highly secured data will spill into four

parts, medium-level secure data will spill into three parts and low-level data will store as a single folder.

III. ARCHITECTURE DIAGRAM

Figure 1: The Uploading File

e-ISSN: 2582-5208 International Research Journal of Modernization in Engineering Technology and Science

( Peer-Reviewed, Open Access, Fully Refereed International Journal )

Volume:04/Issue:05/May-2022 Impact Factor- 6.752 www.irjmets.com

www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science

[3962]

Figure 2: The Downloading File

IV. MODELING AND ANALYSIS

Cloud Computing

Cloud computing is a fast-growing technology that has established itself in the next generation of the IT

industry and business. Cloud computing promises reliable software, hardware, and IaaS delivered over the

Internet and remote data center. cloud services have become a powerful architecture to perform complex

large-scale computing tasks and span a range of IT functions from storage and computation to database and

application services. The need to store, process, and analyze large amounts of datasets has driven many

organizations and individuals to adopt cloud computing. A large number of scientific applications for extensive

experiments are currently deployed in the cloud and may continue to increase because of the lack of available

computing facilities in local servers, reduced capital costs, and increasing volume of data produced and

consumed by the experiments. In addition, cloud service providers have begun to integrate frameworks for

parallel data processing in their services to help users access cloud resources and deploy their programs. Cloud

computing “is a model for allowing ubiquitous, convenient, and on-demand network access to several

configured computing resources (e.g., networks, server, storage, application, and services) that can be rapidly

provisioned and released with minimal management effort or service provider interaction”. Cloud computing

has several favorable aspects to address the rapid growth of economies and technological barriers. Cloud

computing provides total cost of ownership and allows organizations to focus on the core business without

worrying about issues, such as infrastructure, flexibility, and availability of resources.

The increasing popularity of wireless networks and mobile devices has taken cloud computing to new heights

because of the limited processing capability, storage capacity, and battery lifetime of each device. This

e-ISSN: 2582-5208 International Research Journal of Modernization in Engineering Technology and Science

( Peer-Reviewed, Open Access, Fully Refereed International Journal )

Volume:04/Issue:05/May-2022 Impact Factor- 6.752 www.irjmets.com

www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science

[3963]

condition has led to the emergence of a mobile cloud computing paradigm. Mobile cloud facilities allow users to

outsource tasks to external service providers. For example, data can be processed and stored outside of a

mobile device. Mobile cloud applications, such as Gmail, iCloud, and Dropbox, have become prevalent recently.

Juniper’s research predicts that cloud-based mobile applications will increase to approximately 9.5$ billion by

2014. Such applications improve mobile cloud performance and user experience. However, the limitations

associated with wireless networks and the intrinsic nature of mobile devices have imposed computational and

data storage restrictions.

Cloud Setup

Cloud Service providers will contain a large amount of data in their Data Storage. Also, the Cloud Service

provider will maintain all the User information to authenticate the User when are login into their account. The

User Information will be stored in the Database of the Cloud Service Provider. Also, the Cloud Server will

redirect the User requested job to the Resource Assigning Module to process the User requested Job. The

Request of all the Users will process by the Resource Assigning Module. To communicate with the Client and

with the other modules of the Cloud Network, the Cloud Server will establish a connection between them. For

Purpose, we are going to create a User Interface Frame. Also, the Cloud Service Provider will send the User Job

request to the Resource Assign Module in a First in First out (FIFO) manner.

Private Cloud

Private clouds are dedicated to one organization and do not share physical resources. The resource can be

provided in-house or externally. A typical underlying requirement of private cloud deployments is security

requirements and regulations that need strict separation of an organization’s data storage and processing from

accidental or malicious access through shared resources. Private cloud setups are challenging since the

economical advantages of scale are usually not achievable within most projects and organizations despite the

utilization of industry standards. The return of investment compared to public cloud offerings is rarely

obtained and the operational overhead and risk of failure are significant.

Public Cloud

Public clouds share physical resources for data transfers, storage, and processing. However, customers have

private visualized computing environments and isolated storage. Security concerns, which entice a few to adopt

private clouds or custom deployments, are for the vast majority of customers and projects irrelevant.

Visualization makes access to other customers’ data extremely difficult.

Hybrid Cloud

The hybrid cloud architecture merges private and public cloud deployments. This is often an attempt to achieve

security and elasticity, or provide cheaper baseload and burst capabilities. Some organizations experience short

periods of extremely high loads, as a result of seasonality like black Friday for retail, or marketing events like

sponsoring a popular TV event. These events can have a huge economic impact on organizations if they are

serviced poorly. The hybrid cloud provides the opportunity to serve the baseload with in-house services and

rent for a short period multiple resources to service the extreme demand. This requires a great deal of

operational ability in the organization to seamlessly scale between the private and public cloud. Tools for

hybrid or private cloud deployments exist like Eucalyptus for Amazon Web Services.

Big Data

Big data is an all-encompassing term for any collection of data sets so large and complex that it becomes

difficult to process using traditional data processing applications. The challenges include analysis, capture,

curation, search, sharing, storage, transfer, visualization, and privacy violations. The trend to larger data sets is

due to the additional information derivable from analysis of a single large set of related data, as compared to

separate smaller sets with the same total amount of data, allowing correlations to be found to "spot business

trends, prevent diseases, combat crime and so on. so we can implement big data in our project because every

employee has instructed information so we can make analyses on this data.

e-ISSN: 2582-5208 International Research Journal of Modernization in Engineering Technology and Science

( Peer-Reviewed, Open Access, Fully Refereed International Journal )

Volume:04/Issue:05/May-2022 Impact Factor- 6.752 www.irjmets.com

www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science

[3964]

Characteristics Of Big Data

Big data is a term utilized to refer to the increase in the volume of data that is difficult to store, process, and

analyze through traditional database technologies. The nature of big data is indistinct and involves

considerable processes to identify and translate the data into new insights. The term “big data” is relatively new

in IT and business. However, several researchers and practitioners have utilized the term in previous literature.

For instance, referred to big data as a large volume of scientific data for visualization. Several definitions of big

data currently exist.” Meanwhile and defined big data as characterized by three Vs: volume, variety, and

velocity. The terms volume, variety, and velocity were originally introduced by Gartner to describe the

elements of big data challenges. IDC also defined big data technologies as “a new generation of technologies and

architectures, designed to economically extract value from very large volumes of a wide variety of data, by

enabling the high-velocity capture, discovery, and/or analysis.” specified that big data is not only characterized

by the three Vs mentioned above but may also extend to four Vs, namely, volume, variety, velocity, and value

This 4V definition is widely recognized because it highlights the meaning and necessity of big data.

Figure 3: Four V’s of Big Data

Volume

Refers to the amount of all types of data generated from different sources and continues to expand. The benefit

of gathering large amounts of data includes the creation of hidden information and patterns through data

analysis Collecting longitudinal data requires considerable effort and underlying investments. Nevertheless,

such a mobile data challenge produced an interesting result similar to that in the examination of the

predictability of human behavior patterns or means to share data based on human mobility and visualization

techniques for complex data.

Variety

Refers to the different types of data collected via sensors, smartphones, or social networks. Such data types

include video, image, text, audio, and data logs, in either structured or unstructured format. Most of the data

generated from mobile applications are in an unstructured format. For example, text messages, online games,

blogs, and social media generate different types of unstructured data through mobile devices and sensors.

Internet users also generate an extremely diverse set of structured and unstructured data.

Velocity

Refers to the speed of data transfer. The contents of data constantly change because of the absorption of

complementary data collections, introduction of previously archived data or legacy collections, and streamed

data arriving from multiple sources.

Value

the most important aspect of big data; refers to the process of discovering huge hidden values from large

datasets with various types and rapid generation.

Classification Of Big Data

Big data are classified into different categories to better understand their characteristics showing the

numerous categories of big data. The classification is important because of large-scale data in the cloud. The

classification is based on five aspects: (i) data sources, (ii) content format, (iii) data stores, (iv) data staging, and

(v) data processing.

e-ISSN: 2582-5208 International Research Journal of Modernization in Engineering Technology and Science

( Peer-Reviewed, Open Access, Fully Refereed International Journal )

Volume:04/Issue:05/May-2022 Impact Factor- 6.752 www.irjmets.com

www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science

[3965]

Data sources include internet data, sensing and all stores of transnational information, ranging from

unstructured to highly structured are stored in various formats. Most popular is the relational database which

comes in a large number of varieties. As the result of the wide variety of data sources, the captured data differ

in size concerning redundancy, consistency, noise, etc.

Hadoop Background

Hadoop is an open-source Apache Software Foundation project written in Java that enables the distributed

processing of large datasets across clusters of the commodity. Hadoop has two primary components, namely,

HDFS and Map Reduce programming framework. The most significant feature of Hadoop is that HDFS and Map

Reduce are closely related to each other; each is co-deployed such that a single cluster is produced. Therefore,

the storage system is not physically separated from the processing system.

HDFS is a distributed file system designed to run on top of the local file systems of the cluster nodes and store

extremely large files suitable for streaming data access. HDFS is highly faulted tolerant and can scale up from a

single server to thousands of machines, each offering local computation and storage. HDFS consists of two types

of nodes, namely, a name node called “master” and several data nodes called “slaves.” HDFS can also include

secondary name nodes. The name node manages the hierarchy of file systems and director namespace (i.e.,

metadata). File systems are presented in a form of a name node that registers attributes, such as access time,

modification, permission, and disk space quotas. The file content is split into large blocks, and each block of the

file is independently replicated across data nodes for redundancy and to periodically send a report of all

existing blocks to the name node.

Map Reduce is a simplified programming model for processing large numbers of datasets pioneered by Google

for data-intensive applications. The Map-Reduce model was developed based on GFS and is adopted through

open-source Hadoop implementation, which was popularized by Yahoo. Apart from the Map-Reduce

framework, several other current open-source Apache projects are related to the Hadoop ecosystem, including

Hive, Hbase, Mahout, Pig, Zookeeper, Spark, and Avro. Twister provides support for efficient and iterative Map

Reduce computations. An overview of current Map-Reduce projects and related software is shown in Map

Reduce allows an inexperienced programmer to develop parallel programs and create a program capable of

using computers in a cloud. In most cases, programmers are required to specify two functions only: the map

function (mapper) and the reduce function (reducer) commonly utilized in functional programming. The

mapper regards the key/value pair as input and generates intermediate key/value pairs. The reducer merges

all the pairs associated with the same (intermediate) key and then generates an output. Summarizes the

process of the map/reduce function.

Map Reduce in Clouds

Map Reduce accelerates the processing of large amounts of data in a cloud; thus, Map Reduce, is the preferred

computation model of cloud providers. Map Reduce is a popular cloud computing framework that robotically

performs scalable distributed applications and provides an interface that allows for parallelization and

distributed computing in a cluster of servers. The approach is to apply scientific computing problems to the

Map-Reduce framework where scientists can efficiently utilize existing resources in the cloud to solve

computationally large-scale scientific data.

Currently, many alternative solutions are available to deploy Map Reduce in cloud environments; these

solutions include using cloud Map Reduce runtimes that maximize cloud infrastructure services, using Map

Reduce as a service or setting up one׳s own Map-Reduce cluster in cloud instances. Several strategies have been

proposed to improve the performance of big data processing. Moreover, the effort has been exerted to develop

SQL interfaces in the Map-Reduce framework to assist programmers who prefer to use SQL as a high-level

language to express their tasks while leaving all of the execution optimization details to the backend.

Research Challenges

Although cloud computing has been broadly accepted by many organizations, research on big data in the cloud

remains in its early stages. Several existing issues have not been fully addressed. Moreover, new challenges

continue to emerge from applications by organizations. In the subsequent sections, some of the key research

challenges, such as scalability, availability, data integrity, data transformation, data quality, data heterogeneity,

privacy, legal issues, and regulatory governance, are discussed.

e-ISSN: 2582-5208 International Research Journal of Modernization in Engineering Technology and Science

( Peer-Reviewed, Open Access, Fully Refereed International Journal )

Volume:04/Issue:05/May-2022 Impact Factor- 6.752 www.irjmets.com

www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science

[3966]

Scalability

Scalability is the ability of the storage to handle increasing amounts of data appropriately. Scalable distributed

data storage systems have been a critical part of cloud computing infrastructures The lack of cloud computing

features to support RDBMSs associated with enterprise solutions has made RDBMSs less attractive for the

deployment of large-scale applications in the cloud. This drawback has resulted in the popularity of NoSQL. A

NoSQL database provides the mechanism to store and retrieve large volumes of distributed data. The features

of NoSQL databases include schema-free, easy replication support, simple API, and consistent and flexible

modes. Different types of NoSQL databases, such as key-value column-oriented, and document-oriented,

provide support for big data. Shows a comparison of various NoSQL database technologies that provide support

for large datasets.

Availability

Availability refers to the resources of the system accessible on demand by an authorized individual. In a cloud

environment, one of the main issues concerning cloud service providers is the availability of the data stored in

the cloud. For example, one of the pressing demands on cloud service providers is to effectively serve the needs

of the mobile user who requires single or multiple data within a short amount of time. Therefore, services must

remain operational even in the case of a security breach. In addition, with the increasing number of cloud users,

cloud service providers must address the issue of making the requested data available to users to deliver high-

quality services. Lee et al. introduced a multi-cloud model called “rain clouds” to support big data exploitation.

“Rain clouds” involve cooperation among single clouds to provide accessible resources in an emergency.

Schroeck et al. predicted that the demand for more real-time accesso to data may continue to increase as

business models evolve and organizations invest in technologies required for streaming data and Smartphones.

Data Integrity

A key aspect of big data security is integrity. Integrity means that data can be modified only by authorized

parties or the data owner to prevent misuse. The proliferation of cloud-based applications provides users the

opportunity to store and manage their data in cloud data centers. Such applications must ensure data integrity.

However, one of the main challenges that must be addressed is to ensure the correctness of user data in the

cloud.

Transformation

Transforming data into a form suitable for analysis is an obstacle to the adoption of big data. Owing to the

variety of data formats, big data can be transformed into an analysis workflow in two ways.

Figure 4: Transformation Process

Data Quality

In the past, data processing was typically performed on clean datasets from well-known and limited sources.

Therefore, the results were accurate. However, with the emergence of big data, data originate from many

different sources; not all of these sources are well-known or verifiable. Poor data quality has become a serious

problem for many cloud service providers because data are often collected from different sources. For example,

huge amounts of data are generated from smartphones, where inconsistent data formats can be produced as a

result of heterogeneous sources.

Heterogeneity

Variety, one of the major aspects of big data characterization, is the result of the growth of virtually unlimited

different sources of data. This growth leads to the heterogeneous nature of big data. Data from multiple sources

e-ISSN: 2582-5208 International Research Journal of Modernization in Engineering Technology and Science

( Peer-Reviewed, Open Access, Fully Refereed International Journal )

Volume:04/Issue:05/May-2022 Impact Factor- 6.752 www.irjmets.com

www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science

[3967]

are generally of different types and representation forms and are significantly interconnected; they have

incompatible formats and are inconsistently represented. In a cloud environment, users can store data in a

structured, semi-structured, or unstructured format. Structured data formats are appropriate for today’s

database systems, whereas semi-structured data formats are appropriate only to some extent. Unstructured

data are inappropriate because they have a complex format that is difficult to represent in rows and columns.

Privacy

Privacy concerns continue to hamper users who outsource their private data into cloud storage. This concern

has become serious with the development of big data mining and analytics, which require the personal

information to produce relevant results, such as personalized and location-based services. Information on

individuals is exposed to scrutiny, a condition that gives rise to concerns about profiling, stealing, and loss of

control.

Legal/Regulatory Issues

Specific laws and regulations must be established to preserve the personal and sensitive information of users.

Different countries have different laws and regulations to achieve data privacy and protection. In several

countries, monitoring of company staff communications is not allowed. However, electronic monitoring is

permitted under special circumstances. Therefore, the question is whether such laws and regulations offer

adequate protection for individuals ׳data while enjoying the many benefits of big data in the society at large.

Governance

Data governance embodies the exercise of control and authority over data-related rules of law, transparency,

and accountabilities of individuals and information systems to achieve business objectives. The key issues of

big data in cloud governance pertain to applications that consume massive amounts of data streamed from

external sources. Therefore, a clear and acceptable data policy about the type of data that needs to be stored,

how quickly an individual needs to access the data, and how to access the data must be defined.

Big data governance involves leveraging information by aligning the objectives of multiple functions, such as

telecommunication carriers having access to vast troves of customer information in the form of call detail

records and marketing seeking to monetize this information by selling it to third parties.

Moreover, big data provides significant opportunities to service providers by making information more

valuable. However, policies, principles, and frameworks that strike a stability between risk and value in the face

of increasing data size and delivering better and faster data management technology can create huge

challenges.

Cloud governance recommends the use of various policies together with different models of constraints that

limit access to underlying resources. Therefore, adopting governance practices that maintain a balance

between risk exposure and value creation is a new organizational imperative to unlock competitive advantages

and maximize value from the application of big data in the cloud.

Open Research Issues

Numerous studies have addressed several significant problems and issues about the storage and processing of

big data in clouds. The amount of data continues to increase at an exponential rate, but the improvement in the

processing mechanisms is relatively slow. Only a few tools are available to address the issues of big data

processing in cloud environments. State-of-the-art techniques and technologies in many important big data

applications (i.e., Map Reduce, Dryad, Pregel, PigLatin, MongoDB, Hbase, SimpleDB, and Cassandra) cannot

solve the actual problems of storing and querying big data. For example, Hadoop and Map Reduce lack query

processing strategies and have low-level infrastructures concerning data processing and management. Despite

the plethora of work performed to address the problem of storing and processing big data in cloud computing

environments, certain important aspects of storing and processing big data in cloud computing are yet to be

solved. Some of these issues are discussed in the subsequent subsections.

Data Staging

The most important open research issue regarding data staging is related to the heterogeneous nature of data.

Data gathered from different sources do not have a structured format. For instance, mobile cloud-based

applications, blogs, and social networking are inadequately structured similar to pieces of text messages,

videos, and images. Transforming and cleaning such unstructured data before loading those into the warehouse

e-ISSN: 2582-5208 International Research Journal of Modernization in Engineering Technology and Science

( Peer-Reviewed, Open Access, Fully Refereed International Journal )

Volume:04/Issue:05/May-2022 Impact Factor- 6.752 www.irjmets.com

www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science

[3968]

for analysis are challenging tasks. Efforts have been exerted to simplify the transformation process by adopting

technologies such as Hadoop and Map Reduce to support the distributed processing of unstructured data

formats. However, understanding the context of unstructured data is necessary, particularly when meaningful

information is required. Map Reduce programming model is the most common model that operates in clusters

of computers; it has been utilized to process and distribute large amounts of data.

Distributed Storage Systems

Numerous solutions have been proposed to store and retrieve massive amounts of data. Some of these

solutions have been applied in a cloud computing environment. However, several issues hinder the successful

implementation of such solutions, including the capability of current cloud technologies to provide the

necessary capacity and high performance to address massive amounts of data, optimization of existing file

systems for the volumes demanded by data mining applications, and how data can be stored in such a manner

that they can be easily retrieved and migrated between servers.

Data Analysis

The selection of an appropriate model for large-scale data analysis is critical. Talia pointed out that obtaining

useful information from large amounts of data requires scalable analysis algorithms to produce timely results.

However, current algorithms are inefficient in terms of big data analysis. Therefore, efficient data analysis tools

and technologies are required to process such data. Each algorithm performance ceases to increase linearly

with increasing computational resources. As researchers continue to probe the issues of big data in cloud

computing, new problems in big data processing arise from the transitional data analysis techniques. The speed

of stream data arriving from different data sources must be processed and compared with historical

information within a certain period. Such data sources may contain different formats, which makes the

integration of multiple sources for analysis a complex task.

Data Security

Although cloud computing has transformed modern ICT technology, several unresolved security threats exist in

cloud computing. These security threats are magnified by the volume, velocity, and variety of big data.

Moreover, several threats and issues, such as privacy, confidentiality, integrity, and availability of data, exist in

big data using cloud computing platforms. Therefore, data security must be measured once data are outsourced

to cloud service providers. The cloud must also be assessed at regular intervals to protect it against threats.

Cloud vendors must ensure that all service level agreements are met. Recently, some controversies have

revealed how some security agencies use data generated by individuals for their benefit without permission.

therefore, policies that cover all user privacy concerns should be developed. Traditionally, the most common

technique for privacy and data control is to protect the systems utilized to manage data rather than the data

itself; however, such systems have proven to be vulnerable. Utilizing strong cryptography to encapsulate

sensitive data in a cloud computing environment and developing a novel algorithm that efficiently allows for

key management and secure key exchange is important to manage access to big data, particularly as they exist

in the cloud independent of any platform. Moreover, the integrity issue is that previously developed hashing

schemes are no longer applicable to large amounts of data. Integrity verification is also difficult because of the

lack of support, remote data access, and the lack of information on internal storage.

e-ISSN: 2582-5208 International Research Journal of Modernization in Engineering Technology and Science

( Peer-Reviewed, Open Access, Fully Refereed International Journal )

Volume:04/Issue:05/May-2022 Impact Factor- 6.752 www.irjmets.com

www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science

[3969]

Figure 5: Data Flow Diagram for Uploading the Data

Figure 6: Data Flow Diagram for Downloading the Data

V. RESULTS AND DISCUSSION

The main focus of the project is to upgrade the cloud security implementation for sensitive data's military cyber

part. the core heart of this project is to categorize data based on the attributes like low, medium, and high. So

that we can handle the data based on the severity. This project is trying to solve the cloud security issues

especially since we are providing multi-level security based on the file need. This project is very useful for

e-ISSN: 2582-5208 International Research Journal of Modernization in Engineering Technology and Science

( Peer-Reviewed, Open Access, Fully Refereed International Journal )

Volume:04/Issue:05/May-2022 Impact Factor- 6.752 www.irjmets.com

www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science

[3970]

single end-users, enterprise-level users, and also government sectors like the defensive part in the cyber world.

The main future development contribution of this project is to enhance cloud security with the help of multi-

clouding. The three main beneficiaries of this project are the government cyber world, Information Technology

Industry, and also a single end-user. Enterprise-level implementation is also possible if they are adopted to this

environment. Because they have their authorization for every employee.

VI. CONCLUSION

In this project, we have to implement the multi-level cloud security in google drive & google dropbox thus the

paper infers that the user’s information will be stored in the cloud based on the sensitivity of the data. Users

were maintaining more data on the cloud but security for that data is less. so we secure the data on three

different levels.

VII. REFERENCES [1] M. Singhal et al., “Collaboration in multi-cloud computing environments: Framework and security

issues,” Computer, vol. 46, no. 2, pp. 76–84, 2013.

[2] D. Niyato, Z. Kun, and P. Wang, “Cooperative virtual machine management for multiorganization cloud

computing environment,” in Proc. Int. Workshop Game Theory Communication. Netw. (Gamescom),

Paris, France, May 2011, pp. 528–537.

[3] Y. Demchenko, M. X. Makkes, R. Strijkers, and C. de Laat, “Intercloud architecture for interoperability

and integration,” in Proc. IEEE 4th Int. Conf. Cloud Compute. Technol. Sci. (Cloud Computing), vol. 1.

Dec. 2012, pp. 666–674.

[4] M. Shojafar, N. Cordeschi, and E. Baccarelli, “Energy-efficient adaptive resource management for real-

time vehicular cloud services,” IEEE Trans. Cloud Compute., to be published, DOI:

10.1109/TCC.2016.2551747.

[5] I. Butun, M. Erol-Kantarci, B. Kantarci, and H. Song, “Cloud-centric multi-level authentication as a

service for secure public safety device networks,” IEEE Communication. Mag., vol. 54, no. 4, pp. 47–53,

Apr. 2016.

[6] H. F. Mohammadi, R. Prodan, and T. Fahringer, “A truthful dynamic workflow scheduling mechanism

for commercial multi-cloud environments,” IEEE Trans. Parallel Distribute. Syst., vol. 24, no. 6, pp.

1203–1212, Jun. 2013.

[7] F. Paraiso, N. Haderer, P. Merle, R. Rouvoy, and L. Seinturier, “A federated multi-cloud PaaS

infrastructure,” in Proc. 5th IEEE Int. Conf. Cloud Compute. (CLOUD), Jun. 2012, pp. 392–399.

[8] H. Shen and G. Liu, “An efficient and trustworthy resource sharing platform for collaborative cloud

computing,” IEEE Trans. Parallel Distribute. Syst., vol. 25, no. 4, pp. 862–875, Apr. 2014.

[9] X. Li, H. Ma, F. Zhou, and W. Yao, “T-broker: A trust-aware service brokering scheme for multiple cloud

collaborative services,” IEEE Trans. Inf. Forensics Security, vol. 10, no. 7, pp. 1402–1415, Jul. 2015.

[10] K. M. Khan and Q. Malluhi, “Establishing trust in cloud computing,” IT Prof., vol. 12, no. 5, pp. 20–27,

2010.

[11] X. Li, H. Ma, F. Zhou, and X. Gui, “Service operator-aware trust scheme for resource matchmaking across

multiple clouds,” IEEE Trans. Parallel Distribute. Syst., vol. 26, no. 4, pp. 1419–1429, May 2014, DOI:

10.1109/TPDS.2014.2321750.

[12] W. Fan and H. Perros, “A novel trust management framework for multi-cloud environments based on

trust service providers,” Knowledge-Based System., vol. 70, pp. 392–406, Nov. 2014.

[13] K. Hwang and D. Li, “Trusted cloud computing with secure resources and data coloring,” IEEE Internet

Compute., vol. 14, no. 5, pp. 14–22, Sep./Oct. 2010.

[14] H. Kim, H. Lee, W. Kim, and Y. Kim, “A trust evaluation model for QoS guarantee in cloud systems,” Int. J.

Grid Distribute. Compute., vol. 3, no. 1, pp. 1–10, Mar. 2010.

[15] P. D. Manuel, S. T. Selvi, and M. I. A. El Barr, “Trust management system for grid and cloud resources,” in

Proc. 1st Int. Conf. Adv. Compute. (ICAC), Dec. 2009, pp. 176–181.