0% found this document useful (0 votes)
54 views9 pages

Big Data Term Paper

This document discusses big data query processing. It begins by defining big data as very large datasets that are difficult to capture, store, manage and analyze using traditional database management systems. It then discusses the 3 V's of big data - volume, variety and velocity. The document outlines techniques used to ensure privacy in big data, including access limitation, data obfuscation, encryption and anonymization. It concludes with an overview of the stages of big data - data acquisition, extraction, collation, structuring and analysis.

Uploaded by

Yasar Bilal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views9 pages

Big Data Term Paper

This document discusses big data query processing. It begins by defining big data as very large datasets that are difficult to capture, store, manage and analyze using traditional database management systems. It then discusses the 3 V's of big data - volume, variety and velocity. The document outlines techniques used to ensure privacy in big data, including access limitation, data obfuscation, encryption and anonymization. It concludes with an overview of the stages of big data - data acquisition, extraction, collation, structuring and analysis.

Uploaded by

Yasar Bilal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Big Data: Query Processing

Authors:

Dr.M. Shanmukhi1, Dr. Attili Venkata Ramana2, Dr. Annaluri Sreenivasa Rao3, B. Madhuravani4, N. Chandra Sekhar

Abstract

Big data is a collection of data sets which is very large in dimension as well as complicated. Normally dimension
of the data is Petabyte and Exabyte. Traditional data source systems is not able to record, store as well as analyse
this huge amount of information. As the network is growing, amount of large information continuously grow.
Huge data analytics supply brand-new ways for organisations and need to analyse unstructured data. Large
information is one of the most chatted subject in IT sector. It is going to play a vital role in future. Big Data
alters the way that information is handled and utilized. Since Big Data call for high computational power and
also huge storage, distributed systems are utilized. As several parties are involved in these systems, the danger
of privacy infraction is increased. There have actually been a number of privacypreserving mechanisms created
for privacy security at various stages (e.g., information generation, information storage space, and data handling)
of a large data life process. The objective of this paper is to provide a thorough introduction of big data and the
privacy preservation devices in large data.

Keywords--- Big data, Petabyte, Exabyte, Database, velocity, volume, variety

I.Introduction
The Big Data is collection of datasets whose dimension is high and the capacity is generally made use of software
application devices to catch, handle, and also prompt to assess that quantity of information. The amount of
information to be assessed is anticipated to increase every 2 years (IDC, 2012). All these information are
extremely frequently disorganized as well as from numerous resources such as social networks, sensing units,
clinical applications, monitoring, video clip as well as picture archives, Web search indexing, clinical
documents, company deals and also system logs. Huge information is obtaining a focus because the variety of
gadgets linked to the data, suppose " Web of Points" (IoT) is still raising to unexpected degrees, creating big
quantities of information which have to be changed right into beneficial details. In addition, it is incredibly
popular to acquire on-demand extra computer power as well as storage space from public cloud suppliers to
execute extensive data and parallel handling. In this way, different methods and also personal privacy concerns
could be possibly improved by the quantity, selection, as well as broad location implementation of the system
facilities to sustain Big Data applications.

Information is created in big quantities throughout us. Every electronic procedure and also social networks
exchange generates it[1]. Solutions, such as sensing units as well as mobile units generats more data, With the
improvement in modern technology, this information is being taped as well as purposeful worth is being drawn
out from it. Big data is a developing term that defines any kind of extensive quantity of structured, semistructured
and also disorganized information that has the prospective to be extracted for details.

The 3Vs that specify Big Data are Volume, Variety and also Velocity depicted in fig 1.

1) Volume: There has actually been a rapid development in the volume of information that is being managed.
Information is not simply in the type of message information, however likewise through video clips, songs and
also huge photo data. Information is currently saved in regards to Terabytes or even Petabytes in various
business. With the development of the data source, we should re-evaluate the design as well as applications
constructed to take care of the information.
244

2) Variety: Information is streaming in at unmatched rate and also should be managed in a prompt way. RFID
tags, sensing units and also wise metering are driving to manage gushes of information in real time. Responding
swiftly sufficient to handle information speed is a difficulty for many companies.

3) Velocity: Today, information can be found in all sorts of styles. Structured, numerical information in
conventional data sources. Info developed from line-of-business applications. Disorganized message records,
email, video clip, sound, supply ticker information and also monetary deals. We should locate means of
controlling, combining as well as handling these varied kinds of information.

Velocity

There are 2 various other metrics of specifying Big Data

1) Variability: Along with the enhancing rates as well as ranges of information, information circulations could
be extremely irregular with routine heights. Daily, event-triggered as well as seasonal height information lots
could be testing and also a lot more with disorganized information entailed.[2]

2) Complexity: Today's information originates from numerous resources. And also it is still a task to web link,
suit, clean and also change information throughout systems. It is required to associate and also attach
partnerships, pecking orders and also numerous information links or your information could rapidly spiral
uncontrollable. An information atmosphere could exist along the extremes on any type of among the adhering
to specifications, or a mix of them, or also all them with each other.

In order to make certain huge information personal privacy, numerous devices have actually been created in
recent times. These devices could be organized based upon the phases of huge information life process, i.e.,
information generation, storage space, as well as handling. In information generation stage, for the security of
personal privacy, gain access to limitation and also misstating information methods are utilized. While
accessibility limitation methods attempt to restrict the accessibility to people personal information, misstating
Jour of Adv Research in Dynamical & Control Systems, Vol. 10, 07-Special Issue, 2018
information methods change the initial information prior to they are launched. The strategies to personal privacy
security in information storage space stage are generally based on file encryption strategies. File encryption

ISSN 1943-023X
Received: 22 Apr 2018/Accepted: 12 Jun 2018
Big Data : Query Processing
based strategies could be additionally split right into Attribute Based Encryption (ABE),. Identification based
Encryption (IBE), and also storage space course security. Additionally, to safeguard the delicate info, crossbreed
clouds are utilized where delicate information are kept secretive cloud. The information handling stage consists
of personal privacy preserving data publishing (PPDP) and also understanding removal from the information. In
PPDP, anonymization strategies such as generalization as well as suppresions are utilized to shield the personal
privacy of information. Making certain the energy of the information while maintaining the personal privacy is
a great difficulty in PPDP.

Stages In Big Data


Data Acquisition: The initial step in Big Data is getting the information itself. With the expanding tool the price
of information generation is increasing significantly. With the intro of wise tools which are made use of with a
broad selection of sensing units continually produce information. The Huge Haudron Collider in Switzerland
creates petabytes of information. The majority of this information is not valuable as well as could be disposed
of, nevertheless as a result of its disorganized kind; uniquely throwing out the information offers a difficulty.
When it's combined with various other beneficial, this information ends up being even more powerful in nature
information and also lay over. As a result of the interconnectedness of tools over the web, information is
significantly being collected as well as saved in the cloud.

Data Extraction: This stage describes keeping as well as taking care of massive information collections. An
information storage space system contains 2 components i.e., equipment facilities and also information
monitoring [3] Equipment facilities describes making use of info as well as interactions innovation (ICT) sources
for different jobs (such as dispersed storage space). Information monitoring describes the collection of software
program released in addition to equipment framework to take care of as well as inquire huge range information
collections.

Data Collation: Information from a single resource frequently is not nearly enough for evaluation or forecast.
Greater than one information resources are usually incorporated to offer a larger photo to examine. A health and
wellness screen application frequently accumulates information from the heart-rate sensing unit, digital
pedometer, and so on to sum up the wellness info of the customer. Weather condition forecast software
application take in information from lots of resources which expose the day-to-day moisture, temperature level,
rainfall, and so on. In the system of Big Data merging of information to create a larger image is typically taken
into consideration an essential component of processing.

Data Structuring: Once the information is structured, questions are made on the information and also the
information is offered in a visual aesthetic style. Information Evaluation entails targeting locations of rate of
interest and also supplying outcomes based on the information that has. As Soon As all the information is
accumulated, it is crucial to existing as well as shop information for additional usage in an organized layout. The
structuring is necessary so questions could be made on the information. Information structuring uses approaches
of arranging the information in a specific schema. Numerous brand-new systems, such as NoSQL, could question
also on disorganized information and also are being significantly utilized for Big Data Evaluation. A significant
concern with large information is offering live outcomes and also as a result structuring of aggregated information
has to be done at a fast speed.

Data Visualization: Once the information is structured, inquiries are made on the information as well as the
information exists in a aesthetic layout. Information Evaluation includes targeting locations of passion as well as
offering outcomes based upon the information that has been structured. Information having ordinary temperature
levels are revealed together with water intake prices to compute a connection between them. This evaluation as
well as discussion of information makes it all set for usage for customers. Raw information could not be made
use of to obtain understandings or for evaluating patterns, consequently "humanizing" the information ends up
being even more crucial.

Data Interpretation: Information Analysis: The utmost action in Big Data handling consists of analysis and also
getting important data from the information that is refined. The details acquired could be of 2 kinds:
Retrospective Analysis consists of acquiring understandings concerning occasions as well as activities that have
actually currently occurred. Information regarding the tv viewership for a program in various locations could
assist us evaluate the appeal of the program in those locations. Prospective Analysis consists of evaluating
patterns as well as critical fads for future from information that is currently been created. Weather Forecast
utilizing bigdata analysis is an instance of possible data analysis. Troubles accumulating from such analyses
concern deceptive as well as fallacious fads being forecasted. This is especially hazardous because of a boosting
dependence on information for essential choices. If a specific sign is outlined versus the possibility of being

246

ISSN 1943-023X
Received: 22 Apr 2018/Accepted: 12 Jun 2018
Jour of Adv Research in Dynamical & Control Systems, Vol. 10, 07-Special Issue, 2018
detected with a specific illness, it could result in false information regarding the sign being triggered as a result
of the condition itself. Insights got from information analysis are consequently crucial as well as the main factor
for handling big data as well. All paragraphs need to be indented. All paragraphs should be warranted, i.e. both
right-justified as well as left-justified.

III.Big Data And Privacy


Keeping high quantity information is not a large difficulty because of the improvement in information storage
space modern technologies such as the boom in cloud computer. Protecting the information is extremely testing.
if the huge information storage space system is jeopardized it could be really hazardous as people' individual
details could be divulged. We require to make certain that the kept information are shielded versus such hazards.
In contemporary info systems, information centers play an essential duty of executing complicated commutations
and also getting huge quantity of information. In dispersed atmosphere, an application might require numerous
datasets from various information centers and also for that reason encounter the obstacle of personal privacy
defense. The standard safety devices to secure information could be separated right into 4 classifications. They
are data degree information safety and security plans, data source degree information safety systems, media
degree safety and security systems as well as application degree security systems [4] The traditional system to
safeguard information designs (i.e., straight connected storage space, network affixed storage space and also
storage space location network) [5] have actually been a really warm research study location however might not
be straight appropriate to large information analytics system. In reaction to the 3V's nature of the big data
analytics, the storage space framework must be scalable. It needs to have the capability to be set up dynamically
to suit varied applications. One appealing modern technology to resolve these demands is storage space
virtualization, made it possible for by the arising cloud calculating standard [6] Storage space virtualization is
procedure where numerous network storage space gadgets are integrated right into what seems a solitary storage
space gadget utilizing a cloud solution supplied by cloud service provider indicates that the company's
information will certainly be contracted out to a 3rd party such as cloud service provider., this can impact the
personal privacy of the information.

Data security mainly has three dimensions, confidentiality, integrity and availability [7] The first 2 are straight
associated to personal privacy of the information i.e., if information privacy or stability is breached it will
certainly have a straight impact on customers personal privacy. We will certainly additionally go over personal
privacy concerns associated with discretion as well as honesty of information in this area. A standard demand for
huge information storage space system is to safeguard the personal privacy of a person. There are some existing
systems to satisfy that need. A sender could secure his information utilizing pubic key encryption (PKE) in such
a manner in which just the legitimate recipient could decrypt the information.

The advent of Big Data has presented key challenges in terms of Data Security. There is an increasing need of
research in technologies that can handle the large volume of Data and make it secure efficiently. Current
Technologies for securing data are slow when applied to huge amounts of data. Traditional data encryption
models ensure secure transmission with high computational resources, which could significantly decrease the
communication performance. The large collection of data such as the accelerometer, clock, microphone, light
sensor, thermometer, and compass. Virtually every data are time-stamped and enabling it to be paired with data
values of other devices. Computations on the data could be complex to process user queries on the devices. To
ensure data security, many encryption models have been implemented to encrypt data on large databases, some
of the traditional data security schemes are described below.

Attribute-based encryption and decryption scheme was firstly suggested by Sahai and Waters in the year 2005.
In order to gain more security and access control, the above scheme was proposed. This is considered as an
important objective for this algorithm. Users' attributes act as a basic component for the generation of secret key
and cipher text. Decryption is effective and possible if and only if, secret key, as well as cipher text, are equivalent
to threshold d. This algorithm is collision resistant. The major flaw of this approach is that users' public keys are
essential for the process of encryption through data owner. As monotonic attributes are included here, the
approach restricts the application of the model in real world scenarios.

The conventional ABE scheme is extended to give rise Key policy attribute-based encryption scheme. Access
tree structures are used for the representation of this model. Each user is associated with an access tree. Threshold
gates are used to denote the nodes of these access trees whereas, leaf nodes are represented by attributes
respectively. In the case of KPABE, both ciphertext and secret key are merged with attributes. Here while cipher
text is merged with attributes, the secret key is integrated with monotonic access trees. The combined components
are responsible for management and control of cipher text in decryption. It can be successfully applied in one -

ISSN 1943-023X
Received: 22 Apr 2018/Accepted: 12 Jun 2018
Big Data : Query Processing
to-many communications channels. The only lacuna of this approach is that it has no control on who has
decryption rights.

CPABE scheme was introduced by Sahai. This scheme follows the basic concept of the merger of the cipher text
with access policy as well as the combination of secret keys with attributes. Users are able to decrypt ciphertext
if and only if the respective attributes satisfy the access policy. The idea of CP-ABE is just the reverse mechanism
of KP-ABE. CP-ABE acts as the basic unit for many other schemes because of its flexible nature. KP-ABE has
the disadvantage of no control on decryption rights. This issue of the previous approach is resolved here. It has
also implementation in real world scenario. This proposed scheme faces also a severe problem, that is: - It cannot
be implemented in an enterprise scenario. This flaw occurs because of low flexibility and inefficiency/ poor
efficiency. The whole process of decryption requires attributes of a single set. Thus, users are eligible to select
single attribute or combination of attributes from that specified set. Later this disadvantage of CP-ABE scheme
was encountered and overcome by another newly developed approach, i.e., CP-ASBE (Ciphertext Policy
Attribute Set Based Encryption). In this approach, key attributes can be selected from various different sets of
attribute sets. This problem of the prior model was resolved here. This scheme is not effective for merging
attributes in case of multiple keys.

Hierarchical attribute-based encryption model was developed by Wang. The whole model was represented by the
hierarchical structure. The key generation process is carried out by a root master, which interacts with several
different domain masters. Every domain master again interacts with numbers of enterprise users. The said model
has its applications in cloud enterprise domain and in proxy re-encryption. This scheme is theoretical one and
it’s impossible and too expensive to implement practically. Conjunctive clause attributes are managed by same
domain authority, whereas similar attributes are managed by multiple domain authorities.

All the previously described schemes are categorized under monotonic access structures. Monotonic access
structures do not contain any negative constraints. A novel modified version of attribute-based encryption along
with non-monotonic access structure was implemented. The negative constraints are generated only in case of
non-monotonic access structures and these constraints are absent in the case of monotonic access structure. The
only drawback of this method is that the data overhead is increased exponentially because of these above said
negative constraints. The negative constraints increase the data overload but do not relate to data.

IV. Methodology

Difficulty in Big Data is inquiry handling on encrypted information. Presently, questions in both disorganized
And also organized encrypted information require decryption of the information. Because of huge quantities of
information this could take substantial quantities of time as well as Question Handling could take considerable
time. We currently check into an alternate plan of information file encryption.

ABE [8], [9],[12] is an encryption technique which ensures end to end big data privacy in cloud storage system.
In ABE access polices are defined by data owner and data are encrypted under those policies. The data can only
be decrypted by the users whose attributes satisfy the access policies defined by the data owner. When dealing
with big data one may often need to change data access policies as the data owner may have to share it with
different organizations. The current attribute based access control schemes [10], [11] do not consider policy
updating. The proposed system is depicted in Fig 2.

248

ISSN 1943-023X
Received: 22 Apr 2018/Accepted: 12 Jun 2018
Jour of Adv Research in Dynamical & Control Systems, Vol. 10, 07-Special Issue, 2018

In the proposed system shown in fig 2, the user generates the key using CP ABE technique which is used for
encryption and decryption. The generated query is given to MD5 algorithm. The generated hash code is given to
encryption algorithm DES to encrypt the data. The processed results are given to decryption algorithm which are
send to the user in decrypted form.

V. RESULTS

Consider the table:


NAME EID DEPT
SAMUEL 123 MECHANICAL

Query: SELECT ENAME WHERE EID=123;

The generated hash code:

e5bcbe015bcfd4fee438bd851be9323b6006e7c4dff95adb2443b656d3a7a92ba8d2a0944789bb8b4172763f769d

a Encrypted Data:

!e$&$!abe$&#(d#!!ddl$(leb$!qdcd$gaag!i&d(##qe@($cddd$geg(d@i@qc$@l(c@aqddilq$$l$dbicigd#igq(@
Query MD5+ABE MD5+KPABE MD5+CPABE Proposed
(bytes) Model
#500 9938 6938 6892 5122
#1000 11443 10698 9993 8935
#1500 16386 18758 17967 12975
#2000 21684 21276 20176 15758
#2500 32333 30177 31232 20258
TABLE 1: Performance Analysis

ISSN 1943-023X
Received: 22 Apr 2018/Accepted: 12 Jun 2018
Big Data : Query Processing

Fig 3: Performance of average authentication and encryption time


View publication stats

Table 1, Fig 3, describes the efficiency of the integrated hash and encryption model compared to the traditional
models in terms of computational average authentication verification and encryption time. From the table, it is
clearly observed that the proposed model has less average authentication time compare to the traditional models.

V. Conclusions
The massive amount of data is generating everyday and it is impossible to imagine the next generation
applications without producing and executing data driven algorithms. In this paper, we have conducted a
comprehensive survey on the privacy issues when dealing with big data. We have investigated privacy challenges
of big data. In this paper, we study and compare the different algorithms with the proposed method. The analysis
helps us to understand the algorithm which provides maximum security for big data. A great deal of jobs have
actually been done to protect the personal privacy of individuals from information generation to information
handling, however there still exist a number of open concerns and also obstacles.

References
[1] Dona Sarkar, Asoke Nath, “Big Data – A Pilot Study On Scope And Challenges”, International Journal Of
Advance Research In Computer Science And Management Studies (IJARCSMS, ISSN: 2371-7782), Volume
2, Issue 12, Dec 31, Page: 9-19(2014).

[2] Http://Www.Cra.Org/Ccc/Files/Docs/Init/Bigdatawhitepaper.Pdf

[3] H. Hu, Y. Wen, T.-S. Chua, And X. Li, ‘‘Toward Scalable Systems For Big Data Analytics: A Technology
Tutorial,’’ IEEE Access, Vol. 2, Pp. 652–687, Jul. 2014.

[4] C. Hongbing, R. Chunming, H. Kai, W. Weihong, And L. Yanyan, ‘‘Secure Big Data Storage And Sharing
Scheme For Cloud Tenants,’’ China Commun., Vol. 12, No. 6, Pp. 106–115, Jun. 2015.

[5] U. Troppens, R. Erkens, W. Muller-Friedt, R. Wolafka, And N. Haustein, Storage Networks Explained: Basics
And Application Of Fibre Channel SAN, NAS, Iscsi, Infiniband And Fcoe. New York, NY, USA: Wiley,
2011.

250

ISSN 1943-023X
Received: 22 Apr 2018/Accepted: 12 Jun 2018
Jour of Adv Research in Dynamical & Control Systems, Vol. 10, 07-Special Issue, 2018
[6] P. Mell And T. Grance, ‘‘The NIST Definition Of Cloud Computing,’’ Nat. Inst. Standards Technol., 2011.

[7] Z. Xiao And Y. Xiao, ‘‘Security And Privacy In Cloud Computing,’’ IEEE Commun. Surveys Tuts., Vol. 15,
No. 2, Pp. 843–859, May 2013.

[8] V. Goyal, O. Pandey, A. Sahai, And B. Waters, ‘‘Attribute-Based Encryption For Fine-Grained Access Control
Of Encrypted Data,’’ In Proc. ACM Conf. Comput. Commun. Secur., Oct. 2006, Pp. 89–98.

[9] J. Bethencourt, A. Sahai, And B. Waters, ‘‘Ciphertext-Policy Attributebased Encryption,’’ In Proc. IEEE Int.
Conf. Secur. Privacy, May 2007, Pp. 321–334.

[10] K. Yang, X. Jia, K. Ren, B. Zhang, And R. Xie, ‘‘DAC-MACS: Effective Data Access Control For
Multiauthority Cloud Storage Systems,’’ IEEE Trans. Inf. Forensics Security, Vol. 8, No. 11, Pp. 1790–1801,
Nov. 2013.

[11] K. Yang And X. Jia, ‘‘Expressive, Efficient, And Revocable Data Access Control For Multi-Authority Cloud
Storage,’’ IEEE Trans. Parallel Distrib. Syst., Vol. 25, No. 7, Pp. 1735–1744, Jul. 2014.

[12] B. Madhuravani, D. S. R. Murthy, S. V. Raju, “An Improved Wireless Node Neighbor Integrity Verification
And Encryption Using Additive And Multiplicative Homomorphic Model”, Journal Of Fundamental And
Applied Sciences, Volume 10, No. 6S (2018), Pp 2911-2932.

ISSN 1943-023X
Received: 22 Apr 2018/Accepted: 12 Jun 2018

You might also like