0% found this document useful (0 votes)
87 views6 pages

Review Paper On Big Data Analytics in Cloud Computing: July 2017

This document provides a review of big data analytics in cloud computing. It discusses how big data is defined using the 5 V's of volume, variety, velocity, veracity, and value. Common techniques for big data analytics are also summarized, including data mining, machine learning, text analysis, and visualization. The paper explores how big data is different from traditional data and challenges of analyzing large, complex datasets. It also examines how cloud computing enables cost-effective big data analytics at large scale.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
87 views6 pages

Review Paper On Big Data Analytics in Cloud Computing: July 2017

This document provides a review of big data analytics in cloud computing. It discusses how big data is defined using the 5 V's of volume, variety, velocity, veracity, and value. Common techniques for big data analytics are also summarized, including data mining, machine learning, text analysis, and visualization. The paper explores how big data is different from traditional data and challenges of analyzing large, complex datasets. It also examines how cloud computing enables cost-effective big data analytics at large scale.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/325011725

Review Paper on Big Data Analytics in Cloud Computing

Article · July 2017

CITATIONS READS
0 1,315

2 authors:

Saneh Lata Yadav Asha Sohal


Guru Gobind Singh Indraprastha University K.R. MANGALAM UNIVERSITY
7 PUBLICATIONS   5 CITATIONS    3 PUBLICATIONS   5 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Congestion Control mechanism for Wireless Sensor Networks View project

All content following this page was uploaded by Saneh Lata Yadav on 08 May 2018.

The user has requested enhancement of the downloaded file.


International Journal of Computer Trends and Technology (IJCTT) – Volume 49 Number 3 July 2017

Review Paper on Big Data Analytics in Cloud


Computing
Saneh Lata Yadav1, Asha Sohal2
Assistant Professor, Department of Computer science and Engineering
K. R. Mangalam University, India
Associate Professor, Department of Computer science and Engineering
K. R. Mangalam University, India

Abstract: Cloud computing is a most powerful unstructured data (Email attachments, Images
technology which performs massive-scale and complex comments on social networking sites) [1]. The big data
computing. It eliminates the requirement to maintain is defined using five V‟s. Volume includes many
costly computing hardware, dedicated space factors contribute for the increase in volume like
requirement and related software. Massive growth in storage of data, live streaming etc. Variety consists of
the scale of data or big data generated through cloud various types of data is to be supported. Velocity
computing has been identified. Concept of big data is means speed at which the files are created and
a challenging and time-demanding task that requires a processes are carried out refers to the velocity.
large computational space to ensure successful data Veracity indicates data reliability with respect to big
processing and analysis. This paper includes data exploitation. Value shows worth with respect to
definition, characteristics, and classification of big big data exploitation. Since big data is not only large
data along with some discussions on cloud computing but also different and fast-growing. Some analytical
are introduced. The similarities between big data and techniques are required in order to the attempt some
cloud computing, big data storage systems, several big relevant information. It gives a broad overview of
data processing techniques and Hadoop technology some of the most commonly used techniques and
are also discussed. The term ‘Big Data’ defines technologies to help the reader to better understand the
innovative techniques and technologies to capture, tools based on big data analytics. There are many
store, distribute, manage and analyze petabyte-or analytic techniques that could be employed when
larger-sized datasets with high-velocity and different considering a big data project. Which ones are used
structures. Big data may be structured, unstructured that depends on the type of data being analyzed, the
or semi-structured, resulting in incapability of technology available to you, and the research
conventional data management methods. Data can be questions you are trying to solve? Some of the tools
generated from various relevant sources and can store that came up frequently in the reviewed material are
in the system at various rates. In order to analyze summarized here. It is often used in data mining and
these large amounts of data in an inexpensive and according to Chen, Chiang, and Storey (2012) it lends
efficient way, parallelism technique is used. 2015 was support to recommender systems like those employed
the year that Big Data went from being something that by Netflix and Amazon. Data Mining: Manyikaet al.
a majority of organizations were either doing or at the (2011) calls data mining “combining methods from
very least actively considering. The growth of cloud- statistics and machine learning with database
based Big Data services has made Big Data analytics management” in order to pinpoint patterns in large
a feasible reality for organizations of all sizes. datasets [2]. Picciano (2012) lists it as one of the most
important terms related to data-driven decision making
Keywords: Big Data, Big Data Analytics, Map and describes it as “searching or „digging into‟ a data
Reduce, Hadoop. file for information to understand better a particular
phenomenon.” Crowd sourcing collects data from a
I. INTRODUCTION large group of people through an open call, usually via
a Web2.0 tool. This tool is used more for collecting
Big data is a word used for detailed information of
data than for analyzing it. Machine learning includes
massive amounts of data which are either structured,
traditionally computers only know what we tell them,
semi structured or unstructured. The data which is not
but in machine learning, a subspecialty of computer
able to be handled by the traditional databases and
science, we try to craft “algorithms that allow
software Technologies then we divide such data as big
data. The term big data is originated from the web computers to evolve based on empirical data. A major
focus of machine learning research is to automatically
companies who used to handle loosely structured
learn to recognize complex patterns and make
(numerical form, figures, and transaction data etc.) or
intelligent decisions based on data” (Manyika et al.

ISSN: 2231-2803 http://www.ijcttjournal.org Page 156


International Journal of Computer Trends and Technology (IJCTT) – Volume 49 Number 3 July 2017

2011). Miller (2011/2012) gives the example of the data the big question is that can the present or planned
U.S. Department of Homeland Security, which uses enterprise based data warehouse (EDW) handle big
machine learning to identify patterns in cell phone and data and advanced data analytics without degrading
email traffic, as well as credit card purchases and other performance of other workloads for reporting and
sources surrounding security threats. They use these online analytic processing? Some popular institutions
patterns to try to identify future threats so they can manage their analytic data in the EDW by its own
handle them before they become large problems. A while others use a different platform, which helps
large portion of generated data is in text form. Emails, relieve some of the burden on the server resulting from
internet searches, web page content, corporate managing your data on the EDW [4]. Many new
documents, etc. are all largely text based and can be visualization products aim to fill this need, dividing
good sources of information. Text analysis can be used methods for representing data points numbering up
to extract information from large amounts of textual into the millions. Russom (2011) shows this field as
data. This can be done to model topics, mine opinions, one of those having the most potential and says it is
answer questions, and other goals. poised for aggressive adoption. Beyond simple
representation visualization can also involve in finding
II. RELATED WORK the information search. Hansen, Johnson, Pascucci,
and Silva wrote an article included in Hey, Tansley,
With the help of analytical techniques, there are and Tolle‟s collection. The fourth paradigm (2009)
several software products and many technologies to telling visualization in data-intensive science in which
facilitate big data analytics. Some of the most common they define that visualization products allow us to
will be described in this paper. Enterprise data compare models and datasets. It enables quantitative
warehouses are databases used in data analysis [3]. and qualitative decision-making and their article
Russom (2011) writes that for many popular focuses scalability in visualization technologies and
businesses that are taking step to start handling big their ability to track provenance in real-time [5].

A. HADOOP defines a file system spanning all nodes in a Hadoop


This is a most available java based programming cluster for data storage connects the file systems on
framework which supports the processing of large local nodes to make it onto a very large file system
amount of data in a distributed computing thus improving the reliability [6]. Task trackers are
environment. With the help of Hadoop, big amount responsible for executing the tasks that the job
of data sets can be analyzed over cluster of servers tracker assigns them. Job trackers have two major
and applications can be run on system with thousands responsibilities which are managing and controlling
of nodes involving terabytes of information as shown the cluster resources and then schedule all user jobs.
in fig 1. Data engine consists of all the information about the
processing the data. Fetch manager protects and fetch
the data while particular task is running.

B. MAP REDUCE
Map Reduce [7] framework is basically used to write
apps that analyze large amounts of data in a manner
of reliable and fault tolerant. Initially the application
is divided into individual chunks which are analyzed
by individual map jobs by following the concept of
parallelism. The result of map sorted by a framework
and then sent to the reduce tasks. The supervision is
taken care by the framework. The framework splits
the data into smaller chunks that are processed in
parallel on cluster of machines by programs called
Fig. 1: Hadoop Structure mappers. The result from the mappers is then
This decreases the risk of system failure even when a consolidated by reducers into desired result as shown
large amount of nodes fails. It includes a scalable, in fig 2.
flexible, fault tolerant computing solution. HDFS

ISSN: 2231-2803 http://www.ijcttjournal.org Page 157


International Journal of Computer Trends and Technology (IJCTT) – Volume 49 Number 3 July 2017

[22]. Data was in structured form when it creates


from many organizations. Data goes from three
properties like volume, Variety and velocity. Many
companies were suffering from the problems on how
to expand the capacity of data warehouse to accept
and create new requirement.

Fig. 2: Map Reduce

The share nothing architecture of mappers and


reducers make them highly parallel [9]. Over the last
many years, there are so many researchers has Fig 3: Data Growth
completed their work successfully on big data.
Hundreds of articles have appeared in the general S. Vikram Phaneendra & E. Madhusudhan Reddy
business press (For example Forbes, Fortune, Illustrated that in olden days the data was less and
Bloomberg, Business week, The Wall street journal, easily handled by RDBMS but recently it is difficult
The Economist)[15]. National Institute of Standards to handle huge data through RDBMS tools, which is
and Technology [NIST] said that Big Data in which preferred as “big data”. In this they told that big data
data volume, velocity and data representation ability differs from other data in 5 dimensions such as
to perform effective analysis using traditional volume, velocity, variety, value and complexity.
relational approaches [16]. In March 2012, The They illustrated the Hadoop architecture consisting of
Obama Administration great researcher announced name node, data node, edge node, HDFS to handle
that the US would spend 200 Million Dollars to big data systems. Hadoop architecture handle large
launch a big data research plan [17]. An IDC diaries data sets, scalable algorithm does log management
predicts that from 2005 to 2020, the global data application of big data can be found out in financial,
volume will increase by a factor of 300, from 130 retail industry, health-care, mobility, insurance. The
Exabyte‟s to 40,000 Exabyte‟s, showing a double authors also focused on the challenges that need to be
growth every two years[18]. IBM gives estimation faced by enterprises when handling big data: - data
that everyday 2.5 quintillion bytes of data are privacy, search analysis, etc [10].
generated out of which 90% of the data in the world
today has generated in the last two years. It is Kiran kumara Reddi & Dnvsl Indira enhanced us
analyzed that social networking sites like Facebook with the knowledge that Big Data is combination of
have 850 Million users, LinkedIn has 110 million structured, semi-structured ,unstructured homogenous
users and Twitter has 350 million users [19]. From and heterogeneous data .The author suggested to use
industry, government and research community, it is nice model to handle transfer of huge amount of data
predicted that Big Data has led to an emerging and over the network .Under this model, these transfers
recent research field that has attracted tremendous are relegated to low demand periods where there is
interest of users. The major interest is first exampled ample ,idle bandwidth available . This bandwidth can
by coverage on both industrial reports and public then be repurposed for big data transmission without
media [20]. For example, Mobile Phones becoming impacting other users in system. The Nice model uses
best way to get data from people in different aspect, a store and forward approach by utilizing staging
the large amount of data that mobile carrier can servers. The model is able to accommodate
process to improve our daily life [21]. In figure 3, differences in time zones and variations in
From Year 2005, it would show from this graph that bandwidth. They suggested that new algorithms are
the large amount of data was practically increased. required to transfer big data and to solve issues like
However, Consider exponential growth in data from security, compression, routing algorithms [11].
2005 year, when considering enterprise system and
user level data was flooding into data warehouse

ISSN: 2231-2803 http://www.ijcttjournal.org Page 158


International Journal of Computer Trends and Technology (IJCTT) – Volume 49 Number 3 July 2017

Wei Fan & Albert Bifet Introduced Big Data and it is created from various sources like social
Mining as the capability of extracting Useful media likes, comments, playing a video game, email
information from these large datasets or streams of attachments etc. There is complexity in big data such
data that due to its Volume, variability and velocity it as velocity, variety and volume. These three terms
was not possible before to do it. The author also are more challenging for big data. We have also seen
started that there are certain controversy about Big some technologies and techniques. Since big data is
Data. There certain tools for processes. There are not only large, but also varied and fast-growing many
certain Challenges that need to death with as such technologies and analytical techniques are needed in
compression, visualization etc. [12]. order to attempt extracting relevant information. The
benefits are many and varied, ranging from higher
Albert Bifet Stated that streaming data analysis in quality education to cutting-edge medical research,
real time is becoming the fastest and most efficient and while further research is required for things like
way to obtain useful knowledge, allowing ensuring people‟s information is protected from
organizations to react quickly when problem appear exploitation, there are many exciting and innovative
or detect to improve performance. Huge amount of discoveries waiting to be uncovered through big data
data is created everyday termed as “big data”. The analytics. It is very much required that the computer
tools used for mining big data are apache hadoop, scholars and IT professionals to cooperate and make
apache big, cascading, scribe, storm, apache hbase, a successful and long term use of cloud computing
apache mahout, MOA, R, etc. Thus, he instructed that and explores new ideas for the usage of the big data
our ability to handle many Exabyte‟s of data mainly over cloud environment.
dependent on existence of rich variety dataset,
technique, software framework [23]. REFERENCES

Bernice Purcell started that Big Data is comprised of [1] A. Abouzeid, K. B. Pawlikowski, D. J. Abadi, A. Rasin, and A.
large data sets that can‟t be handle by traditional Silberschatz. HadoopDB: An Architectural Hybrid of MapReduce
and DBMS Technologies for Analytical Workloads.
systems. Big data includes structured data, semi- PVLDB,2(1):922–933, 2015.
structured and unstructured data. The data storage
technique used for big data includes multiple [2] A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, S.
clustered network attached storage (NAS) and object Anthony,H. Liu, P. Wyckoff, and R. Murthy. Hive - A
Warehousing Solution Over a Map-Reduce Framework. PVLDB,
based storage. The Hadoop architecture is used to 2(2):1626–1629, 2009.
process unstructured and semi-structured using map
reduce to locate all relevant data then select only the [3] A, Katal, Wazid M, and Goudar R.H. "Big data: Issues,
data directly answering the query. The advent of Big challenges, tools and Good practices.". Noida: 2013, pp. 404 –
409, 8-10 Aug. 2013.
Data has posed opportunities as well challenges to
business [24]. [4] K, Chitharanjan, and Kala Karun A. "A review on hadoop,
HDFS infrastructure extensions.". JeJu Island: 2013, pp. 132-137,
III. CHALLENGES AND OPPORTUNUITIES 11-12 Apr. 2013.
IN BIG DATA [5] Wie, Jiang , Ravi V.T, and Agrawal G. "A Map-Reduce
System with an Alternate API for Multi-core Environments.".
We live in the period of the big data where we can Melbourne, VIC: 2010, pp. 84-93, 17-20 May. 2010.
collect more and more information from daily life of
[6] F.C.P, Muhtaroglu, Demir S, Obali M, and Girgin C. "Busines
human being. So far, researchers are failed to unify on big dataapplications." Big Data, 2013 IEEE International
the features that are more essential to big data, many Conference, Silicon Valley, CA, Oct 6-9, 2013, pp.32 – 37.
think that big data is something which we cannot
process or analyze using existing technology, theory [7] Xu-bin, LI , JIANG Wen-rui, JIANG Yi, ZOU Quan "Hadoop
Applications in Bioinformatics." Open Cirrus Summit (OCS), 2012
or any other method of such kind. However the world Seventh, Beijing, Jun 19-20, 2012, pp. 48 – 52.
has become helpless since enormous amount of data
is being generated by science, business, social sites [8] Venkata Narasimha Inukollu , Sailaja Arsi and Srinivasa Rao
and even society. Big data has posed many Ravuri “Security issues associated with big data in cloud
computing “International Journal of Network Security & Its
challenges to the IT industry [8]. Applications (IJNSA), Vol.6, No.3, May 2014.

IV. CONCLUSION [9] Elragal, A. (2014). ERP and Big Data: The Inept Couple.
Procedia Technology, 16, 242-249.
This paper gave a description of a systematic flow of [10] S.Vikram Phaneendra & E.Madhusudhan Reddy “Big Data-
survey of the big data in the environment of cloud solutions for RDBMS problems- A survey” In 12th IEEE/IFIP
computing. Big data is the large and complex datasets

ISSN: 2231-2803 http://www.ijcttjournal.org Page 159


International Journal of Computer Trends and Technology (IJCTT) – Volume 49 Number 3 July 2017

Network Operations & Management Symposium (NOMS 2010) [20] Neil Raden,”Big Data Analytics Architecture”, Hired Brains
(Osaka, Japan, Apr 19{23 2013). Inc, 2012.

[11] Kiran kumara Reddi & Dnvsl Indira “Different Technique to [21] James Manyika, Michael Chui, Brad Brown, Jacques Bhuhin,
Transfer Big Data : survey” IEEE Transactions on 52(8) Richard Dobbs, Charles Roxburgh, Angela Hungh Byers, “Big
(Aug.2013) 2348 { 2355} Data: The next frontier for innovation, competition and
productivity”, June 2011.

[12]Umasri.M.L,Shyamalagowri.D,SureshKumar.S“Mining Big [22]Wei Fan, Albert Bifet, “Mining Big Data: Current Status and
Data:- Current status and forecast to the future” Volume 4, Forecast to the Future”, SIGKDD Explorations, Volume 14, Issue
Issue 1, January 2014 ISSN: 2277 128X. 2.
[13] Albert Bifet,“Mining Big Data in Real Time”, informatica, [23] Albert Bifet“Mining Big Data In Real Time” Informatics 37
2013. (2013) 15–20 DEC 2012.
[14] James Manyika, Michael Chui, Brad Brown, Jacques Bhuhin, [24] Bernice Purcell “The emergence of “big data” technology and
Richard Dobbs, Charles Roxburgh, Angela Hungh Byers, “Big analytics” Journal of Technology Research 2013.
Data: The next frontier for innovation, competition and
productivity”, June 2011.
[25] Ritu Katara, Hareram Shah “A Novel Integrated Approach for
[15] Sameera Siddiqui, Deepa Gupta,” Big Data Process and Big Data Mining”, International Journal For Computer treands and
Analytics : A Survey”, International Journal Of Emerging Technology, Volume 18, Number 5, Dec 2014.
Research in Management & Technology, ISSN: 2278-9359,
Volume 3, Issue 7, July 2014. [26] M. Saranya, A. Prema “Survey on Big Data Analytics Using
Hadoop ETL”, International Journal For Computer treands and
[16] M.Cooper, P.Mell(2012). Tackling big Technology, Volume 48, Number 5, June 2017.
Data(Online).Http://csrc.nist.gov/groups/SMA/Forum/document/Ju
ne [27] V. Harsha Shastri,, V. sreeprada, T. Kavitha “A Survey on
2012Presentation/f%CSM_june2012_cooper_Neul.pdf. Big Data Technologies, Challenges and Impact on Internet of
things”, International Journal For Computer treands and
[17] Han Hu, YongyangNen, Tat Seng Chua, Xuelong Li,” Technology, Volume 35, Number 3, May 2016.
Towards Scalable System for Big Data Analytics: A Technology
Tutorial”, IEEE Access, Volume 2, Page No 653, June 2014. [28] Fayaz Ahmad Lone, dr. Amit Kumar Chaturvedi “Proposing a
Novel Model on Security Challenges in Cloud Computing
[18] J.Gantz, D. Reinset,” The Digital Universe in 2020: Big Data, especially Social Media and social Sites”, International Journal For
Bigger digital shadow, and biggest growth in the far east”, in Proc : Computer treands and Technology, Volume 47, Number 1, May
IDC iview, IDC Anal, Future, 2012. 2017.
[19] www.ebizmba .com/articles/social-networking-websites.

ISSN: 2231-2803 http://www.ijcttjournal.org Page 160

View publication stats

You might also like