Review Paper On Big Data Analytics in Cloud Computing: July 2017
Review Paper On Big Data Analytics in Cloud Computing: July 2017
net/publication/325011725
CITATIONS READS
0 1,315
2 authors:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Saneh Lata Yadav on 08 May 2018.
Abstract: Cloud computing is a most powerful unstructured data (Email attachments, Images
technology which performs massive-scale and complex comments on social networking sites) [1]. The big data
computing. It eliminates the requirement to maintain is defined using five V‟s. Volume includes many
costly computing hardware, dedicated space factors contribute for the increase in volume like
requirement and related software. Massive growth in storage of data, live streaming etc. Variety consists of
the scale of data or big data generated through cloud various types of data is to be supported. Velocity
computing has been identified. Concept of big data is means speed at which the files are created and
a challenging and time-demanding task that requires a processes are carried out refers to the velocity.
large computational space to ensure successful data Veracity indicates data reliability with respect to big
processing and analysis. This paper includes data exploitation. Value shows worth with respect to
definition, characteristics, and classification of big big data exploitation. Since big data is not only large
data along with some discussions on cloud computing but also different and fast-growing. Some analytical
are introduced. The similarities between big data and techniques are required in order to the attempt some
cloud computing, big data storage systems, several big relevant information. It gives a broad overview of
data processing techniques and Hadoop technology some of the most commonly used techniques and
are also discussed. The term ‘Big Data’ defines technologies to help the reader to better understand the
innovative techniques and technologies to capture, tools based on big data analytics. There are many
store, distribute, manage and analyze petabyte-or analytic techniques that could be employed when
larger-sized datasets with high-velocity and different considering a big data project. Which ones are used
structures. Big data may be structured, unstructured that depends on the type of data being analyzed, the
or semi-structured, resulting in incapability of technology available to you, and the research
conventional data management methods. Data can be questions you are trying to solve? Some of the tools
generated from various relevant sources and can store that came up frequently in the reviewed material are
in the system at various rates. In order to analyze summarized here. It is often used in data mining and
these large amounts of data in an inexpensive and according to Chen, Chiang, and Storey (2012) it lends
efficient way, parallelism technique is used. 2015 was support to recommender systems like those employed
the year that Big Data went from being something that by Netflix and Amazon. Data Mining: Manyikaet al.
a majority of organizations were either doing or at the (2011) calls data mining “combining methods from
very least actively considering. The growth of cloud- statistics and machine learning with database
based Big Data services has made Big Data analytics management” in order to pinpoint patterns in large
a feasible reality for organizations of all sizes. datasets [2]. Picciano (2012) lists it as one of the most
important terms related to data-driven decision making
Keywords: Big Data, Big Data Analytics, Map and describes it as “searching or „digging into‟ a data
Reduce, Hadoop. file for information to understand better a particular
phenomenon.” Crowd sourcing collects data from a
I. INTRODUCTION large group of people through an open call, usually via
a Web2.0 tool. This tool is used more for collecting
Big data is a word used for detailed information of
data than for analyzing it. Machine learning includes
massive amounts of data which are either structured,
traditionally computers only know what we tell them,
semi structured or unstructured. The data which is not
but in machine learning, a subspecialty of computer
able to be handled by the traditional databases and
science, we try to craft “algorithms that allow
software Technologies then we divide such data as big
data. The term big data is originated from the web computers to evolve based on empirical data. A major
focus of machine learning research is to automatically
companies who used to handle loosely structured
learn to recognize complex patterns and make
(numerical form, figures, and transaction data etc.) or
intelligent decisions based on data” (Manyika et al.
2011). Miller (2011/2012) gives the example of the data the big question is that can the present or planned
U.S. Department of Homeland Security, which uses enterprise based data warehouse (EDW) handle big
machine learning to identify patterns in cell phone and data and advanced data analytics without degrading
email traffic, as well as credit card purchases and other performance of other workloads for reporting and
sources surrounding security threats. They use these online analytic processing? Some popular institutions
patterns to try to identify future threats so they can manage their analytic data in the EDW by its own
handle them before they become large problems. A while others use a different platform, which helps
large portion of generated data is in text form. Emails, relieve some of the burden on the server resulting from
internet searches, web page content, corporate managing your data on the EDW [4]. Many new
documents, etc. are all largely text based and can be visualization products aim to fill this need, dividing
good sources of information. Text analysis can be used methods for representing data points numbering up
to extract information from large amounts of textual into the millions. Russom (2011) shows this field as
data. This can be done to model topics, mine opinions, one of those having the most potential and says it is
answer questions, and other goals. poised for aggressive adoption. Beyond simple
representation visualization can also involve in finding
II. RELATED WORK the information search. Hansen, Johnson, Pascucci,
and Silva wrote an article included in Hey, Tansley,
With the help of analytical techniques, there are and Tolle‟s collection. The fourth paradigm (2009)
several software products and many technologies to telling visualization in data-intensive science in which
facilitate big data analytics. Some of the most common they define that visualization products allow us to
will be described in this paper. Enterprise data compare models and datasets. It enables quantitative
warehouses are databases used in data analysis [3]. and qualitative decision-making and their article
Russom (2011) writes that for many popular focuses scalability in visualization technologies and
businesses that are taking step to start handling big their ability to track provenance in real-time [5].
B. MAP REDUCE
Map Reduce [7] framework is basically used to write
apps that analyze large amounts of data in a manner
of reliable and fault tolerant. Initially the application
is divided into individual chunks which are analyzed
by individual map jobs by following the concept of
parallelism. The result of map sorted by a framework
and then sent to the reduce tasks. The supervision is
taken care by the framework. The framework splits
the data into smaller chunks that are processed in
parallel on cluster of machines by programs called
Fig. 1: Hadoop Structure mappers. The result from the mappers is then
This decreases the risk of system failure even when a consolidated by reducers into desired result as shown
large amount of nodes fails. It includes a scalable, in fig 2.
flexible, fault tolerant computing solution. HDFS
Wei Fan & Albert Bifet Introduced Big Data and it is created from various sources like social
Mining as the capability of extracting Useful media likes, comments, playing a video game, email
information from these large datasets or streams of attachments etc. There is complexity in big data such
data that due to its Volume, variability and velocity it as velocity, variety and volume. These three terms
was not possible before to do it. The author also are more challenging for big data. We have also seen
started that there are certain controversy about Big some technologies and techniques. Since big data is
Data. There certain tools for processes. There are not only large, but also varied and fast-growing many
certain Challenges that need to death with as such technologies and analytical techniques are needed in
compression, visualization etc. [12]. order to attempt extracting relevant information. The
benefits are many and varied, ranging from higher
Albert Bifet Stated that streaming data analysis in quality education to cutting-edge medical research,
real time is becoming the fastest and most efficient and while further research is required for things like
way to obtain useful knowledge, allowing ensuring people‟s information is protected from
organizations to react quickly when problem appear exploitation, there are many exciting and innovative
or detect to improve performance. Huge amount of discoveries waiting to be uncovered through big data
data is created everyday termed as “big data”. The analytics. It is very much required that the computer
tools used for mining big data are apache hadoop, scholars and IT professionals to cooperate and make
apache big, cascading, scribe, storm, apache hbase, a successful and long term use of cloud computing
apache mahout, MOA, R, etc. Thus, he instructed that and explores new ideas for the usage of the big data
our ability to handle many Exabyte‟s of data mainly over cloud environment.
dependent on existence of rich variety dataset,
technique, software framework [23]. REFERENCES
Bernice Purcell started that Big Data is comprised of [1] A. Abouzeid, K. B. Pawlikowski, D. J. Abadi, A. Rasin, and A.
large data sets that can‟t be handle by traditional Silberschatz. HadoopDB: An Architectural Hybrid of MapReduce
and DBMS Technologies for Analytical Workloads.
systems. Big data includes structured data, semi- PVLDB,2(1):922–933, 2015.
structured and unstructured data. The data storage
technique used for big data includes multiple [2] A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, S.
clustered network attached storage (NAS) and object Anthony,H. Liu, P. Wyckoff, and R. Murthy. Hive - A
Warehousing Solution Over a Map-Reduce Framework. PVLDB,
based storage. The Hadoop architecture is used to 2(2):1626–1629, 2009.
process unstructured and semi-structured using map
reduce to locate all relevant data then select only the [3] A, Katal, Wazid M, and Goudar R.H. "Big data: Issues,
data directly answering the query. The advent of Big challenges, tools and Good practices.". Noida: 2013, pp. 404 –
409, 8-10 Aug. 2013.
Data has posed opportunities as well challenges to
business [24]. [4] K, Chitharanjan, and Kala Karun A. "A review on hadoop,
HDFS infrastructure extensions.". JeJu Island: 2013, pp. 132-137,
III. CHALLENGES AND OPPORTUNUITIES 11-12 Apr. 2013.
IN BIG DATA [5] Wie, Jiang , Ravi V.T, and Agrawal G. "A Map-Reduce
System with an Alternate API for Multi-core Environments.".
We live in the period of the big data where we can Melbourne, VIC: 2010, pp. 84-93, 17-20 May. 2010.
collect more and more information from daily life of
[6] F.C.P, Muhtaroglu, Demir S, Obali M, and Girgin C. "Busines
human being. So far, researchers are failed to unify on big dataapplications." Big Data, 2013 IEEE International
the features that are more essential to big data, many Conference, Silicon Valley, CA, Oct 6-9, 2013, pp.32 – 37.
think that big data is something which we cannot
process or analyze using existing technology, theory [7] Xu-bin, LI , JIANG Wen-rui, JIANG Yi, ZOU Quan "Hadoop
Applications in Bioinformatics." Open Cirrus Summit (OCS), 2012
or any other method of such kind. However the world Seventh, Beijing, Jun 19-20, 2012, pp. 48 – 52.
has become helpless since enormous amount of data
is being generated by science, business, social sites [8] Venkata Narasimha Inukollu , Sailaja Arsi and Srinivasa Rao
and even society. Big data has posed many Ravuri “Security issues associated with big data in cloud
computing “International Journal of Network Security & Its
challenges to the IT industry [8]. Applications (IJNSA), Vol.6, No.3, May 2014.
IV. CONCLUSION [9] Elragal, A. (2014). ERP and Big Data: The Inept Couple.
Procedia Technology, 16, 242-249.
This paper gave a description of a systematic flow of [10] S.Vikram Phaneendra & E.Madhusudhan Reddy “Big Data-
survey of the big data in the environment of cloud solutions for RDBMS problems- A survey” In 12th IEEE/IFIP
computing. Big data is the large and complex datasets
Network Operations & Management Symposium (NOMS 2010) [20] Neil Raden,”Big Data Analytics Architecture”, Hired Brains
(Osaka, Japan, Apr 19{23 2013). Inc, 2012.
[11] Kiran kumara Reddi & Dnvsl Indira “Different Technique to [21] James Manyika, Michael Chui, Brad Brown, Jacques Bhuhin,
Transfer Big Data : survey” IEEE Transactions on 52(8) Richard Dobbs, Charles Roxburgh, Angela Hungh Byers, “Big
(Aug.2013) 2348 { 2355} Data: The next frontier for innovation, competition and
productivity”, June 2011.
[12]Umasri.M.L,Shyamalagowri.D,SureshKumar.S“Mining Big [22]Wei Fan, Albert Bifet, “Mining Big Data: Current Status and
Data:- Current status and forecast to the future” Volume 4, Forecast to the Future”, SIGKDD Explorations, Volume 14, Issue
Issue 1, January 2014 ISSN: 2277 128X. 2.
[13] Albert Bifet,“Mining Big Data in Real Time”, informatica, [23] Albert Bifet“Mining Big Data In Real Time” Informatics 37
2013. (2013) 15–20 DEC 2012.
[14] James Manyika, Michael Chui, Brad Brown, Jacques Bhuhin, [24] Bernice Purcell “The emergence of “big data” technology and
Richard Dobbs, Charles Roxburgh, Angela Hungh Byers, “Big analytics” Journal of Technology Research 2013.
Data: The next frontier for innovation, competition and
productivity”, June 2011.
[25] Ritu Katara, Hareram Shah “A Novel Integrated Approach for
[15] Sameera Siddiqui, Deepa Gupta,” Big Data Process and Big Data Mining”, International Journal For Computer treands and
Analytics : A Survey”, International Journal Of Emerging Technology, Volume 18, Number 5, Dec 2014.
Research in Management & Technology, ISSN: 2278-9359,
Volume 3, Issue 7, July 2014. [26] M. Saranya, A. Prema “Survey on Big Data Analytics Using
Hadoop ETL”, International Journal For Computer treands and
[16] M.Cooper, P.Mell(2012). Tackling big Technology, Volume 48, Number 5, June 2017.
Data(Online).Http://csrc.nist.gov/groups/SMA/Forum/document/Ju
ne [27] V. Harsha Shastri,, V. sreeprada, T. Kavitha “A Survey on
2012Presentation/f%CSM_june2012_cooper_Neul.pdf. Big Data Technologies, Challenges and Impact on Internet of
things”, International Journal For Computer treands and
[17] Han Hu, YongyangNen, Tat Seng Chua, Xuelong Li,” Technology, Volume 35, Number 3, May 2016.
Towards Scalable System for Big Data Analytics: A Technology
Tutorial”, IEEE Access, Volume 2, Page No 653, June 2014. [28] Fayaz Ahmad Lone, dr. Amit Kumar Chaturvedi “Proposing a
Novel Model on Security Challenges in Cloud Computing
[18] J.Gantz, D. Reinset,” The Digital Universe in 2020: Big Data, especially Social Media and social Sites”, International Journal For
Bigger digital shadow, and biggest growth in the far east”, in Proc : Computer treands and Technology, Volume 47, Number 1, May
IDC iview, IDC Anal, Future, 2012. 2017.
[19] www.ebizmba .com/articles/social-networking-websites.