0% found this document useful (0 votes)

235 views

Big Data Analytics

The document provides an overview of big data analytics and Hadoop. It discusses how big data is characterized by high volume, velocity, and variety of data that makes it challenging to manage and analyze using traditional methods. It then describes how Hadoop uses MapReduce and HDFS to parallel process and store large amounts of data across clusters of computers. Key components of Hadoop include the HDFS for reliable storage and MapReduce for scalable processing of datasets. The document also reviews literature on big data approaches using Hadoop and Spark and discusses challenges in big data management.

Uploaded by

Gautam Prajapati

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

235 views

Big Data Analytics

Uploaded by

Gautam Prajapati

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 12

BIG DATA ANALYTICS: A SURVEY

Gautam
Research Scholar
UIET, Rohtak

Abstract
It is very difficult to storing, managing and processing huge amount of data. The term ‘Big
Data’ describes various techniques and technologies to store, distribute, manage and analyze
huge amount of data with different structures. Big data consists of structured, unstructured or
semi-structured data so there is problems occur regarding incapability of conventional data
management methods. To process these huge amounts of data in an inexpensive and efficient
way, parallelism is used. Big Data is a data which is in large amount and having complexity in
it and this complexity require new architecture, techniques, algorithms, and analytics to
manage it and extract knowledge from it. Hadoop is a framework for processing large amount
of data and provides better storage capacity for large datasets and performs parallel
processing of big data that gives better computational power to all the tasks. It works in batch
processing mode and Hadoop is the core platform for structuring Big Data, it also solves the
problem of making it useful for analytics purposes. In this paper, we provide a brief overview
of Big data management involving hadoop and highlight research efforts and the challenges
to big data.

Index Terms: Big Data, Hadoop, Map Reduce, HDFS, Hadoop Component.

1. Introduction:
1.1. Big Data: Definition

Big data is a term used to describe the exponential growth and availability of data, having
structured, unstructured and semi-structured data, whose size (volume), complexity (variability),
and rate of growth (velocity) make them difficult or even impossible to be managed and
analyzed using conventional software tools and technologies. When the amount of data to be
increases than the time to produce results is also increased. Retrieved data from big data is still a
complex and time consuming approach. Big data provides tremendous opportunities for
enterprise information management and decision making. In the recent study big data is not only
limited to business needs but also helps in research and scientific issues.

The Big Data problem is characterized by the 3V features:

Volume- a huge amount of data, Volume of big data can be measured in terms or several
megabytes, gigabytes, terabytes or petabytes.
Velocity- a high data ingestion rate or the speed with which the data can be analyzed.
Variety- a mix of structured data, semi-structured data, and unstructured data.
These 3V features gives a challenge to data processing systems since these systems cannot either
scale to the huge data volume in a cost-effective way or fail to handle data with variety of types.
The solutions to the Big Data problem are largely based on the MapReduce framework[9]
and its open source implementation Hadoop. Although Hadoop handles the data volume
challenge successfully. Hadoop is the open source software founded by Apache and it is Linux
based software. It is used by famous websites like Google, Yahoo, Facebook, Amazon and many
more. Hadoop is a framework for processing large amount of data and provides better storage
capacity for large datasets and performs parallel processing of big data that gives better
computational power to all the tasks. It works in batch processing mode and having two major
components HDFS (Hadoop Distributed File System)[12] for huge data storage and MapReduce
for processing huge amount of datasets. When the data size is increased it create problems to
existing algorithms to manage that so here main problem is to store and process that huge
amount of data and this problem is solve by hadoop because it store and process huge amount of
data in less time.

1.2. Hadoop:

Hadoop is an open-source software framework used for distributed storage and processing of big
data using the MapReduce programming model. Modules present in Hadoop are designed with a
fundamental assumption that hardware failures are common occurrences and should be
automatically handled by the framework. The core of hadoop consists of two parts the storage
part and processing part.
a) Storage part: Storage part of hadoop is HDFS (Hadoop distributed file system) which stores
huge amount of data with high degree of throughput and this huge data is stored in form of
clusters.
b) Processing part: Processing part of hadoop is Mapreduce which is a software framework
which process large amount of data in the form of clusters.
Hadoop distribute clusters to the node so that they process parallely and this approach also takes
advantage of data locality This allows the dataset to be processed faster and more efficiently
which make it a more conventional supercomputer architecture which work on a parallel file
system where computation and data are distributed via high-speed networking

Fig.1.1. Hadoop architecture

A small Hadoop cluster having single master and multiple worker nodes called as slave node as
shown in Fig. 1.1. The master node consists of a Task Tracker, Job Tracker, NameNode, and
DataNode [14] where as slave or worker node acts as both a DataNode and TaskTracker.

1.3. HDFS:

Hadoop Distributed File System (HDFS) is the storing component in hadoop which store huge
amount of structured, unstructured and seminars-structured data. HDFS is java based file system.
HDFS is reliable and manageable file system. It has great features such as high availability, load
balancing, security, flexible access, fault tolerance, easy management and high data throughputs.
It provides parallel processing of data. HDFS has master/ slave architecture.[23]

Fig. 1.2. HDFS Architecture

1.4. Hadoop MapReduce:

MapReduce is a java based programming paradigm for processing huge amount of data stored in
HDFS. MapReduce is the heart of the Hadoop framework that provides scalability across
thousands of hadoop cluster. Every MapReduce job performs two tasks - one Map task and the is
Reduce task. Map task takes a set of data, processes it at node level and generates the output. The
reduce job takes the output of the map task as the input and combines them to smaller set of
tuples (reduces the large dataset into a smaller one) based on the transformations and various
logic.The advantage of MapReduce is that it is easy to scale data processing over multiple
computing nodes.
Fig. 1.3. MapReduce Architecture

Map stage: The map stage job is to process the input data as shown in Fig. 1.3. Generally the
input data is in the form of file or directory and it is stored in the Hadoop file system (HDFS).
The input file is passed to the map function that processes the data and creates several small
chunks of data.
Reduce stage: The Reducer’s job is to process the data that comes from the map stage. After
processing, it produces a new set of output, which will be stored in the Hadoop Distributed File
System (HDFS).

2. Literature Survey:
This paper provides a detailed review of different approaches used in Big Data in recent years.
Table provides the extensive survey of researches; with the name of author, year of publication
in descending order of research along with purposed work and approaches used by them as
shown below:

Authors Publication Proposed Work

Year
Daniele Apiletti, Elena 2017 Reviews Hadoop and Spark based scalable algorithms
Baralis, Tania for mining problem in the Big Data domain having both
Cerquitelli, Paolo theoretical and experimental comparative analyses.
Garza, Fabio Pulvirenti,
Luca Venturini. [33]
Dinesh J. Prajapati, 2017 The proposed method initially extracts multilevel
Sanjay Garg, N.C. association rules including level-crossing for each zone
Chauhan. [34] using DMFPM. From both multilevel consistent and
inconsistent rules are evaluated and compared based on
different experimental results that lead to the final
conclusions.

Robin Genuer, Jean- 2017 Proposed a selective review that deal with scaling
Michel Poggi, Christine random forests to Big Data problems and also describe
Tuleau-Malot, Nathalie how out of bag error addressed.
Villa-Vialaneix. [35]
M. Bakratsas, P. 2017 Investigate the relative performance and benefits of
Basaras, D. Katsaros, SSDs versus hard disk drives (HDDs) when they are
L. Tassiulas. [36] used as storage for Hadoop's MapReduce.
Ziliang Zong, Rong Ge, 2017 Presented the design of marched system and
Qijun Gu. [37] demonstrate it measurement tools for obtaining power
consumption data in different research.

Guangchen Ruan and 2017 Proposed framework that integrates information

Hui Zhang. [38] visualization, scalable computing, and user interfaces to
explore large-scale multi-modal data streams which
combine to reveal an effective and efficient way to
perform closed-loop big data analysis with
visualization and scalable computing.

Navroop Kaur, Sandeep 2017 Presented resource management system which solves
K. Sood. [39] the problems regarding selecting and allocating
appropriate resource to big data and used 4 V's property
of big data.

Feras A. Batarseh, 2016 Study on healthcare data that is collected from various
Eyad Abdel Latif. [40] different sources so that quality and best practices of
field is done using big data tools.

Dawei Jiang, Sai Wu, 2016 Presents epiC, an extensible system to define the Big
Gang Chen, Beng Chin Data’s data variety challenge. They also present the
Ooi1, Kian-Lee Tan, design and implementation of epiC’s concurrent
Jun Xu. [2] programming model and two customized data
processing models.

Marcos D. Assunçãoa, 2015 Discusses environments for carrying out analytics on

Rodrigo N. Calheiros, Clouds for Big Data applications. Through survey they
Silvia Bianchi, Marco find out possible gaps in technology and provide future
A.S. Nettoc, Rajkumar directions on Cloud-supported Big Data computing.
Buyya. [3]
Sreedhar C.N, 2015 The primary purpose of their work is to provide a
Kasiviswanath, P. comprehensive survey on Big data management and to
Chenna Reddy. [1] provide an overview on various algorithms related to
job scheduling in Hadoop.

Chao Wang, Xi Li, 2015 Proposed a FPGA-based acceleration solution with

Peng Chen, Aili Wang, MapReduce framework. The combination of these two
Xuehai Zhou, and Hong namely hardware acceleration and MapReduce
Yu. [19] execution flow can enhance the task of aligning short
length reads to a known reference genome.

Tao Xu, Dongsheng 2015 Presented an efficient system for managing PB level
Wang and Guodong structured data called Banian, banian overcomes the
Liu. [20] storage problem.
Qinghua Lu, Zheng Li, 2015 Presented conceptual framework CF4BDA to analyze
Maria Kihl, Liming the existing work done on BDA applications involving
Zhu and Weishan the lifecycle of BDA applications and objects involving
Zhang. [26] in BDA applications in the cloud.

Claudio A. Ardagna, 2015 Presented score-based benchmark for NoSQL

Ernesto Damiani, databases, which supports adopters. The proposed
Fulvio Frati, Davide benchmark is independent from the specific
Rebeccani. [25] configurations of the database and deployment
environment.

Hongbing Wang, Chao 2015 Proposed heterogeneous and trust-based service

Yu, Lei Wan and Qi selection by developing a novel multi-objective
Yu. [24] optimization approach to make trade-off decision
between Service’s trust value and user’s QoS
preference to rank candidate.

Simon Fong, Raymond 2015 Presented algorithms to collect big data which is
Wong, and Athanasios present in large degree and test it for performance
V. Vasilakos. [23] evaluation by using accelerated particle swarm
optimization (APSO) type of swarm search that
enhanced analytical accuracy within reasonable
processing time.

Yanhao Huang and 2015 Proposed the structure, elements, basic calculations and
Xiaoxin Zhou. [22] multi-dimensional reasoning method of the new
knowledge model. Research shows more powerful and
adapts various knowledge requirements of electric
power big data.

Marco Viceconti, Peter 2015 Proposed that bid data analytics can successfully
Hunter, and Rod Hose. combined with VPH technology to give desirable
[21] medical solutions.

Alun Evans Javi 2015 Presented a web-based application having analytic

Agenjo Josep Blat. [28] visualization of on-set media data and metadata, which
combines research from several fields of image
processing and 3D graphics.

Syed Akhter Hossain. 2015 Described the nascent field of big data analytics in
[29] education with discussion on prospects and challenges
way forward. Also focus on research and development
issues for educationist and practitioners of big data
analytics.
Xue-Wen Chen AND 2014 Presented overview of deep learning, and also highlight
Xiaotong Lin. [30] current research efforts and the challenges to big data,
as well as the future trends.

Zhi-Hua Zhou,Nitesh 2014 Share the data analytics opinions of the authors
V. Chawla,Yaochu regarding new opportunities and challenges of the big
Jin,Graham J. data movement. The aim of this paper is to evoke
Williams. [31] discussion rather than to provide a comprehensive
survey of big data research.

Matturdi Bardi, Zhou 2014 Reviewed the various benefits and challenges of
Xianwei, LI Shuai, security and privacy in Big Data and also presented
LIN Fuhong. [32] some possible methods and techniques to ensure Big
Data security and privacy.

Suman Storage, 2014 Study and analyzed various techniques of scheduling

Dr.Madhu Goel. [7] which enhance the performance by using Hadoop.

Chang Liu, Jinjun 52014 Presented types of fine-grained data updates and
Chen, Chi Yang, Rajiv scheme that can fully support authorized auditing and
Ranjan, and fine-grained update requests. Also propose an
Ramamohanarao enhancement that can reduce communication overheads
Kotagiri. [16] for verifying small updates.

Shifeng Fang, Li Da 2014 Introduces a novel IIS that combines Internet of Things
Xu, Yunqiang Zhu, (IoT), Cloud Computing, Geoinformatics, geographical
Jiaerheng Ahati, Huan information system (GIS) and e-Science for
Pei, Jianwu Yan, and environmental monitoring and management, with a
Zhihui Liu. [17] case study on climate change and its ecological effects
of a particular region.

Daisuke Takaishi, 2014 Proposed a new mobile sink routing and data gathering
Hiroki Nishiyamai, Nei method with the help of network clustering based on
Katoi and Ryu Miura. modified expectation maximization technique.
[18]
Andrea Marinoni, 2013 Provided study of the connection between air pollution
Arianna Dagliati, and clinical records, than correlations among black
Riccardo Bellazzi, particulate concentration, micro and macro-vascular
Paolo Gamba1. [27] disease can be drawn properly.

Xiongpai Qin, and 2013 Reviewed last several years big data benchmark work
Xiaoyun Zhou. [4] and their characteristics are analyzed.

Rakesh Varma. [6] 2013 Objective of the research is to study about MapReduce
and various algorithms of scheduling which enhance
the scheduling performance.
Daniel Warneke. [15] 2011 Discuss the opportunities and challenges for parallel
data processing in clouds and present Nephele. And
evaluate the MapReduce process and compare the
result of framework Hadoop data processing.

Jasmin Azemovic, 2010 Presented research on using different data types for
Denis Music. [13] storing unstructured data within database and this
research is inspired with current situation of
information society.

Mengjie Zhou, Haoji 2010 Proposed a SLCA (Smallest Lowest Common

Hu and Minqi Zhou. Ancestor) based keyword search implementation for
[14] large-scale XML data sets on a MapReduce cluster.

Leonardo 2010 Outline the S4 architecture and describe

Neumeyer,Bruce applications of real-life deployments. They includes
Robbins, Anish Nair, large scale applications for data mining and machine
Anand Kesari. [5] learning .

BI Shuoben, Xu Yin, 2009 Introduces the single-dimensional Boolean association

Jiao Feng, Lü Guonian, rule on Apriori algorithm, and the data mining
PEI Anping. [12] algorithm of the multi-dimensional association rule
based on BUC algorithm .

Hui Fang, Ming Yang, 2007 Proposed a approach to localize the vehicle position
Ruqing Yang. [11] with respect to a global map, It is based on the texture
of ground from where the vehicle moves.

Seema Metikurke, 2006 Describes a grid-enabled approach for automatic web

Vijay K. Vaishnavi. page classification that applies the vector space model
[10] information retrieval strategy.

John. H. Phan, Chang. 2004 Reporting the results of the first phase development of
F. Quo, and May D. novel system, to use unsupervised methods of
Wang. [9] clustering to discover relationship of genes and
knowledge-based supervised classification is used to
get accurate prediction in cancer diagnosis.

Sushant Goel, Hema 2003 Distribute the scheduling responsibilities to the nodes
Sharda, David Tanid. where data is actually located and also propose a new
[8] serializability criterion, Parallel Database Quasi-
Serializability.

3. Conclusion:
A survey of different Big data approaches is presented of recent years. It is found that solutions
to Big Data problem are largely based on the MapReduce framework and its open source
implementation Hadoop, Hadoop handles the data volume challenge successfully. Big data
management includes different tools, techniques and various algorithms for job scheduling in
hadoop. This paper helps to a novice who wants to pursue his/her career in the field of big data.

4. References:

[1] Sreedhar C.N, Kasiviswanath, P. Chenna Reddy, “A Survey on Big Data Management and
Job Scheduling" International Journal of Computer Applications (0975 – 8887)
Volume 130 – No.13, November 2015.
[2] Dawei Jiang, Sai Wu, Gang Chen, Beng Chin Ooi1and Kian-Lee Tan, Jun Xu, "epiC: an
extensible and scalable system for processing Big Data"The VLDB Journal (2016) 25:3–26
DOI 10.1007/s00778-015-0393-2.
[3] Marcos D. Assunçãoa, Rodrigo N. Calheiros, Silvia Bianchi, Marco A.S. Nettoc, Rajkumar
Buyya, "Big Data computing and clouds: Trends and future directions"J. Parallel Distrib.
Comput. 79–80 (2015) 3–15.
[4] Xiongpai Qin, and Xiaoyun Zhou, "A Survey on Benchmarks for Big Data and Some More
Considerations" H. Yin et al. (Eds.): IDEAL 2013, LNCS 8206, pp. 619–627, 2013. Springer-
Verlag Berlin Heidelberg 2013.
[5] Leonardo Neumeyer,Bruce Robbins,Anish Nair,Anand Kesari, "S4: Distributed Stream
Computing Platform" 2010 IEEE International Conference on Data Mining Workshops.
[6] Rakesh Varma,"Survey on MapReduce and Scheduling Algorithms in Hadoop" International
Journal of Science and Research (IJSR) ISSN (Online): 2319-7064 Index Copernicus Value
(2013): 6.14 | Impact Factor (2013): 4.438.
[7] Suman Storage and Dr.Madhu Goel, "Survey Paper on Scheduling in Hadoop" Volume 4,
Issue 5, May 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer
Science and Software Engineering.
[8] Sushant Goel, Hema Sharda and David Tanid, "Distributed Scheduler for High Performance
Data-Centric Systems" b7803-76CI-XIO1lB17.00 0 2003 IEEE.
[9] John. H. Phan, Chang. F. Quo, and May D. Wang, "Comparative Study of Microarray Data
for Cancer Research" proceedings of the 26th Annual International Conference of IEEE EMBS
San Francisco, CA, USA * September 1-5, 2004.
[10] Seema Metikurke and Vijay K. Vaishnavi, "Grid-Enabled Automatic Web Page
Classification" 2006 IEEE International Conference on Fuzzy Systems Sheraton Vancouver Wall
Centre Hotel, Vancouver, BC, Canada July 16-21, 2006.
[11] Hui Fang, Ming Yang and Ruqing Yang, "Ground Texture Matching based Global
Localization for Intelligent Vehicles in Urban Environment" Proceedings of the 2007 IEEE
Intelligent Vehicles Symposium Istanbul, Turkey, June 13-15, 2007.
[12] BI Shuoben, XU Yin, JIAO Feng, Lü Guonian, PEI Anping, "Study on Data Mining in First
Period of Jiangzhai Site Based on the Association Algorithms" 2009 International Conference on
Artificial Intelligence and Computational Intelligence, 2009 IEEE DOI 10.1109/AICI.2009.
[13] Jasmin Azemovic,Denis Music, "Comparative analysis of efficient methods for storing
unstructured data into database with accent on performance" 201O,IEEE 2nd International
Conference on Education Technology and Computer (ICETC).
[14] Mengjie Zhou,Haoji Hu and Minqi Zhou, "Searching XML Data by SLCA on a MapReduce
Cluster” 2010 IEEE.
[15] Daniel Warneke, "Exploiting Dynamic Resource Allocation for Efficient Parallel Data
Processing in the Cloud" IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED
SYSTEMS, VOL. 22, NO. 6, JUNE 2011.
[16] Chang Liu, Jinjun Chen, Chi Yang, Rajiv Ranjan and Ramamohanarao Kotagiri,
"Authorized Public Auditing of Dynamic Big Data Storage on Cloud with Efficient Verifiable
Fine-Grained Updates" IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED
SYSTEMS, VOL. 25, NO. 9, SEPTEMBER 2014.
[17] Shifeng Fang, Li Da Xu, Yunqiang Zhu, Jiaerheng Ahati, Huan Pei, Jianwu Yan, and Zhihui
Liu, "An Integrated System for Regional Environmental Monitoring and Management Based on
Internet of Things" IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 10,
NO. 2, MAY 2014.
[18] Daisuke Takaishi, Hiroki Nishiyamai, Nei Katoi and Ryu Miura, "Toward Energy Efficient
Big Data Gathering in Densely Distributed Sensor Networks" 2014 IEEE.
[19] Chao Wang, Xi Li, Peng Chen, Aili Wang, Xuehai Zhou and Hong Yu, "Heterogeneous
Cloud Framework for Big Data Genome Sequencing" IEEE/ACM TRANSACTIONS ON
COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, VOL. 12, NO. 1,
JANUARY/FEBRUARY 2015.
[20] Tao Xu, Dongsheng Wang and Guodong Liu, "Banian: A Cross-Platform Interactive Query
System for Structured Big Data" TSINGHUA SCIENCE AND TECHNOLOGY ISSN 1007-021
07/11 p p 6 2- 7 1 Volume 20, Number 1, February 2015.
[21] Marco Viceconti, Peter Hunter and Rod Hose, "Big Data, Big Knowledge: Big Data
for Personalized Healthcare" IEEE JOURNAL OF BIOMEDICAL AND HEALTH
INFORMATICS, VOL. 19, NO. 4, JULY 2015.
[22] Yanhao Huang and Xiaoxin Zhou, "Knowledge Model for Electric Power Big Data
Based on Ontology and Semantic Web" CSEE JOURNAL OF POWER AND ENERGY
SYSTEMS, VOL. I, NO. I, MARCH 2015.
[23] Simon Fong, Raymond Wong, and Athanasios V. Vasilakos, "Accelerated PSO Swarm
Search Feature Selection for Data Stream Mining Big Data" IEEE TRANSACTIONS ON
JOURNAL NAME, 2015.
[24] Hongbing Wang, Chao Yu, Lei Wan and Qi Yu, "Effective BigData-Space Service
Selection over Trust and Heterogeneous QoS Preferences" IEEE, 2015.
[25] Claudio A. Ardagna, Ernesto Damiani, Fulvio Frati, Davide Rebeccani,"A Configuration-
Independent Score-Based Benchmark for Distributed Databases" DOI
10.1109/TSC.2015.2485985, IEEE Transactions on Services Computing.
[26] QINGHUA LU, ZHENG LI, MARIA KIHL, LIMING ZHU AND WEISHAN ZHANG1, "
CF4BDA: A Conceptual Framework for Big Data Analytics Applications in the Cloud" IEEE
October 27, 2015.
[27] Andrea Marinoni, Arianna Dagliati, Riccardo Bellazzi, Paolo Gamba1, "INFERRING AIR
QUALITY MAPS FROM REMOTELY SENSED DATA TO EXPLOIT EOREFERENCED
CLINICAL ONSETS: THE PAVIA 2013 CASE” IEEE, 2015.
[28] Alun Evans Javi Agenjo Josep Blat, "COMBINED 2D AND 3D WEB-BASED
VISUALISATION OF ON-SET BIG MEDIA DATA" 978-1-4799-8339-1/15 2015 IEEE.
[29]Syed Akhter Hossain, "Big Data Analytics in Education: Prospects and Challenges" 978-1-
4673-7231-2/15/ 2015 IEEE.
[30] XUE-WEN CHEN1, AND XIAOTONG LIN, "Big Data Deep Learning: Challenges and
Perspectives" May 16, 2014, IEEE.
[31] Zhi-Hua Zhou,Nitesh V. Chawla,Yaochu Jin,Graham J. Williams, "Big Data Opportunities
and Challenges: Discussions from Data Analytics Perspectives" IEEE Computational
intelligence magazine | November 2014.
[32] MATTURDI Bardi, ZHOU Xianwei, LI Shuai, LIN Fuhong, "Big Data security and
privacy: A review” China Communications Supplement No.2 2014.
[33] Daniele Apiletti, Elena Baralis, Tania Cerquitelli, Paolo Garza, Fabio Pulvirenti, Luca
Venturini , "Frequent itemsets mining for big data: A Comparative Analysis" IEEE ,Aug 2017.
[34] Dinesh J. Prajapati,Sanjay Garg, N.C. Chauhan, "MapReduce Based Multilevel Consistent
and Inconsistent Association Rule Detection from Big Data Using Interestingness Measures"
vol-9 September 2017,IEEE.
[35] Robin Genuer, Jean-Michel Poggi, Christine Tuleau-Malot, Nathalie Villa-Vialaneix,
"Random Forests for Big Data" Vol-23,IEEE 2017.
[36].M. Bakratsas, P. Basaras, D. Katsaros , L. Tassiulas, "Hadoop MapReduce Performance on
SSDs for Analyzing Social Networks " IEEE 2017.
[37] Ziliang Zong, Rong Ge, Qijun Gu, "Marcher: A Heterogeneous System Supporting Energy-
Aware High Performance Computing and Big Data Analytics" Volume 8, July 2017.
[38] Guangchen Ruan and Hui Zhang, "Closed-loop Big Data Analysis with Visualization and
Scalable Computing ". Volume 8, July 2017.
[39] Navroop Kaur , Sandeep K. Sood, "Efficient Resource Management System Based on 4Vs
of Big Data Streams " Volume 13,April 2017.
[40] Feras A. Batarseh, Eyad Abdel Latif , "Assessing the Quality of Service Using Big Data
Analytics: With Application to Healthcare" Volume4, June 2016.
[41] Gantz J, Reinsel D. The digital universe in 2020: Big data, bigger digital shadows, and
biggest growth in the Far East [J]. IDC iView: IDC Analyze the Future, 2012.
[42] Weiss R, Zgorski L. Obama Administration Unveils “Big Data” Initiative: Announces $200
Million in New R&D Investments [J]. Office of Science and Technology Policy, Washington,
DC, 2012. Data P. The Emergence of a New Asset Class[C]//World Economic Forum Report.
2011.
[43] Anderson C. The end of theory: the data deluge makes the scientific method obsolete. Wired
Magazine 16.07[J]. 2008.
[44] Mayer-Schönberger V, Cukier K. Big data: A revolution that will transform how we live,
work, and think [M]. Houghton Mifflin Harcourt, 2013.
[45] Ardagna C A, Damiani E. Business Intelligence meets Big Data: An Overview on Security
and Privacy [J].
[46] Manyika J, Chui M, Brown B, et al. Big data: The next frontier for innovation, competition,
and productivity [J]. 2011.
[47] Laney D. 3-D Data Management: Controlling Data Volume [J]. Velocity and Variety,
META Group Original Research Note, 2001.
[48] Beyer M. Gartner says solving big data challenge involves more than just managing
volumes of data. Gartner [J]. 2011.
[49] Beyer M A, Laney D. The importance of 'big data': a definition [J]. Stamford, CT: Gartner,
2012.
[50] Lefevre C. LHC: the guide (English version) [R]. 2009.[14] Brumfiel G. Down the petabyte
highway[J]. Nature, 2011, 469(20): 282-283.
[51] Mangelsdorf J. Supercomputing the climate: Nasa’s big data mission[J]. Accessed online,
2013: 11-27.
[52] Kalil T. Big data is a big deal [J]. The White House, 2012.
[53] Sheet F. Big Data Across the Federal Government [J]. 2012 03-29)[2013-03-06].
http://www. whitehouse, gov/sites/default/ files/microsites/ostp/big_ data fact sheet final. pdf,
2012.
[54] Lampitt A. ’The real story of how Big Data analytics helped Obama win’ [J]. Info World,
2013, 14.

Big-Data - Analytics Projects Failure - A Literature Review
No ratings yet
Big-Data - Analytics Projects Failure - A Literature Review
10 pages
Research Data Strategy
No ratings yet
Research Data Strategy
9 pages
IBM Big Data Analytics Study 2013 - Annotated
No ratings yet
IBM Big Data Analytics Study 2013 - Annotated
20 pages
Big Data Analytics From Strategic Planning To Enterprise Integration With Tools Techniques Nosql and Graph by David Loshin 0124173195
No ratings yet
Big Data Analytics From Strategic Planning To Enterprise Integration With Tools Techniques Nosql and Graph by David Loshin 0124173195
5 pages
Big Data Analytics in Building The Competitive Intelligence of Organizations2021International Journal of Information Management
No ratings yet
Big Data Analytics in Building The Competitive Intelligence of Organizations2021International Journal of Information Management
13 pages
CCS0021L - Information Management (LAB)
No ratings yet
CCS0021L - Information Management (LAB)
5 pages
SAC - System Sizing, Tuning, and Limits
No ratings yet
SAC - System Sizing, Tuning, and Limits
4 pages
Big Data
No ratings yet
Big Data
16 pages
Big Data Analytics: A Literature Review Paper: Abstract. in The Information Era, Enormous Amounts of Data Have Become
No ratings yet
Big Data Analytics: A Literature Review Paper: Abstract. in The Information Era, Enormous Amounts of Data Have Become
14 pages
Big Data
No ratings yet
Big Data
3 pages
Big Data Analytics
100% (1)
Big Data Analytics
3 pages
Big Data: NADC Says: Every Day, We Create 2.5 Quintillion Bytes of Data - So Much That 90% of The Data in The
No ratings yet
Big Data: NADC Says: Every Day, We Create 2.5 Quintillion Bytes of Data - So Much That 90% of The Data in The
3 pages
Unit I-Ch 01-Big Data Introduction
No ratings yet
Unit I-Ch 01-Big Data Introduction
40 pages
Big Data
No ratings yet
Big Data
16 pages
Big Data Not Right Data Yes
No ratings yet
Big Data Not Right Data Yes
8 pages
Big Data Analytical Tools
100% (1)
Big Data Analytical Tools
8 pages
Big Data Analytics Use Cases
No ratings yet
Big Data Analytics Use Cases
24 pages
Big Data Metods
No ratings yet
Big Data Metods
23 pages
Why Data Preprocessing?: Incomplete
No ratings yet
Why Data Preprocessing?: Incomplete
17 pages
Big Data Analytics
No ratings yet
Big Data Analytics
3 pages
Hadoop Ecosystem Large PDF
No ratings yet
Hadoop Ecosystem Large PDF
229 pages
Big Data in E-Commerce
100% (2)
Big Data in E-Commerce
21 pages
11-12 Big Data Concepts and Tools
No ratings yet
11-12 Big Data Concepts and Tools
30 pages
Seminar
No ratings yet
Seminar
16 pages
Data Analytics
No ratings yet
Data Analytics
11 pages
Big Data Technologies
No ratings yet
Big Data Technologies
4 pages
Big Data in Marketing PDF
No ratings yet
Big Data in Marketing PDF
57 pages
Toronto Data Online Curriculum
No ratings yet
Toronto Data Online Curriculum
11 pages
Big Educational Data & Analytics Survey
No ratings yet
Big Educational Data & Analytics Survey
23 pages
Big Data: Presented by J.Jitendra Kumar
No ratings yet
Big Data: Presented by J.Jitendra Kumar
14 pages
Big Data Analytics Presentation
100% (1)
Big Data Analytics Presentation
34 pages
Big Data: by It Faculty Alttc Ghaziabad
No ratings yet
Big Data: by It Faculty Alttc Ghaziabad
26 pages
Project
No ratings yet
Project
14 pages
A Seminar Report: Big Data
No ratings yet
A Seminar Report: Big Data
22 pages
Big Data and Spark Developers
No ratings yet
Big Data and Spark Developers
5 pages
Big Data PDF
0% (1)
Big Data PDF
40 pages
Business Intelligence & Business Analytics
No ratings yet
Business Intelligence & Business Analytics
8 pages
Capacity Building Framework On Data Analytics - en
100% (1)
Capacity Building Framework On Data Analytics - en
67 pages
Big Data Analysis
100% (1)
Big Data Analysis
30 pages
Part 1 - Introduction To Big Data
No ratings yet
Part 1 - Introduction To Big Data
24 pages
Big Data Use Case Template 2
No ratings yet
Big Data Use Case Template 2
27 pages
Literature Review On Big Data Analytics Vishal Kumar Harsh Bansal
No ratings yet
Literature Review On Big Data Analytics Vishal Kumar Harsh Bansal
6 pages
Data Mining Information
100% (1)
Data Mining Information
15 pages
Spark Use Cases
No ratings yet
Spark Use Cases
2 pages
Data Quality - Trusted Data Across The Entreprise - Overview
100% (1)
Data Quality - Trusted Data Across The Entreprise - Overview
14 pages
Data Science in E-Commerce - Report - Writing
No ratings yet
Data Science in E-Commerce - Report - Writing
18 pages
Big Data's Human Component
No ratings yet
Big Data's Human Component
4 pages
Introduction To Big Data Analytics
100% (4)
Introduction To Big Data Analytics
112 pages
Big Data Analytics
No ratings yet
Big Data Analytics
134 pages
Big Data Engineering PDF
No ratings yet
Big Data Engineering PDF
17 pages
DTS Modul Data Science Methodology
100% (1)
DTS Modul Data Science Methodology
56 pages
Big Data Research Paper
No ratings yet
Big Data Research Paper
10 pages
LS1.1 - V6 Generalized Architecture of Big Data Systems
No ratings yet
LS1.1 - V6 Generalized Architecture of Big Data Systems
8 pages
Data Analytics
100% (1)
Data Analytics
24 pages
Business Intelligence
No ratings yet
Business Intelligence
41 pages
Data Science 2015
No ratings yet
Data Science 2015
229 pages
Big Data Use Cases M0277 - v3 - 4193655353
No ratings yet
Big Data Use Cases M0277 - v3 - 4193655353
133 pages
Big Data Analytics
100% (1)
Big Data Analytics
31 pages
Big Data
100% (3)
Big Data
13 pages
Real-World Use of Big Data in Telecommunications
No ratings yet
Real-World Use of Big Data in Telecommunications
20 pages
(Excerpts From) Investigating Performance: Design and Outcomes With Xapi
From Everand
(Excerpts From) Investigating Performance: Design and Outcomes With Xapi
Janet Laane Effron
No ratings yet
The Definitive Guide to Data Integration: Unlock the power of data integration to efficiently manage, transform, and analyze data
From Everand
The Definitive Guide to Data Integration: Unlock the power of data integration to efficiently manage, transform, and analyze data
Pierre-yves Bonnefoy
No ratings yet
Amazon FinSpace Documentation
No ratings yet
Amazon FinSpace Documentation
4 pages
SQLBase Connecting. Guide To Connecting To SQLBase 20-6245-0001. Connecting To Sqlbase Page 1
No ratings yet
SQLBase Connecting. Guide To Connecting To SQLBase 20-6245-0001. Connecting To Sqlbase Page 1
123 pages
Create An Application First Steps
No ratings yet
Create An Application First Steps
10 pages
Analytical CRM
No ratings yet
Analytical CRM
97 pages
WinterWeekend 21jan2024 V3 Finalised
No ratings yet
WinterWeekend 21jan2024 V3 Finalised
4 pages
IoTDBS 2024 Presentation
No ratings yet
IoTDBS 2024 Presentation
18 pages
Actuate Auto Archive and Deletion Policy
No ratings yet
Actuate Auto Archive and Deletion Policy
3 pages
Week3-Name Ranges (Useful)
No ratings yet
Week3-Name Ranges (Useful)
1 page
DB Bank Answers
No ratings yet
DB Bank Answers
20 pages
Advanced Normalization Transparencies
No ratings yet
Advanced Normalization Transparencies
30 pages
Da 100 5
No ratings yet
Da 100 5
3 pages
UNIT IV-Programming On Amazon AWS
No ratings yet
UNIT IV-Programming On Amazon AWS
3 pages
Create Table Tblnotes (
No ratings yet
Create Table Tblnotes (
3 pages
Cs301 Midterm Mcqs
No ratings yet
Cs301 Midterm Mcqs
23 pages
TVDM Digital 5 - Dimentional Modeling
No ratings yet
TVDM Digital 5 - Dimentional Modeling
48 pages
TL Bts Eventum Howto
No ratings yet
TL Bts Eventum Howto
4 pages
DAA-C01 Dumps - Snowflake Certified SnowPro Advanced - Data Analyst
No ratings yet
DAA-C01 Dumps - Snowflake Certified SnowPro Advanced - Data Analyst
11 pages
21 03 2012 M3u
No ratings yet
21 03 2012 M3u
6 pages
Structured Query Language
No ratings yet
Structured Query Language
34 pages
Resume-Ankit Jain
No ratings yet
Resume-Ankit Jain
3 pages
DBMS Using MS-Access
No ratings yet
DBMS Using MS-Access
34 pages
Oracle Partitioning
No ratings yet
Oracle Partitioning
180 pages
NitroView ESM ELM Datasheet
No ratings yet
NitroView ESM ELM Datasheet
7 pages
IECV2 - CFSV2 (1) (1) .0 - MT2 - V2.0.mdb (3) Send by Ajay
No ratings yet
IECV2 - CFSV2 (1) (1) .0 - MT2 - V2.0.mdb (3) Send by Ajay
28 pages
English Congo and Congo English Dictiona
No ratings yet
English Congo and Congo English Dictiona
287 pages
Memory Storage 77
No ratings yet
Memory Storage 77
5 pages
Data Structures and Algorithm
No ratings yet
Data Structures and Algorithm
3 pages
Bim & VCD
No ratings yet
Bim & VCD
2 pages