100% found this document useful (3 votes)

997 views79 pages

Big Data Analytics

The document provides an overview of big data analytics and related topics. It discusses the evolution of technology, defines what big data is and common issues. It also covers big data analytics vs data science analytics, the six V's of big data, sources of big data, predictive analysis processing, popular big data tools in 2020, Hadoop introduction and architecture, and big data job opportunities.

Uploaded by

sania2011

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

100% found this document useful (3 votes)

997 views79 pages

Big Data Analytics

Uploaded by

sania2011

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 79

FACULTY

DEVELOPMENT
PROGRAM

ANALYTICS

JAI PRATAP DIXIT

HEAD-INFORMATION TECHNOLOGY
Ambalika Institute of Management and Technology Lucknow
• Evolution of Technology
• What is Big data , Issues , Issues with existing
System, Challenges
• Use Case of Big Data
• Benefits of Big Data
• Big Data Analytics Vs Data Science Analytics
• 6V’s of Big Data
• The Sources of Big Data
• Big Data Predictive Analysis Processing
• Big Data Tools in 2020
• Best Big Data Analytics Tools in 2020
• BIG Data Hadoop Introduction, Architecture
• How Does Hadoop Work? and Its Advantages
• Hadoop Environmental Setup
• Hadoop HDFS, MAPREDUCE, YARN Operations
• Big Data JOB Opportunities
Units of data
• The bit
• Nibble
• The Byte
• Kilobyte (1024 Bytes)
• Megabyte (1024 Kilobytes) Then there is the hypothetical
• Gigabyte (1,024 Megabytes) "Googolbyte" which would be a
number of bytes equal to a 10
• Terabyte (1,024 Gigabytes)** followed by 100 zeroes.
• Petabyte (1,024 Terabytes)
• Exabyte (1,024 Petabytes)
• Zettabyte (1,024 Exabytes)
• Yottabyte (1,204 Zettabytes)
What is Big Data
( Large amount of Data)

“Big Data are high volume, high velocity, or high-variety information assets that
require new forms of processing to enable enhanced decision making, insight
discovery, and process optimization.”
Big data analytics is the often complex process of examining large and
varied data sets, or big data, to uncover information -- such as hidden patterns,
unknown correlations, market trends and customer preferences that can help
organizations make informed business decisions
Huge Volume of Data which cannot be processed
with traditional approach with a given time frame.
Big Data Issues

Technology moves too fast

Inaccurate data
Lack of skilled workers
Complexity of Storage Size
Backup And Recovery
Challenge of Search &Optimizing Time
Variations of The Tools & Technologies
Dynamism of Change & Scalability
Resistance in Building The Man Power
Speed of Research in Applying
Variety of Data Formats That Arise
Composition of Demanded Multiple Skill sets
Etc….
Issues with Existing System
Capturing data
Storage
Searching
Big Data Sharing
Challenges Transfer
Analysis
Use Case of Big Data
 Recommendation Engines
 Analyzing Call Details Records
 Fraud Detection
 Market Basket Analysis
 Sentimental Analysis
Big Data Analytics Vs Data Science Analytics

…………………Dealing with unstructured and structured that

comprises everything that related to data cleansing, preparation,
and analysis.

Deals with the combination of statistics, mathematics,

programming, problem-solving, capturing data in ingenious
ways, the ability to look at things differently, and the activity of
cleansing, preparing, and aligning the data.
6V’s of Big Data
Quantity of data
Data sets too large to store and analyze using traditional databases
Different types of data that we can use

Structured
Semi structured
Unstructured
Speed at which data is generated
Speed at which data is moving around and analyzed
Analyze data while it is being generated without even putting it into
databases
Getting value out of Big Data!!!
Data in doubt
Variability
Variability refers to how spread out a group of data
is. The common measures of variability are the
range, variance, and standard deviation.

Data sets with similar values are said to have

little variability while data sets that have values
that are spread out have high variability
The Sources of Big Data
 Black Box Data
 Social Media Data
 Stock Exchange Data
 Power Grid Data
 Transport Data
 Search Engine Data
 Banking Data
 Website Data
 Retails Data
 Etc….
APPLICATION OF BIG DATA
 Banking and Securities
 Communications, Media and Entertainment
 Healthcare Providers
 Education
 Manufacturing and Natural Resources
 Government
 Insurance
 Retail and Wholesale trade
 Transportation
 Energy and Utilities
Companies using data and analytics to gain a
competitive edge
Big data is big business
Amazon American Express
Netflix Next Big Sound
Starbucks IBM
Enterprise SAP
Microsoft Amazon Web Services
Google TIBCO
BigPanda Splice Machine
Etc…
Big Data Analysis

SMART METER
Top Big Data Tools
The Apache Hadoop software library is a big data framework. It allows
distributed processing of large data sets across clusters of computers. It is
designed to scale up from single servers to thousands of machines.
Features:
•Authentication improvements when using HTTP proxy server
•Specification for Hadoop Compatible Filesystem effort
•Support for POSIX-style filesystem extended attributes
•It offers robust ecosystem that is well suited to meet the analytical needs
of developer
•It brings Flexibility In Data Processing
•It allows for faster data Processing
HPCC is a big data tool developed by LexisNexis Risk Solution.
It delivers on a single platform, a single architecture and a single programming
language for data processing.

3 LQ Labs, Theart Metrix, Elsevier

Features:

•Highly efficient accomplish big data tasks with far less code.

•Offers high redundancy and availability

•It can be used both for complex data processing on a Thor cluster
Storm is a free and open source big data computation system. It offers
distributed real-time, fault-tolerant processing system. With real-time
computation capabilities.

Full Contact Inc, Lookout

Features:

•It benchmarked as processing one million 100 byte messages per second per
node
•It uses parallel calculations that run across a cluster of machines
Qubole Data is Autonomous Big data management platform. It is self-
managed, self-optimizing tool which allows the data team to focus on
business outcomes.
AMAZON
Features:
•Single Platform for every use case
•Open-source Engines, optimized for the Cloud
•Comprehensive Security, Governance, and Compliance
The Apache Cassandra database is widely used today to provide an
effective management of large amounts of data.
NETFLIX, WALMART LABS, CERN
Features:
•Support for replicating across multiple data centers by providing lower
latency for users
•Data is automatically replicated to multiple nodes for fault-tolerance
Statwing is an easy-to-use statistical tool. It was built by and
for big data analysts. Its modern interface chooses statistical
tests automatically.
Features:
•Explore any data in seconds
•Statwing helps to clean data, explore relationships, and create
charts in minutes
CouchDB stores data in JSON documents that can be accessed web or query
using JavaScript. It offers distributed scaling with fault-tolerant storage. It
allows accessing data by defining the Couch Replication Protocol.
ADP, REFINITIV , APPLE
Features:
•CouchDB is a single-node database that works like any other database
•It allows running a single logical database server on any number of servers
Pentaho provides big data tools to extract, prepare and blend data. It offers
visualizations and analytics that change the way to run any business. This
Big data tool allows turning big data into big insights.
RELIANCE JIO, JPMORGAN CHASE
Features:
•Data access and integration for effective data visualization
•It empowers users to architect big data at the source and stream them for
accurate analytics
Apache Flink is an open-source stream processing Big data tool. It is distributed,
high-performing, always-available, and accurate data streaming applications.
XING, SALECYCLE
Features:
•Provides results that are accurate, even for out-of-order or late-arriving data
•This big data tool supports stream processing and windowing with event time
semantics
•It supports flexible windowing based on time, count, or sessions to data-driven
windows
Cloudera is the fastest, easiest and highly secure modern big data platform. It allows
anyone to get any data across any environment within single, scalable platform.
GLOBAL SYSTEM , EXCEL GLOBAL SYSTEM

Features:
•High-performance analytics
•It offers provision for multi-cloud
•Deploy and manage Cloudera Enterprise across AWS, Microsoft Azure and Google
Cloud Platform
Open Refine is a powerful big data tool. It helps to work with messy data, cleaning it
and transforming it from one format into another. It also allows extending it with
web services and external data.
FASTLIX , APICRAFTER
Features:
•Open Refine tool help you explore large data sets with ease
•It can be used to link and extend your dataset with various web services
•Import data in various formats
RapidMiner is an open source big data tool. It is used for data prep,
machine learning, and model deployment. It offers a suite of products to
build new data mining processes and setup predictive analysis.

Features:

•Allow multiple data management methods

•GUI or batch processing
•Integrates with in-house databases
DataCleaner is a data quality analysis application and a solution platform. It
has strong data profiling engine. It is extensible and thereby adds data
cleansing, transformations, matching, and merging.
Feature:
•Interactive and explorative data profiling
•Fuzzy duplicate record detection
•Data transformation and standardization
Hive is an open source-software big data too. It allows programmers
analyze large data sets on Hadoop. It helps with querying and managing
large datasets real fast.
MICROSOFT , APPLE
Features:
•It Supports SQL like query language for interaction and Data modeling
•It compiles language with two main tasks map, and reducer
Kaggle is the world's largest big data community. It helps
organizations and researchers to post their data & statistics. It
is the best place to analyze data seamlessly.
Features:
•The best place to discover and seamlessly analyze open data
SAP
SAP's main Big Data tool is its HANA in-memory relational
database that works with Hadoop. HANA is a traditional row-
and-column database, but it can perform advanced analytics,
like predictive analytics, spatial data processing, text analytics,
text search, streaming analytics, and graph data processing and
has ETL (Extract, Transform, and Load) capabilities. SAP also
offers data warehousing to manage all of your data from a
single platform, cloud services, as well as data management
tools for governance, orchestration, cleansing, and storage.
Best Big Data Analytics Tools in 2020
Xplenty is a cloud-based ETL solution providing simple visualized data pipelines for
automated data flows across a wide range of sources and destinations. Xplenty's
powerful on-platform transformation tools allow you to clean, normalize, and
transform data while also adhering to compliance best practices.
Features:
•Powerful, code-free, on-platform data transformation offering
•Rest API connector - pull in data from any source that has a Rest API
Azure HDInsight is a Spark and Hadoop service in the cloud. It provides big
data cloud offerings in two categories, Standard and Premium. It provides
an enterprise-scale cluster for the organization to run their big data
workloads.
Features:
•Reliable analytics with an industry-leading SLA
•It offers enterprise-grade security and monitoring
•Protect data assets and extend on-premises security and governance
controls to the cloud
Skytree is a big data analytics tool that empowers data scientists to build
more accurate models faster. It offers accurate predictive machine learning
models that are easy to use.
Features:
•Highly Scalable Algorithms
•Artificial Intelligence for Data Scientists
•It allows data scientists to visualize and understand the logic behind ML
decisions
Talend is a big data tool that simplifies and automates big data integration.
Its graphical wizard generates native code. It also allows big data
integration, master data management and checks data quality.
Features:
•Accelerate time to value for big data projects
•Simplify ETL & ELT for big data
•Talend Big Data Platform simplifies using MapReduce and Spark by
generating native code
Splice Machine is a big data analytic tool. Their architecture is portable
across public clouds such as AWS, Azure, and Google.
Features:
•It can dynamically scale from a few to thousands of nodes to enable
applications at every scale
•The Splice Machine optimizer automatically evaluates every query to the
distributed H-Base regions
Apache Spark is a powerful open source big data analytics tool.
It offers over 80 high-level operators that make it easy to build
parallel apps. It is used at a wide range of organizations to
process large datasets.
Features:
•It helps to run an application in Hadoop cluster, up to 100
times faster in memory, and ten times faster on disk
•It offers lighting Fast Processing
Plotly is an analytics tool that lets users create charts and dashboards to share
online.
Features:
•Easily turn any data into eye-catching and informative graphics
•It provides audited industries with fine-grained information on data provenance

Apache SAMOA:

Apache SAMOA is a big data analytics tool. It enables development of new ML

algorithms. It provides a collection of distributed algorithms for common data
mining and machine learning tasks.
Lumify is a big data fusion, analysis, and visualization platform. It helps
users to discover connections and explore relationships in their data via a
suite of analytic options.
Features:
•It provides both 2D and 3D graph visualizations with a variety of automatic
layouts
•It provides a variety of options for analyzing the links between entities on
the graph
Elastic search is a JSON-based Big data search and analytics engine. It is a
distributed, RESTful search and analytics engine for solving numbers of use
cases. It offers horizontal scalability, maximum reliability, and easy
management.
Features:
•It allows combine many types of searches such as structured, unstructured,
geo, metric, etc
•Intuitive APIs for monitoring and management give complete visibility and
control
R is a language for statistical computing and graphics. It also
used for big data analysis. It provides a wide variety of
statistical tests.
Features:
•Effective data handling and storage facility,
•It provides a suite of operators for calculations on arrays, in
particular, matrices,
•It provides coherent, integrated collection of big data tools for
data analysis
Kafka
Kafka is often used in real-time streaming data architectures
to provide real-time analytics. Since Kafka is a fast, scalable,
durable, and fault-tolerant publish-subscribe messaging
system, Kafka is used in use cases where JMS, Rabbit MQ,
and AMQP may not even be considered due to volume and
responsiveness.
Kafka has higher throughput, reliability, and replication
characteristics, which makes it applicable for things like
tracking service calls (tracks every call) or tracking IoT sensor
data where a traditional MOM might not be considered.
What is Hadoop?
Using the solution provided by Google, Doug Cutting and his
team developed an Open Source frame work Project called
HADOOP.
4 September, 2007: release 0.14.1
31 May 2018: Release 3.0.3
Hadoop runs applications using the MapReduce algorithm,
where the data is processed in parallel with others. In short,
Hadoop is used to develop applications that could perform
complete statistical analysis on huge amounts of data.
Conti..
 Hadoop Distributed File System (HDFS)
 MapReduce
An Open Source framework that allows distributed processing of large data-sets across the
cluster of commodity hardware
An open source framework that allows distributed processing of large data-sets across
the Cluster of commodity hardware
• Every day we generate 2.3 trillion GBs of data
• Hadoop handles huge volumes of data efficiently
• Hadoop uses the power of distributed computing
• HDFS & Yarn are two main components of Hadoop
• It is highly fault tolerant, reliable & available
Hadoop Characteristics
Hadoop Components
Hadoop Nodes
Nodes

SLAVE NODES
MASTER NODES
Basic Hadoop Architecture
At its core, Hadoop has two major layers namely:
(a)Processing/Computation layer (Map Reduce)
(b)Storage layer (Hadoop Distributed File System).
How Does Hadoop Work?
 Data is initially divided into directories and files. Files
are divided into uniform sized blocks
 These files are then distributed across various cluster
nodes for further processing.
 HDFS, being on top of the local file system,
supervises the processing.
 Blocks are replicated for handling hardware failure.
 Checking that the code was executed successfully.
 Performing the sort that takes place between the map
and reduce stages.
Sending the sorted data to a certain computer.
 Writing the debugging logs for each job.
Advantages of Hadoop
 Hadoop framework allows the user to quickly write and test
distributed systems. It is efficient, and it automatic
distributes the data and work across the machines and in
turn, utilizes the underlying parallelism of the CPU cores.
 Hadoop does not rely on hardware to provide fault-tolerance
and high availability (FTHA), rather Hadoop library itself has been
designed to detect and handle failures at the application layer.
 Servers can be added or removed from the cluster
dynamically and Hadoop continues to operate without
interruption.
 Another big advantage of Hadoop is that apart from being open
source, it is compatible on all the platforms since it is Java based.
HADOOP ─ ENVIRONMENT SETUP
Prerequisites
VIRTUAL BOX: it is used for installing the operating system on it.
OPERATING SYSTEM: You can install Hadoop on Linux based
operating systems, Ubuntu and CentOS are very commonly used
JAVA: You need to install the Java 8 package on your system.
HADOOP: You require Hadoop 2.7.3 package.

Install Hadoop
Step 1: Download the Java 8 Package. Save this file in your home
directory.
Step 2: Extract the Java Tar File.
Command: tar -xvf jdk-8u101-linux-i586.tar.gz
Step 3: Download the Hadoop 2.7.3 Package.
Command: wget
https://archive.apache.org/dist/hadoop/core/hadoop-2.7.3/hadoop-2.7.3.t
ar.gz
Step 4: Extract the Hadoop tar File.
Command: tar -xvf hadoop-2.7.3.tar.gz
HADOOP ─ ENVIRONMENT SETUP
Step 5: Add the Hadoop and Java paths in the bash file (.bashrc).
Open. bashrc file. Now, add Hadoop and Java Path as shown below.
Command: vi .bashrc
Then, save the bash file and close it.
For applying all these changes to the current Terminal,
execute the source command.
Command: source .bashrc
To make sure that Java and Hadoop have been properly
installed on your system and can be accessed through the
Terminal, execute the java -version and hadoop version commands.
Command: java -version
Step 6: Edit the Hadoop Configuration files.
Command: cd hadoop-2.7.3/etc/hadoop/
Command: ls
All the Hadoop configuration files are located in hadoop-
3.0.3/etc/hadoop
HADOOP ─ ENVIRONMENT SETUP
Step 7: Open core-site.xml and edit the property mentioned below inside
configuration tag:
core-site.xml informs Hadoop daemon where NameNode runs in the
cluster. It contains configuration settings of Hadoop core such as I/O
settings that are common to HDFS & MapReduce.
Command: vi core-site.xml

Step 8: Edit hdfs-site.xml and edit the property mentioned below inside

configuration tag:
hdfs-site.xml contains configuration settings of HDFS daemons (i.e.
NameNode, DataNode, Secondary NameNode). It also includes the
replication factor and block size of HDFS.
Command: vi hdfs-site.xml
HADOOP ─ ENVIRONMENT SETUP
Step 9: Edit the mapred-site.xml file and edit the property mentioned
below inside configuration tag:
mapred-site.xml contains configuration settings of MapReduce
application like number of JVM that can run in parallel, the size of the
mapper and the reducer process, CPU cores available for a process,
etc.
In some cases, mapred-site.xml file is not available. So, we have to
create the mapred-site.xml file using mapred-site.xml template.
Command: cp mapred-site.xml.template mapred-site.xml
Command: vi mapred-site.xml.
Step 10: Edit yarn-site.xml and edit the property mentioned below inside
configuration tag:
yarn-site.xml contains configuration settings of ResourceManager and
NodeManager like application memory management size, the operation
needed on program & algorithm, etc.
Command: vi yarn-site.xml
Job Opportunities
•Big Data Manager
•Big Data Analyst
•Big Data Developer
•Big Data Consultants
•Big Data Architect
•Big Data Programmer
……..

Big Data Black Book
16% (25)
Big Data Black Book
2 pages
Big Data and Analytics by Seema Acharya and Subhashini Chellappan Copyright 2015, WILEY INDIA PVT. LTD. Introduction To Pig
67% (3)
Big Data and Analytics by Seema Acharya and Subhashini Chellappan Copyright 2015, WILEY INDIA PVT. LTD. Introduction To Pig
34 pages
CompTIA Data+ Practice Test
100% (2)
CompTIA Data+ Practice Test
149 pages
Data Analytics Notes
No ratings yet
Data Analytics Notes
1 page
Flipkart and Big Data
No ratings yet
Flipkart and Big Data
5 pages
Big Data PPT 55b0fc01e7543
No ratings yet
Big Data PPT 55b0fc01e7543
31 pages
UNIT-3 Hadoop and MapReduce Programming
100% (1)
UNIT-3 Hadoop and MapReduce Programming
84 pages
Introduction To Data Mining With Case Studies Author: G. K. Gupta Prentice Hall India, 2006
100% (2)
Introduction To Data Mining With Case Studies Author: G. K. Gupta Prentice Hall India, 2006
78 pages
Supervised Vs Unsupervised Learning
100% (1)
Supervised Vs Unsupervised Learning
7 pages
Big Data Analytics Unit 1 MCQ
90% (10)
Big Data Analytics Unit 1 MCQ
10 pages
Synopsis of ML Project
100% (1)
Synopsis of ML Project
6 pages
MC5502 Bigdata Unit 2 Notes
100% (2)
MC5502 Bigdata Unit 2 Notes
20 pages
Unit 1 - Machine Learning - WWW - Rgpvnotes.in
No ratings yet
Unit 1 - Machine Learning - WWW - Rgpvnotes.in
23 pages
Chapter+9+ HIVE
No ratings yet
Chapter+9+ HIVE
50 pages
Big Data Notes
No ratings yet
Big Data Notes
68 pages
CHAPTER - 1 - Introduction - 1
No ratings yet
CHAPTER - 1 - Introduction - 1
33 pages
Data Analytics
100% (3)
Data Analytics
11 pages
Unit 1-Big Data Analytics & Lifecycle
No ratings yet
Unit 1-Big Data Analytics & Lifecycle
130 pages
Big Data Analytics PPT-2 (Section-A)
No ratings yet
Big Data Analytics PPT-2 (Section-A)
10 pages
Chapter 6
100% (1)
Chapter 6
51 pages
CS8091-BIG DATA ANALYTICS UNIT V Notes
100% (4)
CS8091-BIG DATA ANALYTICS UNIT V Notes
31 pages
Big Data
No ratings yet
Big Data
30 pages
Data Warehousing & Mining: Unit - V
100% (2)
Data Warehousing & Mining: Unit - V
13 pages
Hive PPT
No ratings yet
Hive PPT
61 pages
Updated Unit-2
0% (1)
Updated Unit-2
55 pages
Big Data Unit 1 AKTU Notes
No ratings yet
Big Data Unit 1 AKTU Notes
87 pages
Big Data: Submitted By-Rajashree Rashmita Reg - No-1825209016 Mca 4 Sem
No ratings yet
Big Data: Submitted By-Rajashree Rashmita Reg - No-1825209016 Mca 4 Sem
27 pages
Knowledge Representation in Data Mining
No ratings yet
Knowledge Representation in Data Mining
22 pages
CS8091 Big Data Analytics MCQ
100% (2)
CS8091 Big Data Analytics MCQ
22 pages
Sentiment Analysis
100% (1)
Sentiment Analysis
19 pages
Data Analytics Unit-3 Notes
No ratings yet
Data Analytics Unit-3 Notes
21 pages
Chapter 5
No ratings yet
Chapter 5
45 pages
Cs2032 Data Warehousing and Data Mining Notes (Unit III) .PDF - Www.chennaiuniversity - Net.notes
No ratings yet
Cs2032 Data Warehousing and Data Mining Notes (Unit III) .PDF - Www.chennaiuniversity - Net.notes
54 pages
Chapter - 1 Introduction
No ratings yet
Chapter - 1 Introduction
22 pages
Difference Between ROLAP, MOLAP and HOLAP
No ratings yet
Difference Between ROLAP, MOLAP and HOLAP
3 pages
Big Data Analytics Notess
No ratings yet
Big Data Analytics Notess
69 pages
Data WareHouse Previous Year Question Paper
100% (1)
Data WareHouse Previous Year Question Paper
10 pages
BD - Unit - III - MapReduce
100% (1)
BD - Unit - III - MapReduce
31 pages
Unit 5 Notes
100% (3)
Unit 5 Notes
66 pages
01a Hadoop Spark 1spp
No ratings yet
01a Hadoop Spark 1spp
68 pages
BigData Nptel
No ratings yet
BigData Nptel
813 pages
r18 - Big Data Analytics - Cse (DS)
0% (1)
r18 - Big Data Analytics - Cse (DS)
1 page
Unit 5
100% (1)
Unit 5
109 pages
Big Data Analytics Unit-2
No ratings yet
Big Data Analytics Unit-2
11 pages
Business Analytics Anna University
No ratings yet
Business Analytics Anna University
40 pages
MCQ - Bda
33% (3)
MCQ - Bda
3 pages
Flipkart Reviews Sentiment Analysis
100% (1)
Flipkart Reviews Sentiment Analysis
22 pages
Final Year Project
No ratings yet
Final Year Project
41 pages
Big Data
No ratings yet
Big Data
21 pages
Super Important Questions For BDA
100% (1)
Super Important Questions For BDA
26 pages
Analytical Learning
No ratings yet
Analytical Learning
42 pages
Data Mining
No ratings yet
Data Mining
84 pages
DATA SCIENCE Internship
100% (1)
DATA SCIENCE Internship
16 pages
Question Bank - Big Data Analytics - Final1
100% (1)
Question Bank - Big Data Analytics - Final1
6 pages
DWDM Unit-5
No ratings yet
DWDM Unit-5
52 pages
Unit 7: Data Mining For Business Intelligence Applications: A) Balanced Scorecard
33% (3)
Unit 7: Data Mining For Business Intelligence Applications: A) Balanced Scorecard
11 pages
Unit 1 - Data Mining - WWW - Rgpvnotes.in PDF
100% (1)
Unit 1 - Data Mining - WWW - Rgpvnotes.in PDF
13 pages
UNIT-1 Introduction To Data Mining
No ratings yet
UNIT-1 Introduction To Data Mining
29 pages
BDA Unit-5
No ratings yet
BDA Unit-5
25 pages
Big Data - Midsem
No ratings yet
Big Data - Midsem
526 pages
Trackpad Ver. 2.0 Class 8
From Everand
Trackpad Ver. 2.0 Class 8
Nidhi Arora
No ratings yet
MODULE 1 - ST
No ratings yet
MODULE 1 - ST
13 pages
Agile Devops
No ratings yet
Agile Devops
22 pages
Agile Devops
No ratings yet
Agile Devops
22 pages
BCS101 - Focp PDF
No ratings yet
BCS101 - Focp PDF
125 pages
Advanced Forensic Research by Face Recognization Technology
No ratings yet
Advanced Forensic Research by Face Recognization Technology
21 pages
Course: Ict in Teaching and Learning: Module-1: Education in Information Age
No ratings yet
Course: Ict in Teaching and Learning: Module-1: Education in Information Age
6 pages
Algebra Math: Algebra Is One Among The Oldest Branches in The History of Mathematics Dealing With The Number
No ratings yet
Algebra Math: Algebra Is One Among The Oldest Branches in The History of Mathematics Dealing With The Number
6 pages
Ugc Net-2018 (Brochure) PDF
No ratings yet
Ugc Net-2018 (Brochure) PDF
76 pages
Best WPF Objective Questions and Answers
No ratings yet
Best WPF Objective Questions and Answers
8 pages
Compiler Design Notes
No ratings yet
Compiler Design Notes
196 pages
Control WPF
No ratings yet
Control WPF
741 pages
APA Quick Guide 6th PDF
No ratings yet
APA Quick Guide 6th PDF
2 pages
Department of Computer Science (PG) List of Softwares S.No. Year Softwares Used
No ratings yet
Department of Computer Science (PG) List of Softwares S.No. Year Softwares Used
1 page
The Basics of Wireless Sensor Networking and Application
No ratings yet
The Basics of Wireless Sensor Networking and Application
33 pages
Chapter 6 (A) : Synchronization
No ratings yet
Chapter 6 (A) : Synchronization
41 pages
Using 9.3 Functionality and Scripts: Calculating Transportation Network Slope and Travel Parameters
No ratings yet
Using 9.3 Functionality and Scripts: Calculating Transportation Network Slope and Travel Parameters
3 pages
Sqlite PHP: Installation
No ratings yet
Sqlite PHP: Installation
8 pages
Notes For DMML
No ratings yet
Notes For DMML
27 pages
Node 5
No ratings yet
Node 5
4 pages
Class12 Cs Set 1
No ratings yet
Class12 Cs Set 1
11 pages
BDA Experiment1
No ratings yet
BDA Experiment1
8 pages
File System Vs DBMS
100% (2)
File System Vs DBMS
13 pages
Tasks 1
No ratings yet
Tasks 1
31 pages
REPORT
No ratings yet
REPORT
5 pages
Viii - CH 2 - Lesson Plan
No ratings yet
Viii - CH 2 - Lesson Plan
8 pages
Session-10-Data Loading in Snowflake
No ratings yet
Session-10-Data Loading in Snowflake
5 pages
MySQL Guide
No ratings yet
MySQL Guide
8 pages
Inf1343 2011W Assignment 1
No ratings yet
Inf1343 2011W Assignment 1
4 pages
Datawarehouse PPT
No ratings yet
Datawarehouse PPT
39 pages
IT Elective I Advance Database System
No ratings yet
IT Elective I Advance Database System
70 pages
Syllabus Information Retrieval Techniques
No ratings yet
Syllabus Information Retrieval Techniques
2 pages
Bases de Données MySQL Triggers Corrigé
No ratings yet
Bases de Données MySQL Triggers Corrigé
9 pages
Database Management System SET 1 Lab Practicals
No ratings yet
Database Management System SET 1 Lab Practicals
7 pages
Electricity Billing System
No ratings yet
Electricity Billing System
14 pages
Eztrieve Presentation
No ratings yet
Eztrieve Presentation
60 pages
Sasmita Behera
No ratings yet
Sasmita Behera
2 pages
Praktikum Struktur Data: Source Code
No ratings yet
Praktikum Struktur Data: Source Code
4 pages
Information Retrieval: Adt-V Unit
No ratings yet
Information Retrieval: Adt-V Unit
106 pages
DA-100 Exam - Free Actual Q&as, Page 12 - ExamTopics
No ratings yet
DA-100 Exam - Free Actual Q&as, Page 12 - ExamTopics
5 pages
Algorithm
No ratings yet
Algorithm
5 pages
Learn Oracle 8i Compress
No ratings yet
Learn Oracle 8i Compress
495 pages
05 ERFlow
No ratings yet
05 ERFlow
66 pages
C++ DBMS Lab
No ratings yet
C++ DBMS Lab
10 pages

Big Data Analytics

Uploaded by

Big Data Analytics

Uploaded by

FACULTY

JAI PRATAP DIXIT

Technology moves too fast

…………………Dealing with unstructured and structured that

Deals with the combination of statistics, mathematics,

Data sets with similar values are said to have

3 LQ Labs, Theart Metrix, Elsevier

•Offers high redundancy and availability

Full Contact Inc, Lookout

•Allow multiple data management methods

Apache SAMOA is a big data analytics tool. It enables development of new ML

Step 8: Edit hdfs-site.xml and edit the property mentioned below inside

You might also like