0% found this document useful (0 votes)

20 views20 pages

Introduction PDF

This document provides an introduction to big data analytics. It discusses the characteristics of big data including volume, velocity, variety, veracity and value. It describes the types of structured, semi-structured and unstructured data. The document also outlines some of the key challenges in big data analytics and sources where big data is generated, including social media, healthcare, transportation and more. It provides an overview of how businesses acquire big data analytics teams and the evolution of big data over the decades.

Uploaded by

Aryan Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views20 pages

Introduction PDF

Uploaded by

Aryan Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

INTRODUCTION TO BIG DATA

Big Data Analytics (CSPE-432)

By: Ananya Chakraborty

Department of Computer Science & Engineering, NIT Jalandhar
REFERENCE BOOKS
1. Frank J Ohlhorst, “Big Data Analytics: Turning Big Data into Big Money”, Wiley and SAS
Business Series, 2012.
2. Anand Rajaraman and Jeffrey David Ullman, Mining of Massive Datasets, Cambridge
University Press, 2012.
3. Bill Franks, “Taming the Big Data Tidal Wave: Finding Opportunities in Huge Data
Streams with Advanced Analytics”, Wiley and SAS Business Series, 2012.
4. Paul Zikopoulos, Chris Eaton, “Understanding Big Data: Analytics for Enterprise Class
Hadoop and Streaming Data”, McGraw Hill, 2011.
5. Paul Zikopoulos, Dirk deRoos, Krishnan Parasuraman, Thomas Deutsch , James Giles,
David Corrigan, “Harness the Power of Big data – The big data platform”, McGraw
Hill, 2012.

ANANYA CHAKRABORTY
BIG DATA ANALYTICS (CSPE-432)
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING, NIT JALANDHAR
2
BIG DATA
Big Data defines a situation in which data sets have grown to such enormous
sizes that traditional data management tools can no longer effectively handle
either the size of the data set or the scale and growth of the data set.
Big data has an intrinsic value that can be extrapolated using analytics,
algorithms, and other techniques.
Insights like drug testing, understand behavior of customers.
Need to handle and store big data because of the data being structured or
not structured.

ANANYA CHAKRABORTY
BIG DATA ANALYTICS (CSPE-432)
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING, NIT JALANDHAR
3
VALUES
Characteristics of Big data: Volume, Velocity, Variety, Veracity and value.
Variety: Structured data, semi structured data and unstructured data
Structured data: This is the data which is in an organized form (e.g., in rows
and columns) and can be easily used by a computer program. Relationships
exist between entities of data, such as classes their objects. Data stored in
traditional databases is an example of structured data. Can be organized
into a table.

ANANYA CHAKRABORTY
BIG DATA ANALYTICS (CSPE-432)
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING, NIT JALANDHAR
4
VALUES
Semi structured data: This is the data which does not conform to a data model but
has some structure. Metadata for this data is available but is not sufficient. Eg:
XML. It can not be stored into tables, but it has tags and other markers to
separate the elements.
Unstructured data: This data does not confirm to a data model. Unstructured data
are stored in non-relational databases. Eg: mail, tweets. Can not be stored in
tables.
Veracity: It refers to the quality or trustworthiness of the data, so that it does not
lead to errors or misinterpretation of the big data.
Value: The value that big data provides for an ecosystem is the requirement.
ANANYA CHAKRABORTY
BIG DATA ANALYTICS (CSPE-432)
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING, NIT JALANDHAR
5
BIG DATA ANALYSIS
IT’s collaboration with Better, faster decisions in
business users real time

Move code to data for

Working with high
greater speed and
volume datasets and
Big Data Analytics efficiency
storing them efficiently

Richer insights into

Time sensitive decisions customer and business
in near real time behaviors

ANANYA CHAKRABORTY
BIG DATA ANALYTICS (CSPE-432)
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING, NIT JALANDHAR
6
CHALLENGES IN BIG DATA ANALYTICS
Capturing and storing data: velocity and volume. Computational limitations.
Data quality: Inaccuracy, incomplete data and unstructured data.
Security and privacy
Knowledge gaps

ANANYA CHAKRABORTY
BIG DATA ANALYTICS (CSPE-432)
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING, NIT JALANDHAR
7
CASE FOR BIG DATA
How to tie big data analytics to a business process.
•Background of the project
•Options
•Scope and costs
•Risk analysis

ANANYA CHAKRABORTY
BIG DATA ANALYTICS (CSPE-432)
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING, NIT JALANDHAR
8
TEAM CHALLENGES
Step 1: Bringing talented workers together.
Step 2: Organizing the team (IT and BI groups)
Step 3: BDA teams to be in the department where their aims align.

ANANYA CHAKRABORTY
BIG DATA ANALYTICS (CSPE-432)
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING, NIT JALANDHAR
9
BIG DATA SOURCES
1. Transportation, retail, logistics and telecommunications
2. Healthcare
3. Government
4. Entertainment media
5. Life Sciences
6. Video surveillance
7. Social Media Data
8. Transactional Data
ANANYA CHAKRABORTY
BIG DATA ANALYTICS (CSPE-432)
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING, NIT JALANDHAR
10
ACQUISITION
Businesses move to include big data analytics teams when they realise the size of
data collected.
The start with, the IT teams will identify the problems that align with the business goals.
Then understand the tools which will be useful to gather data and carry analysis, eg
Hadoop.

ANANYA CHAKRABORTY
BIG DATA ANALYTICS (CSPE-432)
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING, NIT JALANDHAR
11
BIG DATA EVOLUTION
1940s-1989: Data warehousing and personal computer (mainframe)
1989-1999: World wide web (HTML, data explosion because of internet, RDBMS).
Structured and semi structured.
2000-2010s: Cloud Computing and social media data (launch of social media
platforms, entertainment sites hosted by cloud). Unstructured data. Led to the creation
of Hadoop, an open-source framework created specifically to manage big data sets,
and the adoption of NoSQL database queries, which made it possible to manage
unstructured data
2010s: Internet of things, fog computing, edge computing and mobile devices. New
types of data (sensor, social data, transactional data, health related data)

ANANYA CHAKRABORTY
BIG DATA ANALYTICS (CSPE-432)
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING, NIT JALANDHAR
12
BEST PRACTICES FOR BIG DATA ANALYSIS
1. Establish Big Data business objective
2. Start with small data
3. Data Governance
4. Infrastructure around goals
5. Maintenance plan
6. The value of anomalies
7. In-memory processing

ANANYA CHAKRABORTY
BIG DATA ANALYTICS (CSPE-432)
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING, NIT JALANDHAR
13
SECURITY, COMPLIANCE, AUDITING AND
PROTECTING
Problems of using Big Data Repository:
1. Access: To allow access to all, results in reduced security. But to enforce security, it is not
practical to restrict access to all. Hence, to allow access to select users, its important to
allow selected ones only.
2. Availability: To control where the data is stored and how is it distributed among the
various departments. Eg, sensitive data should be available to process only where it is
required.
3. Performance: Higher level encryption and additional security layers to improve security
layers but affect the performance.
4. Liability: Accessible data carry with them liability. Eg, sensitivity of the data, privacy
issues, etc.
Aim is to balance them all.
ANANYA CHAKRABORTY
BIG DATA ANALYTICS (CSPE-432)
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING, NIT JALANDHAR
14
SECURITY, COMPLIANCE, AUDITING AND
PROTECTING
Pragmatic steps to securing Big Data:
Get red of data that is no longer required. Otherwise it is a risk to store.
If legally required, data can be archived, stored but not in a system connected to
internet.
Classifying Data:
Data is easier to protect if it is classified or categorized. Easier to manage.
Eg, financial data, HR data, sales, inventory etc. Each data might have different
sensitivity and different security protocols.

ANANYA CHAKRABORTY
BIG DATA ANALYTICS (CSPE-432)
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING, NIT JALANDHAR
15
SECURITY, COMPLIANCE, AUDITING AND
PROTECTING
Protecting the Big Data:
1. Some data unique to the moment, eg, traffic, movement, weather, etc can be lost
and not possible to recreate them.
If the data is of no value (useless/redundant), its removal is called deduplication. This
is although good for storage but can corrupt encrypted data.
How to backup different sized data files (Oracle, NoSQL, Hadoop)
Big data and compliance

ANANYA CHAKRABORTY
BIG DATA ANALYTICS (CSPE-432)
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING, NIT JALANDHAR
16
BIG DATA ARCHITECTURE
A big data architecture is designed to handle the ingestion, processing, and analysis
of data that is too large or complex for traditional database systems.
Big data solutions typically involve one or more of the following types of workload:
•Batch processing of big data sources at rest.
•Real-time processing of big data in motion.
•Interactive exploration of big data.
•Predictive analytics and machine learning.

ANANYA CHAKRABORTY
BIG DATA ANALYTICS (CSPE-432)
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING, NIT JALANDHAR
17
BIG DATA ARCHITECTURE

ANANYA CHAKRABORTY
BIG DATA ANALYTICS (CSPE-432)
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING, NIT JALANDHAR
18
BIG DATA ARCHITECTURE
Types of big data architecture:
Lambda architecture and Kappa architecture.
Features Lambda Kappa
Processing Pipeline Separate layers Single stream
Data storage Batch store and speed Append only log
store
Consistency Potential inconsistencies Consistent view
Complexity More Less
Cost High Low
Historical advantage Strong Limited
ANANYA CHAKRABORTY
BIG DATA ANALYTICS (CSPE-432)
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING, NIT JALANDHAR
19
DATA ANALYTICS
Computer Cluster: Collection of resources of multiple machines that work together.
Batch Processing
Real-time processing
Distributed computing: increased speed, power, efficiency
Parallel computing: (shared memory)

Big Data Analytics Compiled Notes
No ratings yet
Big Data Analytics Compiled Notes
130 pages
Big Data All Unit by Study4sub
No ratings yet
Big Data All Unit by Study4sub
161 pages
Big Data Presentation Slide
100% (1)
Big Data Presentation Slide
30 pages
Big Data: Submitted By-Rajashree Rashmita Reg - No-1825209016 Mca 4 Sem
No ratings yet
Big Data: Submitted By-Rajashree Rashmita Reg - No-1825209016 Mca 4 Sem
27 pages
Assignment#8 SQL
No ratings yet
Assignment#8 SQL
7 pages
Mca Big Data PDF Sem 3
No ratings yet
Mca Big Data PDF Sem 3
193 pages
Big Data Seminar
100% (2)
Big Data Seminar
27 pages
Big Data Analysis Seminar
100% (1)
Big Data Analysis Seminar
15 pages
Intervención Emocional en Cuidados Paliativos, Modelos y Protocolos Primera Parte
100% (4)
Intervención Emocional en Cuidados Paliativos, Modelos y Protocolos Primera Parte
60 pages
Sesión 3 - Big Data
No ratings yet
Sesión 3 - Big Data
34 pages
Big Data Analysis by Deshbandhu
No ratings yet
Big Data Analysis by Deshbandhu
368 pages
Lincoln International College
No ratings yet
Lincoln International College
17 pages
PESianity PES2015 Option File Installation Manual For PS3
100% (1)
PESianity PES2015 Option File Installation Manual For PS3
97 pages
Big Data Analysis
No ratings yet
Big Data Analysis
3 pages
SQL Antipatterns
100% (1)
SQL Antipatterns
250 pages
Unit 1 Big Data 7th Aids
No ratings yet
Unit 1 Big Data 7th Aids
23 pages
BDA Module1
No ratings yet
BDA Module1
38 pages
Anand J. Kulkarn
No ratings yet
Anand J. Kulkarn
4 pages
Troduction To Ig Ata: Big Data Analytics
No ratings yet
Troduction To Ig Ata: Big Data Analytics
52 pages
According To The New York Daily News:, Trump Said He'd Promised His Daughter
No ratings yet
According To The New York Daily News:, Trump Said He'd Promised His Daughter
7 pages
02-Ionic equilibrium-Ques.-Final-E
No ratings yet
02-Ionic equilibrium-Ques.-Final-E
28 pages
Bda (Unit 1)
No ratings yet
Bda (Unit 1)
24 pages
Big Data Technologies
No ratings yet
Big Data Technologies
9 pages
The Data Science Guide
100% (1)
The Data Science Guide
92 pages
1 Introduction To Big Data Management and Processing
No ratings yet
1 Introduction To Big Data Management and Processing
42 pages
Sai Ete 2.0
No ratings yet
Sai Ete 2.0
10 pages
BDA Unit 1
No ratings yet
BDA Unit 1
36 pages
MS SQL
No ratings yet
MS SQL
95 pages
Bda U1
No ratings yet
Bda U1
80 pages
Big Data
No ratings yet
Big Data
24 pages
BDA 01 - Introduction
No ratings yet
BDA 01 - Introduction
43 pages
Orthographic Projection
No ratings yet
Orthographic Projection
3 pages
BDCC Unit 1
No ratings yet
BDCC Unit 1
165 pages
BDA-1st Unit
No ratings yet
BDA-1st Unit
39 pages
Big Data MINING AND TOOLS
No ratings yet
Big Data MINING AND TOOLS
44 pages
Big Data With Cloud Computing Discussions and Challenges
No ratings yet
Big Data With Cloud Computing Discussions and Challenges
9 pages
13 Science Prefix Suffix
No ratings yet
13 Science Prefix Suffix
1 page
SAP Kernel Upgrade
100% (1)
SAP Kernel Upgrade
4 pages
Isometric Projection: Sheet No. 9
No ratings yet
Isometric Projection: Sheet No. 9
2 pages
Introduction To Big Data Computing
No ratings yet
Introduction To Big Data Computing
25 pages
L8 Big Data Management en
No ratings yet
L8 Big Data Management en
58 pages
Big Data Analytics
No ratings yet
Big Data Analytics
45 pages
UsingCsla4 03 DataAccess
No ratings yet
UsingCsla4 03 DataAccess
216 pages
BDA Notes
No ratings yet
BDA Notes
68 pages
Big Data
No ratings yet
Big Data
31 pages
CS8091 LN
No ratings yet
CS8091 LN
68 pages
Bda Unit I LM
No ratings yet
Bda Unit I LM
14 pages
BIG Data Analytics 21CSH-471: Computer Science & Engineering
No ratings yet
BIG Data Analytics 21CSH-471: Computer Science & Engineering
16 pages
Big Data Presentation
No ratings yet
Big Data Presentation
24 pages
Wreetika Adhikary Pranaya Mishra Yashwanth Bhanu Trisha George Mehal Batavia Ashray Pandey
No ratings yet
Wreetika Adhikary Pranaya Mishra Yashwanth Bhanu Trisha George Mehal Batavia Ashray Pandey
7 pages
Big Data
No ratings yet
Big Data
30 pages
Seminar On: Big Data
No ratings yet
Seminar On: Big Data
23 pages
The Blood Never Dried
0% (1)
The Blood Never Dried
6 pages
Big Data PPT 55b0fc01e7543
No ratings yet
Big Data PPT 55b0fc01e7543
31 pages
BIGDATAUNIT1 AKTUpdf
No ratings yet
BIGDATAUNIT1 AKTUpdf
33 pages
Lab 2
No ratings yet
Lab 2
8 pages
Big Data (1) (Autosaved)
No ratings yet
Big Data (1) (Autosaved)
13 pages
Hamid Seminar
No ratings yet
Hamid Seminar
57 pages
Business Intelligence & Big Data Analytics-CSE3124Y
No ratings yet
Business Intelligence & Big Data Analytics-CSE3124Y
25 pages
Unit 1 - BDS - DS307
No ratings yet
Unit 1 - BDS - DS307
47 pages
Fundamentals of Big Data Analytics
No ratings yet
Fundamentals of Big Data Analytics
151 pages
Chapter 1: Introduction: Database System Concepts, 5th Ed
No ratings yet
Chapter 1: Introduction: Database System Concepts, 5th Ed
31 pages
SQL Server EXISTS Operator Overview
No ratings yet
SQL Server EXISTS Operator Overview
5 pages
DBMS Unit1
No ratings yet
DBMS Unit1
30 pages
Physics IA
No ratings yet
Physics IA
6 pages
Big Data Analtics (Unit 1)
No ratings yet
Big Data Analtics (Unit 1)
31 pages
Banking System Abstract
100% (3)
Banking System Abstract
14 pages
Y2 - MD05 - MRP List - Material
No ratings yet
Y2 - MD05 - MRP List - Material
4 pages
Introduction To Bda
No ratings yet
Introduction To Bda
67 pages
Unix Directory Structure
No ratings yet
Unix Directory Structure
6 pages
Big Data
No ratings yet
Big Data
43 pages
Big Data Security Issues
No ratings yet
Big Data Security Issues
7 pages
05-Big Data
No ratings yet
05-Big Data
29 pages
Ode To The Martyrs: Karl Liebknecht
No ratings yet
Ode To The Martyrs: Karl Liebknecht
8 pages
Thulium Rocks
No ratings yet
Thulium Rocks
5 pages
According To The New York Daily News:, Trump Said He'd Promised His Daughter
No ratings yet
According To The New York Daily News:, Trump Said He'd Promised His Daughter
7 pages
Big Data Analytics
No ratings yet
Big Data Analytics
14 pages
Sam Carter'S Tikzducks: 1 Created With Thanks To
No ratings yet
Sam Carter'S Tikzducks: 1 Created With Thanks To
13 pages
History
No ratings yet
History
55 pages
UNIT-1:Overview of Big Data
No ratings yet
UNIT-1:Overview of Big Data
10 pages
Sagar of Love
No ratings yet
Sagar of Love
4 pages
Notes Operations Research
No ratings yet
Notes Operations Research
26 pages
Seminar Report Alisha
No ratings yet
Seminar Report Alisha
22 pages
Bda U1
No ratings yet
Bda U1
78 pages
1.SERIES WITH POSITIVE MEMBERS - Criteria
No ratings yet
1.SERIES WITH POSITIVE MEMBERS - Criteria
2 pages
Title - Concept of Big Data: Presented by - Divyanshu Upadhyay Naman Gupta Adarsh Pandey Pankaj Chaudhary Shivbrat Singh
No ratings yet
Title - Concept of Big Data: Presented by - Divyanshu Upadhyay Naman Gupta Adarsh Pandey Pankaj Chaudhary Shivbrat Singh
17 pages
Aloha and Media Access Techniques
No ratings yet
Aloha and Media Access Techniques
54 pages
Program No.:-4: 1) System Privileges
No ratings yet
Program No.:-4: 1) System Privileges
18 pages
Docker Essentials TR 1669573518
No ratings yet
Docker Essentials TR 1669573518
70 pages
Afghanistan Times: Boy Dies of Asphyxiation, Refugee Struggle in Afghanistan Continues
No ratings yet
Afghanistan Times: Boy Dies of Asphyxiation, Refugee Struggle in Afghanistan Continues
3 pages
Neso Note (5 DBMS Roles Including)
No ratings yet
Neso Note (5 DBMS Roles Including)
14 pages
Chapter 1
No ratings yet
Chapter 1
21 pages
Big Data Storage Concepts
No ratings yet
Big Data Storage Concepts
31 pages
HBase (Unit 4)
No ratings yet
HBase (Unit 4)
37 pages
Short Course On Indian Communism
No ratings yet
Short Course On Indian Communism
2 pages
PB Technical Questions
No ratings yet
PB Technical Questions
11 pages
Veterinay Record Management
No ratings yet
Veterinay Record Management
19 pages
Experiment-3: Queries Retrieve and Change Data Create, Insert, Update, Delete Command
No ratings yet
Experiment-3: Queries Retrieve and Change Data Create, Insert, Update, Delete Command
3 pages
Buffer Cache Advisory
No ratings yet
Buffer Cache Advisory
8 pages
Hamza Nazir (2872) DB LAB 11-Bscs EVE
No ratings yet
Hamza Nazir (2872) DB LAB 11-Bscs EVE
9 pages
Log
No ratings yet
Log
33 pages
Ef Cient Provenance Management Via Clustering and Hybrid Storage in Big Data Environments
No ratings yet
Ef Cient Provenance Management Via Clustering and Hybrid Storage in Big Data Environments
12 pages
Asm501 Individual Project Report Guideline
No ratings yet
Asm501 Individual Project Report Guideline
2 pages
Query Processing
No ratings yet
Query Processing
3 pages
CD Linux Command
No ratings yet
CD Linux Command
13 pages
Apr May 23 DMW
No ratings yet
Apr May 23 DMW
2 pages
IOT - PLAN of ACTION
100% (1)
IOT - PLAN of ACTION
3 pages
Operating Systems
No ratings yet
Operating Systems
3 pages
k8s SVC Types
No ratings yet
k8s SVC Types
8 pages
Ankit's Resume
No ratings yet
Ankit's Resume
1 page
Pandas Essentials for Data Analysis: Definitive Reference for Developers and Engineers
From Everand
Pandas Essentials for Data Analysis: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Advanced Resilient Distributed Datasets in Distributed Computing: Definitive Reference for Developers and Engineers
From Everand
Advanced Resilient Distributed Datasets in Distributed Computing: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet

Introduction PDF

Uploaded by

Introduction PDF

Uploaded by

INTRODUCTION TO BIG DATA

Big Data Analytics (CSPE-432)

By: Ananya Chakraborty

Move code to data for

Richer insights into

You might also like