0% found this document useful (0 votes)
11 views

BDA - Introduction

Uploaded by

punjabuni0703
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

BDA - Introduction

Uploaded by

punjabuni0703
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

Big Data Analytics

ES – 332
HELLO!
I am Babar Yaqoob
Khan
You can find me at:
[email protected]
1. BIG DATA ANALYTICS
AN INTRODUCTION
“TORTURE THE DATA, AND IT WILL
CONFESS TO ANYTHING.”
- RONALD COASE, ECONOMICS. NOBEL PRIZE LAURATE
WHAT IS IN IT FOR YOU?

» WHAT IS BIG DATA ANLYTICS

» MOTIVATION OF DATA ANALYTICS

» DATA ANALYTICS IS JUST A HYPE


OR AN IMPERATIVE
WHAT IS BIG DATA
ANALYTICS
Big data analytics refers to the methods, tools, and applications
used to collect, process, and derive insights from varied, high-
volume, high-velocity data sets.
These data sets may come from a variety of sources, such as
web, mobile, email, social media, and networked smart devices.
What is Data Analytics
Data driven decisions
Data driven products

Exploratory
Data Pre- Data Insights/
Data Collection Data
Processing Processing Policy
Analysis
Decision
O’Reilly
Data Assignment # 1
Science Reading & Viva Voce
Document
MOTIVATION OF BIG
DATA ANALYTICS
Why all the excitement..!!
Data Makes Everything Clearer
Seven Countries Study (Ancel Keys)
13,000 subjects total, 5-40 years
follow-up.
http://www.sevencountriesstudy.com/about-
thestudy/investigators/ancel-keys/
Data Science: Whyall the Excitement?
15
Google Flu Trends:

Detecting outbreaks two


weeks ahead of CDC
(Centers for Disease Control
and Prevention) data

New models are


estimating which cities
are most at risk for
spread of the Ebola virus.
Nate Silver
Predicton: 349 to 189, 6.1% difference

Actual: 365 to 173, 7.2% difference

https://fivethirtyeight.com
“Nate Silver won the election”
– Harvard Business Review

Five thirty eight blogger


predicted the outcome
of 50 states, Assuming
Barack Obama’s Florida
victory is confirmed.

https://fivethirtyeight.com
https://www.bbc.com/news/election/us2020/results

https://fivethirtyeight.com
❑ The US – Based Internet Entertainment Service
❑ DVD rental-‐by mail company

❑ 278 ml users approx.


❑ $29.70 bn – 2021 revenue
❑ 1 bn+ hours/week – 2017
❑ $130 bn – company value
❑ $5.116 bn – 2021 Annual Income
http:/ / www.netflixprize.com
❑ 190 – countries reach
The
unreasonable
effectiveness 2012 Imagenet challenge:
Classify 1 million images into 1000 classes.
of Deep
Learning
(CNNs)
DATA ANLYTICS IS JUST A
HYPE OR AN IMPERATIVE
Imagine

Do you know How much Data we have right now in the world?
The Rise of Big Data

http://www.datacenterjournal.com/birth-death-big-data/blancco-fig-1-quantity-of-global-digital-data
Internet Minute

Data Sources 2020


Internet Minute

Data Sources 2021


Name Searching Results

111,000
The Rise of Big Data
“Between the dawn of civilization and 2003, we only created five
exabytes of information; now we’re creating that amount every
two days” Eric Schmidt, Google

23.5m
Youtube selfie videos results
844,000,000
Whoa! That’s a big number.
PageRank: The Web as a Behavioural Dataset
Google has 30 server farms
2.5 million machines (est)
Microsoft Chicago -The Million Server Data Center
Doing Big Data Analytics
A. The views of three Data Science/Analytics Experts

1. Jim Gray (Turing Award winning database researcher)


2. Jeff Hammerbacher (Former Facebook Chief
Scientist, Cloudera co-founder)
3. Ben Fry (Data visualization expert)

B. Cloud Computing: Data Science/Analytics Enabler


Jeff Hammerbacher’s Model
Data Science/Analytics Process

1. Identify problem

2. Instrument data sources

3. Collect data

4. Prepare data (integrate, transform, filter, aggregate, clean)

5. Build model
Facebook, Cloudera
6. Evaluate model

7. Communicate results
Ben Fry’s Model
Data Science/Analytics Process – Emphasized Visualization

1. Acquire
2. Parse
3. Filter
4. Mine
5. Represent
6. Refine
7. Interact Data visualization Expert

https://www.dashingd3js.com/the-data-visualization-process
Data Science/Analytics Process
Exploratory
Data
Analysis

Instrument Data
Business Raw Data Clean
Data Pre-
Problem Collection Dataset
Sources Processing

Data
Decision Support
Processing
Business Intelligence
Recommender Systems
Business Forecasting (Prediction)
Visualization/ Make
Data Decisions
Communicate
Product
Results
Reality
Key Data Science /Analytics Enabler: Cloud Computing

❑ Cloud computing reduces computing operating costs

❑ Cloud computing enables data science on massive


numbers of inexpensive computers
Applications of Data Analytics
Applications of Data Analytics
Recommended Books
THANKS!
Questions?
Comments!
Suggestions!!

You can find me at


[email protected]»

You might also like