Big Data & Data Mining

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 27

BIG DATA &

DATA MINING
WHAT IS BIG DATA ?

BIG DATA is - relative term- as data today are big by


reference to the past.

also known as Predictive Analytics


SOURCES OF BIG DATA
HOW BIG IS BIG DATA?
2.9 million emails sent every second
30 hours video uploaded to YouTube every minute
24 petabytes data processed by google every day
Isn't BIG DATA' just another way of saying
ANALYTICS'?

NUMBER OF DIFFERENT
DATA SETS DYPES OF DATA
AND FILES BEING GENERATE

HIGH SPEED
THE FACTS THAT
DATA
DATA BEING
GENERATION
GENERATED
IT CONTRIBUTION : DATA COLLECTION
IT CONTRIBUTION : DATA COLLECTION
IT CONTRIBUTION : DATA COLLECTION
IT CONTRIBUTION :

BIG DATA COLLECTION


BIG DATA COLLECTION
Internet of things (IoT) devices
mobile devices
aerial (remote sensing)
software logs
Cameras
Microphones
radio-frequency identification (RFID) readers
wireless sensor networks
IT CONTRIBUTION : DATA STORAGE
IT CONTRIBUTION : DATA STORAGE
BIG DATA STORAGE
DATA MINING
THE PROCESS OF DISCOVERING ANOMALIES, PATTERNS AND
CORRELATIONS WITHIN THE LARGE RAW DATA SETS TO PREDICT
OUTCOMES IN THE VARIOUS SPHERES OF HUMAN ACTIVITY
DATA MINING

Data mining allows you to:

Sift through all the chaotic and


repetitive noise in your data

Understand what is relevant


and then make good use of
that information

Accelerate the pace of making


informed decisions
Isnt DATA MINING just another way of
saying STATISTICS?
Statistics Data mining
Macro- dicisioning Micro- dicisioning

Explain or describe population Predict values of new records in


relationships individual level

Small sample and few variables Large sample, many variables

Models/algorithms with high


Find good fitting statistical model
predictive power

Confidence intervals, hypothesis Predictive power metrics and


test, P-value costs

(Shmueli, Bruce, & Patel, 2016)


DATA MINING
Refers to the business analytics methods that go beyond
counts, descriptive techniques, reporting, and methods based in
business rules.

Data mining includes: statistical machine learning methods

Data mining stands at the confluence of the fields of


statistics and machine learning, how is this confluence
known?
DATA MINING PROCESS

6) Apply
1) Define Methods and 7) Evaluate
Purpose Select Final Performance
Model

5) Choose Data
2) Obtain Data 8) Deploy
Mining Methods

4) Determine
3) Explore and
Data Mining
Clean data (Shmueli, Bruce, & Patel, 2016)
Task
Going from Step 1 to 4
Classification / Prediction / Clustering /
Association Rules and Recommendation
Systems

Convert the question made in step 1 into


a more specific data mining question
DATA MINING TOOLS & SOFTWARE
Data mining represents a variety of methods or techniques

Descriptive Modeling
Predictive Modeling
Prescriptive Modeling
DATA MINING TOOLS

Analysis Tools Data Visualization


ANALYSIS TOOLS
DATA VISUALIZATION
Benefits
PROVIDES NEW KNOWLEDGE FROM EXISTING DATA:
- Public databases
- Government sources
- Company databases
OLD DATA CAN BE USED TO DEVELOP NEW KNOWLEDGE
NEW KNOWLEDGE CAN BE USED TO IMPROVE SERVICES OR
PRODUCTS
IMPROVEMENT LEAD TO:
- BIGGER PROFITS
- MORE EFFICIENT SERVICES
Data Mining Challenges

Data Security and Privacy


(Weapons of Mass Destruction (WMD) on Dark Web)
Mining Complex Knowledge from Complex Data
Distributed Data Mining Algorithms
Lack of diffusion of data mining techniques
Time series and process data
Big Data Risk 27

Security and
Protection
Big Data
Big Cost

Trust or not ?
Bad Data

You might also like