Introduction To Data Mining Unit 1
Introduction To Data Mining Unit 1
TODAY’S AGENDA
Course management
Brief overview of Data Mining and allied fields
Summary of a few impactful articles and recent trends
1
9/22/2020
COURSE MANAGEMENT
LEARNING OBJECTIVES
Learn the art of modeling and interpreting large complicated data sets
via predictive and descriptive data mining methods.
Get to know several online data repositories and how to participate in
data analytics competitions held at Kaggle.com and other sites
Have advanced level expertise in data analytics software and languages
such as KNIME and Python.
2
9/22/2020
COURSE OVERVIEW
Data Preparation
Classification Techniques
Clustering
Text Analytics
Regression Analysis
Principal Component Analysis
Association Rule Mining
KNIME
Python
Data on Kaggle Website
http://www.kaggle.com/
3
9/22/2020
BOOKS
ACKNOWLEDGEMENT
Although I am not extensively following the two books below but their
slides are still very popular in the academia and would be using them
occasionally:
Data Mining: Concepts and Techniques (2011)
Introduction to Data Mining (2018)
4
9/22/2020
Final 40
Project 15
Assignments + Quizzes 45
MEETING HOURS
Office Hours:
Monday/Wednesday: noon – 1 PM and 4 – 5 PM
or by appointment (by e-mailing me at [email protected]).
5
9/22/2020
Traffic Predictions
Google Maps
Online Transportation Networks
Uber/Careem for price prediction
Video Surveillence
Crime detection
Fraud Detection
Financial institutions
SPRING 2020 Sajjad Haider 12
6
9/22/2020
MACHINE LEARNING
7
9/22/2020
A SIMPLIFIED TAXONOMY
Data Science > Data Analytics > Data Mining > Machine Learning
Data Analytics also deals with Visualization
Data Science also deals with data acquisition and management of data
Beside machine learning, data mining also makes use of statistical models
DATA MINING
8
9/22/2020
1. Statistical Models
2. Machine learning
9
9/22/2020
Data scientists are the people who understand how to fish out answers
to important business questions from today’s tsunami of unstructured
information.
As companies rush to capitalize on the potential of big data, the largest
constraint many face is the scarcity of this special talent.
10
9/22/2020
Big Data referes to datasets whose size is beyond the ability of typical
database software tools to capture, store, manage and analyze.
The demand for deep analytical positions in a big world could exceed the
supply being produced on current trends by 140K to 190K positions.
A need for 1.5 million additional managers and analysts in the US
who can ask the right questions and consume the results of the analysis
of big data effectively.
11
9/22/2020
“Big data refers to data sets whose size is beyond the ability of
typical database software tools to capture, store, manage and analyze.”
- The McKinsey Global Institute, 2011
3 V’S
12
9/22/2020
13