0% found this document useful (0 votes)

16 views40 pages

Basic Concepts Data Mining (Lecture 02) - 1

Data mining

Uploaded by

Muhammad Hammad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views40 pages

Basic Concepts Data Mining (Lecture 02) - 1

Data mining

Uploaded by

Muhammad Hammad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 40

Introduction to Data

Mining

1
Introduction to Data Mining
Introduction
 Why Data Mining?
 What Is Data Mining?
 What Kind of Data Can Be Mined?
 What Kinds of Patterns Can Be Mined?
 What Technology Are Used?
 What Kind of Applications Are Targeted?
 Summary

2
Introduction to Data Mining
Why Data Mining?
 Data vs. Information:
 Data: recorded facts
 Information: patterns underlying the data
 The Explosive Growth of Data:
 Data collection and data availability

Automated data collection tools, database systems, Web
 Major sources of abundant data

Business: Web, e-commerce, transactions, stocks, …

Science: bioinformatics,

Society and everyone: news, digital cameras, YouTube

3
Introduction to Data Mining
Why Data Mining?

 We are drowning in data, but starving for knowledge!

 We are data rich, but information poor.

4
Introduction to Data Mining
What is Data Mining?
 “Necessity is the mother of invention”—Data mining—
Automated analysis of massive data sets.
 Data mining—searching for knowledge (interesting
patterns) in your data.

5
Introduction to Data Mining
Introduction
 Why Data Mining?
 What Is Data Mining?
 What Kind of Data Can Be Mined?
 What Kinds of Patterns Can Be Mined?
 What Technology Are Used?
 What Kind of Applications Are Targeted?
 Summary

6
Introduction to Data Mining
What is Data Mining?
 Data Mining(knowledge discovery from data)
 Refers to extracting or “mining” knowledge from large amounts of
data.
 Extraction of interesting (implicit, previously unknown and
potentially useful) patterns or knowledge from huge amount of
data.

 Alternative names
 Knowledge discovery (mining) in databases (KDD), knowledge
extraction, data/pattern analysis, data archeology, data dredging,
information harvesting, business intelligence, etc.

7
Introduction to Data Mining
Knowledge Discovery Process
 Data mining can be viewed as simply an essential step
in the process of knowledge discovery.

 This is a view from typical

database systems and data
warehousing communities
 Data mining plays an essential
role in the knowledge discovery
process

8
Introduction to Data Mining
Knowledge Discovery Process
 Knowledge Discovery Process
 Data cleaning (to remove noise and inconsistent data)
 Data integration (where multiple data sources may be combined
 Data selection (where data relevant to the analysis task are retrieved
from the database)
 Data transformation (where data are transformed and consolidated into
forms appropriate for mining by performing summary or aggregation
operations)
 Data mining (an essential process where intelligent methods are
applied to extract data patterns)
 Pattern evaluation (to identify the truly interesting patterns representing
knowledge based on interestingness measures
 Knowledge presentation (where visualization and knowledge
representation techniques are used to present mined knowledge to
users)
 Steps 1 through 4 are different forms of data preprocessing

9
Introduction to Data Mining
Evolution of Database Technology
 1960s:

Data collection, database creation, IMS and network DBMS
 1970s:

Relational data model, relational DBMS implementation
 1980s:

RDBMS, advanced data models (extended-relational, OOetc.)

Application-oriented DBMS (spatial, scientific, engineering, etc.)
 1990s:

Data mining, data warehousing, multimedia databases, and Web
databases
 2000s

Stream data management and mining

Data mining and its applications

Web technology (XML, data integration) and global information systems
Modern GIS applications include address matching, location analysis or
site selection and development of evacuation plans. weather forecasting,
environmental study, natural hazards study
10
Introduction to Data Mining
Introduction
 Why Data Mining?
 What Is Data Mining?
 What Kind of Data Can Be Mined?
 What Kinds of Patterns Can Be Mined?
 What Technology Are Used?
 What Kind of Applications Are Targeted?
 Summary

11
Introduction to Data Mining
What Kind of Data Can be Mined?
 Database-oriented data sets and applications
 Relational database, data warehouse, transactional database
 Advanced data sets and advanced applications
 Data streams and sensor data
 Time-series data, temporal data (varies over time), sequence data (incl.
bio-sequences (DNA sequence.))
 Structure data, graphs, social networks
 Heterogeneous databases and legacy databases
 Spatial data (geographic) and spatiotemporal data
 Multimedia database
 Text databases
 The World-Wide Web

12
Introduction to Data Mining
Database Data
 Database Data: Database management system
(DBMS), consists of a collection of interrelated data,
known as a database, and a set of software programs to
manage and access the data.
 The software programs provide mechanisms
 for defining database structures and data storage;
 for specifying and managing shared, or distributed data access;
 for ensuring consistency and security of the information stored
despite system crashes or attempts at unauthorized access.

13
Introduction to Data Mining
Database Data

 An example AllElectonics relational database

14
Introduction to Data Mining
Data Warehouse
 Data warehouse: A data warehouse is a repository of
information collected from multiple sources, stored under
a unified schema, and usually residing at a single site.
 Data in a data warehouse are organized around major
subjects (e.g., customer, item, supplier, and activity).
 The data are stored to provide information from a
historical perspective, such as in the past 6 to 12 months,
and

15
Introduction to Data Mining
Data Warehouse
 Data warehouses are constructed via a process of data
cleaning, data integration, data transformation, data
loading, and periodic data refreshing.

16
Introduction to Data Mining
Transactional Data
 Transactional Data: Each record in a transactional
database captures a transaction, such as a customer’s
purchase, a flight booking, or a user’s clicks on a web
page.
 A transaction typically includes a unique transaction identity
number (trans ID) and a list of the items making up the
transaction, such as the items purchased in the transaction.
 Transactions can be stored in a table, with one record
per transaction.
 Because most relational database systems do not support
nested relational structures, the transactional database is usually
either stored in a flat file

17
Introduction to Data Mining
Transactional Data

Fragment of a transactional database for sales at AllElectronics.

18
Introduction to Data Mining
Introduction
 Why Data Mining?
 What Is Data Mining?
 What Kind of Data Can Be Mined?
 What Kinds of Patterns Can Be Mined?
 What Technology Are Used?
 What Kind of Applications Are Targeted?
 Summary

19
Introduction to Data Mining
What Kind of Patterns Can Be Mined?
 Data mining functionalities.
 Characterization and discrimination
 Mining of frequent patterns, associations, and correlations
 Classification and regression
 Clustering analysis
 Outlier analysis
 Data mining functionalities are used to specify the kinds
of patterns to be found in data mining tasks.

20
Introduction to Data Mining
Concept/Class Description
 Characterization: summarization of the general
characteristics or features of a target class of data.
 The output of data characterization can be presented in
various forms.
 E.g., pie charts, bar charts, curves, multidimensional data cubes
etc.

 Example:
 A customer relationship manager at AllElectronics may order the
following data mining task: Summarize the characteristics of
customers who spend more than $5000 a year at AllElectronics.
The result is a general profile of these customers, such as that
they are 40 to 50 years old, employed, and have excellent credit
ratings.

21
Introduction to Data Mining
Concept/Class Description
 Discrimination: Comparison of the general features of
the target class data objects against the general features
of objects from one or multiple contrasting classes.
 The forms of output presentation are similar to those for
characteristic descriptions.
 Example:
 A customer relationship manager at AllElectronics may want to compare
two groups of customers—those who shop for computer products
regularly (e.g., more than twice a month) and those who rarely shop for
such products (e.g., less than three times a year). The resulting
description provides a general comparative profile of these customers,
such as that 80% of the customers who frequently purchase computer
products are between 20 and 40 years old and have a university
education, whereas 60% of the customers who infrequently buy such
products are either seniors or youths, and have no university degree.

22
Introduction to Data Mining
Frequent Patterns, Association and Correlation Analysis

 Mining Frequent Patterns: Frequent patterns are

patterns that occur frequently in data.
 The kinds of frequent patterns
 Frequent item sets patterns: refers to a set of items that
frequently appear together in a transactional data set, such as
milk and bread.
 Frequent sequential patterns: such as the pattern that
customers tend to purchase first a PC, followed by scanner, and
a printer , is a (frequent) sequential pattern.
 Mining frequent patterns leads to the discovery of
interesting associations and correlations within data.

23
Introduction to Data Mining
Frequent Patterns, Association and Correlation Analysis

 An example of association rule:

 where X is a variable representing a customer.

 This association rule involves a single attribute or
predicate (i.e., buys) that repeats, referred to as single-
dimensional

24
Introduction to Data Mining
Frequent Patterns, Association and Correlation Analysis

 We may find association rules like:

 This is an association between more than one attribute

(i.e., age, income, and buys).
 This is a multidimensional association rule.

25
Introduction to Data Mining
Classification
 Classification and label prediction
 Construct models (functions) based on some training examples
 Describe and distinguish classes or concepts for future
prediction

E.g., classify countries based on (climate), or classify cars
based on (gas mileage)
 Predict some unknown class labels
 Typical methods
 Decision trees, naïve Bayesian classification, support vector
machines, neural networks, rule-based classification, pattern-
based classification, logistic regression, …
 Typical applications:
 Credit card fraud detection, direct marketing, classifying stars,
diseases, web-pages, …

26
Introduction to Data Mining
Classification

A classification model can be represented in various forms: (a)

IF-THEN rules, (b) a decision tree, or (c) a neural network.

27
Introduction to Data Mining
Clustering
 Unsupervised learning (i.e., Class label is unknown)
 Group data to form new categories (i.e., clusters), e.g.,
cluster houses to find distribution patterns
 Principle: Maximizing intra-class similarity & minimizing
interclass similarity
 Many methods and applications

28
Introduction to Data Mining
Clustering

A 2-D plot of customer data with respect to customer locations

in a city, showing three data clusters.

29
Introduction to Data Mining
Clustering
 The output takes the form of a diagram that shows how
the instances fall into clusters.
 Different cases:
 Simple 2D representation: involves associating a cluster
number with each instance
 Venn diagram: allow one instance to belong to more than one
cluster
 Probabilistic assignment: associate instances with clusters
probabilistically
 Dendrogram: produces a hierarchical structure of clusters
(dendron is the Greek word for tree)

30
Introduction to Data Mining
31
Clustering
Introduction to Data Mining
32
Clustering
Introduction to Data Mining
Outlier Analysis
 Outlier analysis
 Outlier: A data object that does not comply with the general
behavior of the data
 Noise or exception? ―
 Methods: clustering or regression analysis, …

33
Introduction to Data Mining
Are All Patterns are Interesting?
 Data mining may generate thousands of patterns: Not all
of them are interesting
 What makes a pattern interesting?
 Easily understood by humans,
 Valid on new or test data
 Novel, Potentially useful
 Validates some hypothesis that a user seeks to confirm
 Objective vs. subjective interestingness measures
 Objective: based on statistics and structures of patterns, e.g.,
support, confidence, etc.
 Subjective: based on user’s belief in the data, e.g.,
unexpectedness, novelty etc.

34
Introduction to Data Mining
Introduction
 Why Data Mining?
 What Is Data Mining?
 What Kind of Data Can Be Mined?
 What Kinds of Patterns Can Be Mined?
 What Technology Are Used?
 What Kind of Applications Are Targeted?
 Summary

35
Introduction to Data Mining
What Technology Are Used?

36
Introduction to Data Mining
Introduction
 Why Data Mining?
 What Is Data Mining?
 What Kind of Data Can Be Mined?
 What Kinds of Patterns Can Be Mined?
 What Technology Are Used?
 What Kind of Applications Are Targeted?
 Major Issues in Data Mining
 A Brief History of Data Mining and Data Mining Society
 Summary

37
Introduction to Data Mining
What Kind of Applications Are Targeted?
 Web page analysis: from web page classification, clustering to
PageRank & HITS algorithms
 Recommender systems
 Basket data analysis
 Biological and medical data analysis: classification, cluster analysis
biological sequence analysis, biological network analysis

38
Introduction to Data Mining
Introduction
 Why Data Mining?
 What Is Data Mining?
 What Kind of Data Can Be Mined?
 What Kinds of Patterns Can Be Mined?
 What Technology Are Used?
 What Kind of Applications Are Targeted?
 Summary

39
Introduction to Data Mining
Summary
 Data mining: Discovering interesting patterns and
knowledge from massive amount of data
 A natural evolution of database technology, in great
demand, with wide applications
 A KDD process includes data cleaning, data integration,
data selection, transformation, data mining, pattern
evaluation, and knowledge presentation
 Mining can be performed in a variety of data
 Data mining functionalities: characterization, discrimination,
association, classification, clustering, outlier and trend
analysis, etc.

(Aberdeen Group) Spend Analysis - The Nexus of Spend Management by Constantine G. Limberakis, November 2011 PDF
No ratings yet
(Aberdeen Group) Spend Analysis - The Nexus of Spend Management by Constantine G. Limberakis, November 2011 PDF
22 pages
Data Mining: Concepts and Techniques
100% (2)
Data Mining: Concepts and Techniques
27 pages
Tax Analytics
No ratings yet
Tax Analytics
55 pages
Chapter 3 Thesis Data Analysis
100% (3)
Chapter 3 Thesis Data Analysis
5 pages
Lec Slides Combined Mid Quiz With Old Quizzes
No ratings yet
Lec Slides Combined Mid Quiz With Old Quizzes
378 pages
Genotype by Environment Interaction and Yield-Rea PDF
No ratings yet
Genotype by Environment Interaction and Yield-Rea PDF
11 pages
Bioavailability & Bioequivalence Studies
100% (2)
Bioavailability & Bioequivalence Studies
45 pages
Datamining 1
No ratings yet
Datamining 1
30 pages
UNIT 5 Introduction To Data Mining-1
No ratings yet
UNIT 5 Introduction To Data Mining-1
185 pages
Credit Card Data - Final Project Proposal - Victor
No ratings yet
Credit Card Data - Final Project Proposal - Victor
1 page
Data Mining and Datawarehousing CS-303
No ratings yet
Data Mining and Datawarehousing CS-303
34 pages
Slide 03 Chapter1 Introduction
No ratings yet
Slide 03 Chapter1 Introduction
36 pages
Data Mining: Concepts and Techniques
No ratings yet
Data Mining: Concepts and Techniques
27 pages
Siegle Reliability Calculator 2
No ratings yet
Siegle Reliability Calculator 2
397 pages
Unit 1 A
No ratings yet
Unit 1 A
39 pages
Concepts and Techniques: - Chapter 1
No ratings yet
Concepts and Techniques: - Chapter 1
37 pages
1 Intro
No ratings yet
1 Intro
50 pages
01 Intro
No ratings yet
01 Intro
40 pages
DM Chapter 1
No ratings yet
DM Chapter 1
37 pages
Fds Unit 4 FINSH
No ratings yet
Fds Unit 4 FINSH
37 pages
Chapter 1 - Tagged
No ratings yet
Chapter 1 - Tagged
46 pages
Data Analyts Resume
No ratings yet
Data Analyts Resume
2 pages
1 01intro, 2data (Except2 3), 3preprocessing
No ratings yet
1 01intro, 2data (Except2 3), 3preprocessing
169 pages
12 Bida - 630 - Final - Exam - Preparations PDF
No ratings yet
12 Bida - 630 - Final - Exam - Preparations PDF
7 pages
Chapter 1 Data Mining Lecture Note
No ratings yet
Chapter 1 Data Mining Lecture Note
31 pages
LECTURE 1 Data Mining
No ratings yet
LECTURE 1 Data Mining
41 pages
01 Intro
No ratings yet
01 Intro
41 pages
CH 1
No ratings yet
CH 1
66 pages
Module1 IntroToDataMining
No ratings yet
Module1 IntroToDataMining
36 pages
01 Intro 1
No ratings yet
01 Intro 1
33 pages
Intro Data Mining
No ratings yet
Intro Data Mining
51 pages
Module 11 (C)
No ratings yet
Module 11 (C)
4 pages
01 - Introduction To Datamining
No ratings yet
01 - Introduction To Datamining
19 pages
02-Introduction To Data Mining
No ratings yet
02-Introduction To Data Mining
40 pages
Data Analysis-2
No ratings yet
Data Analysis-2
41 pages
CBSRV 5 I 2 Art 8
No ratings yet
CBSRV 5 I 2 Art 8
11 pages
Economic Data Analysis (Finance Analyst)
No ratings yet
Economic Data Analysis (Finance Analyst)
38 pages
Association Rule Mining Presentation
No ratings yet
Association Rule Mining Presentation
11 pages
ST221 Notes
No ratings yet
ST221 Notes
9 pages
AN ANALYSIS OF THE STUDENTS SKILL IN WRITING ARGUMENTATIVE ESSAY - Complete
No ratings yet
AN ANALYSIS OF THE STUDENTS SKILL IN WRITING ARGUMENTATIVE ESSAY - Complete
7 pages
The Impact of Digital Transformation On Business Administration and Management Practices in Nigeria M
No ratings yet
The Impact of Digital Transformation On Business Administration and Management Practices in Nigeria M
49 pages
DWM Unit II
No ratings yet
DWM Unit II
76 pages
01 Intro
No ratings yet
01 Intro
26 pages
CLUP Rodriguez
100% (3)
CLUP Rodriguez
105 pages
Data Mining 1
No ratings yet
Data Mining 1
39 pages
Lecture 1
No ratings yet
Lecture 1
37 pages
Unit 3
No ratings yet
Unit 3
23 pages
1 IT326 - Ch1 - Introduction
No ratings yet
1 IT326 - Ch1 - Introduction
37 pages
Topic10 - Data Mining
No ratings yet
Topic10 - Data Mining
29 pages
By Jahanzaib Alvi Dated: November 19, 2017
No ratings yet
By Jahanzaib Alvi Dated: November 19, 2017
2 pages
1 Lect - 1.2 - 12 - August 2022 PDF
No ratings yet
1 Lect - 1.2 - 12 - August 2022 PDF
59 pages
Introduction
No ratings yet
Introduction
27 pages
Combine 056
No ratings yet
Combine 056
57 pages
Intro of Data Mining
No ratings yet
Intro of Data Mining
27 pages
DM Introduction
No ratings yet
DM Introduction
32 pages
Lecture 1. Introduction
No ratings yet
Lecture 1. Introduction
42 pages
ICS 2408 Lecture 1 Introduction
No ratings yet
ICS 2408 Lecture 1 Introduction
32 pages
Introduction To Data Mining 1604
No ratings yet
Introduction To Data Mining 1604
32 pages
01 Introduction
No ratings yet
01 Introduction
36 pages
DWDM 01 Introduction
No ratings yet
DWDM 01 Introduction
43 pages
Cse5243 Intro. To Data Mining: Chapter 1. Introduction
No ratings yet
Cse5243 Intro. To Data Mining: Chapter 1. Introduction
56 pages
Concepts and Techniques: - Chapter 1
No ratings yet
Concepts and Techniques: - Chapter 1
41 pages
Research Aim: Holt-Winter's Method For Multiplicative Seasonality
No ratings yet
Research Aim: Holt-Winter's Method For Multiplicative Seasonality
8 pages
Chapter - 1
No ratings yet
Chapter - 1
22 pages
Data Warehouse & Mining
No ratings yet
Data Warehouse & Mining
28 pages
UNIT 1 Introduction of Data Mining
No ratings yet
UNIT 1 Introduction of Data Mining
40 pages
Econometrics Chapter 4
No ratings yet
Econometrics Chapter 4
5 pages
Step-By-Step Guide To Execute Linear Regression in R
No ratings yet
Step-By-Step Guide To Execute Linear Regression in R
12 pages
The Importance of Data Mining in IT Industry
No ratings yet
The Importance of Data Mining in IT Industry
50 pages
IS414: Data Mining: DR - Waleed M.Ead
No ratings yet
IS414: Data Mining: DR - Waleed M.Ead
36 pages
Day-2 BE-VIII DMDW (Into. Contd..)
No ratings yet
Day-2 BE-VIII DMDW (Into. Contd..)
23 pages
Week 02 PDF
No ratings yet
Week 02 PDF
39 pages
MBA 5004 - Assessment Guide-3
No ratings yet
MBA 5004 - Assessment Guide-3
13 pages
COMP 312 Chapter 1
No ratings yet
COMP 312 Chapter 1
13 pages
Data Mining: Concepts and Techniques
No ratings yet
Data Mining: Concepts and Techniques
27 pages
0 Introduction
No ratings yet
0 Introduction
43 pages
Advanced Analytics For Business Analysts
No ratings yet
Advanced Analytics For Business Analysts
17 pages
Data Mining Concepts
No ratings yet
Data Mining Concepts
35 pages
Data Mining: Concepts and Techniques
No ratings yet
Data Mining: Concepts and Techniques
25 pages
Data Warehousing and Data Mining Dr.P.rizwan Ahmed
0% (1)
Data Warehousing and Data Mining Dr.P.rizwan Ahmed
20 pages
Chapter 1. Introduction
No ratings yet
Chapter 1. Introduction
323 pages
HATCO Documentation
No ratings yet
HATCO Documentation
2 pages
01 Intro
No ratings yet
01 Intro
23 pages
Quick and Dirty Regression Tutorial
No ratings yet
Quick and Dirty Regression Tutorial
6 pages
Standard Deviation
No ratings yet
Standard Deviation
9 pages
Applied Psychology in Human Resource Management: Wayne F. Cascio
No ratings yet
Applied Psychology in Human Resource Management: Wayne F. Cascio
15 pages
Research Paper
No ratings yet
Research Paper
18 pages
1 Intro
No ratings yet
1 Intro
33 pages
Data Mining for Beginners: A Programmer’s Guide
From Everand
Data Mining for Beginners: A Programmer’s Guide
Agasti Khatri
No ratings yet
The Power of Big Data: Transforming Industries and Shaping the Future
From Everand
The Power of Big Data: Transforming Industries and Shaping the Future
Tom Henricksen
No ratings yet