Module-2-Data Mining
Module-2-Data Mining
(CSA4003)
2/28/2022
Module-2 Data Mining
2/28/2022
2/28/2022
Motivation: Why data mining?
2/28/2022
Evolution of Database Technology
2/28/2022
Evolution of Database Technology
1970s - early 1980s:
Data Base Management Systems
Hierarchical and network database systems
Relational database Systems
Query languages: SQL
Transactions, concurrency control and recovery.
On-line transaction processing (OLTP)
2/28/2022
Evolution of Database Technology
2/28/2022
Evolution of Database Technology
Late 1980s-present
Advanced Data Analysis
Data warehouse and OLAP
Data mining and knowledge discovery
Advanced data mining applications
Data mining and society
1990s-present:
XML-based database systems
Integration with information retrieval
Data and information integration
2/28/2022
Evolution of Database Technology
Present – future:
New generation of integrated data and information
system.
2/28/2022
What Is Data Mining?
2/28/2022
What Is Data Mining?
2/28/2022
Data Mining: A KDD Process
Pattern Evaluation
– Data mining: the core of
knowledge discovery process.
Data Mining
Task-relevant Data
Data Cleaning
Data Integration
Databases 2/28/2022
Steps of a KDD Process
1. Data cleaning
2. Data integration
3. Data selection
4. Data transformation
5. Data mining
6. Pattern evaluation
7. Knowledge presentaion
2/28/2022
Steps of a KDD Process
Learning the application domain:
relevant prior knowledge and goals of application
Creating a target data set: data selection
Data cleaning and preprocessing
Data reduction and transformation:
Find useful features, dimensionality/variable reduction,
invariant representation.
2/28/2022
Steps of a KDD Process
2/28/2022
Architecture of a Typical Data
Mining System
Graphical user interface
Pattern evaluation
Data
Databases Warehouse
2/28/2022
Data Mining and Business Intelligence
Increasing potential
to support
business decisions End User
Making
Decisions
Data Exploration
Statistical Analysis, Querying and Reporting
2/28/2022
Answer : On any kind of data.
2/28/2022
2/28/2022
Data Mining: On What Kind of Data?
Relational databases
Data warehouses
Transactional databases
2/28/2022
Data Mining: On What Kind of Data?
2/28/2022
Data Mining Functionalities -What
kind of patterns can be mined?
2/28/2022
Data mining tasks generally classified into two categories.
2/28/2022
Data Mining Functionalities
Concept description: Characterization and
discrimination
Data can be associated with classes or concepts
Ex. All Electronics store classes of items for sale include
computer and printers.
Description of class or concept is called class/concept
description.
Data characterization : summarization of general features
of target class of data.
Data discrimination : comparison of target class with one
or more contrasting classes.
2/28/2022
Data Mining Functionalities
Association Analysis
Multi-dimensional vs. single-dimensional association
age(X, “20..29”) ^ income(X, “20..29K”) => buys(X,
“PC”) [support = 2%, confidence = 60%]
contains(T, “computer”) => contains(x, “software”)
[support=1%, confidence=75%]
2/28/2022
Data Mining Functionalities
Classification and Prediction
Finding models (functions) that describe and distinguish data
classes or concepts for predict the class whose label is
unknown
E.g., classify countries based on climate, or classify cars based
on gas mileage
Models: decision-tree, classification rules (if-then), neural
network
2/28/2022
Data Mining Functionalities
Cluster analysis
Analyze class-labeled data objects, clustering analyze
data objects without consulting a known class label.
Clustering based on the principle: maximizing the intra-
class similarity and minimizing the interclass similarity
2/28/2022
Data Mining Functionalities
Outlier analysis
Outlier: a data object that does not comply(fulfill) with the general
behavior of the model of the data
It can be considered as noise or exception but is quite useful in fraud
detection, rare events analysis
2/28/2022
Data Mining: Classification Schemes
2/28/2022
Data Mining: Confluence of Multiple
Disciplines
Database
Statistics
Technology
Information
Science Data Mining MachineLearning
Visualization Other
Disciplines
2/28/2022
Data Mining systems: Classification Schemes
General functionality
Descriptive data mining
Predictive data mining
2/28/2022
Data Mining: Classification Schemes
Databases to be mined
Relational, transactional, object-oriented, object-
relational, active, spatial, time-series, text, multi-media,
heterogeneous, legacy, WWW, etc.
Knowledge to be mined
Characterization, discrimination, association,
classification, clustering, trend, deviation and outlier
analysis, etc.
Multiple/integrated functions and mining at multiple
levels
2/28/2022
Data Mining: Classification Schemes
Techniques utilized
Database-oriented, data warehouse
(OLAP), machine learning, statistics,
visualization, neural network, etc.
Applications adopted
Retail, telecommunication, banking,
fraud analysis, DNA mining, stock market
2/28/2022
Major Issues in Data Mining
2/28/2022
Major Issues in Data Mining
2/28/2022
Major Issues in Data Mining
2. Performance issues
Efficiency and scalability of data mining algorithms
Parallel, distributed and incremental mining methods
2/28/2022
Major Issues in Data Mining
2/28/2022
Data Mining Task Primitives
2/28/2022
2/28/2022
2/28/2022
2/28/2022
ARCHITECTURE OF A TYPICAL DATA MINING
SYSTEM
2/28/2022
2/28/2022
2/28/2022
2/28/2022
2/28/2022
2/28/2022
Q::::: Describe the differences between the following approaches for
the integration of a data mining system with a database or data
warehouse system: no coupling, loose coupling, semitight coupling, and
tight coupling.
State which approach you think is the most popular and why?
2/28/2022