0% found this document useful (0 votes)
19 views17 pages

Lecture 3.1.1

The document outlines the course objectives and outcomes for a Data Mining and Warehousing class, focusing on classification and prediction techniques. It covers key concepts such as supervised and unsupervised learning, classification processes, and issues related to data preparation and evaluation. Additionally, it lists assignments and references for further reading on the subject.

Uploaded by

Anshul Kunwar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views17 pages

Lecture 3.1.1

The document outlines the course objectives and outcomes for a Data Mining and Warehousing class, focusing on classification and prediction techniques. It covers key concepts such as supervised and unsupervised learning, classification processes, and issues related to data preparation and evaluation. Additionally, it lists assignments and references for further reading on the subject.

Uploaded by

Anshul Kunwar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 17

APEX INSTITUTE OF TECHNOLOGY

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

Data Mining and Warehousing (22CSH-380)


Faculty: Dr. Preeti Khera (E16576)

Lecture – 3.1.1
Classification and Prediction, Issues regarding DISCOVER . LEARN . EMPOWER
Classification and prediction

1
Data Mining and Warehousing : Course Objectives

COURSE OBJECTIVES
The Course aims to:

1. Develop understanding key concepts of data mining and obtain knowledge about
how to extract useful characteristics from data using data pre-processing techniques.
2. Demonstrate methods to apply and analyze relevant attributes, perform statistical
measure to look for meaningful variation in data, and mine association rules for
transactional datasets.
3. Teach use and application of data mining techniques such as classification, decision
tree, neural networks, back propagation and many more, in various applications.

2
COURSE OUTCOMES
On completion of this course, the students shall be able to:-

Understand the concept of Data mining and usage of various tools for
CO1
data warehousing and data mining.

Demonstrate the strengths and weaknesses of different methods of


CO2
meaningful data mining.

Apply association rule, classification, and clustering algorithms for


CO3
large data sets.

Evaluate and employ correct data mining techniques depending on


CO4
characteristics of the dataset.
Verify and formulate the performance of various data mining
CO5
techniques according to the dataset.

3
Unit-3 Syllabus

Unit-3
What is Classification & Prediction, Issues regarding Classification and prediction,
Decision tree, Bayesian Classification, Classification by Back propagation,
Multilayer feed-forward Neural Network, Back propagation Algorithm,
Classification methods K-nearest neighbor classifiers, Genetic Algorithm.
Cluster Analysis: Data types in cluster analysis, Categories of clustering methods,
Partitioning methods. Hierarchical Clustering- CURE and Chameleon. Density Based
Methods-DBSCAN, OPTICS. Grid Based Methods- STING, CLIQUE.
Model Based Method –Statistical Approach, Neural Network approach, Outlier
Analysis

4
Table of Content
• Concept of Classification and Prediction
• Issues regarding Classification and Prediction
Classification vs. Prediction

• Classification:
• predicts categorical class labels
• classifies data (constructs a model) based on the training set
and the values (class labels) in a classifying attribute and uses
it in classifying new data
• Prediction:
• models continuous-valued functions, i.e., predicts unknown
or missing values
• Typical Applications
• credit approval
• target marketing
• medical diagnosis
• treatment effectiveness analysis
June 1, 2025 Data Mining: Concepts and Techniques 6
Classification—A Two-Step
Process
• Model construction: describing a set of predetermined classes
• Each tuple/sample is assumed to belong to a predefined class, as
determined by the class label attribute
• The set of tuples used for model construction: training set
• The model is represented as classification rules, decision trees, or
mathematical formulae
• Model usage: for classifying future or unknown objects
• Estimate accuracy of the model
• The known label of test sample is compared with the classified
result from the model
• Accuracy rate is the percentage of test set samples that are
correctly classified by the model
• Test set is independent of training set, otherwise over-fitting will
occur

June 1, 2025 Data Mining: Concepts and Techniques 7


Classification Process (1):
Model Construction

Classification
Algorithms
Training
Data

NAME RANK YEARS TENURED Classifier


(Model)
Mike Assistant Prof 3 no
Mary Assistant Prof 7 yes
Bill Professor 2 yes
Jim Associate Prof 7 yes
IF rank = ‘professor’
Dave Assistant Prof 6 no OR years > 6
Anne Associate Prof 3 no THEN tenured = ‘yes’
June 1, 2025 Data Mining: Concepts and Techniques 8
Classification Process (2): Use
the Model in Prediction

Classifier

Testing
Data Unseen Data

(Jeff, Professor, 4)

NAME RANK YEARS TENURED


Tom Assistant Prof 2 no Tenured?
Merlisa Associate Prof 7 no
George Professor 5 yes
June 1, 2025
Joseph Assistant ProfData Mining:
7 Concepts and
yesTechniques 9
Supervised vs. Unsupervised
Learning
• Supervised learning (classification)
• Supervision: The training data (observations,
measurements, etc.) are accompanied by labels indicating
the class of the observations
• New data is classified based on the training set
• Unsupervised learning (clustering)
• The class labels of training data is unknown
• Given a set of measurements, observations, etc. with the
aim of establishing the existence of classes or clusters in
the data
June 1, 2025 Data Mining: Concepts and Techniques 10
Issues regarding classification and
prediction (1): Data Preparation

• Data cleaning
• Preprocess data in order to reduce noise and handle
missing values
• Relevance analysis (feature selection)
• Remove the irrelevant or redundant attributes
• Data transformation
• Generalize and/or normalize data

June 1, 2025 Data Mining: Concepts and Techniques 11


Issues regarding classification and prediction
(2): Evaluating Classification Methods

• Predictive accuracy
• Speed and scalability
• time to construct the model
• time to use the model
• Robustness
• handling noise and missing values
• Scalability
• efficiency in disk-resident databases
• Interpretability:
• understanding and insight provded by the model
• Goodness of rules
• decision tree size
• compactness of classification rules
June 1, 2025 Data Mining: Concepts and Techniques 12
Summary
• Classification - Definition and Process
• Supervised vs. unsupervised learning
• Issues regarding Classification and Prediction

13
Assignment
• Discuss the concept of Classification.
• List the issues related to classification and prediction.
• Differentiate between classification and prediction in data mining.

14
References
TEXT BOOKS
T1: Tan, Steinbach and Vipin Kumar. Introduction to Data Mining, Pearson Education, 2016.
T2: Zaki MJ, Meira Jr W, Meira W. Data mining and machine learning: Fundamental concepts and algorithms.
Cambridge University Press; 2020 Jan 30.
T3: King RS. Cluster analysis and data mining: An introduction. Mercury Learning and Information; 2015 May
12.

REFERENCE BOOKS
R1: Pei, Han and Kamber. Data Mining: Concepts and Techniques, Elsevier, 2011.
R2: Halgamuge SK, Wang L, editors. Classification and clustering for knowledge discovery. Springer Science
& Business Media; 2005 Sep 2.
R3: Bhatia P. Data mining and data warehousing: principles and practical techniques. Cambridge University
Press; 2019 Jun 27.

JOURNALS
• https://www.igi-global.com/journal/international-journal-data-warehousing-mining/1085
• https://www.springer.com/journal/41060 15
• https://link.springer.com/journal/10618
References
RESEARCH PAPER
 Alasadi SA, Bhaya WS. Review of data preprocessing techniques in data mining. Journal of Engineering and Applied
Sciences. 2017 Sep;12(16):4102-7.
 Freitas AA. A survey of evolutionary algorithms for data mining and knowledge discovery. InAdvances in evolutionary
computing: theory and applications 2003 Jan 1 (pp. 819-845). Berlin, Heidelberg: Springer Berlin Heidelberg.
 Kumbhare TA, Chobe SV. An overview of association rule mining algorithms. International Journal of Computer
Science and Information Technologies. 2014 Feb;5(1):927-30.
 Srivastava S. Weka: a tool for data preprocessing, classification, ensemble, clustering and association rule mining.
International Journal of Computer Applications. 2014 Jan 1;88(10).
 Dol SM, Jawandhiya PM. Classification technique and its combination with clustering and association rule mining in
educational data mining—A survey. Engineering Applications of Artificial Intelligence. 2023 Jun 1; 122:106071.

• WEB LINK
https://www.tutorialspoint.com/data_mining/dm_classification_prediction.htm

• VIDEO LINK
https://youtu.be/uFydF-g-AJs 16
THANK YOU

For queries
Email: [email protected]

You might also like