Lecture 3.1.1
Lecture 3.1.1
Lecture – 3.1.1
Classification and Prediction, Issues regarding DISCOVER . LEARN . EMPOWER
Classification and prediction
1
Data Mining and Warehousing : Course Objectives
COURSE OBJECTIVES
The Course aims to:
1. Develop understanding key concepts of data mining and obtain knowledge about
how to extract useful characteristics from data using data pre-processing techniques.
2. Demonstrate methods to apply and analyze relevant attributes, perform statistical
measure to look for meaningful variation in data, and mine association rules for
transactional datasets.
3. Teach use and application of data mining techniques such as classification, decision
tree, neural networks, back propagation and many more, in various applications.
2
COURSE OUTCOMES
On completion of this course, the students shall be able to:-
Understand the concept of Data mining and usage of various tools for
CO1
data warehousing and data mining.
3
Unit-3 Syllabus
Unit-3
What is Classification & Prediction, Issues regarding Classification and prediction,
Decision tree, Bayesian Classification, Classification by Back propagation,
Multilayer feed-forward Neural Network, Back propagation Algorithm,
Classification methods K-nearest neighbor classifiers, Genetic Algorithm.
Cluster Analysis: Data types in cluster analysis, Categories of clustering methods,
Partitioning methods. Hierarchical Clustering- CURE and Chameleon. Density Based
Methods-DBSCAN, OPTICS. Grid Based Methods- STING, CLIQUE.
Model Based Method –Statistical Approach, Neural Network approach, Outlier
Analysis
4
Table of Content
• Concept of Classification and Prediction
• Issues regarding Classification and Prediction
Classification vs. Prediction
• Classification:
• predicts categorical class labels
• classifies data (constructs a model) based on the training set
and the values (class labels) in a classifying attribute and uses
it in classifying new data
• Prediction:
• models continuous-valued functions, i.e., predicts unknown
or missing values
• Typical Applications
• credit approval
• target marketing
• medical diagnosis
• treatment effectiveness analysis
June 1, 2025 Data Mining: Concepts and Techniques 6
Classification—A Two-Step
Process
• Model construction: describing a set of predetermined classes
• Each tuple/sample is assumed to belong to a predefined class, as
determined by the class label attribute
• The set of tuples used for model construction: training set
• The model is represented as classification rules, decision trees, or
mathematical formulae
• Model usage: for classifying future or unknown objects
• Estimate accuracy of the model
• The known label of test sample is compared with the classified
result from the model
• Accuracy rate is the percentage of test set samples that are
correctly classified by the model
• Test set is independent of training set, otherwise over-fitting will
occur
Classification
Algorithms
Training
Data
Classifier
Testing
Data Unseen Data
(Jeff, Professor, 4)
• Data cleaning
• Preprocess data in order to reduce noise and handle
missing values
• Relevance analysis (feature selection)
• Remove the irrelevant or redundant attributes
• Data transformation
• Generalize and/or normalize data
• Predictive accuracy
• Speed and scalability
• time to construct the model
• time to use the model
• Robustness
• handling noise and missing values
• Scalability
• efficiency in disk-resident databases
• Interpretability:
• understanding and insight provded by the model
• Goodness of rules
• decision tree size
• compactness of classification rules
June 1, 2025 Data Mining: Concepts and Techniques 12
Summary
• Classification - Definition and Process
• Supervised vs. unsupervised learning
• Issues regarding Classification and Prediction
13
Assignment
• Discuss the concept of Classification.
• List the issues related to classification and prediction.
• Differentiate between classification and prediction in data mining.
14
References
TEXT BOOKS
T1: Tan, Steinbach and Vipin Kumar. Introduction to Data Mining, Pearson Education, 2016.
T2: Zaki MJ, Meira Jr W, Meira W. Data mining and machine learning: Fundamental concepts and algorithms.
Cambridge University Press; 2020 Jan 30.
T3: King RS. Cluster analysis and data mining: An introduction. Mercury Learning and Information; 2015 May
12.
REFERENCE BOOKS
R1: Pei, Han and Kamber. Data Mining: Concepts and Techniques, Elsevier, 2011.
R2: Halgamuge SK, Wang L, editors. Classification and clustering for knowledge discovery. Springer Science
& Business Media; 2005 Sep 2.
R3: Bhatia P. Data mining and data warehousing: principles and practical techniques. Cambridge University
Press; 2019 Jun 27.
JOURNALS
• https://www.igi-global.com/journal/international-journal-data-warehousing-mining/1085
• https://www.springer.com/journal/41060 15
• https://link.springer.com/journal/10618
References
RESEARCH PAPER
Alasadi SA, Bhaya WS. Review of data preprocessing techniques in data mining. Journal of Engineering and Applied
Sciences. 2017 Sep;12(16):4102-7.
Freitas AA. A survey of evolutionary algorithms for data mining and knowledge discovery. InAdvances in evolutionary
computing: theory and applications 2003 Jan 1 (pp. 819-845). Berlin, Heidelberg: Springer Berlin Heidelberg.
Kumbhare TA, Chobe SV. An overview of association rule mining algorithms. International Journal of Computer
Science and Information Technologies. 2014 Feb;5(1):927-30.
Srivastava S. Weka: a tool for data preprocessing, classification, ensemble, clustering and association rule mining.
International Journal of Computer Applications. 2014 Jan 1;88(10).
Dol SM, Jawandhiya PM. Classification technique and its combination with clustering and association rule mining in
educational data mining—A survey. Engineering Applications of Artificial Intelligence. 2023 Jun 1; 122:106071.
• WEB LINK
https://www.tutorialspoint.com/data_mining/dm_classification_prediction.htm
• VIDEO LINK
https://youtu.be/uFydF-g-AJs 16
THANK YOU
For queries
Email: [email protected]