Lecture 3.1.5 and 3.1.6
Lecture 3.1.5 and 3.1.6
1
Data Mining and Warehousing : Course Objectives
COURSE OBJECTIVES
The Course aims to:
1. Develop understanding key concepts of data mining and obtain knowledge about
how to extract useful characteristics from data using data pre-processing techniques.
2. Demonstrate methods to apply and analyze relevant attributes, perform statistical
measure to look for meaningful variation in data, and mine association rules for
transactional datasets.
3. Teach use and application of data mining techniques such as classification, decision
tree, neural networks, back propagation and many more, in various applications.
2
COURSE OUTCOMES
On completion of this course, the students shall be able to:-
Understand the concept of Data mining and usage of various tools for
CO1
data warehousing and data mining.
3
Unit-3 Syllabus
Unit-3
What is Classification & Prediction, Issues regarding Classification and prediction,
Decision tree, Bayesian Classification, Classification by Back propagation,
Multilayer feed-forward Neural Network, Back propagation Algorithm,
Classification methods K-nearest neighbor classifiers, Genetic Algorithm.
Cluster Analysis: Data types in cluster analysis, Categories of clustering methods,
Partitioning methods. Hierarchical Clustering- CURE and Chameleon. Density Based
Methods-DBSCAN, OPTICS. Grid Based Methods- STING, CLIQUE.
Model Based Method –Statistical Approach, Neural Network approach, Outlier
Analysis
4
Table of Content
• k-Nearest Neighbor Algorithm
• Genetic Algorithm
Instance-Based Methods
• Instance-based learning:
• Store training examples and delay the processing (“lazy
evaluation”) until a new instance must be classified
• Typical approaches
• k-nearest neighbor approach
• Instances represented as points in a Euclidean space.
• Locally weighted regression
• Constructs local approximation
• Case-based reasoning
• Uses symbolic representations and knowledge-based inference
• Fuzzy logic uses truth values between 0.0 and 1.0 to represent
the degree of membership (such as using fuzzy membership
graph)
• Attribute values are converted to fuzzy values
• e.g., income is mapped into the discrete categories {low,
medium, high} with fuzzy values calculated
• For a given new sample, more than one fuzzy value may apply
• Each applicable rule contributes a vote for membership in the
categories
• Typically, the truth values for each predicted category are
summed
June 1, 2025 Data Mining: Concepts and Techniques 13
Summary
• Instance based Methods
• k-Nearest Neighbor Algorithm
• Genetic Algorithm
14
Assignment
• Examine the key features of the k-nearest neighbor classifier.
• Determine the process of selecting the value of k in kNN.
• Explain in detail the motivation of using Genetic Algorithm.
15
References
TEXT BOOKS
T1: Tan, Steinbach and Vipin Kumar. Introduction to Data Mining, Pearson Education, 2016.
T2: Zaki MJ, Meira Jr W, Meira W. Data mining and machine learning: Fundamental concepts and algorithms.
Cambridge University Press; 2020 Jan 30.
T3: King RS. Cluster analysis and data mining: An introduction. Mercury Learning and Information; 2015 May
12.
REFERENCE BOOKS
R1: Pei, Han and Kamber. Data Mining: Concepts and Techniques, Elsevier, 2011.
R2: Halgamuge SK, Wang L, editors. Classification and clustering for knowledge discovery. Springer Science
& Business Media; 2005 Sep 2.
R3: Bhatia P. Data mining and data warehousing: principles and practical techniques. Cambridge University
Press; 2019 Jun 27.
JOURNALS
• https://www.igi-global.com/journal/international-journal-data-warehousing-mining/1085
• https://www.springer.com/journal/41060 16
• https://link.springer.com/journal/10618
References
RESEARCH PAPER
Alasadi SA, Bhaya WS. Review of data preprocessing techniques in data mining. Journal of Engineering and Applied
Sciences. 2017 Sep;12(16):4102-7.
Freitas AA. A survey of evolutionary algorithms for data mining and knowledge discovery. InAdvances in evolutionary
computing: theory and applications 2003 Jan 1 (pp. 819-845). Berlin, Heidelberg: Springer Berlin Heidelberg.
Kumbhare TA, Chobe SV. An overview of association rule mining algorithms. International Journal of Computer Science
and Information Technologies. 2014 Feb;5(1):927-30.
Srivastava S. Weka: a tool for data preprocessing, classification, ensemble, clustering and association rule mining.
International Journal of Computer Applications. 2014 Jan 1;88(10).
Dol SM, Jawandhiya PM. Classification technique and its combination with clustering and association rule mining in
educational data mining—A survey. Engineering Applications of Artificial Intelligence. 2023 Jun 1; 122:106071.
• WEB LINK
https://www.datacamp.com/tutorial/k-nearest-neighbor-classification-scikit-learn
https://www.javatpoint.com/genetic-algorithm-in-machine-learning
• VIDEO LINK
17
https://youtu.be/Z_8MpZeMdD4
THANK YOU
For queries
Email: [email protected]