0% found this document useful (0 votes)

16 views18 pages

Lecture 3.1.5 and 3.1.6

The document outlines the course objectives and outcomes for a Data Mining and Warehousing class, focusing on classification methods such as k-nearest neighbor and genetic algorithms. It details the syllabus, including topics like decision trees, clustering methods, and various data mining techniques. Additionally, it provides references and resources for further study in the field.

Uploaded by

Anshul Kunwar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views18 pages

Lecture 3.1.5 and 3.1.6

Uploaded by

Anshul Kunwar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 18

APEX INSTITUTE OF TECHNOLOGY

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

Data Mining and Warehousing (22CSH-380)

Faculty: Dr. Preeti Khera (E16576)

Lecture – 3.1.5 & 3.1.6

Classification methods K-nearest neighbor DISCOVER . LEARN . EMPOWER
classifiers, Genetic Algorithm

1
Data Mining and Warehousing : Course Objectives

COURSE OBJECTIVES
The Course aims to:

1. Develop understanding key concepts of data mining and obtain knowledge about
how to extract useful characteristics from data using data pre-processing techniques.
2. Demonstrate methods to apply and analyze relevant attributes, perform statistical
measure to look for meaningful variation in data, and mine association rules for
transactional datasets.
3. Teach use and application of data mining techniques such as classification, decision
tree, neural networks, back propagation and many more, in various applications.

2
COURSE OUTCOMES
On completion of this course, the students shall be able to:-

Understand the concept of Data mining and usage of various tools for
CO1
data warehousing and data mining.

Demonstrate the strengths and weaknesses of different methods of

CO2
meaningful data mining.

Apply association rule, classification, and clustering algorithms

CO3
for large data sets.

Evaluate and employ correct data mining techniques depending on

CO4
characteristics of the dataset.
Verify and formulate the performance of various data mining
CO5
techniques according to the dataset.

3
Unit-3 Syllabus

Unit-3
What is Classification & Prediction, Issues regarding Classification and prediction,
Decision tree, Bayesian Classification, Classification by Back propagation,
Multilayer feed-forward Neural Network, Back propagation Algorithm,
Classification methods K-nearest neighbor classifiers, Genetic Algorithm.
Cluster Analysis: Data types in cluster analysis, Categories of clustering methods,
Partitioning methods. Hierarchical Clustering- CURE and Chameleon. Density Based
Methods-DBSCAN, OPTICS. Grid Based Methods- STING, CLIQUE.
Model Based Method –Statistical Approach, Neural Network approach, Outlier
Analysis

4
Table of Content
• k-Nearest Neighbor Algorithm
• Genetic Algorithm
Instance-Based Methods
• Instance-based learning:
• Store training examples and delay the processing (“lazy
evaluation”) until a new instance must be classified
• Typical approaches
• k-nearest neighbor approach
• Instances represented as points in a Euclidean space.
• Locally weighted regression
• Constructs local approximation
• Case-based reasoning
• Uses symbolic representations and knowledge-based inference

June 1, 2025 Data Mining: Concepts and Techniques 6

The k-Nearest Neighbor Algorithm

• All instances correspond to points in the n-D space.

• The nearest neighbor are defined in terms of Euclidean
distance.
• The target function could be discrete- or real- valued.
• For discrete-valued, the k-NN returns the most common
value among the k training examples nearest to xq.
• Vonoroi diagram: the decision surface induced by 1-NN for a
typical set of training examples.
_
_
_ _ .
+
_ .
+
xq + . . .
June 1, 2025
_
+ .
Data Mining: Concepts and Techniques 7
Discussion on the k-NN Algorithm
• The k-NN algorithm for continuous-valued target functions
• Calculate the mean values of the k nearest neighbors
• Distance-weighted nearest neighbor algorithm
• Weight the contribution of each of the k neighbors according
to their distance to the query point xq
• giving greater weight to closer neighbors
w 1
• Similarly, for real-valued target functions d ( xq , xi )2
• Robust to noisy data by averaging k-nearest neighbors
• Curse of dimensionality: distance between neighbors could be
dominated by irrelevant attributes.
• To overcome it, axes stretch or elimination of the least
relevant attributes.
June 1, 2025 Data Mining: Concepts and Techniques 8
Case-Based Reasoning

• Also uses: lazy evaluation + analyze similar instances

• Difference: Instances are not “points in a Euclidean space”
• Example: Water faucet problem in CADET (Sycara et al’92)
• Methodology
• Instances represented by rich symbolic descriptions (e.g.,
function graphs)
• Multiple retrieved cases may be combined
• Tight coupling between case retrieval, knowledge-based
reasoning, and problem solving
• Research issues
• Indexing based on syntactic similarity measure, and when
failure, backtracking, and adapting to additional cases
June 1, 2025 Data Mining: Concepts and Techniques 9
Remarks on Lazy vs. Eager Learning

• Instance-based learning: lazy evaluation

• Decision-tree and Bayesian classification: eager evaluation
• Key differences
• Lazy method may consider query instance xq when deciding how to
generalize beyond the training data D
• Eager method cannot since they have already chosen global approximation
when seeing the query
• Efficiency: Lazy - less time training but more time predicting
• Accuracy
• Lazy method effectively uses a richer hypothesis space since it uses many
local linear functions to form its implicit global approximation to the target
function
• Eager: must commit to a single hypothesis that covers the entire instance
space
June 1, 2025 Data Mining: Concepts and Techniques 10
Genetic Algorithms

• GA: based on an analogy to biological evolution

• Each rule is represented by a string of bits
• An initial population is created consisting of randomly generated
rules
• e.g., IF A1 and Not A2 then C2 can be encoded as 100
• Based on the notion of survival of the fittest, a new population
is formed to consists of the fittest rules and their offsprings
• The fitness of a rule is represented by its classification accuracy
on a set of training examples
• Offsprings are generated by crossover and mutation

June 1, 2025 Data Mining: Concepts and Techniques 11

Rough Set Approach

• Rough sets are used to approximately or “roughly” define

equivalent classes
• A rough set for a given class C is approximated by two sets: a
lower approximation (certain to be in C) and an upper
approximation (cannot be described as not belonging to C)
• Finding the minimal subsets (reducts) of attributes (for
feature reduction) is NP-hard but a discernibility matrix is
used to reduce the computation intensity

June 1, 2025 Data Mining: Concepts and Techniques 12

Fuzzy Set
Approaches

• Fuzzy logic uses truth values between 0.0 and 1.0 to represent
the degree of membership (such as using fuzzy membership
graph)
• Attribute values are converted to fuzzy values
• e.g., income is mapped into the discrete categories {low,
medium, high} with fuzzy values calculated
• For a given new sample, more than one fuzzy value may apply
• Each applicable rule contributes a vote for membership in the
categories
• Typically, the truth values for each predicted category are
summed
June 1, 2025 Data Mining: Concepts and Techniques 13
Summary
• Instance based Methods
• k-Nearest Neighbor Algorithm
• Genetic Algorithm

14
Assignment
• Examine the key features of the k-nearest neighbor classifier.
• Determine the process of selecting the value of k in kNN.
• Explain in detail the motivation of using Genetic Algorithm.

15
References
TEXT BOOKS
T1: Tan, Steinbach and Vipin Kumar. Introduction to Data Mining, Pearson Education, 2016.
T2: Zaki MJ, Meira Jr W, Meira W. Data mining and machine learning: Fundamental concepts and algorithms.
Cambridge University Press; 2020 Jan 30.
T3: King RS. Cluster analysis and data mining: An introduction. Mercury Learning and Information; 2015 May
12.

REFERENCE BOOKS
R1: Pei, Han and Kamber. Data Mining: Concepts and Techniques, Elsevier, 2011.
R2: Halgamuge SK, Wang L, editors. Classification and clustering for knowledge discovery. Springer Science
& Business Media; 2005 Sep 2.
R3: Bhatia P. Data mining and data warehousing: principles and practical techniques. Cambridge University
Press; 2019 Jun 27.

JOURNALS
• https://www.igi-global.com/journal/international-journal-data-warehousing-mining/1085
• https://www.springer.com/journal/41060 16
• https://link.springer.com/journal/10618
References
RESEARCH PAPER
 Alasadi SA, Bhaya WS. Review of data preprocessing techniques in data mining. Journal of Engineering and Applied
Sciences. 2017 Sep;12(16):4102-7.
 Freitas AA. A survey of evolutionary algorithms for data mining and knowledge discovery. InAdvances in evolutionary
computing: theory and applications 2003 Jan 1 (pp. 819-845). Berlin, Heidelberg: Springer Berlin Heidelberg.
 Kumbhare TA, Chobe SV. An overview of association rule mining algorithms. International Journal of Computer Science
and Information Technologies. 2014 Feb;5(1):927-30.
 Srivastava S. Weka: a tool for data preprocessing, classification, ensemble, clustering and association rule mining.
International Journal of Computer Applications. 2014 Jan 1;88(10).
 Dol SM, Jawandhiya PM. Classification technique and its combination with clustering and association rule mining in
educational data mining—A survey. Engineering Applications of Artificial Intelligence. 2023 Jun 1; 122:106071.

• WEB LINK
https://www.datacamp.com/tutorial/k-nearest-neighbor-classification-scikit-learn
https://www.javatpoint.com/genetic-algorithm-in-machine-learning

• VIDEO LINK
17
https://youtu.be/Z_8MpZeMdD4
THANK YOU

For queries
Email: [email protected]

Sales Forc
No ratings yet
Sales Forc
217 pages
Service Manual: For Seca 777
0% (2)
Service Manual: For Seca 777
28 pages
Paper - Xvii Data Mining and Warehousing
No ratings yet
Paper - Xvii Data Mining and Warehousing
140 pages
Unit 1
No ratings yet
Unit 1
102 pages
8 CLST
No ratings yet
8 CLST
98 pages
DWM NOTES
No ratings yet
DWM NOTES
118 pages
Clustering Full 1
No ratings yet
Clustering Full 1
98 pages
Lecture 2.1.3 2.1.4
No ratings yet
Lecture 2.1.3 2.1.4
34 pages
PG - M.sc. - Computer Science - 34141 Data Mining and Ware Housing
No ratings yet
PG - M.sc. - Computer Science - 34141 Data Mining and Ware Housing
192 pages
8 CLST
No ratings yet
8 CLST
100 pages
Kmeans Ex
No ratings yet
Kmeans Ex
98 pages
SM-R93X R94X R95X R96X UM EU Android Eng Rev.1.2 240430
No ratings yet
SM-R93X R94X R95X R96X UM EU Android Eng Rev.1.2 240430
146 pages
8 CLST
No ratings yet
8 CLST
98 pages
3 DM
No ratings yet
3 DM
36 pages
Lecture 3.1.2
No ratings yet
Lecture 3.1.2
27 pages
Dunham - Data Mining PDF
83% (6)
Dunham - Data Mining PDF
156 pages
Lecture 3.2.1 3.2.2
No ratings yet
Lecture 3.2.1 3.2.2
28 pages
Concepts and Techniques: - Chapter 7
No ratings yet
Concepts and Techniques: - Chapter 7
127 pages
Lecture 3.1.3 3.1.4
No ratings yet
Lecture 3.1.3 3.1.4
24 pages
DM-Unit-I Introduction To Association-1
No ratings yet
DM-Unit-I Introduction To Association-1
97 pages
Lecture 3.2.3 3.2.4
No ratings yet
Lecture 3.2.3 3.2.4
28 pages
Creative Sensation Techniques Book
100% (1)
Creative Sensation Techniques Book
188 pages
AIML-HC Mod 02
No ratings yet
AIML-HC Mod 02
65 pages
Data Mining Module - New
No ratings yet
Data Mining Module - New
38 pages
Lec 1
No ratings yet
Lec 1
33 pages
Appearance Checked Report of Samples TV
50% (2)
Appearance Checked Report of Samples TV
7 pages
Lecture 3.1.1
No ratings yet
Lecture 3.1.1
17 pages
Benzell Et Al 2023 How Apis Create Growth by Inverting The Firm
No ratings yet
Benzell Et Al 2023 How Apis Create Growth by Inverting The Firm
23 pages
DMlecture 1
No ratings yet
DMlecture 1
39 pages
8 Clustering
No ratings yet
8 Clustering
89 pages
Class 1a-DataCollection
No ratings yet
Class 1a-DataCollection
14 pages
Data Mining Introduction
No ratings yet
Data Mining Introduction
35 pages
Performing A Plane-Based Scan Registration
No ratings yet
Performing A Plane-Based Scan Registration
16 pages
Data Mining Techniques
No ratings yet
Data Mining Techniques
41 pages
Data Mining and Data Warehousing
No ratings yet
Data Mining and Data Warehousing
47 pages
Ai Model 1
No ratings yet
Ai Model 1
24 pages
DMW Notes UNIT-1 2023-24
No ratings yet
DMW Notes UNIT-1 2023-24
15 pages
Alibaba Group Technology, Strategy, and Sustainability
No ratings yet
Alibaba Group Technology, Strategy, and Sustainability
13 pages
Class 10 Maths Previous Year Questions - Polynomials
No ratings yet
Class 10 Maths Previous Year Questions - Polynomials
9 pages
Chapter 6 Synchronisation - p1
No ratings yet
Chapter 6 Synchronisation - p1
30 pages
Ewst
No ratings yet
Ewst
167 pages
Dunham - Data Mining PDF
100% (1)
Dunham - Data Mining PDF
156 pages
Data Mining Notes
No ratings yet
Data Mining Notes
3 pages
Akira 21hcs3wn Tb1238
No ratings yet
Akira 21hcs3wn Tb1238
23 pages
Model Question Paper 2
No ratings yet
Model Question Paper 2
7 pages
Data Mining
No ratings yet
Data Mining
25 pages
Cluster Analysis
No ratings yet
Cluster Analysis
39 pages
Banking System Project
100% (1)
Banking System Project
93 pages
NetSec-Generalist Dumps - Palo Alto Networks Network Security Generalist
No ratings yet
NetSec-Generalist Dumps - Palo Alto Networks Network Security Generalist
12 pages
E1 Unit1 ExtMemory MG
No ratings yet
E1 Unit1 ExtMemory MG
24 pages
Class-IV Worksheet 5
No ratings yet
Class-IV Worksheet 5
12 pages
Data Mining Summaries PDF
No ratings yet
Data Mining Summaries PDF
22 pages
DoLynk Care Datasheet
No ratings yet
DoLynk Care Datasheet
1 page
Data Mining & Data Warehousing
No ratings yet
Data Mining & Data Warehousing
84 pages
Evaluation of Student Academic Performan
No ratings yet
Evaluation of Student Academic Performan
7 pages
M365 BPChecklists
No ratings yet
M365 BPChecklists
3 pages
Chapter No 5 Mcqs
No ratings yet
Chapter No 5 Mcqs
9 pages
Sam Satapathy Resume
No ratings yet
Sam Satapathy Resume
11 pages
Data Mining1
No ratings yet
Data Mining1
13 pages
EMC Statement of Volatility - Isilon S2xx, X2xx, X4xx, NL4xx - January 2016
No ratings yet
EMC Statement of Volatility - Isilon S2xx, X2xx, X4xx, NL4xx - January 2016
7 pages
Chapter 7. Cluster Analysis
No ratings yet
Chapter 7. Cluster Analysis
120 pages
Concepts and Techniques: - Chapter 7
No ratings yet
Concepts and Techniques: - Chapter 7
123 pages
Data Warehouse and Mining Notes
No ratings yet
Data Warehouse and Mining Notes
12 pages
Data Warehousing Data Mining
No ratings yet
Data Warehousing Data Mining
4 pages
Coursera Certificate JavaScript Jquery and JSON
No ratings yet
Coursera Certificate JavaScript Jquery and JSON
1 page
What Is Cluster Analysis?: Unsupervised Learning Stand-Alone Tool Preprocessing Step
No ratings yet
What Is Cluster Analysis?: Unsupervised Learning Stand-Alone Tool Preprocessing Step
21 pages
DM - MOD - 1 Part I
No ratings yet
DM - MOD - 1 Part I
9 pages
Data Mining Techniques PDF
No ratings yet
Data Mining Techniques PDF
41 pages
Article 6
No ratings yet
Article 6
6 pages
Data Mining Intro
No ratings yet
Data Mining Intro
46 pages
Online TV Shows Pitch Deck by Slidesgo
No ratings yet
Online TV Shows Pitch Deck by Slidesgo
51 pages
OPJEMS Process PDF
No ratings yet
OPJEMS Process PDF
9 pages
Knowledge Discovery Data Mining - Syllabus
No ratings yet
Knowledge Discovery Data Mining - Syllabus
6 pages
Adv Sec Arch Spec Parnter Req Etmg en
No ratings yet
Adv Sec Arch Spec Parnter Req Etmg en
5 pages
An Introduction To Data Mining
No ratings yet
An Introduction To Data Mining
35 pages
IV - CSE - Data Warehousing and Data Mining
No ratings yet
IV - CSE - Data Warehousing and Data Mining
4 pages
Data Warehousing and Data Mining Dr.P.rizwan Ahmed
0% (1)
Data Warehousing and Data Mining Dr.P.rizwan Ahmed
20 pages
Bia Unit-3 Part-2
No ratings yet
Bia Unit-3 Part-2
43 pages
Data Warehousing & Mining: Unit - Iv
No ratings yet
Data Warehousing & Mining: Unit - Iv
32 pages
Images - Answers - BrainQuest
No ratings yet
Images - Answers - BrainQuest
1 page
Research On Pattern Analysis and Data Classification Methodology For Data Mining and Knowledge Discovery
No ratings yet
Research On Pattern Analysis and Data Classification Methodology For Data Mining and Knowledge Discovery
10 pages
Data Sheet: Automotive Audio Bus A B Transceiver
No ratings yet
Data Sheet: Automotive Audio Bus A B Transceiver
2 pages
Data Mining: An Overview From A Database Perspective
No ratings yet
Data Mining: An Overview From A Database Perspective
30 pages
Discovering Knowledge in Data: Lecture Review of
No ratings yet
Discovering Knowledge in Data: Lecture Review of
20 pages
How To Open MDI in Office 2007 Job Aid
No ratings yet
How To Open MDI in Office 2007 Job Aid
12 pages
Data Mining
No ratings yet
Data Mining
3 pages
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
Globalization: I. Attention Grabber
No ratings yet
Globalization: I. Attention Grabber
4 pages
Data Mining Theory Syllabus
No ratings yet
Data Mining Theory Syllabus
2 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet