0% found this document useful (0 votes)

36 views25 pages

Unit-4 AML (1. Basics and K-NN)

Unit-4 discusses classification and regression in data mining. Classification involves using labeled training data to build a model that can predict the class or category of new data. Examples include predicting if a loan is safe or risky, or categorizing emails as spam. Regression predicts continuous numeric values like housing prices. Supervised learning uses labeled input and output data to train models for tasks like classification and regression.

Uploaded by

hirenprajapati722

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views25 pages

Unit-4 AML (1. Basics and K-NN)

Uploaded by

hirenprajapati722

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 25

Unit-4

Classification and Regression

Compiled By: Dr. Darshana H. Patel

Information Technology Department
V.V.P. Engineering College, Rajkot.
Different Attributes
• There are certain data types associated with data mining that actually tells us the format of the file
(whether it is in text format or in numerical format).
Attributes – Represents different features of an object. Different types of attributes are:
• Binary: Possesses only two values i.e. True or False
Example: Suppose there is a survey of evaluating some product. We need to check whether it’s
useful or not. So, the Customer has to answer it in Yes or No.
Product usefulness: Yes / No
• Symmetric: Both values are equally important in all aspects
• Asymmetric: When both the values may not be important.

Compiled By: Darshana Patel 2

• Nominal: When more than two outcomes are possible. It is in Alphabet form rather than being in
Integer form.
Example: One needs to choose some material but of different colors. So, the color might be Yellow,
Green, Black, Red.
Different Colors: Red, Green, Black, Yellow
• Ordinal: Values that must have some meaningful order.
Example: Suppose there are grade sheets of few students which might contain different grades
as per their performance such as A, B, C, D
Grades: A, B, C, D
• Continuous: May have number of values.
• Example: Measuring weight of few Students in a sequence or orderly manner i.e. 50, 51, 52, 53
Weight: 50, 51, 52, 53
• Discrete: Finite number or certain range of values.
Example: Result of a Student in a few subjects: Good, Average, Poor

Compiled By: Darshana Patel 3

Supervised Learning vs Unsupervised Learning

Parameters Supervised machine learning technique Unsupervised machine learning technique

Process In a supervised learning model, input and In unsupervised learning model, only input
output variables will be given. data will be given
Input Data Algorithms are trained using labeled data. Algorithms are used against data which is not
labeled
Algorithms Used Support vector machine, Neural network, Unsupervised algorithms can be divided into
Linear and logistics regression, random different categories: like Cluster algorithms,
forest, and Classification trees. K-means, Hierarchical clustering, etc.
Computational Complexity Supervised learning is a simpler method. Unsupervised learning is computationally
complex
Use of Data Supervised learning model uses training data Unsupervised learning does not use output
to learn a link between the input and the data.
outputs.
Accuracy of Results Highly accurate and trustworthy method. Less accurate and trustworthy method.
Real Time Learning Learning method takes place offline. Learning method takes place in real time.
Number of Classes Number of classes is known. Number of classes is not known.
Main Drawback Classifying big data can be a real challenge You cannot get precise information regarding
in Supervised Learning. data sorting, and the output as data used in
unsupervised learning is labeled and not
known.

Compiled By: Darshana H. Patel

Compiled By: Darshana H. Patel
Supervised learning

• Labelled training data containing past information comes as an input. Based on the
training data, the machine builds a predictive model that can be used on test data to
assign a label for each record in the test data.
• Some examples of supervised learning are
• Predicting the results of a game
• Predicting whether a tumour is malignant or benign
• Predicting the price of domains like real estate, stocks, etc.
• Classifying texts such as classifying a set of emails as
spam or non-spam

Figure: Supervised learning 6

Compiled By: Darshana H. Patel

Supervised learning

• When we are trying to predict a categorical or nominal variable, the problem is known as
a classification problem. Whereas when we are trying to predict a real-valued variable,
the problem falls under the category of regression.
• Some typical classification problems include: Image classification, Prediction of disease,
Recognition of handwriting etc.
• Typical applications of regression can be seen in Demand forecasting in retails, weather
forecast etc.
• Note: Supervised machine learning is as good as the data used to train it. If the training
data is of poor quality, the prediction will also be far from being precise.
7

Compiled By: Darshana H. Patel

Introduction
• There are two forms of data analysis that can be used for extracting models describing
important classes or to predict future data trends. These two forms are as follows −
Classification
Regression
• Classification models predict categorical (discrete-valued and unoredered) class labels;
and prediction models predict continuous valued (ordered) functions.
• For example, we can build a classification model to categorize bank loan applications as
either safe or risky, or a regression model to predict the amount in dollars that would be
safe for the bank to loan an applicant.
Compiled By: Darshana Patel 8
Classification : Definition
• Classification is a data mining task of assigning a data instance to one of the
predefined classes or groups based upon the knowledge gained from previously seen or
classified data.
• Classification is the task of learning a target function f that maps attribute set x to
one of the predefined class labels y.
• Example:
Classifying bank loan application as safe or risky
Classifying or categorizing news as sports, weather etc.
Classifying e-mail into different categories like primary, forums, spam etc.
Compiled By: Darshana Patel 9
Classification Model
• Usually, the given data set is divided into training and test sets, with
training set used to build the model and test set used to validate it.
• Thus, Classification is a two step process:
1) Learning: Training data are analyzed by a classification algorithm.
2) Classification: Test data are used to estimate the accuracy of the
classification rules. If the accuracy is considered acceptable, the rules
can be applied to the classification of new data tuples.

Compiled By: Darshana Patel 10

Classification Model

Figure shown here depicts the typical

process of classification where a
classification model is obtained from
the labelled training data by a
classifier algorithm. On the basis of
the model, a class label (e.g. ‘Intel’ as
in the case of the test data referred
in Figure) is assigned to the test data.

Compiled By: Darshana H. Patel

Classification Model
• A critical classification problem in the context of the banking domain is identifying
potentially fraudulent transactions. Because there are millions of transactions
which have to be scrutinized to identify whether a particular transaction might be a
fraud transaction, it is not possible for any human being to carry out this task.
• Machine learning is leveraged efficiently to do this task, and this is a classic case of
classification. On the basis of the past transaction data, especially the ones labelled
as fraudulent, all new incoming transactions are marked or labelled as usual or
suspicious. The suspicious transactions are subsequently segregated for a closer
review.
Compiled By: Darshana H. Patel
Classification Model
• Some typical classification problems include the following:
• Image classification
• Disease prediction
• Win–loss prediction of games
• Prediction of natural calamity such as earthquake, flood, etc.
• Handwriting recognition

Compiled By: Darshana H. Patel

CLASSIFICATION LEARNING STEPS

First, there is a problem which

is to be solved, and then, the
required data (related to the
problem, which is already
stored in the system) is
evaluated and pre-processed
based on the algorithm.
Algorithm selection is a critical
point in supervised learning.
The result after iterative
training rounds is a classifier for
the problem in hand

Compiled By: Darshana H. Patel

COMMON CLASSIFICATION ALGORITHMS

• Following are the most common classification algorithms:

1) k-Nearest Neighbour (kNN)
2) Decision tree
3) Random forest
4) Support Vector Machine (SVM)
5) Naïve Bayes classifier

Compiled By: Darshana H. Patel

k-Nearest Neighbour (kNN)

• The kNN algorithm is a simple but extremely powerful classification algorithm.

• The name of the algorithm originates from the underlying philosophy of kNN –
i.e. people having similar background or mindset tend to stay close to each other.
In other words, neighbours in a locality have a similar background.
• In the same way, as a part of the kNN algorithm, the unknown and unlabelled data
which comes for a prediction problem is judged on the basis of the training data
set elements which are similar to the unknown element. So, the class label of the
unknown element is assigned on the basis of the class labels of the similar training
data set elements
Compiled By: Darshana H. Patel
k-Nearest Neighbour (kNN)
k-Nearest Neighbour (kNN)

• In the kNN algorithm, the class label of the test data elements is decided
by the class label of the training data elements which are neighbouring,
i.e. similar in nature. But there are two challenges:
1. What is the basis of this similarity or when can we say that two data
elements are similar?
2. How many similar elements should be considered for deciding the class
label of each test data element?

Compiled By: Darshana H. Patel

k-Nearest Neighbour (kNN)

• To answer the first question, though there are many measures of similarity, the
most common approach adopted by kNN to measure similarity between two data
elements is Euclidean distance. Considering a very simple data set having two
features (say f1 and f2), Euclidean distance between two data
elements d1 and d2 can be measured by

where f11 = value of feature f1 for data element d1

f12 = value of feature f1 for data element d2

f21 = value of feature f2 for data element d1

f22 = value of feature f2 for data element d2

Compiled By: Darshana H. Patel

k-Nearest Neighbour (kNN)

• The answer to the second question, i.e. how many similar elements should be considered.
The answer lies in the value of ‘k’ which is a user-defined parameter given as an input to
the algorithm.
• In the kNN algorithm, the value of ‘k’ indicates the number of neighbours that need to be
considered.
• For example, if the value of k is 3, only three nearest neighbours or three training data
elements closest to the test data element are considered. Out of the three data elements, the
class which is predominant is considered as the class label to be assigned to the test data.
• In case the value of k is 1, only the closest training data element is considered. The class
label of that data element is directly assigned to the test data element

Compiled By: Darshana H. Patel

k-Nearest Neighbour (kNN)

• But it is often a tricky decision to decide the value of k. The reasons are as follows:
• If the value of k is very large (in the extreme case equal to the total number of records in the training
data), the class label of the majority class of the training data set will be assigned to the test data
regardless of the class labels of the neighbours nearest to the test data.
• If the value of k is very small (in the extreme case equal to 1), the class value of a noisy data or
outlier in the training data set which is the nearest neighbour to the test data will be assigned to the
test data.
• The best k value is somewhere between these two extremes.
• Few strategies are adopted by machine learning practitioners to arrive at a value for k.
• One common practice is to set k equal to the square root of the number of training records.

Compiled By: Darshana H. Patel

k-Nearest Neighbour (kNN)

• Input: Training data set, test data set (or data points), value of ‘k’ (i.e. number of nearest neighbours to be considered)
• Steps:
• Do for all test data points
• Calculate the distance (usually Euclidean distance) of the test data point from the different training data points.
• Find the closest ‘k’ training data points, i.e. training data points whose distances are least from the test data point.
• If k = 1
• Then assign class label of the training data point to the test data point
• Else
• Whichever class label is predominantly present in the training data points, assign that class label to the test data point
• End do

Compiled By: Darshana H. Patel

k-Nearest Neighbour (kNN)

• Why the kNN algorithm is called a lazy learner?

• The eager learners follow the general steps of machine learning, i.e. perform an
abstraction of the information obtained from the input data and then follow it through
by a generalization step.
• However, as we have seen in the case of the kNN algorithm, these steps are completely
skipped. It stores the training data and directly applies the philosophy of nearest
neighbourhood finding to arrive at the classification. So, for kNN, there is no learning
happening in the real sense. Therefore, kNN falls under the category of lazy learner.

Compiled By: Darshana H. Patel

k-Nearest Neighbour (kNN)

• Strengths of the kNN algorithm

1. Extremely simple algorithm – easy to understand
2. Very effective in certain situations, e.g. for recommender system design
3. Very fast or almost no time required for the training phase
• Weaknesses of the kNN algorithm
1. Does not learn anything in the real sense. Classification is done completely on the basis of the training data. So, it has a
heavy reliance on the training data. If the training data does not represent the problem domain comprehensively, the
algorithm fails to make an effective classification.
2. Because there is no model trained in real sense and the classification is done completely on the basis of the training data,
the classification process is very slow.
3. Also, a large amount of computational space is required to load the training data for classification.

Compiled By: Darshana H. Patel

k-Nearest Neighbour (kNN)

• Application of the kNN algorithm

• One of the most popular areas in machine learning where the kNN algorithm is widely
adopted is recommender systems. As we know, recommender systems recommend users
different items which are similar to a particular item that the user seems to like. The
liking pattern may be revealed from past purchases or browsing history and the similar
items are identified using the kNN algorithm.
• Another area where there is widespread adoption of kNN is searching documents/
contents similar to a given document/content. This is a core area under information
retrieval and is known as concept search.

Compiled By: Darshana H. Patel

NFOT 2025 Technolympics Guidelines
100% (10)
NFOT 2025 Technolympics Guidelines
23 pages
MI - Unit 3
100% (1)
MI - Unit 3
107 pages
Topics in Module-3-: ML & Cloud Computing For Iot
No ratings yet
Topics in Module-3-: ML & Cloud Computing For Iot
149 pages
Lesson 4 Singing
No ratings yet
Lesson 4 Singing
44 pages
Classification
No ratings yet
Classification
23 pages
Unit 5
No ratings yet
Unit 5
73 pages
Chapter
100% (1)
Chapter
101 pages
Dav Unit 3
No ratings yet
Dav Unit 3
50 pages
AIML Unit-IV & V
100% (1)
AIML Unit-IV & V
47 pages
Unit-5 3161610
No ratings yet
Unit-5 3161610
92 pages
0 - ARTS9 - q1 - Melc1 - Distinct Characteristics... - FINAL
No ratings yet
0 - ARTS9 - q1 - Melc1 - Distinct Characteristics... - FINAL
29 pages
CSCI946 W5-Classification
No ratings yet
CSCI946 W5-Classification
72 pages
Datamining Lect12
No ratings yet
Datamining Lect12
75 pages
ML Module4 Classification
No ratings yet
ML Module4 Classification
79 pages
Machine Learning Algorithms Laiki
No ratings yet
Machine Learning Algorithms Laiki
123 pages
Data Mining and Warehousing Mod3
No ratings yet
Data Mining and Warehousing Mod3
69 pages
Classification (Part II)
No ratings yet
Classification (Part II)
162 pages
Unit 4 Supervised Learning
100% (1)
Unit 4 Supervised Learning
75 pages
Machine Learning-Classification
No ratings yet
Machine Learning-Classification
52 pages
EDSP Application Form 2019 PDF
50% (2)
EDSP Application Form 2019 PDF
3 pages
Classification Basic Concept - Data Mining
No ratings yet
Classification Basic Concept - Data Mining
20 pages
3 DM Classification
No ratings yet
3 DM Classification
62 pages
Chapter-V CLASSIFICATION & CLUSTERING
No ratings yet
Chapter-V CLASSIFICATION & CLUSTERING
153 pages
CH 4
No ratings yet
CH 4
106 pages
Come On Everyone 1-TG
No ratings yet
Come On Everyone 1-TG
75 pages
Pattern Revision
No ratings yet
Pattern Revision
63 pages
Machine Learning
No ratings yet
Machine Learning
35 pages
Reflection Paper-Administration and Supervision
No ratings yet
Reflection Paper-Administration and Supervision
3 pages
Module 3
No ratings yet
Module 3
63 pages
Unit 6
No ratings yet
Unit 6
22 pages
NLP Chapter 2
No ratings yet
NLP Chapter 2
79 pages
Classification
No ratings yet
Classification
53 pages
Big Data Analytics - Unit 3
No ratings yet
Big Data Analytics - Unit 3
55 pages
The Secret Of Machine Learning
From Everand
The Secret Of Machine Learning
Mhd Arjunanta
No ratings yet
ML 3RD Unit
No ratings yet
ML 3RD Unit
67 pages
Classification
No ratings yet
Classification
50 pages
DW&M Unit 3 Part I
No ratings yet
DW&M Unit 3 Part I
101 pages
Machine Learning with Python: Foundations and Applications: ML, #1
From Everand
Machine Learning with Python: Foundations and Applications: ML, #1
Mohammed Nurudeen
No ratings yet
Week 4 Part 1 Classification
No ratings yet
Week 4 Part 1 Classification
71 pages
Chapter 4 Classification
No ratings yet
Chapter 4 Classification
78 pages
Machine Learning
No ratings yet
Machine Learning
32 pages
3 DM Classification
No ratings yet
3 DM Classification
55 pages
4 - Data Analytics Using DM and ML Algorithms - 1
No ratings yet
4 - Data Analytics Using DM and ML Algorithms - 1
71 pages
DM Chapter 4
No ratings yet
DM Chapter 4
47 pages
Guidance On Teaching, Learning and Assessment at Key Stage 4
No ratings yet
Guidance On Teaching, Learning and Assessment at Key Stage 4
115 pages
IT 802 ML Unit-2 Notes
No ratings yet
IT 802 ML Unit-2 Notes
19 pages
Lecture 9
No ratings yet
Lecture 9
27 pages
Classification
No ratings yet
Classification
33 pages
Overview Basics
No ratings yet
Overview Basics
16 pages
Learner Workbook Business Plan
0% (1)
Learner Workbook Business Plan
21 pages
Internship Report Format For Submission Final
No ratings yet
Internship Report Format For Submission Final
8 pages
Unit 5
No ratings yet
Unit 5
28 pages
Introduction To Basics of Machine Learning Algorithms: Pankaj Oli
100% (1)
Introduction To Basics of Machine Learning Algorithms: Pankaj Oli
13 pages
TIP M and E Report Format
No ratings yet
TIP M and E Report Format
6 pages
Chapter 2
No ratings yet
Chapter 2
31 pages
Unit 4 Classification
No ratings yet
Unit 4 Classification
87 pages
"Classifiers": R & D Project by Under The Guidance of
No ratings yet
"Classifiers": R & D Project by Under The Guidance of
59 pages
Unit Iii Classification
No ratings yet
Unit Iii Classification
57 pages
Classification: Unit-III
No ratings yet
Classification: Unit-III
90 pages
Course Teacher Education in Pakistan 8626
0% (1)
Course Teacher Education in Pakistan 8626
39 pages
ITP4-Lesson 4-Week 7-8
No ratings yet
ITP4-Lesson 4-Week 7-8
18 pages
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
ch-4 FML
No ratings yet
ch-4 FML
13 pages
Classification
No ratings yet
Classification
7 pages
V1-CH-6-Classification and Prediction
No ratings yet
V1-CH-6-Classification and Prediction
38 pages
ABP DWDM UNIT 4 Classification 1
No ratings yet
ABP DWDM UNIT 4 Classification 1
51 pages
Classifiers (Support Vector Machines, Decision Trees, Nearest Neighbor Classification)
No ratings yet
Classifiers (Support Vector Machines, Decision Trees, Nearest Neighbor Classification)
16 pages
Artificial Intelligence and Machine Learning
No ratings yet
Artificial Intelligence and Machine Learning
12 pages
Machine Learning (Part 1) : Iykra Data Fellowship Batch 3
No ratings yet
Machine Learning (Part 1) : Iykra Data Fellowship Batch 3
28 pages
Literature Teaching Approaches (1) - Compressed PDF
No ratings yet
Literature Teaching Approaches (1) - Compressed PDF
4 pages
Group Discussion Topics - Section C
No ratings yet
Group Discussion Topics - Section C
20 pages
Data Mining 4th Is
No ratings yet
Data Mining 4th Is
24 pages
Integration of AI in VTE Programmes in Universities
No ratings yet
Integration of AI in VTE Programmes in Universities
13 pages
Decision Trees. These Models Use Observations About Certain
No ratings yet
Decision Trees. These Models Use Observations About Certain
6 pages
Arts
No ratings yet
Arts
4 pages
Item Analysis Format Word
No ratings yet
Item Analysis Format Word
1 page
4th Grade Force Energy and Motion - Water Balloon Experiment
No ratings yet
4th Grade Force Energy and Motion - Water Balloon Experiment
3 pages
Lesson 4 Developing Problem-Based and Project Based Instructional Plans
No ratings yet
Lesson 4 Developing Problem-Based and Project Based Instructional Plans
41 pages
Lesson 8.2.education 4.0 in The School Curriculum
No ratings yet
Lesson 8.2.education 4.0 in The School Curriculum
2 pages
Arning
No ratings yet
Arning
4 pages
C M Charles Synergetic Classroom
No ratings yet
C M Charles Synergetic Classroom
5 pages
Application For Graduation For October 2017 April 2018
No ratings yet
Application For Graduation For October 2017 April 2018
8 pages
Professions: Prepared by 1c Class Student Rugile Visinskaite
No ratings yet
Professions: Prepared by 1c Class Student Rugile Visinskaite
7 pages
Cynthia Sciandra CV Internet
No ratings yet
Cynthia Sciandra CV Internet
2 pages
Combining DI and UbD
No ratings yet
Combining DI and UbD
2 pages
Audio-Lingual Presentation
No ratings yet
Audio-Lingual Presentation
22 pages
EDF5643 2024 Week 3 Lecture
No ratings yet
EDF5643 2024 Week 3 Lecture
21 pages
ChatGPT Article
No ratings yet
ChatGPT Article
6 pages
Individual Development Plan
No ratings yet
Individual Development Plan
2 pages
Orange Template 2021-2022
No ratings yet
Orange Template 2021-2022
67 pages

Unit-4 AML (1. Basics and K-NN)

Uploaded by

Unit-4 AML (1. Basics and K-NN)

Uploaded by

Unit-4

Classification and Regression

Compiled By: Dr. Darshana H. Patel

Compiled By: Darshana Patel 2

Compiled By: Darshana Patel 3

Parameters Supervised machine learning technique Unsupervised machine learning technique

Compiled By: Darshana H. Patel

Figure: Supervised learning 6

Compiled By: Darshana H. Patel

Compiled By: Darshana H. Patel

Compiled By: Darshana Patel 10

Figure shown here depicts the typical

Compiled By: Darshana H. Patel

Compiled By: Darshana H. Patel

First, there is a problem which

Compiled By: Darshana H. Patel

• Following are the most common classification algorithms:

Compiled By: Darshana H. Patel

• The kNN algorithm is a simple but extremely powerful classification algorithm.

Compiled By: Darshana H. Patel

where f11 = value of feature f1 for data element d1

f12 = value of feature f1 for data element d2

f21 = value of feature f2 for data element d1

f22 = value of feature f2 for data element d2

Compiled By: Darshana H. Patel

Compiled By: Darshana H. Patel

Compiled By: Darshana H. Patel

Compiled By: Darshana H. Patel

• Why the kNN algorithm is called a lazy learner?

Compiled By: Darshana H. Patel

• Strengths of the kNN algorithm

Compiled By: Darshana H. Patel

• Application of the kNN algorithm

Compiled By: Darshana H. Patel

You might also like