Unit - 2 ML

Uploaded by

imjyoti1511

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views32 pages

Unit - 2 ML

Uploaded by

imjyoti1511

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 32

Unit-2

Supervised Learning
Contents
Types of Supervised Learning
 Supervised Machine Learning Algorithms
k Nearest Neighbors
Regression Models
Naive Bayes Classifiers
Decision Trees
Ensembles of Decision Trees
Kernelized Support Vector Machines
Uncertainty Estimates from Classifiers
Supervised Machine Learning

 Supervised learning is the types of machine learning in which machines are

trained using well "labelled" training data, and on basis of that data,
machines predict the output.
 The labelled data means some input data is already tagged with the correct
output.
 In supervised learning, the training data provided to the machines work as
the supervisor that teaches the machines to predict the output correctly.
 It applies the same concept as a student learns in the supervision of the
teacher.
 Supervised learning is a process of providing input data as well as correct
output data to the machine learning model.
 The aim of a supervised learning algorithm is to find a mapping function to
map the input variable(x) with the output variable(y).
How Supervised Learning Works?

 In supervised learning, models are trained using labelled dataset, where the
model learns about each type of data.
 Once the training process is completed, the model is tested on the basis of
test data (a subset of the training set), and then it predicts the output.
KNN Algorithm :
K-Nearest Neighbour is one of the simplest
Machine Learning algorithms based on Supervised
Learning technique.
K-NN algorithm can be used for Regression as
well as for Classification but mostly it is used for
the Classification problems.
K-NN is a non-parametric algorithm, which
means it does not make any assumption on
underlying data.
KNN algorithm at the training phase just stores
the dataset and when it gets new data, then it
classifies that data into a category that is much
similar to the new data.
Why do we need a K-NN Algorithm?
How does K-NN work?
The K-NN working can be explained on the basis of
the below algorithm:
Step-1: Select the number K of the neighbors
Step-2: Calculate the Euclidean distance of K
number of neighbors
Step-3: Take the K nearest neighbors as per the
calculated Euclidean distance.
Step-4: Among these k neighbors, count the
number of the data points in each category.
Step-5: Assign the new data points to that category
for which the number of the neighbor is maximum.
Step-6: Our model is ready.
How to select the value of K in the K-NN Algorithm :
There is no particular way to determine the
best value for "K", so we need to try some
values to find the best out of them. The most
preferred value for K is 5.
A very low value for K such as K=1 or K=2,
can be noisy and lead to the effects of outliers
in the model.
Large values for K are good, but it may find
some difficulties.
Advantages of KNN Algorithm:
It is simple to implement.
It is robust to the noisy training data
It can be more effective if the training data is
large.
Disadvantages of KNN Algorithm:
Always needs to determine the value of K
which may be complex some time.
The computation cost is high because of
calculating the distance between the data
points for all the training samples.
Example
Start by visualizing some data points:
import matplotlib.pyplot as plt

x = [4, 5, 10, 4, 3, 11, 14 , 8, 10, 12]

y = [21, 19, 24, 17, 16, 25, 24, 22, 21, 21]
classes = [0, 0, 1, 0, 0, 1, 1, 0, 1, 1]

plt.scatter(x, y, c=classes)
plt.show()
 Now we fit the KNN algorithm with K=1
 from sklearn.neighbors import KNeighborsClassifier

data = list(zip(x, y))

knn = KNeighborsClassifier(n_neighbors=1)

knn.fit(data, classes)
 And use it to classify a new data point:
Example
 new_x = 8
new_y = 21
new_point = [(new_x, new_y)]

prediction = knn.predict(new_point)

plt.scatter(x + [new_x], y + [new_y], c=classes + [prediction[0]])

plt.text(x=new_x-1.7, y=new_y-0.7, s=f"new point, class:
{prediction[0]}")
plt.show()
Regression Analysis

Regression analysis is a statistical method to

model the relationship between a dependent
(target) and independent (predictor) variables
with one or more independent variables.
 More specifically, Regression analysis helps
us to understand how the value of the
dependent variable is changing
corresponding to an independent variable
when other independent variables are held
fixed.
 It predicts continuous/real values such
as temperature, age, salary, price, etc.
Example: Suppose there is a marketing
company A, who does various advertisement
every year and get sales on that. The below
list shows the advertisement made by the
company in the last 5 years and the
corresponding sales:
"Regression shows a line or curve that
passes through all the datapoints on
target-predictor graph in such a way that
the vertical distance between the
datapoints and the regression line is
minimum." The distance between datapoints
and line tells whether a model has captured a
strong relationship or not.
Some examples of regression can be as:
Prediction of rain using temperature and
other factors
Determining Market trends
Prediction of road accidents due to rash
driving.
Types of Regression
Linear Regression:
 Linear regression is a statistical regression method which is used for
predictive analysis.
 It is one of the very simple and easy algorithms which works on
regression and shows the relationship between the continuous
variables.
 It is used for solving the regression problem in machine learning.
 Linear regression shows the linear relationship between the
independent variable (X-axis) and the dependent variable (Y-axis),
hence called linear regression.
 If there is only one input variable (x), then such linear regression is
called simple linear regression. And if there is more than one
input variable, then such linear regression is called multiple linear
regression.
 The relationship between variables in the linear regression model
can be explained using the below image. Here we are predicting the
salary of an employee on the basis of the year of experience.
Y= aX+b
Here, Y = dependent variables (target
variables),
X= Independent variables (predictor
variables),
a and b are the linear coefficients
Some popular applications of linear
regression are:
Analyzing trends and sales estimates
Salary forecasting
Real estate prediction
Arriving at ETAs in traffic.
Logistic Regression:
Logistic regression is another supervised learning
algorithm which is used to solve the classification
problems. In classification problems, we have
dependent variables in a binary or discrete format such as
0 or 1.
Logistic regression algorithm works with the categorical
variable such as 0 or 1, Yes or No, True or False, Spam or
not spam, etc.
It is a predictive analysis algorithm which works on the
concept of probability.
Logistic regression is a type of regression, but it is
different from the linear regression algorithm in the term
how they are used.
Logistic regression uses sigmoid function or logistic
function which is a complex cost function. This sigmoid
function is used to model the data in logistic regression.
The function can be represented as:
f(x)= Output between the 0 and 1 value.
x= input to the function
e= base of natural logarithm.
When we provide the input values (data) to
the function, it gives the S-curve as follows:
It uses the concept of threshold levels, values
above the threshold level are rounded up to
1, and values below the threshold level are
rounded up to 0.
There are three types of logistic regression:
Binary(0/1, pass/fail)
Multi(cats, dogs, lions)
Ordinal(low, medium, high)
Polynomial Regression:
Polynomial Regression is a type of regression which
models the non-linear dataset using a linear model.
It is similar to multiple linear regression, but it fits a
non-linear curve between the value of x and
corresponding conditional values of y.
Suppose there is a dataset which consists of
datapoints which are present in a non-linear fashion,
so for such case, linear regression will not best fit to
those datapoints. To cover such datapoints, we need
Polynomial regression.
In Polynomial regression, the original features
are transformed into polynomial features of given
degree and then modeled using a linear
model. Which means the datapoints are best fitted
using a polynomial line.
The equation for polynomial regression also
derived from linear regression equation that
means Linear regression equation Y= b0+ b1x, is
transformed into Polynomial regression equation
Y= b0+b1x+ b2x2+ b3x3+.....+ bnxn.
Here Y is the predicted/target output, b0, b1,...
bn are the regression coefficients. x is
our independent/input variable.
The model is still linear as the coefficients are
still linear with quadratic
Support Vector Regression:
 Support Vector Machine is a supervised learning algorithm
which can be used for regression as well as classification
problems. So if we use it for regression problems, then it is
termed as Support Vector Regression.
 Support Vector Regression is a regression algorithm which
works for continuous variables. Below are some keywords
which are used in Support Vector Regression:
 Kernel: It is a function used to map a lower-dimensional
data into higher dimensional data.
 Hyperplane: In general SVM, it is a separation line
between two classes, but in SVR, it is a line which helps to
predict the continuous variables and cover most of the
datapoints.
 Boundary line: Boundary lines are the two lines apart
from hyperplane, which creates a margin for datapoints.
 Support vectors: Support vectors are the datapoints
which are nearest to the hyperplane and opposite class.
In SVR, we always try to determine a
hyperplane with a maximum margin, so that
maximum number of datapoints are covered
in that margin. The main goal of SVR is to
consider the maximum datapoints within
the boundary lines and the hyperplane
(best-fit line) must contain a maximum
number of datapoints. Consider the below
image:
Decision Tree Regression:
Decision Tree is a supervised learning algorithm which
can be used for solving both classification and
regression problems.
It can solve problems for both categorical and
numerical data
Decision Tree regression builds a tree-like structure in
which each internal node represents the "test" for an
attribute, each branch represent the result of the test,
and each leaf node represents the final decision or
result.
A decision tree is constructed starting from the root
node/parent node (dataset), which splits into left and
right child nodes (subsets of dataset). These child
nodes are further divided into their children node, and
themselves become the parent node of those nodes.
Consider the below image:
Random forest is one of the most powerful supervised
learning algorithms which is capable of performing
regression as well as classification tasks.
The Random Forest regression is an ensemble learning
method which combines multiple decision trees and
predicts the final output based on the average of each tree
output. The combined decision trees are called as base
models, and it can be represented more formally as:
g(x)= f0(x)+ f1(x)+ f2(x)+....
Random forest uses Bagging or Bootstrap
Aggregation technique of ensemble learning in which
aggregated decision tree runs in parallel and do not
interact with each other.
With the help of Random Forest regression, we can
prevent Overfitting in the model by creating random
subsets of the dataset.
Ridge Regression:
 Ridge regression is one of the most robust versions of linear
regression in which a small amount of bias is introduced so that we
can get better long term predictions.
 The amount of bias added to the model is known as Ridge
Regression penalty. We can compute this penalty term by
multiplying with the lambda to the squared weight of each individual
features.
 The equation for ridge regression will be:

 A general linear or polynomial regression will fail if there is high

collinearity between the independent variables, so to solve such
problems, Ridge regression can be used.
 Ridge regression is a regularization technique, which is used to
reduce the complexity of the model. It is also called as L2
regularization.
 It helps to solve the problems if we have more parameters than
samples.
Lasso Regression:
Lasso regression is another regularization
technique to reduce the complexity of the
model.
It is similar to the Ridge Regression except that
penalty term contains only the absolute weights
instead of a square of weights.
Since it takes absolute values, hence, it can
shrink the slope to 0, whereas Ridge
Regression can only shrink it near to 0.
It is also called as L1 regularization. The
equation for Lasso regression will be:

Unsupervised Machine Learning
No ratings yet
Unsupervised Machine Learning
130 pages
ML Unit 3
No ratings yet
ML Unit 3
106 pages
ML 5
No ratings yet
ML 5
76 pages
ML Unit 5..
No ratings yet
ML Unit 5..
40 pages
ML Unit 2 (Ab22)
No ratings yet
ML Unit 2 (Ab22)
61 pages
K-Nearest Neighbor (KNN) Algorithm For Machine Learning - Javatpoint
No ratings yet
K-Nearest Neighbor (KNN) Algorithm For Machine Learning - Javatpoint
18 pages
Machine Learning3
No ratings yet
Machine Learning3
51 pages
K-Nearest Neighbor (KNN) Algorithm For Machine Learning
No ratings yet
K-Nearest Neighbor (KNN) Algorithm For Machine Learning
17 pages
AIML Unit-IV & V
100% (1)
AIML Unit-IV & V
47 pages
Unit 5
No ratings yet
Unit 5
73 pages
K-NN Algorithm and Clustering Analysis
No ratings yet
K-NN Algorithm and Clustering Analysis
93 pages
ML Supervised Learning Unit 3
No ratings yet
ML Supervised Learning Unit 3
51 pages
ML CH 3
No ratings yet
ML CH 3
88 pages
Machine Learning
No ratings yet
Machine Learning
35 pages
Topics in Module-3-: ML & Cloud Computing For Iot
No ratings yet
Topics in Module-3-: ML & Cloud Computing For Iot
149 pages
K-Nearest Neighbor (KNN) 6
No ratings yet
K-Nearest Neighbor (KNN) 6
46 pages
Week 09 Lesson 1 Intro Machine Learning 1 To 32
No ratings yet
Week 09 Lesson 1 Intro Machine Learning 1 To 32
61 pages
Week 8
No ratings yet
Week 8
70 pages
Statistic Inference Unit 2 Notes
No ratings yet
Statistic Inference Unit 2 Notes
34 pages
Unit 4
No ratings yet
Unit 4
26 pages
Unit 4
No ratings yet
Unit 4
24 pages
ML-Unit 5
No ratings yet
ML-Unit 5
40 pages
Unit Ii
No ratings yet
Unit Ii
102 pages
CSL0777 L22
No ratings yet
CSL0777 L22
35 pages
ML Unit 2
No ratings yet
ML Unit 2
46 pages
CH 02 Revised Financial Management by IM Pandey
100% (6)
CH 02 Revised Financial Management by IM Pandey
38 pages
FPA Unit 2
No ratings yet
FPA Unit 2
20 pages
Chapter II - Lecture 2 - KNN
No ratings yet
Chapter II - Lecture 2 - KNN
21 pages
Machine Learning and Web Scraping Lecture 03
No ratings yet
Machine Learning and Web Scraping Lecture 03
22 pages
Unit 3 - Supervise Learning Classification
No ratings yet
Unit 3 - Supervise Learning Classification
23 pages
Machine Learning
No ratings yet
Machine Learning
15 pages
Algorithm
No ratings yet
Algorithm
27 pages
The KNN
No ratings yet
The KNN
31 pages
Unit 3 (Classification)
No ratings yet
Unit 3 (Classification)
12 pages
Unit-4 Unsupervised Algorithm
No ratings yet
Unit-4 Unsupervised Algorithm
18 pages
Supervised Machine Learning-Adi
No ratings yet
Supervised Machine Learning-Adi
51 pages
ML Unit-2
No ratings yet
ML Unit-2
24 pages
Unit 4
No ratings yet
Unit 4
23 pages
K-Nearest Neighbor (KNN)
No ratings yet
K-Nearest Neighbor (KNN)
12 pages
ML Unit 3
No ratings yet
ML Unit 3
12 pages
(KtabPDF Com) xrwA7TEBGp
No ratings yet
(KtabPDF Com) xrwA7TEBGp
32 pages
Classification
No ratings yet
Classification
58 pages
Machine Learning Unit-3.1
No ratings yet
Machine Learning Unit-3.1
20 pages
Day 4 Content
No ratings yet
Day 4 Content
35 pages
Module 3
No ratings yet
Module 3
63 pages
Machine Lar Arii
No ratings yet
Machine Lar Arii
9 pages
Chapter#10 (Part#01) SL (K-NN)
No ratings yet
Chapter#10 (Part#01) SL (K-NN)
27 pages
ML04 KNN-SVM 2024-2025
No ratings yet
ML04 KNN-SVM 2024-2025
57 pages
K-Nearest Neighbor Algorithm
100% (1)
K-Nearest Neighbor Algorithm
6 pages
UNIT 2 - Notes
No ratings yet
UNIT 2 - Notes
31 pages
Untitled 9
No ratings yet
Untitled 9
17 pages
Week 10
No ratings yet
Week 10
41 pages
Unit 4 Supervised Learning
100% (1)
Unit 4 Supervised Learning
75 pages
U3 KNN
No ratings yet
U3 KNN
6 pages
Machine Learning Algorithms
No ratings yet
Machine Learning Algorithms
25 pages
Evolutional Study On KNN and K-Means Algorithms (SP)
No ratings yet
Evolutional Study On KNN and K-Means Algorithms (SP)
9 pages
Classification
No ratings yet
Classification
7 pages
Unit 1
No ratings yet
Unit 1
15 pages
Asumbi Maths Paper 2 QS Teacher - Co - .Ke
No ratings yet
Asumbi Maths Paper 2 QS Teacher - Co - .Ke
15 pages
PID Instr Sec 01 Introduction To Process Control
100% (1)
PID Instr Sec 01 Introduction To Process Control
38 pages
Project Report Hate
100% (1)
Project Report Hate
24 pages
Lecture 22. Ideal Bose and Fermi Gas (Ch. 7) : Fermions: N Bosons: N
100% (2)
Lecture 22. Ideal Bose and Fermi Gas (Ch. 7) : Fermions: N Bosons: N
12 pages
Artificial Intelligence and Machine Learning
No ratings yet
Artificial Intelligence and Machine Learning
12 pages
Lecture 3
No ratings yet
Lecture 3
17 pages
Power System Expansion Planning
No ratings yet
Power System Expansion Planning
27 pages
CS353 Midterm Slides
No ratings yet
CS353 Midterm Slides
297 pages
Algorithmic Trading Unleashed AI Driven Strategies For Success
No ratings yet
Algorithmic Trading Unleashed AI Driven Strategies For Success
54 pages
Economic Order Quantity in Fuzzy Sense For Inventory
No ratings yet
Economic Order Quantity in Fuzzy Sense For Inventory
74 pages
12 Sorting
No ratings yet
12 Sorting
66 pages
Muskingum Routing - Example
No ratings yet
Muskingum Routing - Example
12 pages
IO621PE: MACHINE LEARNING (Professional Elective - II) B.Tech. III Year II Sem. L T P C 3 0 0 3 Course Objectives
No ratings yet
IO621PE: MACHINE LEARNING (Professional Elective - II) B.Tech. III Year II Sem. L T P C 3 0 0 3 Course Objectives
1 page
UNIT-1 ch01
No ratings yet
UNIT-1 ch01
28 pages
Problems 1
No ratings yet
Problems 1
10 pages
ANSYS Workbench: Prof. N. S. Surner SRES Sanjivani College of Engineering, Kopargaon
No ratings yet
ANSYS Workbench: Prof. N. S. Surner SRES Sanjivani College of Engineering, Kopargaon
24 pages
Project Report
100% (1)
Project Report
35 pages
File Handling in PHP
No ratings yet
File Handling in PHP
16 pages
Unit-3 Iot
No ratings yet
Unit-3 Iot
14 pages
SPM
No ratings yet
SPM
18 pages
Lecture 08.1
No ratings yet
Lecture 08.1
12 pages
ISOM1500 Final Notes
No ratings yet
ISOM1500 Final Notes
6 pages
Ch.2 Coordinate Geometry & Ch.4 Parametric Differentiation
No ratings yet
Ch.2 Coordinate Geometry & Ch.4 Parametric Differentiation
38 pages
Jurnal Ekonomi Bisnis Dan Akuntansi: Mellasanti Ayuwardani Administrasi Binis,, Politeknik Negeri Semarang
No ratings yet
Jurnal Ekonomi Bisnis Dan Akuntansi: Mellasanti Ayuwardani Administrasi Binis,, Politeknik Negeri Semarang
12 pages
Forecasting Techniques
No ratings yet
Forecasting Techniques
9 pages
1-Poll Physics
No ratings yet
1-Poll Physics
2 pages
Hotel Management
No ratings yet
Hotel Management
3 pages
ODE Midterm Exam - 2024.01
No ratings yet
ODE Midterm Exam - 2024.01
4 pages
MCQ's Chapter 2
No ratings yet
MCQ's Chapter 2
4 pages
Batch 3
No ratings yet
Batch 3
2 pages
Information Theory and Coding: Faculty of Engineering
No ratings yet
Information Theory and Coding: Faculty of Engineering
10 pages
An Unsupervised Deep Domain Adaptation Approach For Robust Speech Recognition PDF
No ratings yet
An Unsupervised Deep Domain Adaptation Approach For Robust Speech Recognition PDF
12 pages
Assignment 5
No ratings yet
Assignment 5
4 pages
Pps Mid-1
No ratings yet
Pps Mid-1
1 page
Ty-Timetable Latest
No ratings yet
Ty-Timetable Latest
2 pages
PRML Assignment1 2022
No ratings yet
PRML Assignment1 2022
2 pages
Pergamon: Gas-Solid Fluidization: A Typical Dissipative Structure
No ratings yet
Pergamon: Gas-Solid Fluidization: A Typical Dissipative Structure
3 pages
Top Numerical Methods With Matlab For Beginners!
From Everand
Top Numerical Methods With Matlab For Beginners!
Andrei Besedin
No ratings yet

Unit - 2 ML

Uploaded by

Unit - 2 ML

Uploaded by

Unit-2

 Supervised learning is the types of machine learning in which machines are

x = [4, 5, 10, 4, 3, 11, 14 , 8, 10, 12]

data = list(zip(x, y))

plt.scatter(x + [new_x], y + [new_y], c=classes + [prediction[0]])

Regression analysis is a statistical method to

 A general linear or polynomial regression will fail if there is high

You might also like