0% found this document useful (0 votes)

25 views5 pages

Algorithms

The document discusses several machine learning algorithms including K-Nearest Neighbors, linear regression, decision trees, and random forests. KNN works by finding the closest training examples in feature space. Linear regression finds a linear relationship between variables. Decision trees use features to split the data into subsets and make classifications. Random forests create many decision trees on subsets of data and take a majority vote.

Uploaded by

mattmck0813

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views5 pages

Algorithms

Uploaded by

mattmck0813

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

K-Nearest Neighbour (KNN) Algorithm for Machine

Learning
o K-Nearest Neighbour is one of the simplest Machine Learning algorithms based on Supervised
Learning technique.
o K-NN algorithm assumes the similarity between the new case/data and available cases and put the
new case into the category that seems the most similar to the available categories.
o K-NN algorithm stores all the available data and classifies a new data point based on the similarity.
This means when new data appears then it can be easily classified into a well suite category by using
K- NN algorithm.
o K-NN algorithm can be used for Regression as well as for Classification but mostly it is used for the
Classification problems.
o K-NN is a non-parametric algorithm, which means it does not make any assumption on underlying
data.
o It is also called a lazy learner algorithm because it does not learn from the training set immediately
instead it stores the dataset and at the time of classification, it performs an action on the dataset.
o KNN algorithm at the training phase just stores the dataset and when it gets new data, then it
classifies that data into a category that seems much similar to the new data.

How does K-NN work?

The K-NN working can be explained on the basis of the below algorithm:

o Step-1: Select the number K of the neighbours

o Step-2: Calculate the Euclidean distance of K number of neighbours
o Step-3: Take the K nearest neighbours as per the calculated Euclidean distance.
o Step-4: Among these k neighbours, count the number of the data points in each category.
o Step-5: Assign the new data points to that category for which the number of the neighbour is maximum.
o Step-6: Our model is ready.

1
Linear Regression in Machine Learning
Linear regression is one of the easiest and most popular Machine Learning algorithms. It is a statistical method
that is used for predictive analysis. Linear regression makes predictions for continuous/real or numeric
variables such as sales, salary, age, product price, etc.

Linear regression algorithm shows a linear relationship between a dependent (y) and one or more independent
(y) variables, hence called as linear regression. Since linear regression shows the linear relationship, which
means it finds how the value of the dependent variable is changing according to the value of the independent
variable.

The linear regression model provides a sloped straight line representing the relationship between the variables.
Consider the below image:

Mathematically, we can represent a linear regression as:

y= a0+a1x+ ε

Here,

Y=Dependent Variable (Target Variable)

X= Independent Variable (predictor Variable)
a0= intercept of the line (Gives an additional degree of freedom)
a1 = Linear regression coefficient (scale factor to each input
value).
ε = random error

The values for x and y variables are training datasets for Linear
Regression model representation.

Types of Linear Regression

Linear regression can be further divided into two types of the algorithm:

o Simple Linear Regression:

If a single independent variable is used to predict the value of a numerical dependent variable, then
such a Linear Regression algorithm is called Simple Linear Regression.
o Multiple Linear Regression:
If more than one independent variable is used to predict the value of a numerical dependent variable,
then such a Linear Regression algorithm is called Multiple Linear Regression.

Assumptions of Linear Regression

1.Linear relationship between the features and target

2.Small or no multicollinearity between the features

3.Normal distribution of error terms

4.No autocorrelations

2
Model Performance:

The Goodness of fit determines how the line of regression fits the set of observations. The process of finding
the best model out of various models is called optimization. It can be achieved by below method:

1. R-squared method:

o R-squared is a statistical method that determines the goodness of fit.

o It measures the strength of the relationship between the dependent and independent variables on a scale
of 0-100%.
o The high value of R-square determines the less difference between the predicted values and actual
values and hence represents a good model.
o It is also called a coefficient of determination, or coefficient of multiple determination for multiple
regression.
o It can be calculated from the below formula:

Decision Tree Classification Algorithm

o Decision Tree is a Supervised learning technique that can be used for both classification and
Regression problems, but mostly it is preferred for solving Classification problems. It is a tree-
structured classifier, where internal nodes represent the features of a dataset, branches represent
the decision rules and each leaf node represents the outcome.
o In a Decision tree, there are two nodes, which are the Decision Node and Leaf Node. Decision nodes
are used to make any decision and have multiple branches, whereas Leaf nodes are the output of those
decisions and do not contain any further branches.
o The decisions or the test are performed on the basis of features of the given dataset.
o It is a graphical representation for getting all the possible solutions to a problem/decision based on
given conditions.
o It is called a decision tree because, similar to a tree, it starts with the root node, which expands on
further branches and constructs a tree-like structure.
o In order to build a tree, we use the CART algorithm, which stands
for Classification and Regression Tree algorithm.
o A decision tree simply asks a question, and based on the
answer (Yes/No), it further split the tree into subtrees.

3
o Step-1: Begin the tree with the root node, says S, which contains the complete dataset.
o Step-2: Find the best attribute in the dataset using Attribute Selection Measure (ASM).
o Step-3: Divide the S into subsets that contains possible values for the best attributes.
o Step-4: Generate the decision tree node, which contains the best attribute.
o Step-5: Recursively make new decision trees using the subsets of the dataset created in step -3.
Continue this process until a stage is reached where you cannot further classify the nodes and called
the final node as a leaf node.

Random Forest Algorithm

Random Forest is a popular machine learning algorithm that belongs to the supervised learning technique. It
can be used for both Classification and Regression problems in ML. It is based on the concept of ensemble
learning, which is a process of combining multiple classifiers to solve a complex problem and to improve the
performance of the model.

As the name suggests, "Random Forest is a classifier that contains a number of decision trees on various
subsets of the given dataset and takes the average to improve the predictive accuracy of that
dataset." Instead of relying on one decision tree, the random forest takes the prediction from each tree and
based on the majority votes of predictions, and it predicts the final output.

The greater number of trees in the forest leads to higher accuracy and prevents the problem of
overfitting.

Assumptions
o There should be some actual values in the feature
variable of the dataset so that the classifier can
predict accurate results rather than a guessed
result.
o The predictions from each tree must have very low
correlations.

The Working process can be explained in the below steps and diagram:

o Step-1: Select random K data points from the training set.

o Step-2: Build the decision trees associated with the selected data points (Subsets).
o Step-3: Choose the number N for decision trees that you want to build.
o Step-4: Repeat Step 1 & 2.
o Step-5: For new data points, find the predictions of each decision tree, and assign the new
data points to the category that wins the majority votes.

4
5

Breast Cancer Classification
100% (2)
Breast Cancer Classification
16 pages
Chapter 5 Sta404
No ratings yet
Chapter 5 Sta404
10 pages
Experiment 12 Postlab
No ratings yet
Experiment 12 Postlab
17 pages
Business Statistics MCQ
No ratings yet
Business Statistics MCQ
40 pages
Lecture - 2 & 3
No ratings yet
Lecture - 2 & 3
62 pages
ML Notes
No ratings yet
ML Notes
50 pages
Supervised ML
No ratings yet
Supervised ML
69 pages
Unit 3,4,5 ML (CS - AI)
No ratings yet
Unit 3,4,5 ML (CS - AI)
37 pages
The Relationship Between Hotel Staff Service Delivery With Custom
No ratings yet
The Relationship Between Hotel Staff Service Delivery With Custom
116 pages
Module 3
No ratings yet
Module 3
25 pages
Unit 3
No ratings yet
Unit 3
19 pages
Chapter 4
No ratings yet
Chapter 4
14 pages
Chapter One - Five
No ratings yet
Chapter One - Five
24 pages
Cse Vsem 503 B PR Unit 2 Notes
No ratings yet
Cse Vsem 503 B PR Unit 2 Notes
17 pages
The Effect of NPL, Car, LDR, Oer and Nim To Banking Return On Asset
No ratings yet
The Effect of NPL, Car, LDR, Oer and Nim To Banking Return On Asset
16 pages
Dangote Cement PLC Capital Structure and Financial Performance Link in Nigeria: Empirical Analysis
No ratings yet
Dangote Cement PLC Capital Structure and Financial Performance Link in Nigeria: Empirical Analysis
10 pages
Machine Learning Project
No ratings yet
Machine Learning Project
12 pages
Learning Types ML
No ratings yet
Learning Types ML
18 pages
UNIT II Machine Learning
No ratings yet
UNIT II Machine Learning
118 pages
Gasoline Demand Revisited: An International Meta-Analysis of Elasticities
No ratings yet
Gasoline Demand Revisited: An International Meta-Analysis of Elasticities
23 pages
Accountability and Fraud Type Effects On Fraud Detection Responsibility
No ratings yet
Accountability and Fraud Type Effects On Fraud Detection Responsibility
13 pages
Aiml K2
No ratings yet
Aiml K2
8 pages
Class 3 - Classification
No ratings yet
Class 3 - Classification
80 pages
Self Consept-Z Social Sup Perciev Dan EE (WE)
No ratings yet
Self Consept-Z Social Sup Perciev Dan EE (WE)
8 pages
Unit 5
No ratings yet
Unit 5
73 pages
Gebrekiros's Journal
No ratings yet
Gebrekiros's Journal
9 pages
Insects: Seasonal Flight, Optimal Timing and Efficacy of Selected Insecticides For Cabbage Maggot (Anthomyiidae) Control
No ratings yet
Insects: Seasonal Flight, Optimal Timing and Efficacy of Selected Insecticides For Cabbage Maggot (Anthomyiidae) Control
27 pages
The Impact of Business Intelligence On Sustainable Business Entrepreneurship in The Jordanian Pharmaceutical Industrial Companies
No ratings yet
The Impact of Business Intelligence On Sustainable Business Entrepreneurship in The Jordanian Pharmaceutical Industrial Companies
12 pages
Pengaruh Tingkat Pendidikan, Pengalaman Kerja, Dan Faktor Sosial Terhadap Pemanfaatan Pada Bappeda Kota Solok
No ratings yet
Pengaruh Tingkat Pendidikan, Pengalaman Kerja, Dan Faktor Sosial Terhadap Pemanfaatan Pada Bappeda Kota Solok
11 pages
ECON 330 Problem Sets
No ratings yet
ECON 330 Problem Sets
3 pages
Jntuk R20 ML Unit-Ii
No ratings yet
Jntuk R20 ML Unit-Ii
37 pages
UNIT II Machine Learning
No ratings yet
UNIT II Machine Learning
118 pages
2.unit 2 ML Q&A
No ratings yet
2.unit 2 ML Q&A
36 pages
DataMining Unit-3
No ratings yet
DataMining Unit-3
8 pages
Unit 2
No ratings yet
Unit 2
11 pages
Diao Et Al.. 2019. Optimized Extraction Process and Compositional Analysis of Bioflocculant Produced by Klebsiella M1
No ratings yet
Diao Et Al.. 2019. Optimized Extraction Process and Compositional Analysis of Bioflocculant Produced by Klebsiella M1
22 pages
Unit Ii
No ratings yet
Unit Ii
102 pages
Unit-7 ML
No ratings yet
Unit-7 ML
11 pages
Raghav Soni (20IOT6014) Algo - Assignment
No ratings yet
Raghav Soni (20IOT6014) Algo - Assignment
14 pages
ICT202B AI ML and Emerging Technologies UNIT 3 (Classification and Regression) 2
No ratings yet
ICT202B AI ML and Emerging Technologies UNIT 3 (Classification and Regression) 2
23 pages
Final SBE-FINALS
No ratings yet
Final SBE-FINALS
11 pages
What Is The Difference Between Coefficient of Determination, and Coefficient of Correlation - Gaurav Bansal
No ratings yet
What Is The Difference Between Coefficient of Determination, and Coefficient of Correlation - Gaurav Bansal
2 pages
PerceptiLabs-ML Handbook
No ratings yet
PerceptiLabs-ML Handbook
31 pages
Machine Learning
No ratings yet
Machine Learning
62 pages
File 1
No ratings yet
File 1
15 pages
ML Unit 2
No ratings yet
ML Unit 2
46 pages
Classification Algorithms 3rd
No ratings yet
Classification Algorithms 3rd
15 pages
N Paper 0-92-11may-Ccbs Docs 9916 1
No ratings yet
N Paper 0-92-11may-Ccbs Docs 9916 1
12 pages
A Detailed Analysis of The Supervised Machine Learning Algorithms
No ratings yet
A Detailed Analysis of The Supervised Machine Learning Algorithms
5 pages
Algorithms 1
No ratings yet
Algorithms 1
23 pages
Classification and Clustering Algorithm Notes
No ratings yet
Classification and Clustering Algorithm Notes
19 pages
Refer For KNNDecison Tree SVM
No ratings yet
Refer For KNNDecison Tree SVM
90 pages
Unit - II
No ratings yet
Unit - II
37 pages
MC Learning
No ratings yet
MC Learning
4 pages
ML & DL Notes
No ratings yet
ML & DL Notes
30 pages
Unit 2 Supervised Learning and Applications
No ratings yet
Unit 2 Supervised Learning and Applications
13 pages
Module 5 - Supervised Learning Algorithms
No ratings yet
Module 5 - Supervised Learning Algorithms
38 pages
Faktor - Faktor Yang Mempengaruhi Pertumbuhan Kendaraan Bermotor Roda Dua Di Kota Pekanbaru
No ratings yet
Faktor - Faktor Yang Mempengaruhi Pertumbuhan Kendaraan Bermotor Roda Dua Di Kota Pekanbaru
15 pages
Machine Learning
No ratings yet
Machine Learning
33 pages
Machine Learning Notes ?
No ratings yet
Machine Learning Notes ?
14 pages
Â MS Preweek (B48)
No ratings yet
Â MS Preweek (B48)
10 pages
JETIR2005364
No ratings yet
JETIR2005364
9 pages
Machine Learning
100% (5)
Machine Learning
56 pages
41 Machine Learning Algorithms I
No ratings yet
41 Machine Learning Algorithms I
8 pages
Module 3
No ratings yet
Module 3
63 pages
Ai Unit 4
No ratings yet
Ai Unit 4
17 pages
DS Unit - 4
No ratings yet
DS Unit - 4
76 pages
SemVII MachineLearning
No ratings yet
SemVII MachineLearning
22 pages
Unit - 2 ML Notes
No ratings yet
Unit - 2 ML Notes
14 pages
Data Science Unit 3
No ratings yet
Data Science Unit 3
33 pages
STB1003 - Unit-3 BSC
No ratings yet
STB1003 - Unit-3 BSC
12 pages
Applied Regression Analysis For Business - Tools, Traps and Applications (PDFDrive)
No ratings yet
Applied Regression Analysis For Business - Tools, Traps and Applications (PDFDrive)
294 pages
Types of Regression
No ratings yet
Types of Regression
8 pages
Bank Fraud Mitigation Strategies and Performance o
No ratings yet
Bank Fraud Mitigation Strategies and Performance o
13 pages
Tutorial 7 Machine Learning Algorithms
No ratings yet
Tutorial 7 Machine Learning Algorithms
30 pages
Unit 1
No ratings yet
Unit 1
15 pages
Unit V - Big Data Programming
No ratings yet
Unit V - Big Data Programming
22 pages
ML Assignment 2 PDF
No ratings yet
ML Assignment 2 PDF
9 pages
ML Points
No ratings yet
ML Points
13 pages
Regression Analysis Linear and Multiple Regression
No ratings yet
Regression Analysis Linear and Multiple Regression
6 pages
Machine Learning and Data Analytics Using Python Lab
No ratings yet
Machine Learning and Data Analytics Using Python Lab
36 pages
Effect of Heat Input On Dilution and Heat Affected Zone in Submerged Arc Welding Process PDF
No ratings yet
Effect of Heat Input On Dilution and Heat Affected Zone in Submerged Arc Welding Process PDF
23 pages
Machine Learning
No ratings yet
Machine Learning
22 pages
Machine Learning QNA
No ratings yet
Machine Learning QNA
1 page
Machine Learning Algorithms For Breast Cancer Prediction
No ratings yet
Machine Learning Algorithms For Breast Cancer Prediction
8 pages
Determinants of Successful Project Implementation in Nigeria Volume1
No ratings yet
Determinants of Successful Project Implementation in Nigeria Volume1
17 pages
ML Unit-2
No ratings yet
ML Unit-2
26 pages
Usp 857 Ultraviolet Visible Spectros
No ratings yet
Usp 857 Ultraviolet Visible Spectros
18 pages
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet

Algorithms

Uploaded by

Algorithms

Uploaded by

K-Nearest Neighbour (KNN) Algorithm for Machine

How does K-NN work?

o Step-1: Select the number K of the neighbours

Mathematically, we can represent a linear regression as:

Y=Dependent Variable (Target Variable)

Types of Linear Regression

o Simple Linear Regression:

Assumptions of Linear Regression

2.Small or no multicollinearity between the features

3.Normal distribution of error terms

o R-squared is a statistical method that determines the goodness of fit.

Decision Tree Classification Algorithm

Random Forest Algorithm

o Step-1: Select random K data points from the training set.

You might also like