0% found this document useful (0 votes)
3 views4 pages

KNN algorithm

The K-Nearest Neighbors (K-NN) algorithm is a supervised machine learning method used for classification and regression, which classifies new data points based on their similarity to existing data. It operates by storing all available data and determining the category of a new data point by analyzing the closest neighbors using Euclidean distance. Additionally, logistic regression is introduced as a classification algorithm that predicts probabilities for discrete classes using a logistic function.

Uploaded by

akash jha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
3 views4 pages

KNN algorithm

The K-Nearest Neighbors (K-NN) algorithm is a supervised machine learning method used for classification and regression, which classifies new data points based on their similarity to existing data. It operates by storing all available data and determining the category of a new data point by analyzing the closest neighbors using Euclidean distance. Additionally, logistic regression is introduced as a classification algorithm that predicts probabilities for discrete classes using a logistic function.

Uploaded by

akash jha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 4
K-Nearest Neighbors (K-NN) algorithm The K-Nearest Neighbors (K-NN) algorithm is a clear and simple supervised machine learning algorithm that can be used to solve regression and classification problems. The K-NN algorithm assumes that the new case and existing cases are similar and places the new case in the category that is most similar to the existing categories. The K-NN algorithm stores all available data and classifies a new data point based on its similarity to the existing data. This means that when new data appears, the KNN algorithm can quickly classify it into a suitable category. K-NN is a non-parametric algorithm, which means it makes no assumptions about the data it uses. It's also known as a lazy learner algorithm because it doesn't learn from the training set right away; instead, it stores the dataset and performs an action on it when it comes time to classify it. Pattern recognition, data mining, and intrusion detection are some of the demanding applications. Need of the K-NN Algorithm Assume there are two categories, Category Aand Category B, and we have anew data point x1. Which of these categories will this data point fall into? A K-NN algorithm 44 Machine Learning is required to solve this type of problem. We can easily identify the category or class of a dataset with the help of K-NN as shown in Figure 2.12: Cvelr Category 8 New data point 5 Figure 2.12: Category or class of a dataset with the help of K-NN New data point © assigned to. The following algorithm can be used to explain how KNNs work: Step-I: Select the no. K of the neighbors. Step-II: Determine the Euclidean distance between K neighbors. Step-III: Take the K closest neighbors based on the Euclidean distance calculated. Step-IV: Count the number of data points in each category among these K neighbors. Step-V: Assign the new data points to the category with the greatest number of neighbors. Step-VI: Our model is complete. Example: Let's say we have anew data point thatneeds to be laced in the ‘ category. Consider the following illustration: P Eee Category A Figure 2.13: K-NN example Supervised Learning Algorithms ™@ 45 First, we'll decide on the number of neighbors, so we'll go with k=5. The Euclidean distance between the data points will then be calculated as shown in Figure 2.14. The Euclidean distance is the distance between two points that we learned about in geometry class. It can be calculated using the following formula: Koay) x Xe oe ? Euclidean Distance between Arand B= /AXz-Xi)?+ Figure 2.14: The Euclidean distance between the data points We found the closest neighbors by calculating the Euclidean distance, which yielded three closest neighbors in category A and two closest neighbors in category B as shown in Figure 2.15. Consider the following illustration: 26 °., f, ° @anechs ©® . New Data o¢ ®, point Figure 2.15: Closest neighbors for the Category A and B 46 @ Machine Learning ‘As can be seen, the three closest neighbors are all from category A, so this new data point must also be from that category. Logistic Regression The classification algorithm logistic regression is used to assign observations to a discrete set of classes. Unlike linear regression, which produces a continuous number of values, logistic regression produces a probability value that can be mapped to two or more discrete classes using the logistic sigmoid function. A regression model with a categorical target variable is known as logistic regression. To model binary dependent variables, it employs a logistic function. The target variable in logistic regression has two possible values, such as yes/no. Consider how the target variable y would be represented in the value of "yes" is 1 and “no” is 0. The log-odds of y being 1 is a linear combination of one or more predictor variables, according to the logistic model. So, let's say we have two predictors or independent variables, x, and x, and p is the probability of y equaling 1. Then, using the logistic model as a guide: We can recover the odds by exponentiating the equation: neeth ve) ao a 1-p = deed se) i ‘a p= re) aa me sab +e lt+e xt ia Asa result, the probability of y is 1. If isch ye 1, y equals 1. Asa result, P's closer to 0, y equals 0, and if p is closer to the logistic regression equation is: 1 em ae) : ied 1 his equation can be generalized to n number lent id i a beg is T of parameters and independ 1 y= ———— “By* By x,+ 0. +B, x)

You might also like