0% found this document useful (0 votes)
13 views6 pages

ML 2

K-Nearest Neighbors (KNN) is a supervised machine learning algorithm used for classification and regression by predicting based on the 'k' closest data points. It operates as a lazy learner, storing the dataset and making predictions based on proximity and majority voting. The choice of 'k' is crucial for performance, and various distance metrics like Euclidean and Manhattan are employed to determine nearest neighbors.

Uploaded by

Bhavya V 8562
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views6 pages

ML 2

K-Nearest Neighbors (KNN) is a supervised machine learning algorithm used for classification and regression by predicting based on the 'k' closest data points. It operates as a lazy learner, storing the dataset and making predictions based on proximity and majority voting. The choice of 'k' is crucial for performance, and various distance metrics like Euclidean and Manhattan are employed to determine nearest neighbors.

Uploaded by

Bhavya V 8562
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Kkk-Nearest Neighbors (KNN) is a supervised machine learning algorithm generally used for

classification but can also be used for regression tasks. It works by finding the "k" closest data points
(neighbors) to a given input and makesa predictions based on the majority class (for classification) or
the average value (for regression). Since KNN makes no assumptions about the underlying data
distribution it makes it a non-parametric and instance-based learning method.

2/3
K-Nearest Neighbors is also called as a lazy learner algorithm because it does not learn from the
training set immediately instead it stores the dataset and at the time of classification it performs an
action on the dataset.

For example, consider the following table of data points containing two features:

KNN Algorithm working visualization


The new point is classified as Category 2 because most of its closest neighbors are blue squares. KNN
assigns the category based on the majority of nearby points. The image shows how KNN predicts the
category of a new data point based on its closest neighbours.

The red diamonds represent Category 1 and the blue squares represent Category 2.

The new data point checks its closest neighbors (circled points).

Since the majority of its closest neighbors are blue squares (Category 2) KNN predicts the new data
point belongs to Category 2.

KNN works by using proximity and majority voting to make predictions.

What is 'K' in K Nearest Neighbour?


In the k-Nearest Neighbours algorithm k is just a number that tells the algorithm how many nearby
points or neighbors to look at when it makes a decision.
Example: Imagine you're deciding which fruit it is based on its shape and size. You compare it to
fruits you already know.

If k = 3, the algorithm looks at the 3 closest fruits to the new one.

If 2 of those 3 fruits are apples and 1 is a banana, the algorithm says the new fruit is an apple
because most of its neighbors are apples.

How to choose the value of k for KNN Algorithm?


The value of k in KNN decides how many neighbors the algorithm looks at when making a prediction.

Choosing the right k is important for good results.

If the data has lots of noise or outliers, using a larger k can make the predictions more stable.

But if k is too large the model may become too simple and miss important patterns and this is called
underfitting.

So k should be picked carefully based on the data.

Statistical Methods for Selecting k


Cross-Validation: Cross-Validation is a good way to find the best value of k is by using k-fold cross-
validation. This means dividing the dataset into k parts. The model is trained on some of these parts
and tested on the remaining ones. This process is repeated for each part. The k value that gives the
highest average accuracy during these tests is usually the best one to use.

Elbow Method: In Elbow Method we draw a graph showing the error rate or accuracy for different
k values. As k increases the error usually drops at first. But after a certain point error stops
decreasing quickly. The point where the curve changes direction and looks like an "elbow" is usually
the best choice for k.

Odd Values for k: It’s a good idea to use an odd number for k especially in classification problems.
This helps avoid ties when deciding which class is the most common among the neighbors.

Distance Metrics Used in KNN Algorithm


KNN uses distance metrics to identify nearest neighbor, these neighbors are used for classification
and regression task. To identify nearest neighbor we use below distance metrics:

1. Euclidean Distance
Euclidean distance is defined as the straight-line distance between two points in a plane or space.
You can think of it like the shortest path you would walk if you were to go directly from one point to
another.

distance(x,Xi)=∑j=1d(xj−Xij)2] distance(x,Xi)=∑j=1d(xj−Xij)2]

2. Manhattan Distance
This is the total distance you would travel if you could only move along horizontal and vertical lines
like a grid or city streets. It’s also called "taxicab distance" because a taxi can only drive along the
grid-like streets of a city.
d(x,y)=∑i=1n∣xi−yi∣d(x,y)=∑i=1n∣xi−yi∣

3. Minkowski Distance
Minkowski distance is like a family of distances, which includes both Euclidean and Manhattan
distances as special cases.

d(x,y)=(∑i=1n(xi−yi)p)1pd(x,y)=(∑i=1n(xi−yi)p)p1
From the formula above, when p=2, it becomes the same as the Euclidean distance formula and
when p=1, it turns into the Manhattan distance formula. Minkowski distance is essentially a flexible
formula that can represent either Euclidean or Manhattan distance depending on the value of p.

Working of KNN algorithm


Thе K-Nearest Neighbors (KNN) algorithm operates on the principle of similarity where it predicts
the label or value of a new data point by considering the labels or values of its K nearest neighbors in
the training dataset.

Step 1: Selecting the optimal value of K


K represents the number of nearest neighbors that needs to be considered while making prediction.

Step 2: Calculating distance


To measure the similarity between target and training data points Euclidean distance is used.
Distance is calculated between data points in the dataset and target point.

Step 3: Finding Nearest Neighbors


The k data points with the smallest distances to the target point are nearest neighbors.

Step 4: Voting for Classification or Taking Average for Regression


When you want to classify a data point into a category like spam or not spam, the KNN algorithm
looks at the K closest points in the dataset. These closest points are called neighbors. The algorithm
then looks at which category the neighbors belong to and picks the one that appears the most. This
is called majority voting.

In regression, the algorithm still looks for the K closest points. But instead of voting for a class in
classification, it takes the average of the values of those K neighbors. This average is the predicted
value for the new point for the algorithm.

It shows how a test point is classified based on its nearest neighbors. As the test point moves the
algorithm identifies the closest 'k' data points i.e. 5 in this case and assigns test point the majority
class label that is grey label class here.

Perceptron is a type of neural network that performs binary classification that maps input features
to an output decision, usually classifying data into one of two categories, such as 0 or 1.

Perceptron consists of a single layer of input nodes that are fully connected to a layer of output
nodes. It is particularly good at learning linearly separable patterns. It utilizes a variation of artificial
neurons called Threshold Logic Units (TLU), which were first introduced by McCulloch and Walter
Pitts in the 1940s. This foundational model has played a crucial role in the development of more
advanced neural networks and machine learning algorithms.

Types of Perceptron
Single-Layer Perceptron is a type of perceptron is limited to learning linearly separable patterns. It is
effective for tasks where the data can be divided into distinct categories through a straight line.
While powerful in its simplicity, it struggles with more complex problems where the relationship
between inputs and outputs is non-linear.

Multi-Layer Perceptron possess enhanced processing capabilities as they consist of two or more
layers, adept at handling more complex patterns and relationships within the data.

Basic Components of Perceptron


A Perceptron is composed of key components that work together to process information and make
predictions.

Input Features: The perceptron takes multiple input features, each representing a characteristic of
the input data.

Weights: Each input feature is assigned a weight that determines its influence on the output. These
weights are adjusted during training to find the optimal values.

Summation Function: The perceptron calculates the weighted sum of its inputs, combining them
with their respective weights.

Activation Function: The weighted sum is passed through the Heaviside step function, comparing it
to a threshold to produce a binary output (0 or 1).

Output: The final output is determined by the activation function, often used for binary
classification tasks.
Bias: The bias term helps the perceptron make adjustments independent of the input, improving its
flexibility in learning.

Learning Algorithm: The perceptron adjusts its weights and bias using a learning algorithm, such as
the Perceptron Learning Rule, to minimize prediction errors.

These components enable the perceptron to learn from data and make predictions. While a single
perceptron can handle simple binary classification, complex tasks require multiple perceptrons
organized into layers, forming a neural network.

How does Perceptron work?


A weight is assigned to each input node of a perceptron, indicating the importance of that input in
determining the output. The Perceptron’s output is calculated as a weighted sum of the inputs,
which is then passed through an activation function to decide whether the Perceptron will fire.

The weighted sum is computed as:

z=w1x1+w2x2+…+wnxn=XTWz=w1x1+w2x2+…+wnxn=XTW
The step function compares this weighted sum to a threshold. If the input is larger than the
threshold value, the output is 1; otherwise, it's 0. This is the most common activation function used
in Perceptrons are represented by the Heaviside step function:

h(z)={0if z<Threshold1if z≥Thresholdh(z)={01if z<Thresholdif z≥Threshold


A perceptron consists of a single layer of Threshold Logic Units (TLU), with each TLU fully connected
to all input nodes.

In a fully connected layer, also known as a dense layer, all neurons in one layer are connected to
every neuron in the previous layer.

The output of the fully connected layer is computed as:

fW,b(X)=h(XW+b)fW,b(X)=h(XW+b)
where XX is the input WW is the weight for each inputs neurons and bb is the bias and hh is the
step function.

During training, the Perceptron's weights are adjusted to minimize the difference between the
predicted output and the actual output. This is achieved using supervised learning algorithms like
the delta rule or the Perceptron learning rule.

The weight update formula is:

wi,j=wi,j+η(yj−y^j)xiwi,j=wi,j+η(yj−y^j)xi
Where:

wi,jwi,j is the weight between the ithith input and jthjth output neuron,
xixi is the ithith input value,
yjyj is the actual value, and y^jy^j is the predicted value,
ηη is the learning rate, controlling how much the weights are adjusted.
This process enables the perceptron to learn from data and improve its prediction accuracy over
time.

A Multi-Layer Network Algorithm in machine learning typically refers to a Multi-Layer Perceptron


(MLP), which is a type of feedforward artificial neural network. It consists of multiple layers of
neurons and is used for tasks like classification, regression, and pattern recognition.

Here’s a simple breakdown:

1. Structure of Multi-Layer Network (MLP):

 Input Layer: Takes the input features.


 Hidden Layers: One or more layers where computation happens (non-linear
transformations).
 Output Layer: Produces the final prediction.

Each neuron in a layer is connected to every neuron in the next layer (fully connected).

2. Algorithm Working (Training a Multi-Layer Network):

The most common algorithm used is Backpropagation with Gradient Descent.

Steps:

1. Forward Pass:
o Inputs are passed through the network.
o Each neuron applies a weighted sum followed by an activation function (like ReLU,
sigmoid, tanh).
2. Loss Calculation:
o The output is compared with the actual target using a loss function (e.g., Mean
Squared Error, Cross-Entropy).
3. Backward Pass (Backpropagation):
o Calculates the gradient of the loss function with respect to each weight using the
chain rule.
o Updates the weights using Gradient Descent or its variants (SGD, Adam, etc.).
4. Repeat:
o This process is repeated for several epochs until the loss minimizes.

3. Activation Functions:

 Sigmoid: Good for binary classification.


 ReLU (Rectified Linear Unit): Popular in hidden layers.
 Softmax: For multi-class classification in the output layer.

You might also like