ML 2
ML 2
classification but can also be used for regression tasks. It works by finding the "k" closest data points
(neighbors) to a given input and makesa predictions based on the majority class (for classification) or
the average value (for regression). Since KNN makes no assumptions about the underlying data
distribution it makes it a non-parametric and instance-based learning method.
2/3
K-Nearest Neighbors is also called as a lazy learner algorithm because it does not learn from the
training set immediately instead it stores the dataset and at the time of classification it performs an
action on the dataset.
For example, consider the following table of data points containing two features:
The red diamonds represent Category 1 and the blue squares represent Category 2.
The new data point checks its closest neighbors (circled points).
Since the majority of its closest neighbors are blue squares (Category 2) KNN predicts the new data
point belongs to Category 2.
If 2 of those 3 fruits are apples and 1 is a banana, the algorithm says the new fruit is an apple
because most of its neighbors are apples.
If the data has lots of noise or outliers, using a larger k can make the predictions more stable.
But if k is too large the model may become too simple and miss important patterns and this is called
underfitting.
Elbow Method: In Elbow Method we draw a graph showing the error rate or accuracy for different
k values. As k increases the error usually drops at first. But after a certain point error stops
decreasing quickly. The point where the curve changes direction and looks like an "elbow" is usually
the best choice for k.
Odd Values for k: It’s a good idea to use an odd number for k especially in classification problems.
This helps avoid ties when deciding which class is the most common among the neighbors.
1. Euclidean Distance
Euclidean distance is defined as the straight-line distance between two points in a plane or space.
You can think of it like the shortest path you would walk if you were to go directly from one point to
another.
distance(x,Xi)=∑j=1d(xj−Xij)2] distance(x,Xi)=∑j=1d(xj−Xij)2]
2. Manhattan Distance
This is the total distance you would travel if you could only move along horizontal and vertical lines
like a grid or city streets. It’s also called "taxicab distance" because a taxi can only drive along the
grid-like streets of a city.
d(x,y)=∑i=1n∣xi−yi∣d(x,y)=∑i=1n∣xi−yi∣
3. Minkowski Distance
Minkowski distance is like a family of distances, which includes both Euclidean and Manhattan
distances as special cases.
d(x,y)=(∑i=1n(xi−yi)p)1pd(x,y)=(∑i=1n(xi−yi)p)p1
From the formula above, when p=2, it becomes the same as the Euclidean distance formula and
when p=1, it turns into the Manhattan distance formula. Minkowski distance is essentially a flexible
formula that can represent either Euclidean or Manhattan distance depending on the value of p.
In regression, the algorithm still looks for the K closest points. But instead of voting for a class in
classification, it takes the average of the values of those K neighbors. This average is the predicted
value for the new point for the algorithm.
It shows how a test point is classified based on its nearest neighbors. As the test point moves the
algorithm identifies the closest 'k' data points i.e. 5 in this case and assigns test point the majority
class label that is grey label class here.
Perceptron is a type of neural network that performs binary classification that maps input features
to an output decision, usually classifying data into one of two categories, such as 0 or 1.
Perceptron consists of a single layer of input nodes that are fully connected to a layer of output
nodes. It is particularly good at learning linearly separable patterns. It utilizes a variation of artificial
neurons called Threshold Logic Units (TLU), which were first introduced by McCulloch and Walter
Pitts in the 1940s. This foundational model has played a crucial role in the development of more
advanced neural networks and machine learning algorithms.
Types of Perceptron
Single-Layer Perceptron is a type of perceptron is limited to learning linearly separable patterns. It is
effective for tasks where the data can be divided into distinct categories through a straight line.
While powerful in its simplicity, it struggles with more complex problems where the relationship
between inputs and outputs is non-linear.
Multi-Layer Perceptron possess enhanced processing capabilities as they consist of two or more
layers, adept at handling more complex patterns and relationships within the data.
Input Features: The perceptron takes multiple input features, each representing a characteristic of
the input data.
Weights: Each input feature is assigned a weight that determines its influence on the output. These
weights are adjusted during training to find the optimal values.
Summation Function: The perceptron calculates the weighted sum of its inputs, combining them
with their respective weights.
Activation Function: The weighted sum is passed through the Heaviside step function, comparing it
to a threshold to produce a binary output (0 or 1).
Output: The final output is determined by the activation function, often used for binary
classification tasks.
Bias: The bias term helps the perceptron make adjustments independent of the input, improving its
flexibility in learning.
Learning Algorithm: The perceptron adjusts its weights and bias using a learning algorithm, such as
the Perceptron Learning Rule, to minimize prediction errors.
These components enable the perceptron to learn from data and make predictions. While a single
perceptron can handle simple binary classification, complex tasks require multiple perceptrons
organized into layers, forming a neural network.
z=w1x1+w2x2+…+wnxn=XTWz=w1x1+w2x2+…+wnxn=XTW
The step function compares this weighted sum to a threshold. If the input is larger than the
threshold value, the output is 1; otherwise, it's 0. This is the most common activation function used
in Perceptrons are represented by the Heaviside step function:
In a fully connected layer, also known as a dense layer, all neurons in one layer are connected to
every neuron in the previous layer.
fW,b(X)=h(XW+b)fW,b(X)=h(XW+b)
where XX is the input WW is the weight for each inputs neurons and bb is the bias and hh is the
step function.
During training, the Perceptron's weights are adjusted to minimize the difference between the
predicted output and the actual output. This is achieved using supervised learning algorithms like
the delta rule or the Perceptron learning rule.
wi,j=wi,j+η(yj−y^j)xiwi,j=wi,j+η(yj−y^j)xi
Where:
wi,jwi,j is the weight between the ithith input and jthjth output neuron,
xixi is the ithith input value,
yjyj is the actual value, and y^jy^j is the predicted value,
ηη is the learning rate, controlling how much the weights are adjusted.
This process enables the perceptron to learn from data and improve its prediction accuracy over
time.
Each neuron in a layer is connected to every neuron in the next layer (fully connected).
Steps:
1. Forward Pass:
o Inputs are passed through the network.
o Each neuron applies a weighted sum followed by an activation function (like ReLU,
sigmoid, tanh).
2. Loss Calculation:
o The output is compared with the actual target using a loss function (e.g., Mean
Squared Error, Cross-Entropy).
3. Backward Pass (Backpropagation):
o Calculates the gradient of the loss function with respect to each weight using the
chain rule.
o Updates the weights using Gradient Descent or its variants (SGD, Adam, etc.).
4. Repeat:
o This process is repeated for several epochs until the loss minimizes.
3. Activation Functions: