0% found this document useful (0 votes)

2 views

ML4_ML_Algorithms

The document discusses various machine learning algorithms including KNN, K-Means Clustering, Decision Trees, Naive Bayes, Random Forest, and Linear Regression, detailing their functionalities, advantages, and disadvantages. It provides step-by-step processes for implementing K-Means and Decision Trees, along with examples for clarity. Additionally, it highlights the importance of methods like the Elbow Method for determining the number of clusters and the role of entropy and information gain in decision trees.

Uploaded by

ramzanrawal777

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

ML4_ML_Algorithms

Uploaded by

ramzanrawal777

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 123

LO3 Develop a machine learning

application using an appropriate

programming language or
machine learning tool for solving
a real-world problem
RAJAD SHAKYA
KNN Algorithm
● popular Machine Learning algorithm used mostly for
solving classiﬁcation problems.

● compares a new data entry to the values in a given

data set

● Based on its closeness or similarities in a given

range (K) of neighbors, the algorithm assigns the
new data to a class or category in the data set
KNN Algorithm
● 1 - Assign a value to K
● 2 - Calculate the distance between the new data
entry and all other existing data entries
● 3 - Find the K nearest neighbors to the new entry
based on the calculated distances.
● 4 - Assign the new data entry to the majority class
in the nearest neighbors.
Question
Question
K-Means Clustering
● popular unsupervised machine learning algorithm
used to partition a dataset into a set of distinct,
non-overlapping groups (or clusters) based on their
features.
● Centroids: Each cluster is represented by a centroid,
which is the average of all points in the cluster.
● Clusters: Groups of data points that are assigned to
the nearest centroid.
Steps of K-Means Algorithm
● Initialization: Select k initial centroids randomly from
the dataset.
● Assignment: Assign each data point to the nearest
centroid, forming k clusters.
● Update: Recompute the centroid of each cluster as the
mean of all points assigned to that cluster.
● Repeat: Repeat the assignment and update steps until
convergence (i.e., centroids no longer change or change
very little).
Example 1
{2,4,10,12,3,20,30,11,25}

Let's assume we want to divide these data points

into k=2 clusters.

We will randomly select two data points as initial

cluster centroids. Suppose we choose 4 and 20.

Assign each data point to the nearest centroid:

Example 1
Data Point D to C 4 Dto C20 Assigned Cluster
2 2 18 Cluster 1
4 0 16 Cluster 1
10 6 10 Cluster 1
12 8 8 Cluster 1
3 1 17 Cluster 1
20 16 0 Cluster 2
30 26 10 Cluster 2
11 7 9 Cluster 1
25 21 5 Cluster 2
Example 1
Recompute the centroids:

New Centroid 1: (2+4+10+12+3+11)/6=42/6=7

New Centroid 2: (20+30+25)/3=75/3=25

Reassign data points based on new centroids:

Example 1
Data Point D to C 7 D to C 25 Assigned Cluster
2 5 23 Cluster 1
4 3 21 Cluster 1
10 3 15 Cluster 1
12 5 13 Cluster 1
3 4 22 Cluster 1
20 13 5 Cluster 2
30 23 5 Cluster 2
11 4 14 Cluster 1
25 18 0 Cluster 2
Example 1
Recompute the centroids (repeat until convergence):

New Centroid 1: (2+4+10+12+3+11)/6=42/6=7

New Centroid 2: (20+30+25)/3=75/3=25

Reassign data points based on new centroids:

Example 1
Final Clusters

Cluster 1: {2, 4, 10, 12, 3, 11} (Centroid: 7)

Cluster 2: {20, 30, 25} (Centroid: 25)
Example 2
{2,3,8,10,15,18}

Let's assume k=2 clusters.

Randomly select two data points as initial cluster

centroids. Suppose we choose 3 and 15.
Example 2
Data Point D to C3 D to C15 Assigned Cluster
2 1 13 Cluster 1
3 0 12 Cluster 1
8 5 7 Cluster 1
10 7 5 Cluster 2
15 12 0 Cluster 2
18 15 3 Cluster 2
Example 2
Recompute the centroids:

New Centroid 1: (2+3+8)/3=13/3≈4.33

New Centroid 2: (10+15+18)/3=43/3≈14.33

Example 2
Data Point D to 4.33 D to 14.3 Assigned Cluster
2 2.33 12.33 Cluster 1
3 1.33 11.33 Cluster 1
8 3.67 6.33 Cluster 1
10 5.67 4.33 Cluster 2
15 10.67 0.67 Cluster 2
18 13.67 3.67 Cluster 2
Example 2
Recompute the centroids:

New Centroid 1: (2+3+8)/3=13/3≈4.33

New Centroid 2: (10+15+18)/3=43/3≈14.33

The centroids have not changed, so the algorithm

converged
Example 2
Cluster 1: {2, 3, 8} (Centroid: 4.33)

Cluster 2: {10, 15, 18} (Centroid: 14.33)

Example 2
Point X Y
A 1 1
B 2 1
C 4 5
D 5 4

Using k=2 cluster

Suppose we choose A (1, 1) and C (4, 3).
Advantages
● Simple to implement and computationally efﬁcient.

● Easy to interpret results.

Disadvantages
● Number of clusters (k) must be speciﬁed
beforehand.

● Outliers can skew the centroids and affect the

clustering results.
Question
● Cluster the following eight points (with (x, y)
representing locations) into three clusters:

● A1(2, 10), A2(2, 5), A3(8, 4), A4(5, 8), A5(7, 5),
A6(6, 4), A7(1, 2), A8(4, 9)

● Initial cluster centers are: A1(2, 10), A4(2, 8) and

A7(1, 2).
Question
● Apply K(=2)-Means algorithm over the data (185,
72), (170, 56), (168, 60), (179,68), (182,72),
(188,77) up to two iterations and show the clusters.
Initially choose ﬁrst two objects as initial centroids.
Elbow Method
● used in determining the number of clusters in a
dataset.
● This method involves running the K-Means
algorithm on the dataset for a range of values of k
(the number of clusters) and then for each value of
k , computing the within-cluster sum of squares
(WCSS).
●
Decision Tree
Decision Tree
● supervised machine learning algorithm that can be
used for both classiﬁcation and regression tasks.

● It splits the data into subsets based on the most

signiﬁcant attribute, making decisions at each node
to reach the ﬁnal prediction.

● Root Node: The top node representing the entire

dataset, which is split into two or more
homogeneous sets.
Decision Tree
● Decision Node: Nodes that represent a decision or
test on an attribute.

● Leaf/Terminal Node: Nodes that represent the ﬁnal

class label or value (in the case of regression).

● Branch/Sub-tree: A subsection of the entire tree.

Steps to Build a Decision Tree
● Select the Best Attribute
○ Choose the attribute that best splits the dataset.
■ Gini Index, Information Gain
● Splitting
○ Divide the dataset into subsets based on the best
attribute. This creates decision nodes and leaf
nodes.
● Repeat Step 1 and Step 2
○ Recursively split the nodes until one of the
stopping conditions is met
Steps to Build a Decision Tree
● Stopping Criteria

○ Maximum tree depth is reached.

○ Minimum number of samples in a node.
○ No further information gain can be achieved.
Entropy
● to measure the impurity in a given attribute.

● speciﬁes randomness in data.

● Entropy(s)= -P(yes)log2 P(yes)- P(no) log2 P(no)

○ S= Total number of samples

○ P(yes)= probability of yes
○ P(no)= probability of no
Information Gain
● measurement of changes in entropy after the
segmentation of a dataset based on an attribute.
● calculates how much information a feature provides us
about a class.
● to maximize the value of information gain, and a
node/attribute having the highest information gain is
split ﬁrst.

Entropy(D)=−∑i=1kpilog2pi
Advantages of the Decision Tree
● simple to understand as it follows the same process
which a human follow while making any decision in
real-life.

● helps to think about all the possible outcomes for a

problem.
Disadvantages of the DT
● contains lots of layers, which makes it complex.

● For more class labels, the computational complexity

of the decision tree may increase.
Numerical of the DT
https://medium.datadriveninvestor.com/decision-tree-al
gorithm-with-hands-on-example-e6c2afb40d38

https://www.cs.cmu.edu/~aarti/Class/10315_Fall20/rec
s/DecisionTreesBoostingExampleProblem.pdf
Naive Bayes Algorithm
● probabilistic machine learning algorithm based on
Bayes' Theorem.

● called "naive" because it assumes that the features

(or predictors) are independent of each other given
the class label
Naive Bayes Algorithm
Naive Bayes Algorithm
● Calculate Prior Probability
○ Out of 14 records, 9 are yes.
So P(yes)=9/14 and P(no)=5/14

● Calculate Likelihood Ratio

○ Out of 14 records, 5-Sunny,4-Overcast,5-Rainy.

○ From the dataset, the number of sunny days we

can play is 2. The total no of days we can play is
9.
○ So P(Sunny | yes) =2/9
● Let’s predict given the conditions sunny, mild,
normal, False → Whether he/she can play golf?

= 2/9 4/9 6/9 6/9 9/14

= P(yes|(Sunny,Mild,Normal,False))= 0.0282
● Let’s now calculate P(no|(Sunny,Mild,Normal,False))

=3/5 2/5 1/5 2/5 5/14

P(no|(Sunny,Mild,Normal,False))=0.0068
Since 0.0282 > 0.0068
[P(yes|conditions)>P(no|conditions) ,
for the given conditions Sunny,Mild,Normal,False ,
play is predicted as yes.
Advantages of Naïve Bayes
● fast and easy ML algorithms to predict a class

● used for Binary as well as Multi-class

Classiﬁcations.

● most popular choice for text classiﬁcation problems.

Disadvantages of Naïve Bayes
● assumes that all features are independent or
unrelated, so it cannot learn the relationship
between features.
Types of Naïve Bayes Model
● Gaussian

● Multinomial
Disadvantages of Naïve Bayes
● assumes that all features are independent or
unrelated, so it cannot learn the relationship
between features.
Random Forest
● collaborative team of decision trees that work
together to provide a single output.

● It works by creating a number of Decision Trees

during the training phase.

● Each tree is constructed using a random subset of

the data set to measure a random subset of features
in each partition.
Random Forest
● This randomness introduces variability among
individual trees, reducing the risk of overﬁtting and
improving overall prediction performance.

● aggregates the results of all trees, either by voting

(for classiﬁcation tasks)

● supported by multiple trees with their insights

Random Forest
● Step 1: Select random K data points from the
training set.

● Step 2:Build the decision trees associated with the

selected data points(Subsets).

● Step 3:Choose the number N for decision trees that

you want to build.
Random Forest
● Step 4:Repeat Step 1 and 2.

● Step 5: For new data points, ﬁnd the predictions of

each decision tree, and assign the new data points
to the category that wins the majority votes.
Key Features of Random Forest
● High Predictive Accuracy

● Resistance to Overﬁtting

● Large Datasets Handling

● Parallelization for Speed

Linear Regression
● type of supervised machine learning algorithm that
computes the linear relationship between the
dependent variable and one or more independent
features by ﬁtting a linear equation to observed
data.
● When there is only one independent feature, it is
known as Simple Linear Regression
● when there are more than one feature, it is known as
Multiple Linear Regression
Linear Regression
● interpretability of linear regression is a notable
strength.

● transparent, easy to implement, and serves as a

foundational concept for more complex algorithms.
Simple Linear Regression
● it involves only one independent variable and one
dependent variable.

● Y is the dependent variable

● X is the independent variable
● β0 is the intercept
● β1 is the slope
Multiple Linear Regression
● involves more than one independent variable and
one dependent variable.

● where:

● Y is the dependent variable

● X1, X2, …, Xp are the independent variables
● β0 is the intercept
● β1, β2, …, βn are the slopes
Simple Linear Regression
● The goal of the algorithm is to ﬁnd the best Fit Line
equation that can predict the values based on the
independent variables.
● What is the best Fit Line?
○ straight line that
represents the
relationship between
the dependent and
independent variables.
Simple Linear Regression
● The slope of the line indicates how much the
dependent variable changes for a unit change in the
independent variable(s).

● Here Y is called a dependent or target variable and

● X is called an independent variable also known as

the predictor of Y.
Simple Linear Regression
● performs the task to predict a dependent variable
value (y) based on a given independent variable (x)

● We utilize the cost function to compute the best

values in order to get the best fit line since different
values for weights or the coefficient of lines result in
different regression lines.
How to update θ1 and θ2 values to get the best-fit line?

● To achieve the best-ﬁt regression line, the model

aims to predict the target value 𝑌_pred such that the
error difference between the predicted value 𝑌_pred
and the true value Y is minimum.

● So, it is very important to update the θ1 and θ2

values, to reach the best value that minimizes the
error between the predicted y value (pred) and the
true y value (y).

●
Cost function for LR
● The cost function or the loss function is nothing but
the error or difference between the predicted value
Y_pred and the true value Y.

● In Linear Regression, the Mean Squared Error (MSE)

cost function is employed, which calculates the
average of the squared errors between the
predicted values y_pred_i and the actual values y_i
Cost function for LR
● Utilizing the MSE function, the iterative process of
gradient descent is applied to update the values of
theta1 and theta2
● This ensures that the MSE value converges to the global
minima, signifying the most accurate ﬁt of the linear
regression line to the dataset.
● The ﬁnal result is a linear regression line that minimizes
the overall squared differences between the predicted
and actual values, providing an optimal representation of
the underlying relationship in the data.
Numerical
● (1,2),(2,3),(3,5),(4,4),(5,6)

● y=0.9x+1.3
Gradient Descent for LR
● model can be trained using the optimization
algorithm gradient descent by iteratively modifying
the model’s parameters to reduce the mean squared
error (MSE) of the model on a training dataset.

● To update θ1 and θ2 values in order to reduce the

Cost function (minimizing RMSE value) and achieve
the best-ﬁt line the model uses Gradient Descent.
Gradient Descent for LR
● The idea is to start with random θ1 and θ2 values
and then iteratively update the values, reaching
minimum cost.

● A gradient is nothing but a derivative that deﬁnes

the effects on outputs of the function with a little bit
of variation in inputs.
Gradient Descent for LR
● Finding the coefﬁcients of a linear equation that best
ﬁts the training data is the objective of linear
regression.

● By moving in the direction of the Mean Squared

Error negative gradient with respect to the
coefﬁcients, the coefﬁcients can be changed.

● And the respective intercept and coefﬁcient of X will

be if α is the learning rate.
Gradient Descent for LR
Gradient Descent for LR
● Finding the coefﬁcients of a linear equation that best
ﬁts the training data is the objective of linear
regression.

● By moving in the direction of the Mean Squared

Error negative gradient with respect to the
coefﬁcients, the coefﬁcients can be changed.

● And the respective intercept and coefﬁcient of X will

be if α is the learning rate.
Alpha – The Learning Rate
● It must be chosen carefully to end up with local
minima.

● If the learning rate is too high, we might

OVERSHOOT the minima and keep bouncing
without reaching the minima
● If the learning rate is too small, the training might
turn out to be too long
Types of gradient descent
● Batch gradient descent
○ works by updating the values of parameters
based on the average gradient of the loss
function over the entire training dataset.
○ This is a standard form of gradient descent and
takes huge time with large datasets because it
computes the gradient for all the training data at
each iteration of ﬁnding optimal parameter
values.
Types of gradient descent
● Stochastic gradient descent (SGD)

○ works in a single iteration and gives a higher

modelling speed.

○ Here, values of models parameter are updated

once by the gradient of the loss function on a
single training data at a time.
Types of gradient descent
● Mini-batch gradient descent

○ We can think of it as the compromise between

the above two types of gradient descent because
here, the values of the model parameter get
updated based on the average gradient of a
small random subset of the training data.

○ This subset is called a mini-batch, and this works

with a medium speed.
Evaluation Metrics for LR
● Mean Square Error (MSE)

○ evaluation metric that calculates the average of

the squared differences between the actual and
predicted values for all the data points.
Evaluation Metrics for LR
● Mean Square Error (MSE)

○ Here,

■ n is the number of data points.

■ yi is the actual or observed value for the ith
data point.

■ y_i_pred is the predicted value for the ith

data point.
Evaluation Metrics for LR
● Mean Square Error (MSE)
○ Here,
■ n is the number of data points.
■ yi is the actual or observed value for the ith data point.
■ y_i_pred is the predicted value for the ith data point.
○ MSE is a way to quantify the accuracy of a model’s
predictions.
○ MSE is sensitive to outliers as large errors contribute
signiﬁcantly to the overall score.
Evaluation Metrics for LR
● Coefﬁcient of Determination (R-squared)

○ statistic that indicates how much variation the

developed model can explain or capture.

○ It is always in the range of 0 to 1.

○ In general, the better the model matches the

data, the greater the R-squared number.
Evaluation Metrics for LR
● Coefﬁcient of Determination (R-squared)
Evaluation Metrics for LR
● Coefﬁcient of Determination (R-squared)

○ Residual sum of Squares (RSS):

■ The sum of squares of the residual for each

data point in the plot or data

■ It is a measurement of the difference between

the output that was observed and what was
anticipated.
Evaluation Metrics for LR
● Coefﬁcient of Determination (R-squared)

○ Total Sum of Squares (TSS):

■ The sum of the data points’ errors from the

answer variable’s mean is known as the total
sum of squares, or TSS.
Evaluation Metrics for LR
● Coefﬁcient of Determination (R-squared)

○ R squared metric is a measure of the proportion

of variance in the dependent variable that is
explained the independent variables in the
model.
Evaluation Metrics for LR
● Adjusted R-Squared Error
○ proportion of variance in the dependent variable
that is explained by independent variables in a
regression model.
○ accounts the number of predictors in the model
and penalizes the model for including irrelevant
predictors that don’t contribute signiﬁcantly to
explain the variance in the dependent variables.
Evaluation Metrics for LR
● Adjusted R-Squared Error

○ adjusted R2 is expressed as:

○ Here,

○ n is the number of observations

○ k is the number of predictors in the model
○ R2 is coefﬁcient of determination
Evaluation Metrics for LR
● Adjusted R-Squared Error

○ helps to prevent overﬁtting. It penalizes the

model with additional predictors that do not
contribute signiﬁcantly to explain the variance in
the dependent variable.
Train Test Validation
● fundamental in machine learning and data analysis,
particularly during model development.

● It involves dividing a dataset into three subsets:

training, testing, and validation.

● Train test split is a model validation process that

allows you to check how your model would perform
with a new data set.
Train Test Validation
● train validation test split helps assess how well a
machine learning model will generalize to new,
unseen data.
● It also prevents overﬁtting, where a model performs
well on the training data but fails to generalize to
new instances.
● By using a validation set, practitioners can iteratively
adjust the model’s parameters to achieve better
performance on unseen data.
Train Test Validation
● Performance Evaluation

○ After training and validation, the model faces the

testing set, which checks real-world scenarios.

○ A well-performing model on the testing set

indicates that it has successfully adapted to new,
unseen data.
Train Test Validation
● Bias and Variance Assessment
○ helps in understanding the bias trade-off.
○ The training set provides information about the model’s
bias, capturing inherent patterns, while
○ the validation and testing sets help assess variance,
indicating the model’s sensitivity to ﬂuctuations in the
dataset.
○ Striking the right balance between bias and variance is
vital for achieving a model that generalizes well across
different datasets.
Confusion Matrix
Confusion Matrix
● matrix that summarizes the performance of a
machine learning model on a set of test data.

● It is a means of displaying the number of accurate

and inaccurate instances based on the model’s
predictions.

● It is often used to measure the performance of

classiﬁcation models, which aim to predict a
categorical label for each input instance.
Confusion Matrix
Confusion Matrix
● True Positive:

○ Interpretation: You predicted positive and it’s

true.

○ You predicted that a woman is pregnant and she

actually is.
Confusion Matrix
● True Negative:

○ Interpretation: You predicted negative and it’s

true.

○ You predicted that a man is not pregnant and he

actually is not.
Confusion Matrix
● False Positive: (Type 1 Error)

○ Interpretation: You predicted positive and it’s

false.

○ You predicted that a man is pregnant but he

actually is not.
Confusion Matrix
● False Negative: (Type 2 Error)

○ Interpretation: You predicted negative and it’s

false.

○ You predicted that a woman is not pregnant but

she actually is.
Confusion Matrix
● Just Remember, We describe predicted values as
Positive and Negative and actual values as True and
False.
Confusion Matrix
● Recall

○ The above equation can be explained by saying,

from all the positive classes, how many we
predicted correctly.

○ Recall should be high as possible.

Confusion Matrix
● Precision

○ from all the classes we have predicted as

positive, how many are actually positive.

○ Precision should be high as possible.

Confusion Matrix
● Accuracy

○ From all the classes (positive and negative), how

many of them we have predicted correctly.

○ Accuracy should be high as possible.

Confusion Matrix
● F-measure
○ It is difﬁcult to compare two models with low
precision and high recall or vice versa.
○ So to make them comparable, we use F-Score.
F-score helps to measure Recall and Precision at
the same time.
○ It uses Harmonic Mean in place of Arithmetic
Mean by punishing the extreme values more.
Confusion Matrix
● Positive Negative
Actual Positive 50 10
Actual Negative 5 35

● Calculate the following metrics:

a) Accuracy
b) Precision
c) Recall
d) F1 Score
Confusion Matrix
● P Spam P Not Spam
Actual Spam 90 10
Actual Not Spam 20 80

● Calculate the following metrics:

a) Accuracy
b) Precision
c) Recall
d) F1 Score
Support Vector Machine (SVM)
● It is a supervised machine learning problem where
we try to ﬁnd a hyperplane that best separates the
two classes.

● SVM is a powerful supervised algorithm that works

best on smaller datasets but on complex ones.
Support Vector Machine (SVM)
Support Vector Machine (SVM)
● Support Vectors:

○ These are the points that are closest to the

hyperplane. A separating line will be deﬁned
with the help of these data points.
Support Vector Machine (SVM)
● Margin:

○ It is the distance between the hyperplane and

the observations closest to the hyperplane
(support vectors). In SVM large margin is
considered a good margin. There are two
types of margins hard margin and soft margin.
How Does SVM Work?
How Does SVM Work?
● The best hyperplane is that plane that has the
maximum distance from both the classes, and this is
the main aim of SVM.

● This is done by ﬁnding different hyperplanes which

classify the labels in the best way
Mathematical Intuition
● The best hyperplane is that plane that has the
maximum distance from both the classes, and this is
the main aim of SVM.

● This is done by ﬁnding different hyperplanes which

classify the labels in the best way
Thank You

RAJAD SHAKYA

Mystery Manor Mystery Manor: Escape Room Escape Room
No ratings yet
Mystery Manor Mystery Manor: Escape Room Escape Room
17 pages
Machine Learning 1
No ratings yet
Machine Learning 1
29 pages
Algorithms New
No ratings yet
Algorithms New
8 pages
Machine Learning
100% (6)
Machine Learning
115 pages
ML Assignment 2 PDF
No ratings yet
ML Assignment 2 PDF
9 pages
Data mining and machine learning
No ratings yet
Data mining and machine learning
48 pages
Decision Tree & Random Forest
No ratings yet
Decision Tree & Random Forest
34 pages
K MEANS -
No ratings yet
K MEANS -
2 pages
Big Data Notes
No ratings yet
Big Data Notes
33 pages
Refer For KNNDecison Tree SVM
No ratings yet
Refer For KNNDecison Tree SVM
90 pages
Algorithms 1
No ratings yet
Algorithms 1
23 pages
Chapter 7 Supervised Learning
No ratings yet
Chapter 7 Supervised Learning
71 pages
Slide 3
No ratings yet
Slide 3
23 pages
ML Unit-Ii Notes
No ratings yet
ML Unit-Ii Notes
17 pages
CSE-VSEM-503-B-PR-UNIT-2-NOTES
No ratings yet
CSE-VSEM-503-B-PR-UNIT-2-NOTES
17 pages
DATA 2024_dist
No ratings yet
DATA 2024_dist
97 pages
Unit 5 Big Data
No ratings yet
Unit 5 Big Data
55 pages
Chapter
100% (1)
Chapter
101 pages
(KtabPDF Com) xrwA7TEBGp
No ratings yet
(KtabPDF Com) xrwA7TEBGp
32 pages
Primer On Major Data Mining Algorithms
No ratings yet
Primer On Major Data Mining Algorithms
86 pages
ML UNIT 2 Sir
No ratings yet
ML UNIT 2 Sir
46 pages
ML Notes 1
No ratings yet
ML Notes 1
3 pages
Raghav soni(20IOT6014) Algo_Assignment
No ratings yet
Raghav soni(20IOT6014) Algo_Assignment
14 pages
8 Classification
No ratings yet
8 Classification
45 pages
ML Unit 2
No ratings yet
ML Unit 2
84 pages
unsupervised learning
No ratings yet
unsupervised learning
23 pages
Learning AI
No ratings yet
Learning AI
34 pages
lec6+K-means,+Niavebase,+KNN
No ratings yet
lec6+K-means,+Niavebase,+KNN
25 pages
Machine learning algorithms laiki
No ratings yet
Machine learning algorithms laiki
123 pages
Lec6,7,8 K-means, Niavebase, KNN
No ratings yet
Lec6,7,8 K-means, Niavebase, KNN
25 pages
NNML
No ratings yet
NNML
113 pages
Machine Learning and Neural Networks: Riccardo Rizzo
100% (1)
Machine Learning and Neural Networks: Riccardo Rizzo
113 pages
DS Unit - 4
No ratings yet
DS Unit - 4
76 pages
Machine learning assingiment
No ratings yet
Machine learning assingiment
20 pages
Lecture 05.decision Tree and K Means PDF
No ratings yet
Lecture 05.decision Tree and K Means PDF
38 pages
IDS26 Clustering and Classification
No ratings yet
IDS26 Clustering and Classification
30 pages
Chapter 04 (2)
No ratings yet
Chapter 04 (2)
42 pages
MLunit 2 Mynotes
No ratings yet
MLunit 2 Mynotes
15 pages
Week 4 - Lecture Slides - K-Means, Mixture Models, & EM
No ratings yet
Week 4 - Lecture Slides - K-Means, Mixture Models, & EM
65 pages
Classification and Clustering Method
0% (1)
Classification and Clustering Method
30 pages
Random Forest Regression
No ratings yet
Random Forest Regression
57 pages
Unit 3 DVA
No ratings yet
Unit 3 DVA
50 pages
ML Unit 2
No ratings yet
ML Unit 2
8 pages
Data Mining Classification Algorithms: Credits: Padhraic Smyth
No ratings yet
Data Mining Classification Algorithms: Credits: Padhraic Smyth
54 pages
ML & DL Notes
No ratings yet
ML & DL Notes
30 pages
DataMining_Unit-3
No ratings yet
DataMining_Unit-3
8 pages
Datamining Lect7knearst
No ratings yet
Datamining Lect7knearst
62 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
Applied Machine Learning I
No ratings yet
Applied Machine Learning I
29 pages
UNIT-3[MLT]
No ratings yet
UNIT-3[MLT]
42 pages
AAM UNIT 2 QB WITH ANSWER
No ratings yet
AAM UNIT 2 QB WITH ANSWER
16 pages
Aiml Prof
No ratings yet
Aiml Prof
8 pages
Unit 4
No ratings yet
Unit 4
33 pages
05 Classification Part1
No ratings yet
05 Classification Part1
35 pages
Co-2 ML 2019
No ratings yet
Co-2 ML 2019
71 pages
Unit Ii
No ratings yet
Unit Ii
102 pages
Asset v1 MKAU+SEng9032+DEV 01+Type@Asset+Block@ML Chapterthree
No ratings yet
Asset v1 MKAU+SEng9032+DEV 01+Type@Asset+Block@ML Chapterthree
129 pages
2023-24_ML_NOTES_2
No ratings yet
2023-24_ML_NOTES_2
16 pages
Pattern Analysis-Machine Learning
No ratings yet
Pattern Analysis-Machine Learning
74 pages
Unit 4 Clustering - K-Means and Hierarchical
No ratings yet
Unit 4 Clustering - K-Means and Hierarchical
40 pages
Speed Mathamatics
From Everand
Speed Mathamatics
Naila Hina
1/5 (1)
4..................
No ratings yet
4..................
21 pages
Unit 17
No ratings yet
Unit 17
11 pages
ML5_Implementation
No ratings yet
ML5_Implementation
32 pages
Unit 15
No ratings yet
Unit 15
9 pages
Unit 13
No ratings yet
Unit 13
18 pages
Building Test Environment.Test Cases
No ratings yet
Building Test Environment.Test Cases
16 pages
Professional Practice Report Template 2
No ratings yet
Professional Practice Report Template 2
108 pages
Skuld - List of Correspondents as Per 2 Sep 2024
No ratings yet
Skuld - List of Correspondents as Per 2 Sep 2024
313 pages
80 2003
No ratings yet
80 2003
51 pages
ANUP SAKHARE - Resume-1
No ratings yet
ANUP SAKHARE - Resume-1
2 pages
Trollface: Trollface Is A 2008 Rage Comic Meme Image Used To Symbolize
No ratings yet
Trollface: Trollface Is A 2008 Rage Comic Meme Image Used To Symbolize
3 pages
Unit 4
No ratings yet
Unit 4
13 pages
Introduction To Distributed DBMS Architecture
No ratings yet
Introduction To Distributed DBMS Architecture
7 pages
Ite Unit 1
No ratings yet
Ite Unit 1
39 pages
Internet of Things: Jarrod Trevathan, PHD, Simon Schmidtke, Wayne Read, PHD, Tony Sharp, Abdul Sattar, PHD
No ratings yet
Internet of Things: Jarrod Trevathan, PHD, Simon Schmidtke, Wayne Read, PHD, Tony Sharp, Abdul Sattar, PHD
18 pages
AIR Creative Collection Guide
No ratings yet
AIR Creative Collection Guide
88 pages
HikCentral Professional On Amazon Web Services - Deployment Guide - 20211018
No ratings yet
HikCentral Professional On Amazon Web Services - Deployment Guide - 20211018
15 pages
QRG 225 en 02 FL Comserver+Aerzener Aer Tronic Modbus
No ratings yet
QRG 225 en 02 FL Comserver+Aerzener Aer Tronic Modbus
9 pages
CH04 - Software
No ratings yet
CH04 - Software
18 pages
Summary_ Introduction to Data Analytics (2)-3978
No ratings yet
Summary_ Introduction to Data Analytics (2)-3978
7 pages
S-4 Hana Thesis
No ratings yet
S-4 Hana Thesis
56 pages
Mostafa Kamel Ibrahim: Senior BMS and Light Current Engineer KNX Certified
No ratings yet
Mostafa Kamel Ibrahim: Senior BMS and Light Current Engineer KNX Certified
5 pages
Gamsse Log
No ratings yet
Gamsse Log
47 pages
SF Dump
No ratings yet
SF Dump
15 pages
Ok_hotmail @DevelMakss (47)
No ratings yet
Ok_hotmail @DevelMakss (47)
24 pages
Machine Learning Notes (1)
No ratings yet
Machine Learning Notes (1)
19 pages
AP Artificial Passenger Report
100% (2)
AP Artificial Passenger Report
24 pages
Computer-6 Key (1)
No ratings yet
Computer-6 Key (1)
37 pages
Docker
No ratings yet
Docker
25 pages
Unit 7 Database Recovery TechniquesbitinfoNepal
No ratings yet
Unit 7 Database Recovery TechniquesbitinfoNepal
3 pages
B Release Notes Asr9k 692
No ratings yet
B Release Notes Asr9k 692
19 pages
Unit 1-DBP
No ratings yet
Unit 1-DBP
59 pages
Page Replacement Algorithm
No ratings yet
Page Replacement Algorithm
20 pages
SSP-546 The Passat 2015 Infotainment and Car-Net
No ratings yet
SSP-546 The Passat 2015 Infotainment and Car-Net
48 pages
Vibe A To Z Guide 2021 V8 - MY
No ratings yet
Vibe A To Z Guide 2021 V8 - MY
13 pages

ML4_ML_Algorithms

Uploaded by

ML4_ML_Algorithms

Uploaded by

LO3 Develop a machine learning

application using an appropriate

● compares a new data entry to the values in a given

● Based on its closeness or similarities in a given

Let's assume we want to divide these data points

We will randomly select two data points as initial

Assign each data point to the nearest centroid:

New Centroid 1: (2+4+10+12+3+11)/6=42/6=7

New Centroid 2: (20+30+25)/3=75/3=25

Reassign data points based on new centroids:

New Centroid 1: (2+4+10+12+3+11)/6=42/6=7

New Centroid 2: (20+30+25)/3=75/3=25

Reassign data points based on new centroids:

Cluster 1: {2, 4, 10, 12, 3, 11} (Centroid: 7)

Let's assume k=2 clusters.

Randomly select two data points as initial cluster

New Centroid 1: (2+3+8)/3=13/3≈4.33

New Centroid 2: (10+15+18)/3=43/3≈14.33

New Centroid 1: (2+3+8)/3=13/3≈4.33

New Centroid 2: (10+15+18)/3=43/3≈14.33

The centroids have not changed, so the algorithm

Cluster 2: {10, 15, 18} (Centroid: 14.33)

Using k=2 cluster

● Easy to interpret results.

● Outliers can skew the centroids and affect the

● Initial cluster centers are: A1(2, 10), A4(2, 8) and

● It splits the data into subsets based on the most

● Root Node: The top node representing the entire

● Leaf/Terminal Node: Nodes that represent the ﬁnal

● Branch/Sub-tree: A subsection of the entire tree.

○ Maximum tree depth is reached.

● speciﬁes randomness in data.

● Entropy(s)= -P(yes)log2 P(yes)- P(no) log2 P(no)

○ S= Total number of samples

● helps to think about all the possible outcomes for a

● For more class labels, the computational complexity

● called "naive" because it assumes that the features

● Calculate Likelihood Ratio

○ From the dataset, the number of sunny days we

= 2/9 *4/9 *6/9 *6/9 *9/14

=3/5 *2/5 *1/5 *2/5 *5/14

● used for Binary as well as Multi-class

● most popular choice for text classiﬁcation problems.

● It works by creating a number of Decision Trees

● Each tree is constructed using a random subset of

● aggregates the results of all trees, either by voting

● supported by multiple trees with their insights

● Step 2:Build the decision trees associated with the

● Step 3:Choose the number N for decision trees that

● Step 5: For new data points, ﬁnd the predictions of

● Large Datasets Handling

● Parallelization for Speed

● transparent, easy to implement, and serves as a

● Y is the dependent variable

● Y is the dependent variable

● Here Y is called a dependent or target variable and

● X is called an independent variable also known as

● We utilize the cost function to compute the best

● To achieve the best-ﬁt regression line, the model

● So, it is very important to update the θ1 and θ2

● In Linear Regression, the Mean Squared Error (MSE)

● To update θ1 and θ2 values in order to reduce the

● A gradient is nothing but a derivative that deﬁnes

● By moving in the direction of the Mean Squared

● And the respective intercept and coefﬁcient of X will

● By moving in the direction of the Mean Squared

● And the respective intercept and coefﬁcient of X will

● If the learning rate is too high, we might

○ works in a single iteration and gives a higher

○ Here, values of models parameter are updated

○ We can think of it as the compromise between

○ This subset is called a mini-batch, and this works

○ evaluation metric that calculates the average of

■ n is the number of data points.

■ y_i_pred is the predicted value for the ith

○ statistic that indicates how much variation the

= 2/9 4/9 6/9 6/9 9/14

=3/5 2/5 1/5 2/5 5/14