0% found this document useful (0 votes)
21 views

Basic ML Algorithm

Uploaded by

WONDYE DESTA
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views

Basic ML Algorithm

Uploaded by

WONDYE DESTA
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 74

Machine Learning

Algorithms

Michael Melese (Ph.D.)


[email protected]
January 22, 2024
Algorithm
¡ An algorithm is a procedure or set of steps or rules to
accomplish a task. Algorithms are one of the fundamental
concepts in, or building blocks of, computer science.
¡ Some of the basic types of tasks that algorithms can solve are
§ sorting, searching, and graph-based computational problems
¡ In data science, there are at least three classes of algorithms
one should be aware of;
§ Data munging, preparation, and processing algorithms, such as
sorting, MapReduce, or Pregel.
§ Optimization algorithms for parameter estimation, including
Stochastic Gradient Descent, Newton’s Method, and Least Squares.
§ Machine learning algorithms.

January 22, 2024


Machine Learning
¡ Machine learning algorithms are largely used to
predict, classify, or cluster.
¡ Machine learning algorithms are the basis of artificial
intelligence (AI) such as image recognition, speech
recognition, recommendation systems, ranking and
personalization of content.
¡ Machine learning algorithms are described as
learning a target function (f) that best maps input
variables (X) to an output variable (Y): Y = f(X)

January 22, 2024


Algorithm
¡ Linear Regression ¡ Classification and Regression
Trees
¡ Logistic Regression
¡ Hierarchical Clustering
¡ K-Means ¡ Learning Vector Quantization
¡ Support Vector Machines
¡ Naive Bayes
¡ Bagging and Random Forest
¡ K-Nearest Neighbors ¡ Boosting and AdaBoost
¡ PCA ¡ Linear Discriminant Analysis
January 22, 2024
Linear Regression
¡ Is one of the fundamental supervised machine-learning algorithms
due to its relative simplicity and well-known properties.
¡ Is one of the most well known and understood algorithms in
statistics and ML.
¡ Express the mathematical relationship between two variables or
attributes.
¡ Predictive modeling is primarily concerned with minimizing the
error of a model or making the most accurate predictions, at the
expense of explain ability.

January 22, 2024



¡ Linear regression might be simple linear or
multivariant.
§ The case of one explanatory variable is called simple linear
regression.
§ More than one explanatory variable, the process is
called multiple linear regression.
¡ Assumption
§ There is a linear relationship between an outcome variable
(dependent variable) and a predictor (independent variable
or feature).

January 22, 2024



¡ The relationship between independent and dependent
variables by fitting a best line using the coefficients b and
m are derived from the given input by minimizing the
sum of squared difference of distance between data
points and regression line.

where: y – Dependent variable,


b – Intercept,
y= b+ m*x x – Independent variable and
m – slope

January 22, 2024



$% ∑ + − +, - ∑ ( − (̅ -
!=# $% = $& =
$& .−1 .−1
B = +, − 0(̅
∑ ( − (̅ + − +, Where: r – Pearson correlation coefficient variable,
#= $& $% – Standard deviation,
∑ ( − (̅ - ∑ + − +, - (̅ – x mean and
+, – y mean

(∑ +)(∑ ( - ) − (∑ ()(∑ (+) 3 ∑ &% 4(∑ &)(∑ %)


0= m= 3(∑ & 5 )4(∑ &)5
.(∑ ( - ) − (∑ ()-

January 22, 2024



Glucose
No Age (X) XY X2 Y2
level (Y)
1 43 99 4257 1849 9801
2 21 65 1365 441 4225
3 25 79 1975 625 6241
4 42 75 3150 1764 5625
5 57 87 4959 3249 7569
6 59 81 4779 3481 6561
Σ 247 486 20485 11409 40022

January 22, 2024



Glucose
No Age (X)
level (Y)
XY X2 Y2 486 11409 − (247)(20485)
!=
1 43 99 4257 1849 9801 6 11409 − (247)/
2 21 65 1365 441 4225
3 25 79 1975 625 6241
4 42 75 3150 1764 5625
5 57 87 4959 3249 7569 ! = 65.1416
6 59 81 4779 3481 6561
Σ 247 486 20485 11409 40022 2 /3456 7(/48)(452)
m= 2 9943: 7 (/48);

m= 0.38522

y= 0.38522*x + 65.141
January 22, 2024
Logistic Regression
¡ Logistic regression is a supervised learning
algorithm.
¡ LR analysis studies the association between a
categorical dependent and independent
variables.
¡ The goal of Logistic Regression is to discover
a link between characteristics and the
likelihood of a specific outcome.

January 22, 2024



¡ Like a linear regression, logistic regression is
used when the response variable is a
categorical variable with two possible option.
For example;
§ A student applicant receives or does not receive
aid.
§ A patient lives or dies during emergency surgery.
§ Phone coverage is acceptable or not.

January 22, 2024



¡ Generally the two outcomes of the response
variable “success’’ represent them by 1 and 0
for a failure.
§ The mean is then the proportion of 1’s, p =
P(success) while p=P(failure) for 0’s
§ Student passes or fails an exam based on the
number of hours spent studying, the response
variable has two values: pass and fail.

January 22, 2024



¡ Generally logistic regression can be classified into:
Ø Binary Logistic Regression – two or binary
outcomes like yes or no
Ø Multinomial Logistic Regression – three or more
outcomes like Married, Single, Divorced, or
Widowed.
Ø Ordinal Logistic Regression – three or more like
multinomial logistic regression but here with the
order like customer rating in the supermarket from
1 to 5.

January 22, 2024



¡ LR model work majority of the datasets, but still, if
you need good performance, then there are some
assumptions to consider,
Ø The dependent variable in binary logistic regression must
be binary.
Ø Only you have to choose relevant attribute should be
included.
Ø The independent variables must be unrelated to one another.
Ø That is, there should be minimal or no multicollinearity in the model.
Ø The log chances are proportional to the independent
variables.
Ø Large sample sizes are required for logistic regression.
January 22, 2024

¡ Logistic Regression utilizes a
more sophisticated cost function, X transformed
-3 0.04742587318
which is known as the “Sigmoid -2 0.119202922
function” or “logistic function”. -1 0.2689414214

¡ The function produces an S- 0 0.5


1 0.7310585786
shaped curve.
2 0.880797078
§ It returns a probability value 3 0.9525741268
between 0 and 1 and convert
expected values to probabilities.
January 22, 2024

Ø Like linear regression x1 x2 y
the logistic regression 2.7810836 2.550537 0

has independent number 1.46548937 2.36212508 0


3.39656169 4.40029353 0
of coefficients here 2 1.38807019 1.85022032 0
3.06407232 3.00530597 0
output = b0 + b1*x1 + b2*x2 7.62753121 2.75926224 1
5.33244125 2.08862678 1
6.92259672 1.77106367 1
8.67541865 -0.2420687 1
7.67375647 3.50856301 1

January 22, 2024



¡ The logistic regression model takes real-valued inputs
and makes a prediction as to the probability of the
input belonging to the default class (class 0).
§ If the probability is > 0.5 we can take the output as a
prediction for the default class (class 0), otherwise the
prediction is for the other class (class 1).
¡ For this dataset, the logistic regression has three
coefficients just like linear regression, for example:
output = b0 + b1*x1 + b2*x2
¡ The job of the learning algorithm is to discover the best
values for the coefficients (b0, b1 and b2) based on the
training data.
January 22, 2024

¡ Unlike linear regression, the output is transformed into a probability
using the logistic function:
p(class=0) = 1 / (1 + e-output)
¡ Let’s start off by assigning 0.0 to each coefficient and calculating
the probability of the first training instance that belongs to class 0.
¡ B0 = 0.0 , B1 = 0.0 , B2 = 0.0
¡ The first training instance is: x1=2.7810836, x2=2.550537003, Y=0
¡ Using the above equation we can plug in all of these numbers and
calculate a prediction:
prediction = 1 / (1 + e^(-(b0 + b1*x1 + b2*x2)))
prediction = 1 / (1 + e^(-(0.0 + 0.0*2.7810836 +
0.0*2.550537003)))
prediction = 0.5

January 22, 2024



¡ We can calculate the new coefficient values using a simple update
equation.
b= b+ alpha * (y –prediction) * prediction * (1 –prediction) * x
ü B is the coefficient we are updating and prediction is the output of making a
prediction using the model.
ü Alpha is parameter (learning rate) that you must specify at the beginning of the
training run. Good values might be in the range 0.1 to 0.3. Let’s use a value of
0.3.
ü B0 does not have an input. This coefficient is often called the bias or the
intercept and we can assume it always has an input value of 1.0.
¡ After calculating the output value, then it is transformed into a probability
using the logistic function or sigmoid function :
p(class=0) = 1 / (1 + e^(-output))

January 22, 2024



¡ Let’s update the coefficients using the prediction (0.5) and coefficient
values (0.0) from the previous section.
¡ For the first instance
Ø b0 = b0 + 0.3 * (0 – 0.5) * 0.5 * (1 – 0.5) * 1.0 = -0.0375
Ø b1 = b1 + 0.3 * (0 – 0.5) * 0.5 * (1 – 0.5) * 2.7810836 = -0.104290635
Ø b2 = b2 + 0.3 * (0 – 0.5) * 0.5 * (1 – 0.5) * 2.550537003 = -0.09564513761
¡ Substituting the values b0, b1, b2 to calculate prediction for our 1 instance:
prediction = 1 / (1 + e^(-(b0 + b1*x1 + b2*x2)))
= 1 / (1 + e^(-(-0.0375 + (-0.104290635 * 2.7810836) + (-0.09564513761 *
2.550537003)))) = 0.2987569857

January 22, 2024



¡ For the second instance
Ø b0 = b0 + 0.3 * (0 – 0.5) * 0.5 * (1 – 0.5) * 1.0 = -0.0375
Ø b1 = b1 + 0.3 * (0 – 0.5) * 0.5 * (1 – 0.5) * 1.46548937 = -
0.054955851375
Ø b2 = b2 + 0.3 * (0 – 0.5) * 0.5 * (1 – 0.5) * 2.36212508 = -
0.0885796905
¡ Substituting the values b0, b1, b2 to calculate prediction for
the second instance:
prediction = 1 / (1 + e^(-(b0 + b1*x1 + b2*x2)))
= 1 / (1 + e^(-(-0.0375 +( -0.054955851375 * 1.46548937) +
(-0.0885796905 * 2.36212508)))) = 0.145951056

January 22, 2024



¡ For the third instance
Ø b0 = b0 + 0.3 * (0 – 0.5) * 0.5 * (1 – 0.5) * 1.0 = -0.0375
Ø b1 = b1 + 0.3 * (0 – 0.5) * 0.5 * (1 – 0.5) * 3.39656169 = -
0.127371063375
Ø b2 = b2 + 0.3 * (0 – 0.5) * 0.5 * (1 – 0.5) * 4.40029353 = -
0.165011007375
¡ Substituting the values b0, b1, b2 to calculate prediction for
the second instance:
prediction = 1 / (1 + e^(-(b0 + b1*x1 + b2*x2)))
= 1 / (1 + e^(-(-0.0375 +(-0.127371063375 * 3.39656169) + (-
0.165011007375 * 4.40029353)))) = 0.0853326531
¡ Repeat this process and update the model for each training
instance in the dataset.

January 22, 2024



training output instance Convert Probability Reference
0.2987569857 0 0
0.145951056 0 0
0.0853326531 0 0
0.2197373144 0 0
0.2470590002 0 0
0.9547021348 1 1
0.8620341908 1 1
0.9717729051 1 1
0.9992954521 1 1
0.905489323 1 1

January 22, 2024


K-means
¡ Clustering is the process of partitioning a group of
data points into a small number of clusters.
¡ K-means clustering is a type of unsupervised
learning, which is used when you have unlabeled
data.
§ The goal of this algorithm is to find groups in the data, with
the number of groups represented by the variable K.
¡ The algorithm works iteratively to assign each data
point to one of K groups based on the features
provided.

January 22, 2024



¡ Data points are clustered based on feature
similarity. The results of the K-means
clustering algorithm are:
¡ The centroids of the K clusters, which can be
used to label new data
¡ Each data point is assigned to a single cluster.

January 22, 2024



¡ Behavioral segmentation
§ Segment by purchase history, activities on application, website,
or platform
§ Define personas based on interests
¡ Inventory categorization
§ Group inventory by sales activity and manufacturing metrics
¡ Sorting sensor measurements
§ Detect activity types in motion sensors
§ Group images, Separate audio and Identify groups in health
monitoring
¡ Detecting bots or anomalies
§ Separate valid activity groups from bots
§ Group valid activity to clean up outlier detection

January 22, 2024


K-means algorithm
¡ The Κ-means clustering algorithm uses iterative refinement to produce a
cluster. The algorithm inputs are the number of clusters Κ and the data set.
§ The data set is a features for each data point. The algorithms starts with initial
estimates of Κ centroids, which can either be randomly generated or randomly
selected from data set.
¡ The algorithm then iterates between the following step:
§ Initially, randomly pick k centroids (or points that will be the center of your
clusters) in d-space. Try to make them near the data but different from one
another.
§ Then assign each data point to the closest centroid.
§ Move the centroids to the average location of the data points (which correspond
to users in this example) assigned to it.
§ Repeat the preceding two steps until the assignments don’t change, or change
very little. (i.e., no data points change clusters, the sum of the distances is
minimized, or some maximum number of iterations is reached).
January 22, 2024

¡ The algorithm finds the clusters and data set labels for a
particular pre-chosen K. To find the number of clusters, the
user needs to run the K-means clustering algorithm for a range
of K and compare the results.
§ There is no method for determining exact value of K, but an accurate
estimate can be obtained using the following techniques.
Given n data points xi, i=1...n to be partitioned in k clusters

where ci is the set of points that belong to cluster. The K-


means clustering uses the square of the Euclidean
distance d(x,μi)=ǁx−μiǁ2.

January 22, 2024


K-means
¡ Given the following two table cluster the data using
k-means algorithm
No x y Cluster
No x y Cluster A 1 1
1 185 72 B 1 0
2 170 56 C 0 2
3 169 60 D 2 4
4 179 68 E 3 4
5 182 72 F 1 2
6 188 77 G 2 3
H 1 3
January 22, 2024
Hierarchical Clustering
¡ Separate the data into different groups using
hierarchy of clusters using similarity measure.
§ Agglomerative (bottom-up approach)
▪ Starts by merging based on the distance between clusters.
This will be done until we form one big cluster
§ Divisive (top-down approach)
▪ Starts by huge cluster and breaks in to smaller cluster
and continues until it reaches individual data points (or
single point clusters).

January 22, 2024


Agglomerative
A B C D E F
Height 185 170 168 179 182 188
Weight 72 56 60 68 72 77

P0 (185, 72) P1 (170, 56) P2 (168, 60) P3 (179, 68) P4 (182, 72) P5 (188, 77)
P0 (185, 72)
0 0 0 0 0 0
P1 (170, 56)
21.93 0 0 0 0 0
P2 (168, 60)
20.81 4.47 0 0 0 0
P3 (179, 68)
7.21 15 13.6 0 0 0
P4 (182, 72)
3 20 18.44 5 0 0
P5 (188, 77)
5.83 27.66 26.25 12.73 7.81 0

January 22, 2024



[P0 (185, 72),
P1 (170, 56) P2 (168, 60) P3 (179, 68) P5 (188, 77)
P4 (182, 72)]
[P0 (185, 72),
P4 (182, 72)] 0 0 0 0 0
P1 (170, 56)
20.0 0 0 0 0
P2 (168, 60)
18.44 4.47 0 0 0
P3 (179, 68)
5 15 13.6 0 0
P5 (188, 77)
5.83 27.66 26.25 12.73 0

[P0 (185, 72), [P1 (170, 56), P3 (179,


P5 (188, 77)
P4 (182, 72)] P2 (168, 60)] 68)
[P0 (185, 72),
P4 (182, 72)] 0 0 0 0
[P1 (170, 56),
P2 (168, 60)] 18.44 0 0 0
P3 (179, 68)
5 13.6 0 0
P5 (188, 77)
5.83 26.25 12.73 0
January 22, 2024

[P3 (179, 68), [P0 (185, 72), P1 (170, 56) P2 (168,
P5 (188, 77)
P4 (182, 72)]] 60)
[P3 (179, 68), [P0 (185, 72),
P4 (182, 72)]] 0 0 0
P1 (170, 56)
P2 (168, 60) 13.6 0 0
P5 (188, 77)
5.83 26.25 0

[P5 (188, 77),


P1 (170, 56) P2
[P3 (179, 68), [P0 (185, 72),
(168, 60)
P4 (182, 72)]]]
[P5 (188, 77),
[P3 (179, 68), [P0 (185,
72),
P4 (182, 72)]]] 0 0
P1 (170, 56)
P2 (168, 60) 13.6 0

January 22, 2024



[[P1 (170, 56), P2 (168, 60)], [P5 (188, 77),
[P3 (179, 68), [P0 (185, 72), P4 (182, 72)]]]]
[[P1 (170, 56), P2 (168, 60)], [P5 (188, 77),
[P3 (179, 68), [P0 (185, 72), P4 (182, 72)]]]] 0

January 22, 2024


DBSCAN
¡ Unlike KMeans or Kmediods the desired number of
clusters (K) is not given as input rather DBSCAN
determine dense cluster from data points.
¡ Main aim of DBSCAN is to create clusters with minimum
size and density.
§ Density is defined as minimum number of points within a certain
distance of each other.
§ Use the concept of Minimum points and ε (threshold value Eps).
¡ It handles outlier problem easily and efficiently because
outliers are not dense and hence they can’t form clusters.

1/22/24

¡ In DBSCAN there are main internal concepts like Core Point,
Noise Point, Border Point, Center Point, ε.
§ ε: It defines the neighborhood around a data point i,e distance between two
points is lower or equal to ε then they are considering as neighbors.
¡ Min.points: Minimum number of neighbors (data points) with ε
radius.
§ Larger the dataset, the large value of Min.points must be chosen.
¡ Core points: A point is said to core point if it has more than
Min.points within ε.
¡ Border point: A point which has fewer than Min.points within ε
but its in the neighborhood of core point.
¡ Noise point: A point which is not a core point or border point.

1/22/24

¡ DBSCAN divide the data points in to Core, Noise
and Border points
1. Core points if
▪ {q | dist(p,q) <= ε }>= minPts
2. Border Points if
▪ {q | dist(p,q) <= ε }< minPts
3. Noise Points if
▪ dist(p,q) <= ε and p is core point and q is noise
point

1/22/24

X Y
P1 7 4 ¡ Given the minimum point of 4
P2 6 4 and the radius of 1.9, find the
P3 5 6 core, border and noise points
P4 4 2
P5 6 3 !"#$%&'( = (+, − +. ). +(1, − 1. ).
P6 5 2
P7 3 3
P8 4 5
P9 6 5
P10 3 6
P11 4 4
P12 8 2

1/22/24

P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12
P1 0 1 2.828 3.606 1.4142 2.82843 4.1231 3.1623 1.4142 4.47214 3 2.2361
P2 1 0 2.236 2.828 1 2.23607 3.1623 2.2361 1 3.60555 2 2.8284
P3 2.8284 2.236 0 4.123 3.1623 4 3.6056 1.4142 1.4142 2 2.2361 5
P4 3.6056 2.828 4.123 0 2.2361 1 1.4142 3 3.6056 4.12311 2 4
P5 1.4142 1 3.162 2.236 0 1.41421 3 2.8284 2 4.24264 2.2361 2.2361
P6 2.8284 2.236 4 1 1.4142 0 2.2361 3.1623 3.1623 4.47214 2.2361 3
P7 4.1231 3.162 3.606 1.414 3 2.23607 0 2.2361 3.6056 3 1.4142 5.099
P8 3.1623 2.236 1.414 3 2.8284 3.16228 2.2361 0 2 1.41421 1 5
P9 1.4142 1 1.414 3.606 2 3.16228 3.6056 2 0 3.16228 2.2361 3.6056
P10 4.4721 3.606 2 4.123 4.2426 4.47214 3 1.4142 3.1623 0 2.2361 6.4031
P11 3 2 2.236 2 2.2361 2.23607 1.4142 1 2.2361 2.23607 0 4.4721
P12 2.2361 2.828 5 4 2.2361 3 5.099 5 3.6056 6.40312 4.4721 0

1/22/24

Points Status
P1 P1: P2, P5, P9 Core
P2 P2: P1, P5, P9 Core
P3 P3: P8, P9 Noise Border
P4 P4: P6, P7 Noise
P5 P5: P1, P2, P6 Core
P6 P6: P4, P5 Noise Border
P7 P7: P4, P11 Noise
P8 P8: P3, P10, P11 Core
P9 P9: P1, P2, P3 Core
P10 P10: P8 Noise Border
P11 P11: P7, P8 Noise Border
P12 P12: Noise
1/22/24

X Y Given the minimum point of 3 and


P1 2 10 the radius of 2.0, find the core,
P2 2 5 border and noise points?
P3 8 4
P4 5 8
P5 7 5
P6 6 4
P7 1 2
P8 4 9

1/22/24
Divisive Clustering
¡ A type of clustering algorithm that group data object using
in top down manner
§ Initially all the object are in one cluster
§ Then the cluster is subdivided into smaller and smaller pieces until
each object forms a cluster on its own or until it satisfies certain
termination condition as the desired number of clusters is obtained
¡ The algorithm works using minimal spanning tree (MST)
§ Compute a MST for the given adjacency matrix
§ Repeat
§ Create a new cluster by breaking the link corresponding to the new
distance
§ Do until one cluster remain

1/22/24

A B C D E From the adjacency


A 0 1 2 2 3 matrix sort the data in
ascending order.
B 1 0 2 4 3
C 2 2 0 1 4
D 2 4 1 0 3
E 3 3 5 3 0

1/22/24

Edge Cost Select the minimal cost
A-B 1 from the element listed
C-D 1 without creating an a loop
from the edge.
A-C 2
A-D 2 A
Thus edge A is connected
B-C 2 to B at the cost of 1.
A-E 3 1

D-E 3
B
B-E 3
B-D 4
C-E 5
1/22/24

Edge Cost Select the next minimal cost
A-B 1 from the element listed
without creating loop.
C-D 1
A-C 2 Thus edge C is connected to
A-D 2 A C D at the cost of 1.
B-C 2
A-E 3 1 1

D-E 3
B D
B-E 3
B-D 4
C-E 5
1/22/24

Edge Cost Select the next minimal cost Edge Cost
A-B 1 from the element listed without A-B 1
loop which is 2. C-D 1
C-D 1
A-C 2 A-D 2
Thus edge A is connected to C
A-D 2 at the cost of 2.
B-C 2
A-E 3
A C
D-E 3
2
B-E 3 1 1
B-D 4
C-E 5 B D
1/22/24

Edge Cost In selecting the next minimal
Edge Cost
A-B 1 cost from the element listed 2
A-B 1
but it create a loop in B to C
C-D 1 C-D 1
and A-C, thus you have to
A-C 2 jump. A-D 2
A-D 2 B-C 2 *
B-C 2 A-C 2 *
A-E 3
A C
D-E 3
2
B-E 3 1 1
B-D 4
C-E 5 B D
1/22/24

Edge Cost Select the next minimal cost Edge Cost
A-B 1 from the element listed without A-B 1
loop which is 3. C-D 1
C-D 1
Thus edge A is connected to E.
A-C 2 A-D 2
E
A-D 2 B-C 2 *

B-C 2 3 A-C 2 *

A-E 3 A-E 3
A C
D-E 3 2
B-E 3 1 1
B-D 4
C-E 5 B D

1/22/24

Edge Cost In selecting the next minimal Edge Cost
A-B 1 cost from the element listed A-B 1
3, 4 and 5 but it create a loop C-D 1
C-D 1
in, thus you have to jump.
A-C 2 A-D 2
E
A-D 2 B-C 2 *

B-C 2 3
A-C 2 *

A-E 3 A-E 3
A C
D-E 3 D-E 3 *
2
B-E 3 1
B-E 3 *
1
B-D 4 B-D 4 *

C-E 5 B D C-E 5 *

1/22/24

Edge Cost In selecting the next minimal
Edge Cost
A-B 1 cost from the element listed 3
A-B 1
but it create a loop in B to C
C-D 1 C-D 1
and A-C, thus you have to
A-C 2 jump A-D 2
E
A-D 2 B-C 2 *
B-C 2 3 A-C 2 *
A-E 3
A C
D-E 3
2
B-E 3 1 1
B-D 4
C-E 5 B D
1/22/24

¡ The largest edge is between A and E.


E
§ Cutting this edge results in to two cluster
{E} and {A, B, C, D}
3
¡ Next, removing the edge between A
A C and C.
2 § Three cluster. {A, B}, {C, D} and {E}
1 1 ¡ Next breaking A and B.
§ Three cluster. {A}, {B}, {C, D} and {E}
B D ¡ Next breaking C and D.
§ Three cluster. {A}, {B}, {C}, {D} and {E}

1/22/24
KNN
¡ KNN can be used for both classification and regression
predictive problems.
¡ K nearest neighbors is a simple algorithm that stores all
available cases and classifies new cases based on majority
(similarity) vote.
¡ KNN has been used in statistical estimation and pattern
recognition. Three important aspects of KNN:
§ Ease to interpret output
§ Calculation time and
§ Predictive Power

January 22, 2024


Distance Measure
¡ Distance measures are only valid for continuous
variables.

¡ Categorical variables the Hamming distance

January 22, 2024


KNN Exercise
¡ Given table cluster the data using k-NN algorithm
No Durability Strength Cluster Distance

A 7 7 Weak

B 7 4 Weak

C 3 4 Strong

D 4 4 Strong

E 6 7 Weak

F 3 5 Strong

G 1 3 Strong

H 5 4 Weak

I 4 3 ?????

January 22, 2024



Weight height Cluster Distance
¡ Given the following predict 51 167 Underweight
§ Weight 60 with Height 180
62 182 Normal

69 176 Normal

64 160 Overweight

65 172 Normal

56 174 Underweight

68 158 Overweight

57 173 Normal

58 169 Normal

68 158 Overweight

55 170 Normal

58 184 Underweight

January 22, 2024


SVM
¡ SVM is a supervised ML algorithm used for both
classification or regression mostly used in classification
problems.
¡ The algorithm plot each data item as a point in n-
dimensional space with the value of each feature being
the value of a particular coordinate.
§ Classification done by finding the hyper-plane that differentiate the two
classes very well.

January 22, 2024


Naïve Bayes
¡ is a classification technique based on Bayes’ theorem with an
assumption of independence among predictors.
¡ Naive Bayes model is easy to build and particularly useful for very
large data sets. Along with simplicity, Naive Bayes is known
to outperform even highly sophisticated classification methods.

!(#/") ∗ !(")
! "# =
!(#)
¡ Given
§ P(c|x) is the posterior probability of class (c, target) given predictor (x, attributes).
§ P(c) is the prior probability of class.
§ P(x|c) is the likelihood which is the probability of predictor given class.
§ P(x) is the prior probability of predictor.

January 22, 2024


Exercise
¡ Given the following
Weather Play Weather No Yes Probability
Sunny No Sunny 2 3 =5/14 (0.36)
Overcast Yes Overcast 4 =4/14 (0. 29)
Rainy Yes
Rainy 3 2 =5/14 (0.36)
Sunny Yes
Total 5 9
Sunny Yes
Probability =5/14 (0.36) =9/14 (0.64)
Overcast Yes
Rainy No § What is the probability of players will play if weather is sunny ?
Rainy No
Sunny Yes
Rainy Yes
Sunny No
Overcast Yes
Overcast Yes
Rainy No

January 22, 2024



¡ Given the following
Weather Play Weather No Yes Probability
Sunny No Sunny 2 3 =5/14 (0.36)
Overcast Yes Overcast 4 =4/14 (0. 29)
Rainy Yes
Rainy 3 2 =5/14 (0.36)
Sunny Yes
Total 5 9
Sunny Yes
Probability =5/14 (0.36) =9/14 (0.64)
Overcast Yes
Rainy No § What is the probability of players will play if weather is sunny ?
Rainy No
5 6
Sunny Yes ((*+,,-/-/*)∗((-/*) ∗
! "#$ $%&&" = = 0.6 = 6 78
9
Rainy Yes ((*+,,-)
78
Sunny No
Overcast Yes
§ What is the probability of players will play if weather is rainy ?
Overcast Yes
Rainy No

January 22, 2024


Decision Tree

¡ Decision Trees is one of the non-parametric


supervised learning approach for both
regression and classification problems.
¡ The goal is to create a model that predicts the
value of a target variable by learning simple
decision rules inferred from the data features.

January 22, 2024



¡ Follows the tree analogy,
§ DT implement a sequential decision process starting
from the root node until a final leaf is reached, which
normally represents the target.
¡ Decision trees are also attractive models if we care
about interpretability.

January 22, 2024


DT Varieties

¡ ID3
¡ C5 which is modified version of C4.5
¡ CART (Classification and Regression trees)

January 22, 2024


DT Varieties

¡ ID3
¡ C5 which is modified version of C4.5
¡ CART (Classification and Regression trees)

January 22, 2024


Examples
Day Outlook Temperature Humidity Wind Decision
1 Sunny Hot High Weak No
2 Sunny Hot High Strong No
3 Overcast Hot High Weak Yes
4 Rain Mild High Weak Yes
5 Rain Cool Normal Weak Yes
6 Rain Cool Normal Strong No
7 Overcast Cool Normal Strong Yes
8 Sunny Mild High Weak No
9 Sunny Cool Normal Weak Yes
10 Rain Mild Normal Weak Yes
11 Sunny Mild Normal Strong Yes
12 Overcast Mild High Strong Yes
13 Overcast Hot Normal Weak Yes
14 Rain Mild High Strong No
January 22, 2024

¡ Gini index
§ A metric for classification tasks in CART by
storing the sum of squared probabilities of each
class. *
!"#" = 1 − ∑ () where Pi is the probability

Outlook Yes No Inst Temp Yes No Inst


Sunny 2 3 5 Hot 2 2 4
Overcast 4 0 4 Cool 3 1 4
Rain 3 2 5 Mild 4 2 6

Wind Yes No Inst Humidity Yes No Inst


Weak 6 2 8 High 3 4 7
Strong 3 3 6 Normal 6 1 7
January 22, 2024

Outlook Yes No Inst
Sunny 2 3 5
Overcast 4 0 4
Rain 3 2 5

¡ !"#" $%&'(() = +%##, = 1 − 0⁄ 0 + 3⁄ 0 = 0.48


1 1

0 @⁄ 0 )
¡ !"#" $%&'(() = $89:;<=& = 1 − ( ?⁄? + ? =0
¡ !"#" $%&'(() = B<"# = 1 − 3⁄ 0 + 0⁄ 0 = 0.48
1 1

¡ !"#" $%&'(() = 1⁄C? ∗ 0.48 + ?⁄C? ∗ 0 + 1⁄C? ∗ 0.48 = 0.3428

January 22, 2024



Temp Yes No Inst
Wind Yes No Inst Humidity Yes No Inst
Hot 2 2 4
Weak 6 2 8 High 3 4 7
Cool 3 1 4
Strong 3 3 6 Normal 6 1 7
Mild 4 2 6

!"#" $"#% = 8'14 ∗ 0.375 + 6'14 ∗ 0.5 = 0.3428

!"#" 4567 = 4'14 ∗ 0.5 + 4'14 ∗ 0.375 + 6'14 ∗ 0.445 = 0.439

!"#" ℎ:6"% = 7'14 ∗ 0.489 + 7'14 ∗ 0.245 = 0.367

January 22, 2024



¡ Gini index provides the calculated cost for
each feature cost. The one with lowest Gini
index value will be in the top of the tree.

Feature Gini
Outlook 0.342
Temperature 0.439
Humidity 0.367
Wind 0.428

January 22, 2024


Sunny Rainy
Outlook

Day Outlook Temperature Humidity Wind Decision Day Outlook Temperature Humidity Wind Decision
4 Rain Mild High Weak Yes
1 Sunny Hot High Weak No 5 Rain Cool Normal Weak Yes
2 Sunny Hot High Strong No 6 Rain Cool Normal Strong No
8 Sunny Mild High Weak No Overcast 10 Rain Mild Normal Weak Yes
9 Sunny Cool Normal Weak Yes 14 Rain Mild High Strong No
11 Sunny Mild Normal Strong Yes

Day Outlook Temperature Humidity Wind Decision


3 Overcast Hot High Weak Yes
7 Overcast Cool Normal Strong Yes
12 Overcast Mild High Strong Yes
13 Overcast Hot Normal Weak Yes

January 22, 2024



Day Outlook Temp. Humidity Wind Decision
1 Sunny Hot High Weak No
2 Sunny Hot High Strong No
8 Sunny Mild High Weak No
9 Sunny Cool Normal Weak Yes
11 Sunny Mild Normal Strong Yes

Temp Yes No Inst


Wind Yes No Inst Humidity Yes No Inst
Hot 0 2 2
Weak 1 2 3 High 0 3 3
Cool 1 0 1
Strong 1 1 2 Normal 2 0 2
Mild 1 1 2
!"#" ℎ@2 = 3'5 ∗ 0 + 2'5 ∗ 0
!"#" $"#% = 3'5 ∗ 0.444 + 2'5 ∗ 0.5 !"#" 0123 = 2'5 ∗ 0 + 1'5 ∗ 0 + 2'5 ∗ 0.5
Gini(wind) = 0.467 !"#" ℎ@2 = 0
!"#" 0123 = 0.2

¡ Gini index 0 in the humidity shows that, after selecting Outlook=Sunny, humidity
follows irrespective of the wind and temperature.

January 22, 2024


Sunny Rain
Outlook

Day Outlook Temperature Humidity Wind Decision


4 Rain Mild High Weak Yes
5 Rain Cool Normal Weak Yes
6 Rain Cool Normal Strong No
Humidity Overcast 10 Rain Mild Normal Weak Yes
Normal

14 Rain Mild High Strong No


High

No Yes
Yes

January 22, 2024



Day Outlook Temp. Humidity Wind Decision
4 Rain Mild High Weak Yes
5 Rain Cool Normal Weak Yes
6 Rain Cool Normal Strong No
10 Rain Mild Normal Weak Yes
14 Rain Mild High Strong No

Wind Yes No Inst Temp Yes No Inst Humidity Yes No Inst


Weak 3 0 3 Cool 1 1 2 High 1 1 2
Strong 0 2 2 Mild 2 1 3 Normal 2 1 3

!"#" ℎ?0 = 2'5 ∗ 0.5 + 3'5 ∗ 0.444


!"#" $"#% = 3'5 ∗ 0 + 2'5 ∗ 0 !"#" ./01 = 2'5 ∗ 0.5 + 3'5 ∗ 0.444
Gini(wind) = 0 !"#" ℎ?0 = 0.4667
!"#" ./01 = 0.4667

¡ Gini index 0 in the humidity shows that, after selecting Outlook=Rain, wind
follows irrespective of the temp and humidity.

January 22, 2024


Sunny Rain
Outlook

Humidity Overcast Wind


Normal

Strong
Weak
High

No Yes Yes No
Yes

January 22, 2024

You might also like