0% found this document useful (0 votes)

44 views74 pages

Basic ML Algorithm

Uploaded by

WONDYE DESTA

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

44 views74 pages

Basic ML Algorithm

Uploaded by

WONDYE DESTA

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 74

Machine Learning

Algorithms

Michael Melese (Ph.D.)

[email protected]
January 22, 2024
Algorithm
¡ An algorithm is a procedure or set of steps or rules to
accomplish a task. Algorithms are one of the fundamental
concepts in, or building blocks of, computer science.
¡ Some of the basic types of tasks that algorithms can solve are
§ sorting, searching, and graph-based computational problems
¡ In data science, there are at least three classes of algorithms
one should be aware of;
§ Data munging, preparation, and processing algorithms, such as
sorting, MapReduce, or Pregel.
§ Optimization algorithms for parameter estimation, including
Stochastic Gradient Descent, Newton’s Method, and Least Squares.
§ Machine learning algorithms.

January 22, 2024

Machine Learning
¡ Machine learning algorithms are largely used to
predict, classify, or cluster.
¡ Machine learning algorithms are the basis of artificial
intelligence (AI) such as image recognition, speech
recognition, recommendation systems, ranking and
personalization of content.
¡ Machine learning algorithms are described as
learning a target function (f) that best maps input
variables (X) to an output variable (Y): Y = f(X)

January 22, 2024

Algorithm
¡ Linear Regression ¡ Classification and Regression
Trees
¡ Logistic Regression
¡ Hierarchical Clustering
¡ K-Means ¡ Learning Vector Quantization
¡ Support Vector Machines
¡ Naive Bayes
¡ Bagging and Random Forest
¡ K-Nearest Neighbors ¡ Boosting and AdaBoost
¡ PCA ¡ Linear Discriminant Analysis
January 22, 2024
Linear Regression
¡ Is one of the fundamental supervised machine-learning algorithms
due to its relative simplicity and well-known properties.
¡ Is one of the most well known and understood algorithms in
statistics and ML.
¡ Express the mathematical relationship between two variables or
attributes.
¡ Predictive modeling is primarily concerned with minimizing the
error of a model or making the most accurate predictions, at the
expense of explain ability.

January 22, 2024

…
¡ Linear regression might be simple linear or
multivariant.
§ The case of one explanatory variable is called simple linear
regression.
§ More than one explanatory variable, the process is
called multiple linear regression.
¡ Assumption
§ There is a linear relationship between an outcome variable
(dependent variable) and a predictor (independent variable
or feature).

January 22, 2024

…
¡ The relationship between independent and dependent
variables by fitting a best line using the coefficients b and
m are derived from the given input by minimizing the
sum of squared difference of distance between data
points and regression line.

where: y – Dependent variable,

b – Intercept,
y= b+ m*x x – Independent variable and
m – slope

January 22, 2024

…
$% ∑ + − +, - ∑ ( − (̅ -
!=# $% = $& =
$& .−1 .−1
B = +, − 0(̅
∑ ( − (̅ + − +, Where: r – Pearson correlation coefficient variable,
#= $& $% – Standard deviation,
∑ ( − (̅ - ∑ + − +, - (̅ – x mean and
+, – y mean

(∑ +)(∑ ( - ) − (∑ ()(∑ (+) 3 ∑ &% 4(∑ &)(∑ %)

0= m= 3(∑ & 5 )4(∑ &)5
.(∑ ( - ) − (∑ ()-

January 22, 2024

…
Glucose
No Age (X) XY X2 Y2
level (Y)
1 43 99 4257 1849 9801
2 21 65 1365 441 4225
3 25 79 1975 625 6241
4 42 75 3150 1764 5625
5 57 87 4959 3249 7569
6 59 81 4779 3481 6561
Σ 247 486 20485 11409 40022

January 22, 2024

…
Glucose
No Age (X)
level (Y)
XY X2 Y2 486 11409 − (247)(20485)
!=
1 43 99 4257 1849 9801 6 11409 − (247)/
2 21 65 1365 441 4225
3 25 79 1975 625 6241
4 42 75 3150 1764 5625
5 57 87 4959 3249 7569 ! = 65.1416
6 59 81 4779 3481 6561
Σ 247 486 20485 11409 40022 2 /3456 7(/48)(452)
m= 2 9943: 7 (/48);

m= 0.38522

y= 0.38522*x + 65.141
January 22, 2024
Logistic Regression
¡ Logistic regression is a supervised learning
algorithm.
¡ LR analysis studies the association between a
categorical dependent and independent
variables.
¡ The goal of Logistic Regression is to discover
a link between characteristics and the
likelihood of a specific outcome.

January 22, 2024

…
¡ Like a linear regression, logistic regression is
used when the response variable is a
categorical variable with two possible option.
For example;
§ A student applicant receives or does not receive
aid.
§ A patient lives or dies during emergency surgery.
§ Phone coverage is acceptable or not.

January 22, 2024

…
¡ Generally the two outcomes of the response
variable “success’’ represent them by 1 and 0
for a failure.
§ The mean is then the proportion of 1’s, p =
P(success) while p=P(failure) for 0’s
§ Student passes or fails an exam based on the
number of hours spent studying, the response
variable has two values: pass and fail.

January 22, 2024

…
¡ Generally logistic regression can be classified into:
Ø Binary Logistic Regression – two or binary
outcomes like yes or no
Ø Multinomial Logistic Regression – three or more
outcomes like Married, Single, Divorced, or
Widowed.
Ø Ordinal Logistic Regression – three or more like
multinomial logistic regression but here with the
order like customer rating in the supermarket from
1 to 5.

January 22, 2024

…
¡ LR model work majority of the datasets, but still, if
you need good performance, then there are some
assumptions to consider,
Ø The dependent variable in binary logistic regression must
be binary.
Ø Only you have to choose relevant attribute should be
included.
Ø The independent variables must be unrelated to one another.
Ø That is, there should be minimal or no multicollinearity in the model.
Ø The log chances are proportional to the independent
variables.
Ø Large sample sizes are required for logistic regression.
January 22, 2024
…
¡ Logistic Regression utilizes a
more sophisticated cost function, X transformed
-3 0.04742587318
which is known as the “Sigmoid -2 0.119202922
function” or “logistic function”. -1 0.2689414214

¡ The function produces an S- 0 0.5

1 0.7310585786
shaped curve.
2 0.880797078
§ It returns a probability value 3 0.9525741268
between 0 and 1 and convert
expected values to probabilities.
January 22, 2024
…
Ø Like linear regression x1 x2 y
the logistic regression 2.7810836 2.550537 0

has independent number 1.46548937 2.36212508 0

3.39656169 4.40029353 0
of coefficients here 2 1.38807019 1.85022032 0
3.06407232 3.00530597 0
output = b0 + b1*x1 + b2*x2 7.62753121 2.75926224 1
5.33244125 2.08862678 1
6.92259672 1.77106367 1
8.67541865 -0.2420687 1
7.67375647 3.50856301 1

January 22, 2024

…
¡ The logistic regression model takes real-valued inputs
and makes a prediction as to the probability of the
input belonging to the default class (class 0).
§ If the probability is > 0.5 we can take the output as a
prediction for the default class (class 0), otherwise the
prediction is for the other class (class 1).
¡ For this dataset, the logistic regression has three
coefficients just like linear regression, for example:
output = b0 + b1*x1 + b2*x2
¡ The job of the learning algorithm is to discover the best
values for the coefficients (b0, b1 and b2) based on the
training data.
January 22, 2024
…
¡ Unlike linear regression, the output is transformed into a probability
using the logistic function:
p(class=0) = 1 / (1 + e-output)
¡ Let’s start off by assigning 0.0 to each coefficient and calculating
the probability of the first training instance that belongs to class 0.
¡ B0 = 0.0 , B1 = 0.0 , B2 = 0.0
¡ The first training instance is: x1=2.7810836, x2=2.550537003, Y=0
¡ Using the above equation we can plug in all of these numbers and
calculate a prediction:
prediction = 1 / (1 + e^(-(b0 + b1*x1 + b2*x2)))
prediction = 1 / (1 + e^(-(0.0 + 0.0*2.7810836 +
0.0*2.550537003)))
prediction = 0.5

January 22, 2024

…
¡ We can calculate the new coefficient values using a simple update
equation.
b= b+ alpha * (y –prediction) * prediction * (1 –prediction) * x
ü B is the coefficient we are updating and prediction is the output of making a
prediction using the model.
ü Alpha is parameter (learning rate) that you must specify at the beginning of the
training run. Good values might be in the range 0.1 to 0.3. Let’s use a value of
0.3.
ü B0 does not have an input. This coefficient is often called the bias or the
intercept and we can assume it always has an input value of 1.0.
¡ After calculating the output value, then it is transformed into a probability
using the logistic function or sigmoid function :
p(class=0) = 1 / (1 + e^(-output))

January 22, 2024

…
¡ Let’s update the coefficients using the prediction (0.5) and coefficient
values (0.0) from the previous section.
¡ For the first instance
Ø b0 = b0 + 0.3 * (0 – 0.5) * 0.5 * (1 – 0.5) * 1.0 = -0.0375
Ø b1 = b1 + 0.3 * (0 – 0.5) * 0.5 * (1 – 0.5) * 2.7810836 = -0.104290635
Ø b2 = b2 + 0.3 * (0 – 0.5) * 0.5 * (1 – 0.5) * 2.550537003 = -0.09564513761
¡ Substituting the values b0, b1, b2 to calculate prediction for our 1 instance:
prediction = 1 / (1 + e^(-(b0 + b1*x1 + b2*x2)))
= 1 / (1 + e^(-(-0.0375 + (-0.104290635 * 2.7810836) + (-0.09564513761 *
2.550537003)))) = 0.2987569857

January 22, 2024

…
¡ For the second instance
Ø b0 = b0 + 0.3 * (0 – 0.5) * 0.5 * (1 – 0.5) * 1.0 = -0.0375
Ø b1 = b1 + 0.3 * (0 – 0.5) * 0.5 * (1 – 0.5) * 1.46548937 = -
0.054955851375
Ø b2 = b2 + 0.3 * (0 – 0.5) * 0.5 * (1 – 0.5) * 2.36212508 = -
0.0885796905
¡ Substituting the values b0, b1, b2 to calculate prediction for
the second instance:
prediction = 1 / (1 + e^(-(b0 + b1*x1 + b2*x2)))
= 1 / (1 + e^(-(-0.0375 +( -0.054955851375 * 1.46548937) +
(-0.0885796905 * 2.36212508)))) = 0.145951056

January 22, 2024

…
¡ For the third instance
Ø b0 = b0 + 0.3 * (0 – 0.5) * 0.5 * (1 – 0.5) * 1.0 = -0.0375
Ø b1 = b1 + 0.3 * (0 – 0.5) * 0.5 * (1 – 0.5) * 3.39656169 = -
0.127371063375
Ø b2 = b2 + 0.3 * (0 – 0.5) * 0.5 * (1 – 0.5) * 4.40029353 = -
0.165011007375
¡ Substituting the values b0, b1, b2 to calculate prediction for
the second instance:
prediction = 1 / (1 + e^(-(b0 + b1*x1 + b2*x2)))
= 1 / (1 + e^(-(-0.0375 +(-0.127371063375 * 3.39656169) + (-
0.165011007375 * 4.40029353)))) = 0.0853326531
¡ Repeat this process and update the model for each training
instance in the dataset.

January 22, 2024

…
training output instance Convert Probability Reference
0.2987569857 0 0
0.145951056 0 0
0.0853326531 0 0
0.2197373144 0 0
0.2470590002 0 0
0.9547021348 1 1
0.8620341908 1 1
0.9717729051 1 1
0.9992954521 1 1
0.905489323 1 1

January 22, 2024

K-means
¡ Clustering is the process of partitioning a group of
data points into a small number of clusters.
¡ K-means clustering is a type of unsupervised
learning, which is used when you have unlabeled
data.
§ The goal of this algorithm is to find groups in the data, with
the number of groups represented by the variable K.
¡ The algorithm works iteratively to assign each data
point to one of K groups based on the features
provided.

January 22, 2024

…
¡ Data points are clustered based on feature
similarity. The results of the K-means
clustering algorithm are:
¡ The centroids of the K clusters, which can be
used to label new data
¡ Each data point is assigned to a single cluster.

January 22, 2024

…
¡ Behavioral segmentation
§ Segment by purchase history, activities on application, website,
or platform
§ Define personas based on interests
¡ Inventory categorization
§ Group inventory by sales activity and manufacturing metrics
¡ Sorting sensor measurements
§ Detect activity types in motion sensors
§ Group images, Separate audio and Identify groups in health
monitoring
¡ Detecting bots or anomalies
§ Separate valid activity groups from bots
§ Group valid activity to clean up outlier detection

January 22, 2024

K-means algorithm
¡ The Κ-means clustering algorithm uses iterative refinement to produce a
cluster. The algorithm inputs are the number of clusters Κ and the data set.
§ The data set is a features for each data point. The algorithms starts with initial
estimates of Κ centroids, which can either be randomly generated or randomly
selected from data set.
¡ The algorithm then iterates between the following step:
§ Initially, randomly pick k centroids (or points that will be the center of your
clusters) in d-space. Try to make them near the data but different from one
another.
§ Then assign each data point to the closest centroid.
§ Move the centroids to the average location of the data points (which correspond
to users in this example) assigned to it.
§ Repeat the preceding two steps until the assignments don’t change, or change
very little. (i.e., no data points change clusters, the sum of the distances is
minimized, or some maximum number of iterations is reached).
January 22, 2024
…
¡ The algorithm finds the clusters and data set labels for a
particular pre-chosen K. To find the number of clusters, the
user needs to run the K-means clustering algorithm for a range
of K and compare the results.
§ There is no method for determining exact value of K, but an accurate
estimate can be obtained using the following techniques.
Given n data points xi, i=1...n to be partitioned in k clusters

where ci is the set of points that belong to cluster. The K-

means clustering uses the square of the Euclidean
distance d(x,μi)=ǁx−μiǁ2.

January 22, 2024

K-means
¡ Given the following two table cluster the data using
k-means algorithm
No x y Cluster
No x y Cluster A 1 1
1 185 72 B 1 0
2 170 56 C 0 2
3 169 60 D 2 4
4 179 68 E 3 4
5 182 72 F 1 2
6 188 77 G 2 3
H 1 3
January 22, 2024
Hierarchical Clustering
¡ Separate the data into different groups using
hierarchy of clusters using similarity measure.
§ Agglomerative (bottom-up approach)
▪ Starts by merging based on the distance between clusters.
This will be done until we form one big cluster
§ Divisive (top-down approach)
▪ Starts by huge cluster and breaks in to smaller cluster
and continues until it reaches individual data points (or
single point clusters).

January 22, 2024

Agglomerative
A B C D E F
Height 185 170 168 179 182 188
Weight 72 56 60 68 72 77

P0 (185, 72) P1 (170, 56) P2 (168, 60) P3 (179, 68) P4 (182, 72) P5 (188, 77)
P0 (185, 72)
0 0 0 0 0 0
P1 (170, 56)
21.93 0 0 0 0 0
P2 (168, 60)
20.81 4.47 0 0 0 0
P3 (179, 68)
7.21 15 13.6 0 0 0
P4 (182, 72)
3 20 18.44 5 0 0
P5 (188, 77)
5.83 27.66 26.25 12.73 7.81 0

January 22, 2024

…
[P0 (185, 72),
P1 (170, 56) P2 (168, 60) P3 (179, 68) P5 (188, 77)
P4 (182, 72)]
[P0 (185, 72),
P4 (182, 72)] 0 0 0 0 0
P1 (170, 56)
20.0 0 0 0 0
P2 (168, 60)
18.44 4.47 0 0 0
P3 (179, 68)
5 15 13.6 0 0
P5 (188, 77)
5.83 27.66 26.25 12.73 0

[P0 (185, 72), [P1 (170, 56), P3 (179,

P5 (188, 77)
P4 (182, 72)] P2 (168, 60)] 68)
[P0 (185, 72),
P4 (182, 72)] 0 0 0 0
[P1 (170, 56),
P2 (168, 60)] 18.44 0 0 0
P3 (179, 68)
5 13.6 0 0
P5 (188, 77)
5.83 26.25 12.73 0
January 22, 2024
…
[P3 (179, 68), [P0 (185, 72), P1 (170, 56) P2 (168,
P5 (188, 77)
P4 (182, 72)]] 60)
[P3 (179, 68), [P0 (185, 72),
P4 (182, 72)]] 0 0 0
P1 (170, 56)
P2 (168, 60) 13.6 0 0
P5 (188, 77)
5.83 26.25 0

[P5 (188, 77),

P1 (170, 56) P2
[P3 (179, 68), [P0 (185, 72),
(168, 60)
P4 (182, 72)]]]
[P5 (188, 77),
[P3 (179, 68), [P0 (185,
72),
P4 (182, 72)]]] 0 0
P1 (170, 56)
P2 (168, 60) 13.6 0

January 22, 2024

…
[[P1 (170, 56), P2 (168, 60)], [P5 (188, 77),
[P3 (179, 68), [P0 (185, 72), P4 (182, 72)]]]]
[[P1 (170, 56), P2 (168, 60)], [P5 (188, 77),
[P3 (179, 68), [P0 (185, 72), P4 (182, 72)]]]] 0

January 22, 2024

DBSCAN
¡ Unlike KMeans or Kmediods the desired number of
clusters (K) is not given as input rather DBSCAN
determine dense cluster from data points.
¡ Main aim of DBSCAN is to create clusters with minimum
size and density.
§ Density is defined as minimum number of points within a certain
distance of each other.
§ Use the concept of Minimum points and ε (threshold value Eps).
¡ It handles outlier problem easily and efficiently because
outliers are not dense and hence they can’t form clusters.

1/22/24
…
¡ In DBSCAN there are main internal concepts like Core Point,
Noise Point, Border Point, Center Point, ε.
§ ε: It defines the neighborhood around a data point i,e distance between two
points is lower or equal to ε then they are considering as neighbors.
¡ Min.points: Minimum number of neighbors (data points) with ε
radius.
§ Larger the dataset, the large value of Min.points must be chosen.
¡ Core points: A point is said to core point if it has more than
Min.points within ε.
¡ Border point: A point which has fewer than Min.points within ε
but its in the neighborhood of core point.
¡ Noise point: A point which is not a core point or border point.

1/22/24
…
¡ DBSCAN divide the data points in to Core, Noise
and Border points
1. Core points if
▪ {q | dist(p,q) <= ε }>= minPts
2. Border Points if
▪ {q | dist(p,q) <= ε }< minPts
3. Noise Points if
▪ dist(p,q) <= ε and p is core point and q is noise
point

1/22/24
…

X Y
P1 7 4 ¡ Given the minimum point of 4
P2 6 4 and the radius of 1.9, find the
P3 5 6 core, border and noise points
P4 4 2
P5 6 3 !"#$%&'( = (+, − +. ). +(1, − 1. ).
P6 5 2
P7 3 3
P8 4 5
P9 6 5
P10 3 6
P11 4 4
P12 8 2

1/22/24
…
P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12
P1 0 1 2.828 3.606 1.4142 2.82843 4.1231 3.1623 1.4142 4.47214 3 2.2361
P2 1 0 2.236 2.828 1 2.23607 3.1623 2.2361 1 3.60555 2 2.8284
P3 2.8284 2.236 0 4.123 3.1623 4 3.6056 1.4142 1.4142 2 2.2361 5
P4 3.6056 2.828 4.123 0 2.2361 1 1.4142 3 3.6056 4.12311 2 4
P5 1.4142 1 3.162 2.236 0 1.41421 3 2.8284 2 4.24264 2.2361 2.2361
P6 2.8284 2.236 4 1 1.4142 0 2.2361 3.1623 3.1623 4.47214 2.2361 3
P7 4.1231 3.162 3.606 1.414 3 2.23607 0 2.2361 3.6056 3 1.4142 5.099
P8 3.1623 2.236 1.414 3 2.8284 3.16228 2.2361 0 2 1.41421 1 5
P9 1.4142 1 1.414 3.606 2 3.16228 3.6056 2 0 3.16228 2.2361 3.6056
P10 4.4721 3.606 2 4.123 4.2426 4.47214 3 1.4142 3.1623 0 2.2361 6.4031
P11 3 2 2.236 2 2.2361 2.23607 1.4142 1 2.2361 2.23607 0 4.4721
P12 2.2361 2.828 5 4 2.2361 3 5.099 5 3.6056 6.40312 4.4721 0

1/22/24
…
Points Status
P1 P1: P2, P5, P9 Core
P2 P2: P1, P5, P9 Core
P3 P3: P8, P9 Noise Border
P4 P4: P6, P7 Noise
P5 P5: P1, P2, P6 Core
P6 P6: P4, P5 Noise Border
P7 P7: P4, P11 Noise
P8 P8: P3, P10, P11 Core
P9 P9: P1, P2, P3 Core
P10 P10: P8 Noise Border
P11 P11: P7, P8 Noise Border
P12 P12: Noise
1/22/24
…

X Y Given the minimum point of 3 and

P1 2 10 the radius of 2.0, find the core,
P2 2 5 border and noise points?
P3 8 4
P4 5 8
P5 7 5
P6 6 4
P7 1 2
P8 4 9

1/22/24
Divisive Clustering
¡ A type of clustering algorithm that group data object using
in top down manner
§ Initially all the object are in one cluster
§ Then the cluster is subdivided into smaller and smaller pieces until
each object forms a cluster on its own or until it satisfies certain
termination condition as the desired number of clusters is obtained
¡ The algorithm works using minimal spanning tree (MST)
§ Compute a MST for the given adjacency matrix
§ Repeat
§ Create a new cluster by breaking the link corresponding to the new
distance
§ Do until one cluster remain

1/22/24
…

A B C D E From the adjacency

A 0 1 2 2 3 matrix sort the data in
ascending order.
B 1 0 2 4 3
C 2 2 0 1 4
D 2 4 1 0 3
E 3 3 5 3 0

1/22/24
…
Edge Cost Select the minimal cost
A-B 1 from the element listed
C-D 1 without creating an a loop
from the edge.
A-C 2
A-D 2 A
Thus edge A is connected
B-C 2 to B at the cost of 1.
A-E 3 1

D-E 3
B
B-E 3
B-D 4
C-E 5
1/22/24
…
Edge Cost Select the next minimal cost
A-B 1 from the element listed
without creating loop.
C-D 1
A-C 2 Thus edge C is connected to
A-D 2 A C D at the cost of 1.
B-C 2
A-E 3 1 1

D-E 3
B D
B-E 3
B-D 4
C-E 5
1/22/24
…
Edge Cost Select the next minimal cost Edge Cost
A-B 1 from the element listed without A-B 1
loop which is 2. C-D 1
C-D 1
A-C 2 A-D 2
Thus edge A is connected to C
A-D 2 at the cost of 2.
B-C 2
A-E 3
A C
D-E 3
2
B-E 3 1 1
B-D 4
C-E 5 B D
1/22/24
…
Edge Cost In selecting the next minimal
Edge Cost
A-B 1 cost from the element listed 2
A-B 1
but it create a loop in B to C
C-D 1 C-D 1
and A-C, thus you have to
A-C 2 jump. A-D 2
A-D 2 B-C 2 *
B-C 2 A-C 2 *
A-E 3
A C
D-E 3
2
B-E 3 1 1
B-D 4
C-E 5 B D
1/22/24
…
Edge Cost Select the next minimal cost Edge Cost
A-B 1 from the element listed without A-B 1
loop which is 3. C-D 1
C-D 1
Thus edge A is connected to E.
A-C 2 A-D 2
E
A-D 2 B-C 2 *

B-C 2 3 A-C 2 *

A-E 3 A-E 3
A C
D-E 3 2
B-E 3 1 1
B-D 4
C-E 5 B D

1/22/24
…
Edge Cost In selecting the next minimal Edge Cost
A-B 1 cost from the element listed A-B 1
3, 4 and 5 but it create a loop C-D 1
C-D 1
in, thus you have to jump.
A-C 2 A-D 2
E
A-D 2 B-C 2 *

B-C 2 3
A-C 2 *

A-E 3 A-E 3
A C
D-E 3 D-E 3 *
2
B-E 3 1
B-E 3 *
1
B-D 4 B-D 4 *

C-E 5 B D C-E 5 *

1/22/24
…
Edge Cost In selecting the next minimal
Edge Cost
A-B 1 cost from the element listed 3
A-B 1
but it create a loop in B to C
C-D 1 C-D 1
and A-C, thus you have to
A-C 2 jump A-D 2
E
A-D 2 B-C 2 *
B-C 2 3 A-C 2 *
A-E 3
A C
D-E 3
2
B-E 3 1 1
B-D 4
C-E 5 B D
1/22/24
…

¡ The largest edge is between A and E.

E
§ Cutting this edge results in to two cluster
{E} and {A, B, C, D}
3
¡ Next, removing the edge between A
A C and C.
2 § Three cluster. {A, B}, {C, D} and {E}
1 1 ¡ Next breaking A and B.
§ Three cluster. {A}, {B}, {C, D} and {E}
B D ¡ Next breaking C and D.
§ Three cluster. {A}, {B}, {C}, {D} and {E}

1/22/24
KNN
¡ KNN can be used for both classification and regression
predictive problems.
¡ K nearest neighbors is a simple algorithm that stores all
available cases and classifies new cases based on majority
(similarity) vote.
¡ KNN has been used in statistical estimation and pattern
recognition. Three important aspects of KNN:
§ Ease to interpret output
§ Calculation time and
§ Predictive Power

January 22, 2024

Distance Measure
¡ Distance measures are only valid for continuous
variables.

¡ Categorical variables the Hamming distance

January 22, 2024

KNN Exercise
¡ Given table cluster the data using k-NN algorithm
No Durability Strength Cluster Distance

A 7 7 Weak

B 7 4 Weak

C 3 4 Strong

D 4 4 Strong

E 6 7 Weak

F 3 5 Strong

G 1 3 Strong

H 5 4 Weak

I 4 3 ?????

January 22, 2024

…
Weight height Cluster Distance
¡ Given the following predict 51 167 Underweight
§ Weight 60 with Height 180
62 182 Normal

69 176 Normal

64 160 Overweight

65 172 Normal

56 174 Underweight

68 158 Overweight

57 173 Normal

58 169 Normal

68 158 Overweight

55 170 Normal

58 184 Underweight

January 22, 2024

SVM
¡ SVM is a supervised ML algorithm used for both
classification or regression mostly used in classification
problems.
¡ The algorithm plot each data item as a point in n-
dimensional space with the value of each feature being
the value of a particular coordinate.
§ Classification done by finding the hyper-plane that differentiate the two
classes very well.

January 22, 2024

Naïve Bayes
¡ is a classification technique based on Bayes’ theorem with an
assumption of independence among predictors.
¡ Naive Bayes model is easy to build and particularly useful for very
large data sets. Along with simplicity, Naive Bayes is known
to outperform even highly sophisticated classification methods.

!(#/") ∗ !(")
! "# =
!(#)
¡ Given
§ P(c|x) is the posterior probability of class (c, target) given predictor (x, attributes).
§ P(c) is the prior probability of class.
§ P(x|c) is the likelihood which is the probability of predictor given class.
§ P(x) is the prior probability of predictor.

January 22, 2024

Exercise
¡ Given the following
Weather Play Weather No Yes Probability
Sunny No Sunny 2 3 =5/14 (0.36)
Overcast Yes Overcast 4 =4/14 (0. 29)
Rainy Yes
Rainy 3 2 =5/14 (0.36)
Sunny Yes
Total 5 9
Sunny Yes
Probability =5/14 (0.36) =9/14 (0.64)
Overcast Yes
Rainy No § What is the probability of players will play if weather is sunny ?
Rainy No
Sunny Yes
Rainy Yes
Sunny No
Overcast Yes
Overcast Yes
Rainy No

January 22, 2024

…
¡ Given the following
Weather Play Weather No Yes Probability
Sunny No Sunny 2 3 =5/14 (0.36)
Overcast Yes Overcast 4 =4/14 (0. 29)
Rainy Yes
Rainy 3 2 =5/14 (0.36)
Sunny Yes
Total 5 9
Sunny Yes
Probability =5/14 (0.36) =9/14 (0.64)
Overcast Yes
Rainy No § What is the probability of players will play if weather is sunny ?
Rainy No
5 6
Sunny Yes ((*+,,-/-/*)∗((-/*) ∗
! "#$ $%&&" = = 0.6 = 6 78
9
Rainy Yes ((*+,,-)
78
Sunny No
Overcast Yes
§ What is the probability of players will play if weather is rainy ?
Overcast Yes
Rainy No

January 22, 2024

Decision Tree

¡ Decision Trees is one of the non-parametric

supervised learning approach for both
regression and classification problems.
¡ The goal is to create a model that predicts the
value of a target variable by learning simple
decision rules inferred from the data features.

January 22, 2024

…
¡ Follows the tree analogy,
§ DT implement a sequential decision process starting
from the root node until a final leaf is reached, which
normally represents the target.
¡ Decision trees are also attractive models if we care
about interpretability.

January 22, 2024

DT Varieties

¡ ID3
¡ C5 which is modified version of C4.5
¡ CART (Classification and Regression trees)

January 22, 2024

DT Varieties

¡ ID3
¡ C5 which is modified version of C4.5
¡ CART (Classification and Regression trees)

January 22, 2024

Examples
Day Outlook Temperature Humidity Wind Decision
1 Sunny Hot High Weak No
2 Sunny Hot High Strong No
3 Overcast Hot High Weak Yes
4 Rain Mild High Weak Yes
5 Rain Cool Normal Weak Yes
6 Rain Cool Normal Strong No
7 Overcast Cool Normal Strong Yes
8 Sunny Mild High Weak No
9 Sunny Cool Normal Weak Yes
10 Rain Mild Normal Weak Yes
11 Sunny Mild Normal Strong Yes
12 Overcast Mild High Strong Yes
13 Overcast Hot Normal Weak Yes
14 Rain Mild High Strong No
January 22, 2024
…
¡ Gini index
§ A metric for classification tasks in CART by
storing the sum of squared probabilities of each
class. *
!"#" = 1 − ∑ () where Pi is the probability

Outlook Yes No Inst Temp Yes No Inst

Sunny 2 3 5 Hot 2 2 4
Overcast 4 0 4 Cool 3 1 4
Rain 3 2 5 Mild 4 2 6

Wind Yes No Inst Humidity Yes No Inst

Weak 6 2 8 High 3 4 7
Strong 3 3 6 Normal 6 1 7
January 22, 2024
…
Outlook Yes No Inst
Sunny 2 3 5
Overcast 4 0 4
Rain 3 2 5

¡ !"#" $%&'(() = +%##, = 1 − 0⁄ 0 + 3⁄ 0 = 0.48

1 1

0 @⁄ 0 )
¡ !"#" $%&'(() = $89:;<=& = 1 − ( ?⁄? + ? =0
¡ !"#" $%&'(() = B<"# = 1 − 3⁄ 0 + 0⁄ 0 = 0.48
1 1

¡ !"#" $%&'(() = 1⁄C? ∗ 0.48 + ?⁄C? ∗ 0 + 1⁄C? ∗ 0.48 = 0.3428

January 22, 2024

…
Temp Yes No Inst
Wind Yes No Inst Humidity Yes No Inst
Hot 2 2 4
Weak 6 2 8 High 3 4 7
Cool 3 1 4
Strong 3 3 6 Normal 6 1 7
Mild 4 2 6

!"#" $"#% = 8'14 ∗ 0.375 + 6'14 ∗ 0.5 = 0.3428

!"#" 4567 = 4'14 ∗ 0.5 + 4'14 ∗ 0.375 + 6'14 ∗ 0.445 = 0.439

!"#" ℎ:6"% = 7'14 ∗ 0.489 + 7'14 ∗ 0.245 = 0.367

January 22, 2024

…
¡ Gini index provides the calculated cost for
each feature cost. The one with lowest Gini
index value will be in the top of the tree.

Feature Gini
Outlook 0.342
Temperature 0.439
Humidity 0.367
Wind 0.428

January 22, 2024

…

Sunny Rainy
Outlook

Day Outlook Temperature Humidity Wind Decision Day Outlook Temperature Humidity Wind Decision
4 Rain Mild High Weak Yes
1 Sunny Hot High Weak No 5 Rain Cool Normal Weak Yes
2 Sunny Hot High Strong No 6 Rain Cool Normal Strong No
8 Sunny Mild High Weak No Overcast 10 Rain Mild Normal Weak Yes
9 Sunny Cool Normal Weak Yes 14 Rain Mild High Strong No
11 Sunny Mild Normal Strong Yes

Day Outlook Temperature Humidity Wind Decision

3 Overcast Hot High Weak Yes
7 Overcast Cool Normal Strong Yes
12 Overcast Mild High Strong Yes
13 Overcast Hot Normal Weak Yes

January 22, 2024

…
Day Outlook Temp. Humidity Wind Decision
1 Sunny Hot High Weak No
2 Sunny Hot High Strong No
8 Sunny Mild High Weak No
9 Sunny Cool Normal Weak Yes
11 Sunny Mild Normal Strong Yes

Temp Yes No Inst

Wind Yes No Inst Humidity Yes No Inst
Hot 0 2 2
Weak 1 2 3 High 0 3 3
Cool 1 0 1
Strong 1 1 2 Normal 2 0 2
Mild 1 1 2
!"#" ℎ@2 = 3'5 ∗ 0 + 2'5 ∗ 0
!"#" $"#% = 3'5 ∗ 0.444 + 2'5 ∗ 0.5 !"#" 0123 = 2'5 ∗ 0 + 1'5 ∗ 0 + 2'5 ∗ 0.5
Gini(wind) = 0.467 !"#" ℎ@2 = 0
!"#" 0123 = 0.2

¡ Gini index 0 in the humidity shows that, after selecting Outlook=Sunny, humidity
follows irrespective of the wind and temperature.

January 22, 2024

…

Sunny Rain
Outlook

Day Outlook Temperature Humidity Wind Decision

4 Rain Mild High Weak Yes
5 Rain Cool Normal Weak Yes
6 Rain Cool Normal Strong No
Humidity Overcast 10 Rain Mild Normal Weak Yes
Normal

14 Rain Mild High Strong No

High

No Yes
Yes

January 22, 2024

…
Day Outlook Temp. Humidity Wind Decision
4 Rain Mild High Weak Yes
5 Rain Cool Normal Weak Yes
6 Rain Cool Normal Strong No
10 Rain Mild Normal Weak Yes
14 Rain Mild High Strong No

Wind Yes No Inst Temp Yes No Inst Humidity Yes No Inst

Weak 3 0 3 Cool 1 1 2 High 1 1 2
Strong 0 2 2 Mild 2 1 3 Normal 2 1 3

!"#" ℎ?0 = 2'5 ∗ 0.5 + 3'5 ∗ 0.444

!"#" $"#% = 3'5 ∗ 0 + 2'5 ∗ 0 !"#" ./01 = 2'5 ∗ 0.5 + 3'5 ∗ 0.444
Gini(wind) = 0 !"#" ℎ?0 = 0.4667
!"#" ./01 = 0.4667

¡ Gini index 0 in the humidity shows that, after selecting Outlook=Rain, wind
follows irrespective of the temp and humidity.

January 22, 2024

…

Sunny Rain
Outlook

Humidity Overcast Wind

Normal

Strong
Weak
High

No Yes Yes No
Yes

January 22, 2024

LR, Decision Tree
No ratings yet
LR, Decision Tree
48 pages
Logistic Regression Example
100% (1)
Logistic Regression Example
22 pages
Broadly, There Are 3 Types of Machine Learning Algorithms.
No ratings yet
Broadly, There Are 3 Types of Machine Learning Algorithms.
33 pages
FALLSEM2024-25 BCSE401L TH VL2024250102078 2024-09-04 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE401L TH VL2024250102078 2024-09-04 Reference-Material-I
27 pages
Linear and Logistic Regression
No ratings yet
Linear and Logistic Regression
21 pages
Aiml Unit 3
No ratings yet
Aiml Unit 3
9 pages
2 Modele Lineare
No ratings yet
2 Modele Lineare
43 pages
Simple Linear Regression Definition: Two Variables Independent Variable Dependent Variable Equation
No ratings yet
Simple Linear Regression Definition: Two Variables Independent Variable Dependent Variable Equation
9 pages
09 23ECE216 LogisticRegression
No ratings yet
09 23ECE216 LogisticRegression
40 pages
DMML Unit4
No ratings yet
DMML Unit4
77 pages
4.logistic Regression
No ratings yet
4.logistic Regression
16 pages
ML - Unit 2
No ratings yet
ML - Unit 2
155 pages
ML Linear Model
No ratings yet
ML Linear Model
10 pages
Chp2 Logistic Regression
No ratings yet
Chp2 Logistic Regression
6 pages
ML Unit 3
No ratings yet
ML Unit 3
40 pages
ML - MU - Unit - 2 - Supervised Learning-Classification Techniques
No ratings yet
ML - MU - Unit - 2 - Supervised Learning-Classification Techniques
153 pages
Unit-2 ML
No ratings yet
Unit-2 ML
39 pages
ML Ai
No ratings yet
ML Ai
53 pages
Forecasting and Learning Theory
No ratings yet
Forecasting and Learning Theory
46 pages
Lecture Notes 6 Logistic Regression
No ratings yet
Lecture Notes 6 Logistic Regression
8 pages
Generalized Linear Model
No ratings yet
Generalized Linear Model
67 pages
Lecture 07
No ratings yet
Lecture 07
26 pages
Classification
No ratings yet
Classification
31 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
53 pages
Logistic Regression
No ratings yet
Logistic Regression
25 pages
Machine Learning and Data Mining
No ratings yet
Machine Learning and Data Mining
88 pages
Logistic Regression
No ratings yet
Logistic Regression
34 pages
Logistic Regression Lecture Notes
No ratings yet
Logistic Regression Lecture Notes
11 pages
2 - (9-3) Regression Classifiers
No ratings yet
2 - (9-3) Regression Classifiers
35 pages
Unit 3 Machine Learning
No ratings yet
Unit 3 Machine Learning
12 pages
Logistic Regression by Nirzona
No ratings yet
Logistic Regression by Nirzona
11 pages
Module1.4 Regression
No ratings yet
Module1.4 Regression
24 pages
Lecture Notes - Logistic Regression
100% (1)
Lecture Notes - Logistic Regression
11 pages
Fai Module 3
No ratings yet
Fai Module 3
67 pages
Logistic Regression
No ratings yet
Logistic Regression
42 pages
Ch2Regression and Regularization1
No ratings yet
Ch2Regression and Regularization1
45 pages
Unit II
100% (1)
Unit II
13 pages
ML - LAB - BE CSE (DS) Final
No ratings yet
ML - LAB - BE CSE (DS) Final
110 pages
REGRESSION
No ratings yet
REGRESSION
13 pages
Classification Algorithms
100% (2)
Classification Algorithms
23 pages
Logistic REGRESSION
No ratings yet
Logistic REGRESSION
10 pages
Wa0004.
No ratings yet
Wa0004.
9 pages
ML_Study
No ratings yet
ML_Study
9 pages
QSRI Lecture1
No ratings yet
QSRI Lecture1
45 pages
LP III Lab Manual
100% (1)
LP III Lab Manual
8 pages
Machine Learning (CSEN3203) 1-14
No ratings yet
Machine Learning (CSEN3203) 1-14
15 pages
Logistic Regression
No ratings yet
Logistic Regression
25 pages
Exp 2
No ratings yet
Exp 2
7 pages
BFCAI BigDataAnalytics Lecture#5 2
No ratings yet
BFCAI BigDataAnalytics Lecture#5 2
69 pages
Fiches Machine Learning
No ratings yet
Fiches Machine Learning
21 pages
ML Algorithms Week 3
No ratings yet
ML Algorithms Week 3
30 pages
Data Analytics Using R
No ratings yet
Data Analytics Using R
23 pages
Mauryan Empire
No ratings yet
Mauryan Empire
11 pages
B24 ML Exp-1
No ratings yet
B24 ML Exp-1
10 pages
Lecture 08
No ratings yet
Lecture 08
42 pages
ML
No ratings yet
ML
12 pages
Classification-Introduction, Logistic Regression
No ratings yet
Classification-Introduction, Logistic Regression
26 pages
Chapter -1 WAP Overview
No ratings yet
Chapter -1 WAP Overview
65 pages
Develop Use Complex Spreadsheets Powerpoint
No ratings yet
Develop Use Complex Spreadsheets Powerpoint
56 pages
DW & DM Questions & Answers
No ratings yet
DW & DM Questions & Answers
12 pages
Hipaa Privacy and Security Training Presentation Attachment1
No ratings yet
Hipaa Privacy and Security Training Presentation Attachment1
31 pages
CHP 1
No ratings yet
CHP 1
24 pages
How We Were Able To Develop Our Own HIPAA Policies, Forms, Education Materials, Etc. and Spend Very Little Money
No ratings yet
How We Were Able To Develop Our Own HIPAA Policies, Forms, Education Materials, Etc. and Spend Very Little Money
34 pages
Cbe ReceiptFT25112GZN1V
No ratings yet
Cbe ReceiptFT25112GZN1V
1 page
WAE&WWW
No ratings yet
WAE&WWW
15 pages
Installandmanagenetworkprotocols 221219130255 40dcfff3
No ratings yet
Installandmanagenetworkprotocols 221219130255 40dcfff3
38 pages
Ayisha Anisa Edishu
No ratings yet
Ayisha Anisa Edishu
75 pages
Cryptography and NS
No ratings yet
Cryptography and NS
68 pages
SSRN Id4336566
No ratings yet
SSRN Id4336566
12 pages
AI and ML
No ratings yet
AI and ML
68 pages
Identification of Cassava Black Stem and Root Rot
No ratings yet
Identification of Cassava Black Stem and Root Rot
13 pages
Chapter 1-4
No ratings yet
Chapter 1-4
135 pages
Manage NW Securty REGULAR
No ratings yet
Manage NW Securty REGULAR
2 pages
5-Chapter Five - Transport Layer
No ratings yet
5-Chapter Five - Transport Layer
21 pages
RGS404 Rpa2030 Ep 1
No ratings yet
RGS404 Rpa2030 Ep 1
37 pages
Part 2 A 2022
No ratings yet
Part 2 A 2022
22 pages
3rd Module
No ratings yet
3rd Module
5 pages
Physics Project
No ratings yet
Physics Project
15 pages
Final Paper - Tales Takezo
No ratings yet
Final Paper - Tales Takezo
8 pages
10 1016@j Cemconres 2020 106196
No ratings yet
10 1016@j Cemconres 2020 106196
8 pages
Relational Database Design by ER and EER To Relational Mapping PDF
No ratings yet
Relational Database Design by ER and EER To Relational Mapping PDF
10 pages
SQL Commands
No ratings yet
SQL Commands
21 pages
Parsing Teaching Tools Report
No ratings yet
Parsing Teaching Tools Report
5 pages
SKF3013 - Manual Amali PDF
No ratings yet
SKF3013 - Manual Amali PDF
26 pages
Rapid Prototyping
100% (1)
Rapid Prototyping
21 pages
LT 1083RCU FX 3500RCU Installation Manual
No ratings yet
LT 1083RCU FX 3500RCU Installation Manual
103 pages
AVL Tree Operations
No ratings yet
AVL Tree Operations
23 pages
Cochran 2016 - Memory Blindness - Altered Memory Reports Lead To Distortion
No ratings yet
Cochran 2016 - Memory Blindness - Altered Memory Reports Lead To Distortion
10 pages
Licensure Examination For Teachers Reviewer (Part 1)
100% (1)
Licensure Examination For Teachers Reviewer (Part 1)
11 pages
A Rose For Emily Is A Story Told by William Faulkner. The Setting of The Story Occurred
No ratings yet
A Rose For Emily Is A Story Told by William Faulkner. The Setting of The Story Occurred
4 pages
Disbursement Voucher
No ratings yet
Disbursement Voucher
1 page
Surge Arresters Ohio Brass
No ratings yet
Surge Arresters Ohio Brass
48 pages
CHE 1000-E LEARNING - BALANCING REDOX REACTIONS
No ratings yet
CHE 1000-E LEARNING - BALANCING REDOX REACTIONS
17 pages
Inventory Management and Control System
No ratings yet
Inventory Management and Control System
88 pages
PicoWay Candela Specifications Brochure Resolve
No ratings yet
PicoWay Candela Specifications Brochure Resolve
8 pages
Sequence Ez-2000sg (Eng)
No ratings yet
Sequence Ez-2000sg (Eng)
1 page
Types of False Ceilings: 1. Gypsum Plasterboard False Ceiling System
No ratings yet
Types of False Ceilings: 1. Gypsum Plasterboard False Ceiling System
15 pages
Republic of The Philippines Tanggapan NG Sangguniang Panlungsod City of Naga
No ratings yet
Republic of The Philippines Tanggapan NG Sangguniang Panlungsod City of Naga
5 pages
Rizal Paris To Berlin
No ratings yet
Rizal Paris To Berlin
14 pages
Autodesk Inventor - Design Accelerator
No ratings yet
Autodesk Inventor - Design Accelerator
23 pages
SURREY Booking - Com - Confirmation
No ratings yet
SURREY Booking - Com - Confirmation
2 pages
Preparation of Fermented Blue Crab With Rice and It'S Market Ability
No ratings yet
Preparation of Fermented Blue Crab With Rice and It'S Market Ability
6 pages
Catalog Tong May Phat Dien Cummins
No ratings yet
Catalog Tong May Phat Dien Cummins
114 pages
TPS6106x Constant Current LED Driver With Digital and PWM Brightness Control
No ratings yet
TPS6106x Constant Current LED Driver With Digital and PWM Brightness Control
29 pages

Basic ML Algorithm

Uploaded by

Basic ML Algorithm

Uploaded by

Machine Learning

Michael Melese (Ph.D.)

January 22, 2024

January 22, 2024

January 22, 2024

January 22, 2024

where: y – Dependent variable,

January 22, 2024

(∑ +)(∑ ( - ) − (∑ ()(∑ (+) 3 ∑ &% 4(∑ &)(∑ %)

January 22, 2024

January 22, 2024

January 22, 2024

January 22, 2024

January 22, 2024

January 22, 2024

¡ The function produces an S- 0 0.5

has independent number 1.46548937 2.36212508 0

January 22, 2024

January 22, 2024

January 22, 2024

January 22, 2024

January 22, 2024

January 22, 2024

January 22, 2024

January 22, 2024

January 22, 2024

January 22, 2024

where ci is the set of points that belong to cluster. The K-

January 22, 2024

January 22, 2024

January 22, 2024

[P0 (185, 72), [P1 (170, 56), P3 (179,

[P5 (188, 77),

January 22, 2024

January 22, 2024

X Y Given the minimum point of 3 and

A B C D E From the adjacency

¡ The largest edge is between A and E.

January 22, 2024

¡ Categorical variables the Hamming distance

January 22, 2024

January 22, 2024

January 22, 2024

January 22, 2024

January 22, 2024

January 22, 2024

January 22, 2024

¡ Decision Trees is one of the non-parametric

January 22, 2024

January 22, 2024

January 22, 2024

January 22, 2024

Outlook Yes No Inst Temp Yes No Inst

Wind Yes No Inst Humidity Yes No Inst

¡ !"#" $%&'(() = +%##, = 1 − 0⁄ 0 + 3⁄ 0 = 0.48

¡ !"#" $%&'(() = 1⁄C? ∗ 0.48 + ?⁄C? ∗ 0 + 1⁄C? ∗ 0.48 = 0.3428

January 22, 2024

!"#" $"#% = 8'14 ∗ 0.375 + 6'14 ∗ 0.5 = 0.3428

!"#" 4567 = 4'14 ∗ 0.5 + 4'14 ∗ 0.375 + 6'14 ∗ 0.445 = 0.439

!"#" ℎ:6"% = 7'14 ∗ 0.489 + 7'14 ∗ 0.245 = 0.367

January 22, 2024

January 22, 2024

Day Outlook Temperature Humidity Wind Decision

January 22, 2024

Temp Yes No Inst

January 22, 2024

Day Outlook Temperature Humidity Wind Decision

14 Rain Mild High Strong No

January 22, 2024

Wind Yes No Inst Temp Yes No Inst Humidity Yes No Inst

!"#" ℎ?0 = 2'5 ∗ 0.5 + 3'5 ∗ 0.444

January 22, 2024

Humidity Overcast Wind

January 22, 2024

You might also like