0% found this document useful (0 votes)
4 views

Lecture01 &02 (1)

The document outlines the relationship between Artificial Intelligence (AI), Machine Learning (ML), Artificial Neural Networks (ANN), and Deep Learning (DL), emphasizing their definitions and applications. It details the workflow of machine learning and pattern recognition, including data preprocessing, feature extraction, and classification processes, along with various techniques like normalization and histogram equalization. Additionally, it discusses the importance of data splitting and outlier removal in machine learning model training and validation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Lecture01 &02 (1)

The document outlines the relationship between Artificial Intelligence (AI), Machine Learning (ML), Artificial Neural Networks (ANN), and Deep Learning (DL), emphasizing their definitions and applications. It details the workflow of machine learning and pattern recognition, including data preprocessing, feature extraction, and classification processes, along with various techniques like normalization and histogram equalization. Additionally, it discusses the importance of data splitting and outlier removal in machine learning model training and validation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 77

BIO3603:Medical Pattern Recognition

Dr. Lamees Nasser


E-mail: [email protected]
Third Year– Biomedical Engineering Department
Academic Year 2024- 2025

10/8/2024 1
Relationship between AI, ML, NN, and DL.

Artificial Intelligence (AI) AI : programs that mimic human behavior

ML: comprises algorithms and statistical methods used


Machine Learning (ML) by computers to perform a specific task.

Artificial Neural Networks ( ANN) ANNs: models inspired by how neurons in the human
brain work.

Deep Learning (DL) DL:. Kind of artificial neural networks characterized by a


deep structure (several layers), a huge number of
artificial neurons, and the capability to automatically
extract features from data.
10/8/2024 2
Artificial Intelligence Applications

10/8/2024 3
What is Machine Learning?
• In 1959, Arthur Samuel, a pioneer in the field of machine learning (ML) defined it as “ the field of
study that gives computers the ability to learn without being explicitly programmed”.

Traditional Programming
Data (input)
Output
Program (equation)

Machine Learning

Data Program

Output
10/8/2024 4
Machine Learning Workflow
Training Phase

Machine
Preprocessing Feature learning
Training data - Filters extraction and algorithms Model
- Normalization selection - Supervised
- Unsupervised

Testing/Prediction Phase

Preprocessing Feature
Testing data - Filters extraction and Model Prediction
- Normalization selection

10/8/2024 5
What is Pattern Recognition?
• Pattern recognition (PR) is a field in machine learning that uses data analysis to recognize
patterns and regularities and then uses these regularities to take actions such as classifying the data
into different categories
• PR is a complex cognitive process in the brain. It involves analyzing various forms of data,
including images, video, and audio, with the intent of identifying and detecting specific visual
patterns (objects).
What is Pattern Classification?
• Pattern classification is a subfield of pattern recognition that involves categorizing (classifying)
patterns into pre-defined classes or categories. In other words, it is the process of assigning labels
to data based on their content.
Pattern Recognition and Classification Applications

• Computer-aided diagnosis (CAD): helping doctors make diagnostic decisions based on


interpreting medical data such as mammographic images, ultrasound images, electrocardiograms
(ECGs), and electroencephalograms (EEGs).
• Medical imaging: classifying cells as malignant or benign based on the results of magnetic
resonance imaging (MRI) scans or classifying different emotional and cognitive states from the
images of brain activity in functional MRI.
• Speech recognition: helping handicapped patients to control machines.
• Bioinformatics: DNA sequence analysis to detect genes related to particular diseases.
Pattern Recognition Workflow
Pattern Recognition Process
• The sensing/acquisition uses a transducer such as a camera or a microphone. The acquired signal
(e.g., an image) must be of sufficient quality that distinguishing “features” can be adequately
measured.
• Preprocessing: required prior to segmentation, including normalization, and image enhancement
(e.g., brightness adjustment, histogram equalization, contrast enhancement, image averaging,
frequency domain filtering, edge enhancement)
• Segmentation and labeling: isolate different objects from each other and from the background,
and the different objects are labeled. The foreground, comprising the objects of interest, and the
background, is everything else.
Pattern Recognition Process Cont.
• Postprocessing: used to prepare segmented images for feature extraction. For example, partial objects can
be removed from around the periphery of the image, disconnected objects can be merged, objects smaller
or larger than certain limits can be removed, or holes in the objects or background can be filled by
morphological opening or closing.
• Feature Extraction: reduce the data by measuring certain features (such as size, shape, and
texture) of the labeled objects.
• Classification: divide the feature space into decision regions.

Figure: Classes mapped as decision regions, with


decision boundaries
Figure: Example of segmentation,
postprocessing, and labeling
(a) Original image,
(b) variable background [from blurring (a)],
(c) improved image [¼(a) (b)],
(d) segmented image [Otsu thresholding of (c)],
(e) partial objects removed from (d),
(f) labeled components image,
(g) color-coded labeled components image
Data Splitting
• Data splitting is the process of splitting data into 3 sets:
▪ Training set: used to design our models
▪ Validation set: used to evaluate how well these models perform on new data
(refine our models )
▪ Testing set: used to test our models

• Common split percentages include:


▪ Train: 80%, Validation: 10% Test: 10%,
▪ Train: 70%, Validation:15%, Test: 15%,
▪ Train: 60%, Validation: 20%, Test: 20%,

10/8/2024 13
Training data/validation/test

10/8/2024 https://www.v7labs.com/blog/train-validation-test-set 14
Preprocessing- Outlier Removal
• Outlier data that lie far from the mean of the corresponding random variables. They can produce
large errors during training, especially when they are a result of noise.
• For a normal distribution, we could remove data points that are more than three standard
deviations from the mean (since they have less than a 1% chance of belonging to the distribution).
Preprocessing- Outlier Removal Cont.
Preprocessing- Normalization
• Normalization is a technique often applied as part of data preprocessing for
machine learning.
• The goal of normalization is to adjust the values of numeric data to a common scale
without losing information.
❖For example:
Assume your input dataset contains one column with values ranging from 0 to 1,
and another column with values ranging from 10,000 to 100,000. The great
difference in the scale of the numbers could cause problems when you attempt to
combine the values as features during modeling.

10/8/2024 17
Preprocessing- Normalization Cont.
• Min-max normalization: one of the most common ways to normalize data. For every feature, the
minimum value of that feature gets transformed into a 0, the maximum value gets transformed into
a 1, and every other value gets transformed into a decimal between 0 and 1. The formula to achieve
this is the following:
𝒙 − 𝒙𝐦𝐢𝐧
𝒙𝒔𝒄𝒂𝒍𝒆𝒅 =
𝒙𝐦𝐚𝐱 − 𝒙𝐦𝐢𝐧
• Z-score normalization: technique scales the values of a feature to have a mean of 0 and a
standard deviation of 1. 𝒙−𝝁
𝝈
• Here, 𝜇 is the mean value of the feature and 𝜎 is the standard deviation of the feature.
• If 𝒙 (value) is exactly equal to the mean of all the values of the feature, it will be transformed into a
0. If it is below the mean, it will be a negative number, and if it is above the mean, it will be a
positive number.

10/8/2024 18
Normalization Techniques-Numerical Example
• Use the method below to normalize the following group of data: 1000, 2000, 3000, 5000, 9000
▪ Min-max normalization
Data Normalized data

1000 0

2000 0.125

3000 0.25

5000 0.5

9000 1

10/8/2024 19
Normalization Techniques-Numerical
Example
• Use the method below to normalize the following group of data: 1000, 2000, 3000, 5000, 9000
▪ Z-Score Normalization: Data Normalized data
∑ 𝑥𝑖 −𝜇 2
• Standard Deviation = 1000 -1.204
𝑛−1

1000+2000+3000+5000+9000 2000 -0.803


• Mean= =4000
5

3000 -0.4016
(1000−4000)2 +(2000−4000)2 +(3000−4000)2 +(5000−4000)2 +(9000−4000)2
• σ=
5−1
= 2489.97 5000 0.4016

9000 2.008

10/8/2024 20
Histogram Image gray-level occurrence
Initialize H(i) = 0 for all i
For each pixel (i,j)
Considerations H(pixel(i,j)) ++;
• How many times each intensity value occurred in the image
• Information about image characteristics and quality.
• Two completely different images may have very similar histograms (no spatial information).
• Can we reconstruct image from histogram?

10/8/2024 21
Histogram Equalization
• Histogram equalization is a technique in image processing used to enhance the
contrast of an image by effectively redistributing its intensity values.

10/8/2024 22
Histogram Equalization: Manual Calculation

Histogram 8x8 image


Value Count Value Count Value Count Value Count
52 1 66 2 77 1 106 1 52 55 61 66 70 61 64 73
55 3 67 1 78 1 109 1 63 59 55 90 109 85 69 72
58 2 68 5 79 2 113 1 62 59 68 113 144 104 66 73
59 3 69 3 83 1 122 1 63 58 71 122 154 106 70 69
60 1 70 4 85 2 126 1
67 61 68 104 126 88 68 70
61 4 71 2 87 1 144 1
62 1 72 1 88 1 154 1
79 65 60 70 77 68 58 75
63 2 73 2 90 1 85 71 64 59 55 61 65 83
64 2 75 1 94 1 87 79 69 68 65 76 78 94
65 3 76 1 104 2
10/8/2024 23
Histogram Equalization: Manual Calculation
Cont.
CDF
Value CDF Value CDF Value CDF Value CDF Value CDF
52 1 64 19 72 40 85 51 113 60
55 4 65 22 73 42 87 52 122 61
58 6 66 24 75 43 88 53 126 62
59 9 67 25 76 44 90 54 144 63
60 10 68 30 77 45 94 55 154 64
61 14 69 33 78 46 104 57
62 15 70 37 79 48 106 58
63 17 71 39 83 49 109 59

10/8/2024 24
Histogram Equalization: Manual Calculation
Cont.
○ The general histogram equalization formula is:

𝑐𝑑𝑓 𝑣 − 𝑐𝑑𝑓𝑚𝑖𝑛
ℎ 𝑣 = 𝑟𝑜𝑢𝑛𝑑 ∗ (𝐿 − 1)
𝑀 ∗ 𝑁 − 𝑐𝑑𝑓𝑚𝑖𝑛

○ 𝑐𝑑𝑓𝑚𝑖𝑛 : The minimum value of the 𝑐𝑑𝑓


○ 𝑀 ∗ 𝑁 : The number of pixels. (𝑀 𝑤𝑖𝑑𝑡ℎ, 𝑁 ℎ𝑒𝑖𝑔ℎ𝑡)
○ 𝐿 : The number of gray levels (in most cases, 256).

10/8/2024 25
Histogram Equalization: Manual Calculation
Cont.
For example, the 𝑐𝑑𝑓 83 = 194, the normalized value becomes:
49 −1
ℎ 83 = 𝑟𝑜𝑢𝑛𝑑 ∗ 255 = 194
63

0 12 53 93 146 53 73 166
65 32 1 215 235 202 130 158
57 32 117 239 251 227 93 166
65 20 154 243 255 231 146 130
97 53 117 227 247 210 117 146
190 85 36 146 178 117 20 170
202 154 73 32 12 53 85 194
206 190 130 117 85 174 182 219
10/8/2024 Equalized Image 26
Histogram: Threshold

(a) (b)

Intensity histograms that can be partitioned (a) by a single threshold, and


(b) by dual thresholds.

10/8/2024 27
Histogram: Threshold

if f(x,y) > T then g(x,y) =255


else g(x,y) = 0
where T is the threshold

f histogram(f)

g with T = 100 g with T = 150 g with T = 180


10/8/2024 28
Histogram: Global Vs. Local Thresholding
Local Thresholding:
The idea of this function is to Global Thresholding
slide a window (e.g., 7x7) on
the image and determine the
mean of each window and
replace the center of each
window with the mean. Then,
the value of each pixel in the
window is compared to the
mean. The pixel’s value
becomes 255 if it is greater
than the mean and zero
otherwise.
10/8/2024 7x7; T = mean 7x7; T = mean - 7 7x7; T = mean - 10
Segmentation Methods
• Region-based methods: include thresholding (local threshold; global thresholding)
• Boundary-based methods: use an edge detector (e.g., the Canny detector)

If this segmentation
method results in
overlapping objects,
Original Image Segmented Image Edge detection How do we solve
this problem?
Feature Extraction
The choice of appropriate (well-designed) features depends on the particular image and the application at
hand. However, they should be:

• Robust: invariant to translation, orientation (rotation), scale, and illumination and invariant to the presence
of noise and artifacts; this may require some preprocessing of the image.

• Discriminating: the range of values for objects in different classes should be different and preferably be
well separated and non-overlapping.

• Reliable: all objects of the same class should have similar values.

• Independent: uncorrelated; as a counter-example, length, and area are correlated and it would be wasteful
to consider both as separate features.
Feature Extraction Cont.
• Measurements obtainable from the gray-level histogram of an object such as its mean pixel value
(grayness or color) and its standard deviation, its contrast, and its entropy
• The size or area, and its perimeter.
• The Circularity: a ratio of perimeter 2 to area, or area to perimeter 2 (or a scaled
version, such 4𝜋𝐴/𝑃2 )
• Aspect ratio: the ratio of the Feret diameters, given by placing a bounding box around the object.
Feature Extraction Cont.
• Skeleton or medial axis transform, or points within it such as branch points and endpoints, which
can be obtained by counting the number of neighboring pixels on the skeleton (viz., 3 and 1,
respectively)
Feature Extraction Cont.
• The Euler number: the number of connected components (i.e., objects) minus the number of holes
in the image.

Formally, the Euler Number is given by


𝑛comp
𝑖
E = 𝑛comp − ෍ 𝑛hole where
𝑖=1

ncomp => the number of foregrounds connected components


𝑖
𝑛hole => number of holes for 𝑖𝑡ℎ connected component.
Dimensionality Reduction
• Curse of dimensionality
⁃ Increasing computational complexity
⁃ Overfitting
• What PR algorithms want
✓ Uncorrelated data or independent variables
✓ Less enough data to predict
• Fewer features: the data can be analyzed visually more easily, and we get a better idea about the
underlying process.
• Humans have an extraordinary capacity to discern patterns and clusters in one, two, or three
dimensions, but these abilities degrade drastically for four or higher dimensions.
Dimensionality Reduction Cont.
• Overfitting
⁃ A finite sample size, 𝑁, increasing the number of features will initially improve the performance
of a classifier, but after a critical value, a further increase in the number of features (𝑑) will
reduce the performance resulting in overfitting the data (Figure). This is known as the peaking
phenomenon.

X
Simple linear Second-degree polynomial Fourth-degree polynomial
model model model

Poor performance on the training data Good performance on the training Good performance on the training
and poor generalization to other data. data and good generalization to data and poor generalization to
other data other data.

High Bias - the model could not Low Bias -model fits very well Low Bias - model fits very well
fit the training data well. with training data . with training data and thus
produces low error.
Low Variance - any data will produce Low Variance - Both the training
high error in this model, so all errors will and test error is close so that no High Variance - For the test data,
be high and there will not be much much difference in it. the model produces very high error
difference between errors. and thus the difference between
training and test error is high.
10/8/2024 37
Dimensionality Reduction Cont.
• There are two main methods for reducing dimensionality; feature selection and feature extraction
• Feature selection
• Select the 𝑘 features (out of 𝑑) that provide the most information, discarding the other (𝑑 − 𝑘)
features.
• Methods to implement feature selection include using the inter/intraclass distance and subset
selection such as Fisher score
• Feature extraction
• Find a new set of 𝑘 (< 𝑑) features which are combinations of the original 𝑑 features. These
methods may be supervised or unsupervised. The most widely used feature extraction methods
are Principal Components Analysis (PCA) and Linear Discriminant Analysis (LDA), which
are both linear projection methods, unsupervised and supervised respectively
Feature Selection
• What is feature selection?
⁃ If the data has more than 20,000 features and you need
to cut down, it to 1000 features before trying machine
learning. Which 1000 features you should choose it?
⁃ The process of choosing the 1000 features to use it is
called feature selection

• Why feature selection?


⁃ Avoid overfitting and achieve better generalizing ability.
⁃ Reduce the storage requirement and training time.

Overfitting means the model performs well on the training data but does not perform well on the test data.
This is because the model is memorizing the data it has seen and is unable to generalize to unseen examples

10/8/2024 39
Feature Selection Cont.
• Inter/Intraclass Distance:
⁃ Good features are discriminative. Intuitively, there should be a large interclass distance and
a small intraclass distance. The figure shows the case for a single feature, two equiprobable
class situation. The separability of the classes is the ratio of the intraclass distance and the
interclass distance.
Feature Selection Cont.
• Fisher’s score algorithm:
Fisher’s score algorithm:

Is one of the supervised feature selection methods that select each feature independently
according to their scores.
• Here's the formula for calculating a score:
2
∑𝑘𝑗=1 𝑛𝑗 𝜇𝑗 − 𝜇
𝐹=
∑𝑘𝑗=1 𝑛𝑗 𝜎𝑗2

- 𝑛𝑗 size of the data points belonging to class j for a particular feature


- 𝜇𝑗 mean of the data points belonging to class j for a particular feature
- 𝜇 overall mean of the data points for a particular feature
- 𝜎𝑗 standard deviation of the data points belonging to class j for a particular feature
• The larger the Fisher’s score is, the better is the selected feature.
10/8/2024 41
2

Example 𝐹=
∑𝑘𝑗=1 𝑛𝑗 𝜇𝑗 − 𝜇
∑𝑘𝑗=1 𝑛𝑗 𝜎𝑗2

Feature 1 Feature 2 Target (class label)


1 5 1
2 6 1
3 7 2
4 1 2
5 2 2

1+2+3+4+5 5+6+7+1+2
𝜇𝑓1 = = 3 𝜇𝑓2 = = 4.2
5 5

10/8/2024 42
2
∑𝑘𝑗=1 𝑛𝑗 𝜇𝑗 − 𝜇
𝐹=
∑𝑘𝑗=1 𝑛𝑗 𝜎𝑗2

Feature 1 Feature 2 Target (class label)


1 5 1
2 6 1
3 7 2
4 1 2
5 2 2
1+2 5+6
𝜇𝑓1𝑐1 = = 1.5 𝜇𝑓2𝑐1 = = 5.5
2 2
3+4+5 7+1+2
𝜇𝑓1𝑐2 = =4 𝜇𝑓2𝑐2 = = 3.3
3 3

2 (1 − 1.5)2 +(2 − 1.5)2 2 (5 − 5.5) +(6 − 5.5)2


2
𝜎𝑓1𝑐1 = = 0.5 𝜎𝑓2𝑐1 = = 0.5
2−1 2−1
2 (3−4)2 +(4−4)2 +(5−4)2 (7 − 3.3)2
+(1 − 3.3)2
+(2 − 3.3)2
𝜎𝑓1𝑐2 = =1 2
𝜎𝑓2𝑐2 = = 10.3
3−1
3−1
𝑛𝑓1𝑐1 = 2 𝑛𝑓2𝑐1 = 2
𝑛𝑓1𝑐2 = 3 𝑛𝑓2𝑐2 = 3
10/8/2024 43
2
∑𝑘𝑗=1 𝑛𝑗 𝜇𝑗 − 𝜇
𝐹=
∑𝑘𝑗=1 𝑛𝑗 𝜎𝑗2

𝟐 𝟐
𝒏𝒇𝟏𝒄𝟏 𝝁𝒇𝟏𝒄𝟏 −𝝁𝒇𝟏 +𝒏𝒇𝟏𝒄𝟐 𝝁𝒇𝟏𝒄𝟐 −𝝁𝒇𝟏
Fisher_score_F1 =
𝒏𝒇𝟏𝒄𝟏 (𝝈𝟐𝒇𝟏𝒄𝟏 )+𝒏𝒇𝟏𝒄𝟐 (𝝈𝟐𝒇𝟏𝒄𝟐 )

𝟐×(𝟏.𝟓−𝟑)𝟐 +𝟑×(𝟒−𝟑)𝟐
Fisher_score_F1 = = 𝟏. 𝟖𝟕𝟓𝟎 1
𝟐×(𝟎.𝟓)+𝟑×(𝟏)

Fisher_score_F2 = 𝟎. 𝟏𝟕𝟔𝟎 2

10/8/2024 44
Assignment
In a pattern recognition problem, there were two classes (C1, and C2), and three features F1, F2, F3, the
values for the features for each class are:
F1 F2 F3
C1 C2 C1 C2 C1 C2
2.1424 1.0575 3.7830 1.9366 5.0385 2.0247
2.2580 0.7707 3.3390 1.4727 4.8183 1.5616
2.1337 1.2382 3.6057 1.5228 4.5713 1.8716
2.2382 1.2378 3.5439 1.7134 4.7761 1.9508
1.7595 0.9925 3.3156 1.5119 4.4982 1.3813
1.9960 1.0655 3.0659 1.4809 4.6961 1.4118
1.9687 1.0349 3.4882 1.3335 4.6904 1.8142
Based on the above table,
1. Sort the three features according to Fisher’s score.
2. Write a Python code to compute Fisher score of the three features(write the code from scratch,
you can use an existing function to validate your code.)

10/8/2024 45
Distance Measures for Machine Learning
• Distance measures are a key part of several machine learning algorithms. These
measures are used in both supervised and unsupervised learning, generally to
calculate the similarity between data points.

• Types of distance measures:


➢Euclidean Distance
➢Manhattan Distance
➢Hamming Distance

10/8/2024 46
Distance Measures for Machine Learning Cont.
𝐴 𝑥1 , 𝑦1
• Euclidean distance (L2- norm): It represents the shortest distance between
two points.
2 2 1/2
𝐷𝑒 = 𝑥1 − 𝑥2 + 𝑦1 − 𝑦2
𝐵 𝑥2 , 𝑦2

• Manhattan distance (L1- norm): is the simple sum of the horizontal and 𝐴 𝑥1 , 𝑦1
vertical components.
𝐷𝑚 = 𝑥1 − 𝑥2 + 𝑦1 − 𝑦2
𝐵 𝑥2 , 𝑦2

• Hamming distance measures the similarity between two binary data strings
of the same length. That is, the number of bits that need to be changed to
turn one string into the other.

10/8/2024 47
Types of Machine Learning Algorithms

10/8/2024 48
1. Supervised Learning
➢In this type of machine learning algorithm,
• The training dataset is a labeled dataset.
• In other words, the training dataset contains
the input value (X) and target value (Y).
• The learning algorithm generates a model.
• Then, a new dataset consisting of only the
input value is fed.
• The model then generates the prediction based
on its learning.
➢Supervised learning problems are categorized into
"classification" and "regression" problems.

10/8/2024 49
Types of Supervised Learning Algorithm
• There are two types of supervised learning algorithm
Classification Regression

• The output variable (Y) is a category or • The output variable (Y) is a real or
discrete value such as “red” or “blue” or continuous value such as “salary” or
“disease” and “no disease” “weight”.
• Example: • Example: house price prediction
▪ Email: Spam / Not spam.
▪ Tumor: Malignant/ Benign

Square

Circle ??

Triangle

10/8/2024 50
K-nearest neighbor (K-NN)
• The k-nearest neighbors (K-NN) algorithm is a supervised
learning and non-parametric algorithm that can be used to solve
both classification and regression problem statements.
• It is also called a lazy learner algorithm because it does not
learn from the training set immediately instead it stores the
dataset and at the time of classification, it performs an action
on the dataset.

10/8/2024 51
How does K-NN work?
• The K-NN working can be explained on the basis of the
below algorithm:

▪ Step 1: Select the number K of the neighbors.

▪ Step 2: Calculate the Euclidean distance of a new data


point to all other training data points.

▪ Step 3: Take the K nearest neighbors as per the


calculated Euclidean distance.

▪ Step 4: Among these k neighbors, count the number of


the data points in each category (class).

▪ Step 5: Assign the new data points to that category


(class) for which the number of neighbors is maximum.
10/8/2024 52
Classification using K-NN
• Given the Ages and Loans of N customers. Determine the eligibility
(Yes/No) of an unknown customer for obtaining a loan

Yes
No

10/8/2024 53
Classification using K-NN Cont.

If K=1, then the nearest neighbor is the last


case in the training set with Y.

If K=3, there are two Y and one N out of


three closest neighbors. The prediction for
the unknown case is again Y.

10/8/2024 54
Regression using K-NN
• Given the Ages and Loans of N customers. Determine the House
Price Index (HPI) of an unknown customer.

10/8/2024 55
Regression using K-NN Cont.

Use the training set to get HPI of an unknown


case (Age=48 and Loan=$142,000) using
Euclidean distance.

• If K=1
then the nearest neighbor is the last case in the
training set with HPI=264.

• If K=3
the prediction for HPI is equal to the average of
TRY 𝑲 = 𝟓 to determine the
HPI for the top three neighbors. HPI of a new customer ?

• HPI = (264+139+139)/3 = 180.7


10/8/2024
K-NN Algorithm: Limitations
• Always needs to determine the value of K which may be complex
sometimes.
• The computation cost is high because of calculating the distance
between the data points for all the training samples.
• Requires high memory storage

10/8/2024 57
2. Unsupervised Learning
In this type of machine learning algorithm,

• The training dataset is an unlabeled


dataset.
• In other words, the training dataset
contains only the input value (X) and
not the target value (Y).
• The learning algorithm generates a
model.
• Based on the similarity between data, it
tries to draw inferences from the data
such as finding patterns or clusters.

10/8/2024 58
K-means Clustering
• The basic idea of the k-means clustering
algorithm is partitioning n data points into
k clusters by defining k centroids.

• The data clustering is done by minimizing


a chosen Euclidean distance measure
between a data point and cluster center.

10/8/2024 59
K-means Clustering Algorithm: Step 1
• Specify the number 𝑘 of clusters to assign and randomly select K
centroids (K=2).

10/8/2024 60
K-means Clustering Algorithm: Step 2

• Assign every point to a cluster whose centroid is the closest to the


point

10/8/2024 61
K-means Clustering Algorithm: Step 3

• Re-compute the centroid for each cluster based on the newly


assigned points in the cluster

10/8/2024 62
K-means Clustering Algorithm: Step 4
• Repeat the step 2, which means reassign each data point to the new
closest centroid of each cluster

10/8/2024 63
K-means Clustering Algorithm: Step 5
• Iterate until some stopping criterion is met

• There are essentially three stopping criteria that can be adopted to


stop the K-means algorithm:

➢Centroids of newly formed clusters do not change.


➢Points remain in the same cluster.
➢Maximum number of iterations is reached.

10/8/2024 64
K-means Clustering Algorithm: Limitations

• Difficult to predict the number of clusters (K-Value) .


• Initial seeds have a strong impact on the final results.
• It is sensitive to noise and outlier data points.
• k-means assumes that we deal with spherical clusters and that each
cluster has roughly equal numbers of observations

10/8/2024 65
K-means Clustering Algorithm: Summary

10/8/2024 66
K-means Clustering Example
Suppose we have 7 types of medicines and each medicine has two attributes or features as shown in the table below.
Our goal is to group these into two clusters of medicines based on the two features
Suppose that the initial seeds (centers of each cluster) are C1=(1,1) and C2=(5,7).

medicines features1 features2


8
1 1 1 7

2 1.5 2

Features 2
6

3 3 4 5

4
4 5 7
3
5 3.5 5
2

6 4.5 5 1

7 3.5 4.5 0
0 1 2 3 4 5 6

features1

10/8/2024 67
Step 1: Initialization K=2 , Randomly select 2 centroids C1=(1, 1) and
C2=(5, 7).
8

7
C2
6

Features 2 5

1 C1
0
0 1 2 3 4 5 6

features1

10/8/2024 68
Step2: Assign every point to a cluster whose centroid is the closest to the point

Medici C1=(1,1) C2=(5,7) cluster


nes
1 (1 − 1)2 +(1 − 1)2 = 0 (5 − 1)2 +(7 − 1)2 = 7.21 1
2 (1 − 1.5)2 +(1 − 2)2 = 1.12 (5 − 1.5)2 +(7 − 2)2 = 6.10 1
3 (1 − 3)2 +(1 − 4)2 = 3.61 (5 − 3)2 +(7 − 4)2 = 3.61 1
4 (1 − 5)2 +(1 − 7)2 = 7.21 (5 − 5)2 +(7 − 7)2 = 0 2
5 (1 − 3.5)2 +(1 − 5)2 = 4.72 (5 − 3.5)2 +(7 − 5)2 = 2.5 2
6 (1 − 4.5)2 +(1 − 5)2 = 5.31 (5 − 4.5)2 +(7 − 5)2 = 2.06 2
7 (1 − 3.5)2 +(1 − 4.5)2 = 4.30 (5 − 3.5)2 +(7 − 4.5)2 = 2.92 2

10/8/2024 69
Step3: Re-compute the centroid for each cluster based on the newly
assigned points in the cluster

• Thus, we obtain two clusters containing: {1,2,3} and {4,5,6,7}.


• Their new centroids are:
1+1.5+3 1+2+4
Group 1 = , = (1.83,2.33)
3 3

5+3.5+4.5+3.5 7+5+5+4.5
Group 2 = , = (4.12,5.38)
4 4

10/8/2024 70
Step4: Repeat the step 2, which means reassign each data point to the
new closest centroid of each cluster
Medicin C1=(1.83,2.33) C2=(4.12,5.38) cluster
es
1 (1.83 − 1)2 +(2.33 − 1)2 = 1.57 (4.12 − 1)2 +(5.38 − 1)2 = 5.38 1

2 (1.83 − 1.5)2 +(2.331 − 2)2 = 0.47 (4.12 − 1.5)2 +(5.38 − 2)2 = 4.29 1

3 (1.83 − 3)2 +(2.33 − 4)2 = 2.04 (4.12 − 3)2 +(5.38 − 4)2 = 1.78 2

4 (1.83 − 5)2 +(2.33 − 7)2 = 5.64 (4.12 − 5)2 +(5.38 − 7)2 = 1.84 2

5 (1.83 − 3.5)2 +(2.33 − 5)2 = 3.15 (4.12 − 3.5)2 +(5.38 − 5)2 = 0.73 2

6 (1.83 − 4.5)2 +(2.33 − 5)2 = 3.78 (4.12 − 4.5)2 +(5.38 − 5)2 = 0.54 2

7 (1.83 − 3.5)2 +(2.33 − 4.5)2 = 2.74 (4.12 − 3.5)2 +(5.38 − 4.5)2 = 2


1.08

10/8/2024 71
Step3: Re-compute the centroid for each cluster based on the newly
assigned points in the cluster

Therefore, the new clusters are:{1,2} and 3,4,5,6,7

1+1.5 1+2
Group 1 = , = (1.25,1.5)
2 2

3+5+3.5+4.5+3.5 4+7+5+5+4.5
Group 2 = , = (3.9,5.1)
5 5

10/8/2024 72
Step5: Iterate until some stopping criterion is met
Medi C1=(1.25,1.5) C2=(3.9,5.1) cluster
cines
1 (1.25 − 1)2 +(1.5 − 1)2 = 0.58 (3.9 − 1)2 +(5.1 − 1)2 = 5.2 1

2 (1.25 − 1.5)2 +(1.5 − 2)2 = 0.56 (3.9 − 1.5)2 +(5.1 − 2)2 = 3.92 1

3 (1.25 − 3)2 +(1.5 − 4)2 = 3.05 (3.9 − 3)2 +(5.1 − 4)2 = 1.42 2

4 (1.25 − 5)2 +(1.5 − 7)2 = 6.66 (3.9 − 5)2 +(5.1 − 7)2 = 2.20 2

5 (1.25 − 3.5)2 +(1.5 − 5)2 = 4.16 (3.9 − 3.5)2 +(5.1 − 5)2 = 0.41 2

6 (1.25 − 4.5)2 +(1.5 − 5)2 = 4.78 (3.9 − 4.5)2 +(5.1 − 5)2 = 0.61 2

7 (1.25 − 3.5)2 +(1.5 − 4.5)2 = 3.75 (3.9 − 3.5)2 +(5.1 − 4.5)2 = 2


0.72

• Therefore, there is no change in the cluster.


• Thus, the algorithm comes to stop here, and result consist of 2 clusters {1,2} and {3,4,5,6,7}.
10/8/2024 73
10/8/2024 74
3. Reinforcement Learning-
• In this type of machine learning
algorithm,
➢Model learns from a series of actions by
maximizing a reward function
➢The reward function can either be
maximized by rewarding good actions
➢Example: training self-driving car using
feedback from the environment
➢Unlike supervised learning, no data is
provided to the agent.

10/8/2024 75
Assignment
Thank you!

10/8/2024 77

You might also like