Lecture01 &02 (1)
Lecture01 &02 (1)
10/8/2024 1
Relationship between AI, ML, NN, and DL.
Artificial Neural Networks ( ANN) ANNs: models inspired by how neurons in the human
brain work.
10/8/2024 3
What is Machine Learning?
• In 1959, Arthur Samuel, a pioneer in the field of machine learning (ML) defined it as “ the field of
study that gives computers the ability to learn without being explicitly programmed”.
Traditional Programming
Data (input)
Output
Program (equation)
Machine Learning
Data Program
Output
10/8/2024 4
Machine Learning Workflow
Training Phase
Machine
Preprocessing Feature learning
Training data - Filters extraction and algorithms Model
- Normalization selection - Supervised
- Unsupervised
Testing/Prediction Phase
Preprocessing Feature
Testing data - Filters extraction and Model Prediction
- Normalization selection
10/8/2024 5
What is Pattern Recognition?
• Pattern recognition (PR) is a field in machine learning that uses data analysis to recognize
patterns and regularities and then uses these regularities to take actions such as classifying the data
into different categories
• PR is a complex cognitive process in the brain. It involves analyzing various forms of data,
including images, video, and audio, with the intent of identifying and detecting specific visual
patterns (objects).
What is Pattern Classification?
• Pattern classification is a subfield of pattern recognition that involves categorizing (classifying)
patterns into pre-defined classes or categories. In other words, it is the process of assigning labels
to data based on their content.
Pattern Recognition and Classification Applications
10/8/2024 13
Training data/validation/test
10/8/2024 https://www.v7labs.com/blog/train-validation-test-set 14
Preprocessing- Outlier Removal
• Outlier data that lie far from the mean of the corresponding random variables. They can produce
large errors during training, especially when they are a result of noise.
• For a normal distribution, we could remove data points that are more than three standard
deviations from the mean (since they have less than a 1% chance of belonging to the distribution).
Preprocessing- Outlier Removal Cont.
Preprocessing- Normalization
• Normalization is a technique often applied as part of data preprocessing for
machine learning.
• The goal of normalization is to adjust the values of numeric data to a common scale
without losing information.
❖For example:
Assume your input dataset contains one column with values ranging from 0 to 1,
and another column with values ranging from 10,000 to 100,000. The great
difference in the scale of the numbers could cause problems when you attempt to
combine the values as features during modeling.
10/8/2024 17
Preprocessing- Normalization Cont.
• Min-max normalization: one of the most common ways to normalize data. For every feature, the
minimum value of that feature gets transformed into a 0, the maximum value gets transformed into
a 1, and every other value gets transformed into a decimal between 0 and 1. The formula to achieve
this is the following:
𝒙 − 𝒙𝐦𝐢𝐧
𝒙𝒔𝒄𝒂𝒍𝒆𝒅 =
𝒙𝐦𝐚𝐱 − 𝒙𝐦𝐢𝐧
• Z-score normalization: technique scales the values of a feature to have a mean of 0 and a
standard deviation of 1. 𝒙−𝝁
𝝈
• Here, 𝜇 is the mean value of the feature and 𝜎 is the standard deviation of the feature.
• If 𝒙 (value) is exactly equal to the mean of all the values of the feature, it will be transformed into a
0. If it is below the mean, it will be a negative number, and if it is above the mean, it will be a
positive number.
10/8/2024 18
Normalization Techniques-Numerical Example
• Use the method below to normalize the following group of data: 1000, 2000, 3000, 5000, 9000
▪ Min-max normalization
Data Normalized data
1000 0
2000 0.125
3000 0.25
5000 0.5
9000 1
10/8/2024 19
Normalization Techniques-Numerical
Example
• Use the method below to normalize the following group of data: 1000, 2000, 3000, 5000, 9000
▪ Z-Score Normalization: Data Normalized data
∑ 𝑥𝑖 −𝜇 2
• Standard Deviation = 1000 -1.204
𝑛−1
3000 -0.4016
(1000−4000)2 +(2000−4000)2 +(3000−4000)2 +(5000−4000)2 +(9000−4000)2
• σ=
5−1
= 2489.97 5000 0.4016
9000 2.008
10/8/2024 20
Histogram Image gray-level occurrence
Initialize H(i) = 0 for all i
For each pixel (i,j)
Considerations H(pixel(i,j)) ++;
• How many times each intensity value occurred in the image
• Information about image characteristics and quality.
• Two completely different images may have very similar histograms (no spatial information).
• Can we reconstruct image from histogram?
10/8/2024 21
Histogram Equalization
• Histogram equalization is a technique in image processing used to enhance the
contrast of an image by effectively redistributing its intensity values.
10/8/2024 22
Histogram Equalization: Manual Calculation
10/8/2024 24
Histogram Equalization: Manual Calculation
Cont.
○ The general histogram equalization formula is:
𝑐𝑑𝑓 𝑣 − 𝑐𝑑𝑓𝑚𝑖𝑛
ℎ 𝑣 = 𝑟𝑜𝑢𝑛𝑑 ∗ (𝐿 − 1)
𝑀 ∗ 𝑁 − 𝑐𝑑𝑓𝑚𝑖𝑛
10/8/2024 25
Histogram Equalization: Manual Calculation
Cont.
For example, the 𝑐𝑑𝑓 83 = 194, the normalized value becomes:
49 −1
ℎ 83 = 𝑟𝑜𝑢𝑛𝑑 ∗ 255 = 194
63
0 12 53 93 146 53 73 166
65 32 1 215 235 202 130 158
57 32 117 239 251 227 93 166
65 20 154 243 255 231 146 130
97 53 117 227 247 210 117 146
190 85 36 146 178 117 20 170
202 154 73 32 12 53 85 194
206 190 130 117 85 174 182 219
10/8/2024 Equalized Image 26
Histogram: Threshold
(a) (b)
10/8/2024 27
Histogram: Threshold
f histogram(f)
If this segmentation
method results in
overlapping objects,
Original Image Segmented Image Edge detection How do we solve
this problem?
Feature Extraction
The choice of appropriate (well-designed) features depends on the particular image and the application at
hand. However, they should be:
• Robust: invariant to translation, orientation (rotation), scale, and illumination and invariant to the presence
of noise and artifacts; this may require some preprocessing of the image.
• Discriminating: the range of values for objects in different classes should be different and preferably be
well separated and non-overlapping.
• Reliable: all objects of the same class should have similar values.
• Independent: uncorrelated; as a counter-example, length, and area are correlated and it would be wasteful
to consider both as separate features.
Feature Extraction Cont.
• Measurements obtainable from the gray-level histogram of an object such as its mean pixel value
(grayness or color) and its standard deviation, its contrast, and its entropy
• The size or area, and its perimeter.
• The Circularity: a ratio of perimeter 2 to area, or area to perimeter 2 (or a scaled
version, such 4𝜋𝐴/𝑃2 )
• Aspect ratio: the ratio of the Feret diameters, given by placing a bounding box around the object.
Feature Extraction Cont.
• Skeleton or medial axis transform, or points within it such as branch points and endpoints, which
can be obtained by counting the number of neighboring pixels on the skeleton (viz., 3 and 1,
respectively)
Feature Extraction Cont.
• The Euler number: the number of connected components (i.e., objects) minus the number of holes
in the image.
X
Simple linear Second-degree polynomial Fourth-degree polynomial
model model model
Poor performance on the training data Good performance on the training Good performance on the training
and poor generalization to other data. data and good generalization to data and poor generalization to
other data other data.
High Bias - the model could not Low Bias -model fits very well Low Bias - model fits very well
fit the training data well. with training data . with training data and thus
produces low error.
Low Variance - any data will produce Low Variance - Both the training
high error in this model, so all errors will and test error is close so that no High Variance - For the test data,
be high and there will not be much much difference in it. the model produces very high error
difference between errors. and thus the difference between
training and test error is high.
10/8/2024 37
Dimensionality Reduction Cont.
• There are two main methods for reducing dimensionality; feature selection and feature extraction
• Feature selection
• Select the 𝑘 features (out of 𝑑) that provide the most information, discarding the other (𝑑 − 𝑘)
features.
• Methods to implement feature selection include using the inter/intraclass distance and subset
selection such as Fisher score
• Feature extraction
• Find a new set of 𝑘 (< 𝑑) features which are combinations of the original 𝑑 features. These
methods may be supervised or unsupervised. The most widely used feature extraction methods
are Principal Components Analysis (PCA) and Linear Discriminant Analysis (LDA), which
are both linear projection methods, unsupervised and supervised respectively
Feature Selection
• What is feature selection?
⁃ If the data has more than 20,000 features and you need
to cut down, it to 1000 features before trying machine
learning. Which 1000 features you should choose it?
⁃ The process of choosing the 1000 features to use it is
called feature selection
Overfitting means the model performs well on the training data but does not perform well on the test data.
This is because the model is memorizing the data it has seen and is unable to generalize to unseen examples
10/8/2024 39
Feature Selection Cont.
• Inter/Intraclass Distance:
⁃ Good features are discriminative. Intuitively, there should be a large interclass distance and
a small intraclass distance. The figure shows the case for a single feature, two equiprobable
class situation. The separability of the classes is the ratio of the intraclass distance and the
interclass distance.
Feature Selection Cont.
• Fisher’s score algorithm:
Fisher’s score algorithm:
Is one of the supervised feature selection methods that select each feature independently
according to their scores.
• Here's the formula for calculating a score:
2
∑𝑘𝑗=1 𝑛𝑗 𝜇𝑗 − 𝜇
𝐹=
∑𝑘𝑗=1 𝑛𝑗 𝜎𝑗2
Example 𝐹=
∑𝑘𝑗=1 𝑛𝑗 𝜇𝑗 − 𝜇
∑𝑘𝑗=1 𝑛𝑗 𝜎𝑗2
1+2+3+4+5 5+6+7+1+2
𝜇𝑓1 = = 3 𝜇𝑓2 = = 4.2
5 5
10/8/2024 42
2
∑𝑘𝑗=1 𝑛𝑗 𝜇𝑗 − 𝜇
𝐹=
∑𝑘𝑗=1 𝑛𝑗 𝜎𝑗2
𝟐 𝟐
𝒏𝒇𝟏𝒄𝟏 𝝁𝒇𝟏𝒄𝟏 −𝝁𝒇𝟏 +𝒏𝒇𝟏𝒄𝟐 𝝁𝒇𝟏𝒄𝟐 −𝝁𝒇𝟏
Fisher_score_F1 =
𝒏𝒇𝟏𝒄𝟏 (𝝈𝟐𝒇𝟏𝒄𝟏 )+𝒏𝒇𝟏𝒄𝟐 (𝝈𝟐𝒇𝟏𝒄𝟐 )
𝟐×(𝟏.𝟓−𝟑)𝟐 +𝟑×(𝟒−𝟑)𝟐
Fisher_score_F1 = = 𝟏. 𝟖𝟕𝟓𝟎 1
𝟐×(𝟎.𝟓)+𝟑×(𝟏)
Fisher_score_F2 = 𝟎. 𝟏𝟕𝟔𝟎 2
10/8/2024 44
Assignment
In a pattern recognition problem, there were two classes (C1, and C2), and three features F1, F2, F3, the
values for the features for each class are:
F1 F2 F3
C1 C2 C1 C2 C1 C2
2.1424 1.0575 3.7830 1.9366 5.0385 2.0247
2.2580 0.7707 3.3390 1.4727 4.8183 1.5616
2.1337 1.2382 3.6057 1.5228 4.5713 1.8716
2.2382 1.2378 3.5439 1.7134 4.7761 1.9508
1.7595 0.9925 3.3156 1.5119 4.4982 1.3813
1.9960 1.0655 3.0659 1.4809 4.6961 1.4118
1.9687 1.0349 3.4882 1.3335 4.6904 1.8142
Based on the above table,
1. Sort the three features according to Fisher’s score.
2. Write a Python code to compute Fisher score of the three features(write the code from scratch,
you can use an existing function to validate your code.)
10/8/2024 45
Distance Measures for Machine Learning
• Distance measures are a key part of several machine learning algorithms. These
measures are used in both supervised and unsupervised learning, generally to
calculate the similarity between data points.
10/8/2024 46
Distance Measures for Machine Learning Cont.
𝐴 𝑥1 , 𝑦1
• Euclidean distance (L2- norm): It represents the shortest distance between
two points.
2 2 1/2
𝐷𝑒 = 𝑥1 − 𝑥2 + 𝑦1 − 𝑦2
𝐵 𝑥2 , 𝑦2
• Manhattan distance (L1- norm): is the simple sum of the horizontal and 𝐴 𝑥1 , 𝑦1
vertical components.
𝐷𝑚 = 𝑥1 − 𝑥2 + 𝑦1 − 𝑦2
𝐵 𝑥2 , 𝑦2
• Hamming distance measures the similarity between two binary data strings
of the same length. That is, the number of bits that need to be changed to
turn one string into the other.
10/8/2024 47
Types of Machine Learning Algorithms
10/8/2024 48
1. Supervised Learning
➢In this type of machine learning algorithm,
• The training dataset is a labeled dataset.
• In other words, the training dataset contains
the input value (X) and target value (Y).
• The learning algorithm generates a model.
• Then, a new dataset consisting of only the
input value is fed.
• The model then generates the prediction based
on its learning.
➢Supervised learning problems are categorized into
"classification" and "regression" problems.
10/8/2024 49
Types of Supervised Learning Algorithm
• There are two types of supervised learning algorithm
Classification Regression
• The output variable (Y) is a category or • The output variable (Y) is a real or
discrete value such as “red” or “blue” or continuous value such as “salary” or
“disease” and “no disease” “weight”.
• Example: • Example: house price prediction
▪ Email: Spam / Not spam.
▪ Tumor: Malignant/ Benign
Square
Circle ??
Triangle
10/8/2024 50
K-nearest neighbor (K-NN)
• The k-nearest neighbors (K-NN) algorithm is a supervised
learning and non-parametric algorithm that can be used to solve
both classification and regression problem statements.
• It is also called a lazy learner algorithm because it does not
learn from the training set immediately instead it stores the
dataset and at the time of classification, it performs an action
on the dataset.
10/8/2024 51
How does K-NN work?
• The K-NN working can be explained on the basis of the
below algorithm:
Yes
No
10/8/2024 53
Classification using K-NN Cont.
10/8/2024 54
Regression using K-NN
• Given the Ages and Loans of N customers. Determine the House
Price Index (HPI) of an unknown customer.
10/8/2024 55
Regression using K-NN Cont.
• If K=1
then the nearest neighbor is the last case in the
training set with HPI=264.
• If K=3
the prediction for HPI is equal to the average of
TRY 𝑲 = 𝟓 to determine the
HPI for the top three neighbors. HPI of a new customer ?
10/8/2024 57
2. Unsupervised Learning
In this type of machine learning algorithm,
10/8/2024 58
K-means Clustering
• The basic idea of the k-means clustering
algorithm is partitioning n data points into
k clusters by defining k centroids.
10/8/2024 59
K-means Clustering Algorithm: Step 1
• Specify the number 𝑘 of clusters to assign and randomly select K
centroids (K=2).
10/8/2024 60
K-means Clustering Algorithm: Step 2
10/8/2024 61
K-means Clustering Algorithm: Step 3
10/8/2024 62
K-means Clustering Algorithm: Step 4
• Repeat the step 2, which means reassign each data point to the new
closest centroid of each cluster
10/8/2024 63
K-means Clustering Algorithm: Step 5
• Iterate until some stopping criterion is met
10/8/2024 64
K-means Clustering Algorithm: Limitations
10/8/2024 65
K-means Clustering Algorithm: Summary
10/8/2024 66
K-means Clustering Example
Suppose we have 7 types of medicines and each medicine has two attributes or features as shown in the table below.
Our goal is to group these into two clusters of medicines based on the two features
Suppose that the initial seeds (centers of each cluster) are C1=(1,1) and C2=(5,7).
2 1.5 2
Features 2
6
3 3 4 5
4
4 5 7
3
5 3.5 5
2
6 4.5 5 1
7 3.5 4.5 0
0 1 2 3 4 5 6
features1
10/8/2024 67
Step 1: Initialization K=2 , Randomly select 2 centroids C1=(1, 1) and
C2=(5, 7).
8
7
C2
6
Features 2 5
1 C1
0
0 1 2 3 4 5 6
features1
10/8/2024 68
Step2: Assign every point to a cluster whose centroid is the closest to the point
10/8/2024 69
Step3: Re-compute the centroid for each cluster based on the newly
assigned points in the cluster
5+3.5+4.5+3.5 7+5+5+4.5
Group 2 = , = (4.12,5.38)
4 4
10/8/2024 70
Step4: Repeat the step 2, which means reassign each data point to the
new closest centroid of each cluster
Medicin C1=(1.83,2.33) C2=(4.12,5.38) cluster
es
1 (1.83 − 1)2 +(2.33 − 1)2 = 1.57 (4.12 − 1)2 +(5.38 − 1)2 = 5.38 1
2 (1.83 − 1.5)2 +(2.331 − 2)2 = 0.47 (4.12 − 1.5)2 +(5.38 − 2)2 = 4.29 1
3 (1.83 − 3)2 +(2.33 − 4)2 = 2.04 (4.12 − 3)2 +(5.38 − 4)2 = 1.78 2
4 (1.83 − 5)2 +(2.33 − 7)2 = 5.64 (4.12 − 5)2 +(5.38 − 7)2 = 1.84 2
5 (1.83 − 3.5)2 +(2.33 − 5)2 = 3.15 (4.12 − 3.5)2 +(5.38 − 5)2 = 0.73 2
6 (1.83 − 4.5)2 +(2.33 − 5)2 = 3.78 (4.12 − 4.5)2 +(5.38 − 5)2 = 0.54 2
10/8/2024 71
Step3: Re-compute the centroid for each cluster based on the newly
assigned points in the cluster
1+1.5 1+2
Group 1 = , = (1.25,1.5)
2 2
3+5+3.5+4.5+3.5 4+7+5+5+4.5
Group 2 = , = (3.9,5.1)
5 5
10/8/2024 72
Step5: Iterate until some stopping criterion is met
Medi C1=(1.25,1.5) C2=(3.9,5.1) cluster
cines
1 (1.25 − 1)2 +(1.5 − 1)2 = 0.58 (3.9 − 1)2 +(5.1 − 1)2 = 5.2 1
2 (1.25 − 1.5)2 +(1.5 − 2)2 = 0.56 (3.9 − 1.5)2 +(5.1 − 2)2 = 3.92 1
3 (1.25 − 3)2 +(1.5 − 4)2 = 3.05 (3.9 − 3)2 +(5.1 − 4)2 = 1.42 2
4 (1.25 − 5)2 +(1.5 − 7)2 = 6.66 (3.9 − 5)2 +(5.1 − 7)2 = 2.20 2
5 (1.25 − 3.5)2 +(1.5 − 5)2 = 4.16 (3.9 − 3.5)2 +(5.1 − 5)2 = 0.41 2
6 (1.25 − 4.5)2 +(1.5 − 5)2 = 4.78 (3.9 − 4.5)2 +(5.1 − 5)2 = 0.61 2
10/8/2024 75
Assignment
Thank you!
10/8/2024 77