0% found this document useful (0 votes)

39 views56 pages

Lec1 PDF

Week 1-4 of the course covers introductions to machine learning, supervised learning, unsupervised learning, semi-supervised learning, classification and regression problems, linear regression using gradient descent and stochastic gradient descent, logistic regression, and its multiclass extension. Evaluation will be through an assignment and textbooks and references on related topics are provided.

Uploaded by

Ashish Padhy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

39 views56 pages

Lec1 PDF

Uploaded by

Ashish Padhy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 56

Week 1-4: Introduction

to Machine learning

Dr. Rajesh Kumar Tripathy

Assistant Professor, EEE
BITS Pilani, Hyderabad Campus
Week 1-3 (Lecture 1-8)

Syllabus: Introduction to machine learning, Supervised, unsupervised and semi-supervised

learning, Classification and regression problems, Linear regression, gradient descent (Batch
gradient descent and stochastic gradient descent), Logistic regression, multiclass extension of
logistic regression, Performance Measures for Classifiers (binary class and multiclass), Likelihood
ratio test, Bayesian Multiclass classifier with ML and MAP Criteria.

Evaluation: Assignment 1 (Please submit the report for the assignment 1 along-with the
pseudo-code)

Textbooks:

T1. Simon Haykin, “Neural Networks – A comprehensive Foundation”, Pearson Education, 1999.
T2. H. J. Zimmermann, “Fuzzy Set Theory and its Applications”,3rd Edition, Kluwer Academic, 1996.

Reference books/Materials

R1: CS229 Lecture notes: Stanford University

R2: CS231 Convolutional neural networks for visual recognition: Stanford University
R3: http://gyan.iitg.ernet.in/handle/123456789/833
R4: https://www.sciencedirect.com/science/article/pii/S0925231206000385
R5: https://www.springer.com/cda/content/document/cda_downloaddocument/9783319284354-c2.pdf?SGWID=0-
0-45-1545215-p177863021
Introduction to Pattern Recognition

❑Pattern recognition stems from the need for automated machine

recognition of objects, signals or images, or the need for automated
decision-making based on a given set of parameters or features.

Applications:

❑ Speech recognition (e.g., automated voice-activated customer service)

❑ Speaker identification (Forensic applications)
❑ Handwritten character recognition (such as the one used by the postal system
to automatically read the addresses on envelopes)
❑ Identification of a system malfunction based on sensor data
Applications:

❑Loan or credit card application decision based on an individual’s credit

report data

❑ Automated digital mammography analysis for early detection of cancer

❑Automated electrocardiogram (ECG) or electroencephalogram (EEG)

analysis for cardiovascular or neurological disorder diagnosis

❑Biometrics (personal identification based on biological data such as iris

scan, fingerprint, Heart sound, ECG, etc.).
Components of Pattern Recognition System
What is Machine Learning?

• Machine learning focuses on the development of computer programs that can

access data and use it learn for themselves.

• Machine learning is an application of artificial intelligence (AI) that provides

systems the ability to automatically learn and improve from experience without
being explicitly programmed.

• Machine learning is broadly categorized into three types as supervised learning,

unsupervised learning and semi supervised learning.
Supervised Learning

Test data

Trained Model Test label (Banana)

Unsupervised Learning

• Unsupervised learning is the training of machine using information that is

neither classified nor labeled and allowing the algorithm to act on that
information without guidance.

Clustering: A clustering problem is where

you want to discover the inherent groupings
in the data, such as grouping customers by
purchasing behavior.
Semi-supervised Learning

❑This kind of learning fall somewhere in between supervised and unsupervised

learning, since they use both labeled and unlabeled data for training – typically a
small amount of labeled data and a large amount of unlabeled data.

❑The systems that use this method are able to considerably improve learning
accuracy.

❑Usually, semi-supervised learning is chosen when the acquired labeled data

requires skilled and relevant resources in order to train it / learn from it.
Category of Supervised Learning

❑ Supervised learning classified into two categories of algorithms:

❑ Classification: A classification problem is when the output variable is a

category, such as “Red” or “blue” or “disease” and “no disease”.

❑ Regression: A regression problem is when the output variable is a real value,

such as “dollars” or “weight”.
Linear Regression

❑ Living area, number of bedrooms: Features or attributes.

❑ Price of the house: Output (for regression problems)

and class labels (classification problems)

❑ x i is the feature vector for ith instance and it is given

by x i = [x1, x2]; where x1 is the Living area and x2 is
the number of bedrooms. y i is the output or price of
the house for i th instance.
Initially, the hypothesis is given as

We can also write in the intercept form as

Where ‘n’ is the number of input

variables or features. Here, for
house price prediction problem, n
is given as 2.
The cost function can be defined as

Our objective is to evaluate the parameter

‘w’ so as to minimize the above cost
function. The popular Least mean square
(LMS) algorithm is widely used to estimate
the parameter ‘w’.
LMS algorithm considers the gradient descent, which start with some
initial theta and repeatedly perform the update as
Where α is the learning rate and its
value varies from 0 to 1.

This is a natural algorithm which takes a step in the direction of steepest decrease
of the cost function ‘J’. Now, the partial derivative is evaluated as

For one instance, the partial derivative is

evaluated as
Thus, for a single training instance (i th instance), the LMS update
rule is given by

For entire training set, the LMS update rule is given as

Where ‘j’ is the number of attributes or features with j=0, 1, 2…., n and the
training instances varies from i=1, 2, …, m. This method looks at every example in
the entire training set on every step, and is called batch gradient descent.
Stochastic gradient descent

❑ In this algorithm, we repeatedly run through the training set, and each time we
encounter a training example, we update the parameters according to the
gradient of the error with respect to that single training example only. This
algorithm is called stochastic gradient descent (also incremental gradient
descent).

❑ Often, stochastic gradient descent gets w “close” to the minimum much faster
than batch gradient descent.
Regularized Linear Regression (Ridge Regression)
❑ Regularization helps to deal with the bias-variance problem of
model development. When small changes are made to data, such as
switching from the training to testing data, there can be wild
changes in the estimates. Regularization can often smooth this
problem out substantially.

❑ For highly correlated features, the regularization is helpful for

smoother minimization of the cost function.

The cost function for ridge regression is given by

The partial derivative with respect to wj is given by

For one instance, the partial derivative

is evaluated as

The gradient descent for ridge

regression is given by
Least Angle Regression
Implementation of Linear Regression

%%%%%X is the feature matrix X=[x0, x1] and y is the output.

%%%%%%%%Weight vector is w= [ w0, w1]%%%%%

For i = 1 to K %%%%assign number of iteration initially (K)

T1 = w(1) - alpha * ((X * w ) - y)' * X(:, 1);

T2 = w(2) - alpha * ((X * w ) - y)' * X(:, 2);
w (1) = T1;
w (2) = T2;
J (i) = evaluatecostfunction (X, y, w);

End of the for loop

Vectorization based Linear regression
Vectorization based Ridge regression
Vectorization based Least Angle regression
Linear Regression from Probabilistic Prospective
Let us assume that the target variables and the inputs are related
via the equation as

are independently and identically distributed according to a Gaussian

distribution with zero mean and some variance σ^2.

The probability density function is given by

This can also be written as

The likelihood function by considering all the training examples
is given by

The minimization of error is same as the maximization of the log likelihood and it is
given by
Logistic regression (Binary Classification)

The hypothesis for logistic regression is given as

logistic function or the sigmoid function.

g(z) tends towards 1 as z → ∞, and g(z) tends towards 0 as
z → −∞.
The hypothesis for the classification task is given by

More compactly, it can be written as

The likelihood function by considering ‘m’ training examples is given by

The maximization of the cost function is given by

The partial derivative of the cost function for single training instance
is given by

Hence using the LMS rule, the weight values for logistic regression is estimated as

For test data (xt), the output is evaluated as

Multiclass Extension of Logistic Regression (One Vs All)

❑Let’s consider a four class classification problem with class labels as

1, 2, 3 and 4.

❑In One Vs All algorithm, 4 binary class models are created (‘n’ classes
so ‘n’ number of models). The following block diagram explains the
procedure for One Vs All based multiclass coding algorithm.

yt=max (y1, y2, y3, y4), where y1,

y2, y3 and y4 are the predicted
class labels using model1,
model2, model3 and model4,
respectively.
Multiclass Extension of Logistic Regression (One Vs One)
❑ In One Vs One algorithm, 6 binary class models are created for 4
class classification task (for ‘n’ classes ‘n(n-1)/2’ number of
models). The following block diagram explains the procedure for
One Vs All based multiclass coding algorithm.

yt= mode (y1, y2, y3, y4, y5, y6),

where y1, y2, y3, y4, y5 and y6 are
the predicted class labels using
model1, model2, model3, model4,
model5 and model6, respectively.
Logistic Regression with L2-Norm Regularization
The cost function or likelihood function for logistic regression with
regularization is given by

Now, the gradient of the cost function for single training instance is given by

Hence using the LMS rule, the weight values for the cost function is estimated as
Logistic Regression with L1-Norm Regularization
The cost function or likelihood function for logistic regression with
L1-Norm regularization is given by

Now, the gradient of the cost function for a single training instance is given by

Hence using the LMS rule, the weight values for the cost function is estimated as
Unsupervised Learning (k-means clustering)

Let X = {x1,x2,x3,……..,xm} be the set of data points and

V = {v1,v2,…….,vc} be the set of centers.

❑ Randomly select ‘c’ cluster centers.

❑ Calculate the distance between each data point and cluster centers.

❑ Assign the data point to the cluster center whose distance from the cluster
center is minimum of all the cluster centers.

❑ Recalculate the new cluster center using:

where, ‘ck’ represents the number of data points in kth cluster.

How to Select training and Test instances?

Hold-out cross validation:

❑The holdout method is the simplest kind of cross validation. The data set is
separated into two sets, called the training set and the testing set.

❑The function approximator fits a function using the training set only. Then the
function approximator is asked to predict the output values for the data in the
testing set (it has never seen these output values before).

❑Either 60/40, 70/30 or 80/20 based hold-out cross-validation approaches are

followed.
K-fold cross-validation

❑ K-fold cross validation is one way to improve over the holdout method. The data set is divided into
k subsets, and the holdout method is repeated k times.

❑ Each time, one of the k subsets is used as the test set and the other k-1 subsets are put together
to form a training set.

❑ Then the average error across all k trials is computed. The advantage of this method is that it
matters less how the data gets divided.

❑ The disadvantage of this method is that the training algorithm has to be rerun from scratch k
times, which means it takes k times as much computation to make an evaluation.
Leave-one-out cross validation

❑Leave-one-out cross validation is K-fold cross validation taken to its logical

extreme, with K equal to N, the number of data points in the set.

❑That means that N separate times, the function approximator is trained on all
the data except for one point and a prediction is made for that point.
Performance Measures for Binary Classifier

❑The performance of the binary classifier is evaluated using the

confusion matrix. This matrix is given by

❑ The TP, the TN, the FP and the FN are number of true positives, number of
true negatives, number of false positives and number of false negatives,
respectively.
❑The sensitivity (SE) is defined as proportion of abnormal episodes that are
accurately classified as abnormal and it is given by
❑The specificity (SP) is defined as proportion of normal episodes
that are accurately classified as normal and it is given by

❑The accuracy (Acc) is defined as the proportion of episodes that are

classified as normal and abnormal. It is given by
Performance Measures for Multiclass Classifier
❑ The performance measures for multiclass classifier are individual
class accuracy and the overall accuracy. These measures are
evaluated from the multiclass confusion matrix. The confusion
matrix is given as

❑The individual class accuracy (IA) value of i th class is given by

❑where i, j = 1, 2, 3, 4, 5. The overall accuracy (OA) of the multiclass classifier is

evaluated as
Bayesian Decision Theory
The feature matrix and the class label vector are denoted by

The feature matrix consists of ‘m’ feature vectors.

i=1,2,…m
The posterior probability evaluated using Bayes’s theorem is given by

The posterior probability can also be written as

The likelihood is modeled using Normal or Gaussian distribution. For
single dimensional feature vector, the likelihood function is given by

For multi dimensional feature vector,

the likelihood function is given by
Likelihood Ratio Test (LRT) for Two-class Bayesian Classifier
The decision rule for Two-class Bayesian Classifier using a posteriori
is given by

We can also write

As P(x) does not affect the decision rule, so it can be eliminated.

The LRT is defined as
Maximum a posteriori Probability (MAP) Decision rule

The LRT is given by

The LRT can also be written as

For binary class, the MAP decision rule is given by

The MAP decision rule for multiclass classification is given by

MAP decision rule for Multiclass Classification
Maximum Likelihood (ML) Decision rule

The LRT for two-class classifier is given by

For equal priors, the decision rule for LRT is given by

The ML decision rule for multiclass classification is given by

ML decision rule for Multiclass Classification

ZJC Focus On Combined Science Form 1
92% (36)
ZJC Focus On Combined Science Form 1
226 pages
2.1 PPT - Homogeneous and Hetero Mixtures
No ratings yet
2.1 PPT - Homogeneous and Hetero Mixtures
60 pages
k3 Ve Service Manual
26% (19)
k3 Ve Service Manual
2 pages
Cyclotron
72% (61)
Cyclotron
20 pages
Analysis and Design of (Concentric, Edge, Corner) Footing: Sample Structural Manila
100% (1)
Analysis and Design of (Concentric, Edge, Corner) Footing: Sample Structural Manila
3 pages
Linear Regression Python Programming
No ratings yet
Linear Regression Python Programming
25 pages
CNP Bill
No ratings yet
CNP Bill
1 page
Painting Rubrics
100% (1)
Painting Rubrics
2 pages
Cost Function
No ratings yet
Cost Function
17 pages
BITS F464 ML Lecture Notes
No ratings yet
BITS F464 ML Lecture Notes
86 pages
Chapter02 Introduction To DeepLearning
No ratings yet
Chapter02 Introduction To DeepLearning
84 pages
ZYJ260
No ratings yet
ZYJ260
78 pages
CS229
No ratings yet
CS229
69 pages
Linear - Regression - SGD
No ratings yet
Linear - Regression - SGD
71 pages
All Unit
No ratings yet
All Unit
100 pages
Machine Learning Notes Cs229 1
No ratings yet
Machine Learning Notes Cs229 1
217 pages
04 - Linear-Classification-2024
No ratings yet
04 - Linear-Classification-2024
65 pages
cs229 2
No ratings yet
cs229 2
275 pages
ML Introduction
No ratings yet
ML Introduction
76 pages
DAC ML Tutorial Final Deck
No ratings yet
DAC ML Tutorial Final Deck
150 pages
01B DL2023 LinearModels
No ratings yet
01B DL2023 LinearModels
47 pages
2EL1730 ML Lecture02 Linear and Logistic Regression
No ratings yet
2EL1730 ML Lecture02 Linear and Logistic Regression
65 pages
Fileml
No ratings yet
Fileml
54 pages
AC-ED L04 - Logistic Regression, Regularization
No ratings yet
AC-ED L04 - Logistic Regression, Regularization
80 pages
Final ML
No ratings yet
Final ML
54 pages
06 Logistic Regression
No ratings yet
06 Logistic Regression
55 pages
2 - Multiple Linear Regression
No ratings yet
2 - Multiple Linear Regression
71 pages
Lecture 2-Regression
No ratings yet
Lecture 2-Regression
49 pages
Lecture 7 - Part A - Mutli Class and Overfitting and Regularization
No ratings yet
Lecture 7 - Part A - Mutli Class and Overfitting and Regularization
43 pages
Lecture 3 - Regression
No ratings yet
Lecture 3 - Regression
47 pages
03 Linear Models
No ratings yet
03 Linear Models
46 pages
QSRI Lecture1
No ratings yet
QSRI Lecture1
45 pages
Generalized Linear Model
No ratings yet
Generalized Linear Model
67 pages
Notes 1
No ratings yet
Notes 1
30 pages
(MLP) MidtermNote
No ratings yet
(MLP) MidtermNote
31 pages
Regression
No ratings yet
Regression
30 pages
ML: Introduction 1. What Is Machine Learning?
No ratings yet
ML: Introduction 1. What Is Machine Learning?
38 pages
Cours1 ML
No ratings yet
Cours1 ML
41 pages
CS229 Lecture Notes: Supervised Learning
No ratings yet
CS229 Lecture Notes: Supervised Learning
30 pages
MLA TAB Lecture3
No ratings yet
MLA TAB Lecture3
70 pages
Lecture 8: Gradient Descent and Logistic Regression
No ratings yet
Lecture 8: Gradient Descent and Logistic Regression
39 pages
Brief Summary ML
No ratings yet
Brief Summary ML
25 pages
3 LogisticRegression
No ratings yet
3 LogisticRegression
30 pages
Logistic Regression
No ratings yet
Logistic Regression
42 pages
Machine Learning Shortnote
No ratings yet
Machine Learning Shortnote
14 pages
Lec4 Oct12 2022 PracticalNotes LinearRegression
No ratings yet
Lec4 Oct12 2022 PracticalNotes LinearRegression
34 pages
Week 1 Lecture Notes
No ratings yet
Week 1 Lecture Notes
7 pages
CS229 Lecture Notes: Supervised Learning
No ratings yet
CS229 Lecture Notes: Supervised Learning
30 pages
EAPP Module2 v2
No ratings yet
EAPP Module2 v2
7 pages
Introduction To Machine Learning: Dr. Muhammad Amjad Iqbal
No ratings yet
Introduction To Machine Learning: Dr. Muhammad Amjad Iqbal
20 pages
Icc PDF
100% (1)
Icc PDF
279 pages
A Layman's Guide To The Project
No ratings yet
A Layman's Guide To The Project
34 pages
Machine Learning Summary
No ratings yet
Machine Learning Summary
38 pages
Grid Audit Report Format
100% (1)
Grid Audit Report Format
7 pages
Machine Learning
No ratings yet
Machine Learning
9 pages
cs229 Notes1 PDF
No ratings yet
cs229 Notes1 PDF
28 pages
Inolab Cond 730
No ratings yet
Inolab Cond 730
80 pages
Machine Learning - SoS 2017
No ratings yet
Machine Learning - SoS 2017
15 pages
Anuranan Das Summer of Sciences, 2019. Understanding and Implementing Machine Learning
No ratings yet
Anuranan Das Summer of Sciences, 2019. Understanding and Implementing Machine Learning
17 pages
Machine Learning 2
No ratings yet
Machine Learning 2
19 pages
ML:Introduction: Week 1 Lecture Notes
No ratings yet
ML:Introduction: Week 1 Lecture Notes
10 pages
(Machine Learning Coursera) Lecture Note Week 1
No ratings yet
(Machine Learning Coursera) Lecture Note Week 1
8 pages
ML:Introduction: Week 1 Lecture Notes
No ratings yet
ML:Introduction: Week 1 Lecture Notes
5 pages
ML:Introduction: Week 1 Lecture Notes
No ratings yet
ML:Introduction: Week 1 Lecture Notes
8 pages
CS 229 - Supervised Learning Cheatsheet
No ratings yet
CS 229 - Supervised Learning Cheatsheet
2 pages
International Project Management Guide 2.0 (IAPM)
100% (1)
International Project Management Guide 2.0 (IAPM)
44 pages
The Three Lines of Defence: Audit Committee Institute
No ratings yet
The Three Lines of Defence: Audit Committee Institute
4 pages
Filling Station Case Study
No ratings yet
Filling Station Case Study
22 pages
Sartorius PR5510 X4
No ratings yet
Sartorius PR5510 X4
4 pages
A CR CCP 702 PF 001 Red Star IG
No ratings yet
A CR CCP 702 PF 001 Red Star IG
730 pages
Hytera+VM780+4G+Body+Worn+Camera+User+Manual+ (HyTalk) +V1.0.00 Eng
No ratings yet
Hytera+VM780+4G+Body+Worn+Camera+User+Manual+ (HyTalk) +V1.0.00 Eng
50 pages
10 of The Most Luxurious Indian Homes On Houzz
No ratings yet
10 of The Most Luxurious Indian Homes On Houzz
2 pages
The Energy Transition Conference 2023 - Delegates Brochure
No ratings yet
The Energy Transition Conference 2023 - Delegates Brochure
25 pages
GAGEtrak Pro 8 Intro Guide
No ratings yet
GAGEtrak Pro 8 Intro Guide
119 pages
Vmware - Kopia
No ratings yet
Vmware - Kopia
45 pages
Fluostar 2L
No ratings yet
Fluostar 2L
1 page
Izar Net 2 14
No ratings yet
Izar Net 2 14
3 pages
Questions Bank of Project Management
No ratings yet
Questions Bank of Project Management
4 pages
Drone Suppliers Uae
No ratings yet
Drone Suppliers Uae
5 pages
Item Analysis Procedures 1
No ratings yet
Item Analysis Procedures 1
2 pages
Review Quiz - Attempt Review2
No ratings yet
Review Quiz - Attempt Review2
11 pages
Scipy - Stats.norm - SciPy v1.11.2 Manual
No ratings yet
Scipy - Stats.norm - SciPy v1.11.2 Manual
3 pages
Optimal Lot-Size With The Andler Formula: Sensitivity Analysis
No ratings yet
Optimal Lot-Size With The Andler Formula: Sensitivity Analysis
3 pages
Naive Bayes Classifier: Fundamentals and Applications
From Everand
Naive Bayes Classifier: Fundamentals and Applications
Fouad Sabry
No ratings yet
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
Statistical Classification: Fundamentals and Applications
From Everand
Statistical Classification: Fundamentals and Applications
Fouad Sabry
No ratings yet
Mathematical Optimization: Fundamentals and Applications
From Everand
Mathematical Optimization: Fundamentals and Applications
Fouad Sabry
No ratings yet
Random Optimization: Fundamentals and Applications
From Everand
Random Optimization: Fundamentals and Applications
Fouad Sabry
No ratings yet
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet

Lec1 PDF

Uploaded by

Lec1 PDF

Uploaded by

Week 1-4: Introduction

Dr. Rajesh Kumar Tripathy

Syllabus: Introduction to machine learning, Supervised, unsupervised and semi-supervised

R1: CS229 Lecture notes: Stanford University

❑Pattern recognition stems from the need for automated machine

❑ Speech recognition (e.g., automated voice-activated customer service)

❑Loan or credit card application decision based on an individual’s credit

❑ Automated digital mammography analysis for early detection of cancer

❑Automated electrocardiogram (ECG) or electroencephalogram (EEG)

❑Biometrics (personal identification based on biological data such as iris

• Machine learning focuses on the development of computer programs that can

• Machine learning is an application of artificial intelligence (AI) that provides

• Machine learning is broadly categorized into three types as supervised learning,

Trained Model Test label (Banana)

• Unsupervised learning is the training of machine using information that is

Clustering: A clustering problem is where

❑This kind of learning fall somewhere in between supervised and unsupervised

❑Usually, semi-supervised learning is chosen when the acquired labeled data

❑ Supervised learning classified into two categories of algorithms:

❑ Classification: A classification problem is when the output variable is a

❑ Regression: A regression problem is when the output variable is a real value,

❑ Living area, number of bedrooms: Features or attributes.

❑ Price of the house: Output (for regression problems)

❑ x i is the feature vector for ith instance and it is given

We can also write in the intercept form as

Where ‘n’ is the number of input

Our objective is to evaluate the parameter

For one instance, the partial derivative is

For entire training set, the LMS update rule is given as

❑ For highly correlated features, the regularization is helpful for

The cost function for ridge regression is given by

For one instance, the partial derivative

The gradient descent for ridge

%%%%%X is the feature matrix X=[x0, x1] and y is the output.

For i = 1 to K %%%%assign number of iteration initially (K)

T1 = w(1) - alpha * ((X * w ) - y)' * X(:, 1);

End of the for loop

are independently and identically distributed according to a Gaussian

The probability density function is given by

This can also be written as

The hypothesis for logistic regression is given as

logistic function or the sigmoid function.

More compactly, it can be written as

The likelihood function by considering ‘m’ training examples is given by

For test data (xt), the output is evaluated as

❑Let’s consider a four class classification problem with class labels as

yt=max (y1, y2, y3, y4), where y1,

yt= mode (y1, y2, y3, y4, y5, y6),

Let X = {x1,x2,x3,……..,xm} be the set of data points and

❑ Randomly select ‘c’ cluster centers.

❑ Recalculate the new cluster center using:

where, ‘ck’ represents the number of data points in kth cluster.

Hold-out cross validation:

❑Either 60/40, 70/30 or 80/20 based hold-out cross-validation approaches are

❑Leave-one-out cross validation is K-fold cross validation taken to its logical

❑The performance of the binary classifier is evaluated using the

❑The accuracy (Acc) is defined as the proportion of episodes that are

❑The individual class accuracy (IA) value of i th class is given by

❑where i, j = 1, 2, 3, 4, 5. The overall accuracy (OA) of the multiclass classifier is

The feature matrix consists of ‘m’ feature vectors.

The posterior probability can also be written as

For multi dimensional feature vector,

We can also write

As P(x) does not affect the decision rule, so it can be eliminated.

The LRT is given by

The LRT can also be written as

For binary class, the MAP decision rule is given by

The MAP decision rule for multiclass classification is given by

The LRT for two-class classifier is given by

For equal priors, the decision rule for LRT is given by

The ML decision rule for multiclass classification is given by

You might also like