0% found this document useful (0 votes)

67 views5 pages

Machine Learning and Pattern Recognition Week 3 Intro - Classification

The document discusses different methods for representing and learning functions for classification problems, including one-hot encoding for multi-class problems and simple generative models like Gaussian distributions fitted to each class.

Uploaded by

zeliawillscumberg

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

67 views5 pages

Machine Learning and Pattern Recognition Week 3 Intro - Classification

Uploaded by

zeliawillscumberg

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Classification: Regression, Gaussians, and

pre-processing
So far we have fitted scalar real-valued functions f (x) from a real-valued input vector x. We
have matched these to real-valued observations or targets y, a task often called regression.
However, the inputs and outputs of the function we wish to learn could take different types.
This note begins to look at some of the alternatives.
The most common machine learning task is probably classification. We have a dataset of
inputs and outputs, {x(n) , y(n) } as before, but the y labels now belong to a discrete set
of categories. In binary classification, y takes on one of only two values, often {0, 1} or
{−1, +1}, for example indicating if an email is spam or not, or an image contains a particular
object of interest or not.
There are many different ways we could represent and learn a function that predicts discrete
labels. This note gives some of the different ways to represent the function. We will extend
these, and say a lot more about learning later in the course.

1 Just do regression?
Although the labels in classification problems are discrete, 0 and 1 are still numbers. We could
take a training set for a binary classification problem, and use it to fit a linear regression
model where all of the outputs just happen to be 0.0 and 1.0. Is that a good idea?
Given enough basis functions, and data to fit them, we can get a fit close to any function,
so we could fit a function that takes on values close to 0 and 1 for most of its inputs. What
if the labels are noisy, and we can observe both zeros and ones in the same location? Then
to minimize square error, the best function value at a location where p(y = 1) = p1 would
minimize:
E[(y − f )2 ] = p1 (1 − f )2 + (1 − p1 )(0 − f )2 = f 2 − 2p1 f + p1 , (1)
which is minimized by f = p1 . So in the limit of lots of basis functions and data, a flexible
linear regression model can give the probability that a binary label y ∈ {0, 1} will be one. If
we wanted to pick the most probable class, we would return one if our function was greater
than a half, and zero otherwise.
As we can fit regularized least squares regression models quickly, and with very little code,
regressing the labels can be appealing. On the other hand, for a fixed set of basis functions,
it is easy to find examples where the least squares objective does not return the most useful
function for constructing a classification rule. See if you can sketch one before we get to an
example.
Also, regardless of how we fit a linear regression model, the fitted functions will usually
extend outside f ∈ [0, 1] for some inputs, which makes it hard to take the probabilistic
interpretation seriously.
What follows is code to construct an example showing a limitation of least squares fitting.
It shows (in magenta) a least squares quadratic fit to some data, and (in black) another
quadratic curve, which when thresholded at 0.5, is a much better classifier.
import numpy as np
import [Link] as plt

# Train model on synthetic dataset

N = 200
X = [Link](N, 1)*10
yy = (X > 1) & (X < 3)
def phi_fn(X):

MLPR:w3a Iain Murray and Arno Onken, [Link] 1

return [Link]([[Link](([Link][0],1)), X, X**2], axis=1)
ww = [Link](phi_fn(X), yy, rcond=None)[0]

# Predictions
x_grid = [Link](0, 10, 0.05)[:,None]
f_grid = [Link](phi_fn(x_grid), ww)

# Predictions with alternative weights:

w2 = [-1, 2, -0.5] # Values set by hand
f2_grid = [Link](phi_fn(x_grid), w2)

# Show demo
[Link]()
[Link](X[yy==1], yy[yy==1], 'r+')
[Link](X[yy==0], yy[yy==0], 'bo')
[Link](x_grid, f_grid, 'm-')
[Link](x_grid, f2_grid, 'k-')
[Link]([-0.1, 1.1])
[Link]()
Remember: we’re not saying we can’t use linear regression for classification. We could fit
the problem above by using more basis functions. It’s just that approaches designed for
classification will usually generalize better. Linear regression could still be a baseline for
comparison (although by the end of the course, you might jump straight to other baselines).
[Later we will cover logistic regression, which is one way of forcing the function that we are
fitting to lie between zero and one.]

2 ‘One-hot’ or ‘one-of-K’ encodings

What about problems with more than two classes? We might classify a book into one of a set
of categories: {crime, sport, fantasy, . . .}. We could index these classes with integers (e.g.,
y = 2 implies sport). However, for a linear model, we would be insisting that a book whose
feature vector is the average of the vector for a crime novel and the vector for a fantasy novel
will be about sport(!).
An alternative numerical representation of a categorical variable uses a binary vector. Every
element is zero, except for a single one, indicating which class is present. For example, we
can indicate the 2nd class (e.g., sport) with the vector y = [0 1 0 0 . . .]> . We could then fit a
separate regression model to each element of the vector, and predict the label of a test input
by reporting the class whose regression function gives the largest value.1
When we assume that an input only has one label, the task is called multiclass classification.
An alternative (less common) setting, called multilabel classification, allows each object to
have more than one label. For example, a book that’s a romantic-comedy could be tagged
with both “romance” and “comedy”. Using a bit-vector is again a natural approach, where
separate binary classifiers could predict the setting of each bit or tag.
Because expanding a discrete label into a binary vector is a common trick, there is jargon
for it. When there is a single one, it’s commonly called ‘one-hot’, ‘one-of-K’, or ‘one-of-M’
encoding. And you might see other names as well.2
One-hot encoding any discrete (or “categorical”) input features in x can also be a good idea.
If fitting a linear model to predict the popularity of a book, there’s probably no reason to
assume that the predictions for sport should be between those of crime and fantasy.

1. We still have the problems of fitting least squares regression to binary data that were noted above. However,
other methods we will cover, such as generalizations of logistic regression, can also be used to predict these binary
target vectors.
2. In statistics it is sometimes called “dummy variable coding”, an unfortunate clash of terminology with “dummy
variable” elsewhere in mathematics meaning a “bound variable”.

MLPR:w3a Iain Murray and Arno Onken, [Link] 2

3 Simple generative models
Another way to build a classifier is to fit a simple model of each class. Like linear regression,
the fitting code can be short and quick, so these classifiers are good as baselines.
For real-valued input features, we could assume (almost certainly incorrectly) that the
feature-vectors for each class come from a Gaussian distribution:
p(x | y = k ) = N (x; µk , Σk ). (2)
The simplest procedure is to fit the mean and covariance of Gaussian k to the mean and
covariance of the set of feature vectors with label k.
We can then use Bayes’ rule to invert this model, and ask what the model believes about the
label, given a setting of the features:
P(y = k | x) ∝ p(x | y = k) P(y = k) (3)
∝ N (x; µk , Σk ) πk . (4)
For this instance of Bayes’ rule, the ‘prior distribution’ P(y) indicates how frequently we
think the different labels will be used.
We might simply estimate the probabilities of the labels using counts:
∑n I ( y(n) = k )
P(y = k ) = πk ≈ , (5)
N
where I is an indicator returning one or zero.3
At training time we fit the model parameters, θ = {µk , Σk , πk }. Training is easy: measure
some means, covariances, and fractions, which are all based on simple averages over the
dataset. Then at test time, given an input x, we can compute a score for each class,
sk = N (x; µk , Σk ) πk (6)
and normalize it to give a distribution over the labels:
sk
P(y = k | x, θ ) = . (7)
∑k0 sk0
This equation was explicit that the probability is conditioned on a set of parameters θ.
When first introducing the model above, it was implicit that the probabilities depended on
some model settings. In papers and books you’ll often find that probabilities are implicitly
conditioning on things that have been dropped from the notation.
The scatter plot below shows two features for 569 preparations of cells from the Wisconsin
Breast Cancer Data at the UCI machine learning repository.

4
std("texture")

0
0 1 2 3
std(cell radius)

3. In this setting, the estimate above is probably fine: N is large, and hopefully we have a lot of examples from each
class. In other settings, probabilities estimated from counts are often forced away from zero or one with fictitious
‘counts’ α:
α + ∑n I ( y(n) = k )
P(y = k ) = πk ≈ ,
N + Kα
where K is the number of classes, and α could be one, but needn’t be an integer, and is often set smaller than one.

MLPR:w3a Iain Murray and Arno Onken, [Link] 3

We could attempt to fit a multivariate Gaussian to the blue points (normal cells), another
to the red points (malignant cells), to fit a simple classifier. However, the points are clearly
not from a Gaussian distribution. The features are always positive, and Gaussian models
would put probability mass on negative values. Whenever your have positive data, consider
taking its logarithm. As often (but not always) happens4 , taking logs gives a scatter plot that
is easier to read, and has more-plausibly Gaussian-shaped clouds:

2
log(std("texture"))

−1

−2
−4 −2 0 2
log(std(cell radius))

Hopefully if we looked at more features these clouds could be separated rather more than
they are here.
Warning: you would need to think hard about the application and how the data were sampled
before possibly trusting a classifier in an application that matters. Medical applications are
particularly sensitive. One flaw with fitting a classifier as outlined above is that half of
patients do not have cancer. This classifier would give quite high probabilities of cancer to
every test example, which would be impractical to act on and could cause needless worry.
More subtle problems could depend on exactly how the patients in the training set were
selected within each class, and whether those choices will be reflected at test time.

4 “Bayes classifiers” and “Naive Bayes”

The Gaussian classifier above is an example of what is often called a “Bayes Classifier”. A
model is built for each class and then Bayes’ rule is used at test time to form a distribution
over classes.
For discrete features, we could fit a discrete probability distribution for each class, instead of
the Gaussians. There isn’t a correlated, multivariate distribution of discrete features that’s
as easy to deal with as the Gaussian. Instead it’s common to fit a model that assumes the
features are independent. For binary features, xd ∈ {0, 1} the assumption can be written as:

P(x | y = k, θ ) = ∏ P(xd | y = k, θ ) = ∏ θd,k xd (1 − θd,k )1−xd , (8)

d d

where θd,k gives the probability that xd = 1 given that the class is k. The assumption that the
features are independent given the class is known as the Naive Bayes assumption. Not all
Bayes Classifiers are “Naive”!
[The website version of this note has a question here.]
[The website version of this note has a question here.]
Warning about a common confusion: Later in the course we will cover “Bayesian methods”.
These use Bayes’ rule to express beliefs about the parameters of a model given some training
data. For example, we don’t really know the distribution of the features of malignant cells

4. A distribution that is Gaussian on a log scale is a log-normal distribution, which as Wikipedia discusses, comes
up frequently in models. In detail, most positive distributions are not precisely log-Gaussian, but it’s often a more
sensible starting point than a Gaussian.

MLPR:w3a Iain Murray and Arno Onken, [Link] 4

exactly, we just have an estimate from a finite data set. Bayesian methods keep track of
the uncertainty we have in our model throughout the analysis. Confusingly, the “Bayes
classifiers” described in this document — which simply fit parameters and then assume them
to be correct — are not “Bayesian methods” in the sense that is normally meant!

5 Comments
This document covered a couple of approaches to classification: least squares linear regres-
sion, and Bayes classifiers. However, just as important in practice, if not more so, are the
pre-processing methods: one-hot/one-of-K encoding and log-transformations.
Bayes classifiers are a baseline worth knowing about, but we haven’t found time for many
examples in this course. Naive Bayes in particular already occurs in several undergraduate
Edinburgh courses, and probably undergraduate courses in many other Universities. How-
ever, if you haven’t seen Naive Bayes before, you may want to read more about it. It’s a good
baseline to try, and easy to run on enormous datasets. Although some practitioners might
turn straight to “logistic regression”, which we get to soon.
If you do ever use Naive Bayes, you should know that the probabilities reported by Naive
Bayes classifiers are usually poorly calibrated, caused by its strong and wrong independence
assumptions. It’s possible to construct synthetic examples where Naive Bayes is either
overly-confident, or under-confident. Keen students could try to do so. In text classification,
Naive Bayes is usually extremely overconfident: for example, it might declare that many
emails are spam with probability >99.99%, but is wrong on more than 0.01% of them.

6 Further Reading
Neither Murphy or Barber cover the idea of using least squares linear regression for binary
classification. They go straight to the more sophisticated logistic regression method when
modelling binary outputs. You can find discussion in Bishop (Section 4.1.3, p184), or the
classic Duda and Hart Pattern Classification book (Section 5.8 in the Duda, Hart and Stork
second edition from 2000). A book that’s free online, which dives straight into the multiple
class case with one-hot encoding, is Hastie et al.’s The Elements of Statistical Learning (Section
4.2 in both the 1st and 2nd editions). The Rasmussen and Williams book Gaussian Processes
for Machine Learning is also free online, and covers this idea in Section 6.5 (we will discuss
Gaussian Processes later in the course).
Murphy has more detail on Bayes classifiers in section 3.5, and Gaussian classifiers in section
4.2. Barber covers Naive Bayes in Chapter 10, and a classifier based on a mixture of Gaussians
(a generalization we haven’t covered yet) in 20.3.3.

MLPR:w3a Iain Murray and Arno Onken, [Link] 5

MIT18 657F15 LecNote PDF
No ratings yet
MIT18 657F15 LecNote PDF
194 pages
Lecture1 Intro ML
No ratings yet
Lecture1 Intro ML
60 pages
1 Intro
No ratings yet
1 Intro
5 pages
Intro to Binary Classification
No ratings yet
Intro to Binary Classification
10 pages
Notes6 Classification
No ratings yet
Notes6 Classification
10 pages
Supervised Learning
No ratings yet
Supervised Learning
5 pages
Machine Learning and Pattern Recognition Bayesian Complexity Control
No ratings yet
Machine Learning and Pattern Recognition Bayesian Complexity Control
4 pages
Linear Regression
No ratings yet
Linear Regression
4 pages
Linear Regression
No ratings yet
Linear Regression
3 pages
Ds 8
No ratings yet
Ds 8
10 pages
Machine Learning Basics for Beginners
No ratings yet
Machine Learning Basics for Beginners
14 pages
Statistical Learning
No ratings yet
Statistical Learning
31 pages
Machine Learning UNIT-2: Logistic Regression
No ratings yet
Machine Learning UNIT-2: Logistic Regression
12 pages
Alok Bishoyi's Insights on Machine Learning
100% (1)
Alok Bishoyi's Insights on Machine Learning
7 pages
Maths For ML
No ratings yet
Maths For ML
156 pages
Matematics and Machine Learning
No ratings yet
Matematics and Machine Learning
156 pages
Overview of Key ML Algorithms
No ratings yet
Overview of Key ML Algorithms
16 pages
ML 2
No ratings yet
ML 2
155 pages
6.867 Lecture Notes: Section 1: Introduction: 1 Intro 2 2 Problem Class 3
No ratings yet
6.867 Lecture Notes: Section 1: Introduction: 1 Intro 2 2 Problem Class 3
10 pages
Module3-Fitting A Model To Data
No ratings yet
Module3-Fitting A Model To Data
57 pages
i2ML Cheatsheets
No ratings yet
i2ML Cheatsheets
7 pages
Unit 1
No ratings yet
Unit 1
92 pages
Unit 1 Part 3
No ratings yet
Unit 1 Part 3
11 pages
ML Unit 1
No ratings yet
ML Unit 1
73 pages
W8-Supervised Learning Methods
No ratings yet
W8-Supervised Learning Methods
30 pages
ML Merge
No ratings yet
ML Merge
145 pages
Machine Learning for Math Students
No ratings yet
Machine Learning for Math Students
60 pages
Machine Learning Techniques Overview
No ratings yet
Machine Learning Techniques Overview
73 pages
06 Lectureslides LinearClassification Fixed
No ratings yet
06 Lectureslides LinearClassification Fixed
52 pages
Ds 6
No ratings yet
Ds 6
21 pages
Datamining Lect12
No ratings yet
Datamining Lect12
75 pages
Chapter 5 Learning Deterministic Models
No ratings yet
Chapter 5 Learning Deterministic Models
28 pages
Machine Learning Tasks and Models Explained
No ratings yet
Machine Learning Tasks and Models Explained
22 pages
הרצאה-Classifiers and Decision Trees
No ratings yet
הרצאה-Classifiers and Decision Trees
119 pages
ML 01
No ratings yet
ML 01
24 pages
Log-Linear Models and Conditional Random Fieldsels
No ratings yet
Log-Linear Models and Conditional Random Fieldsels
27 pages
Linear Discriminant & SVM Explained
No ratings yet
Linear Discriminant & SVM Explained
41 pages
03-Linear Classification
No ratings yet
03-Linear Classification
17 pages
UNIT I-Part 2
No ratings yet
UNIT I-Part 2
35 pages
Machine Learning Models
0% (1)
Machine Learning Models
16 pages
Chapter 1
No ratings yet
Chapter 1
3 pages
Introduction to Classification Algorithms
No ratings yet
Introduction to Classification Algorithms
51 pages
Regression Analysis
No ratings yet
Regression Analysis
11 pages
Linear Regression Example
No ratings yet
Linear Regression Example
26 pages
Bayes Classifiers in Machine Learning
No ratings yet
Bayes Classifiers in Machine Learning
51 pages
Machine Learning
No ratings yet
Machine Learning
87 pages
Bayes Classifiers Overview by Ihler
No ratings yet
Bayes Classifiers Overview by Ihler
51 pages
Machine Learning HC
No ratings yet
Machine Learning HC
4 pages
DAC ML Tutorial Final Deck
No ratings yet
DAC ML Tutorial Final Deck
150 pages
DSA5102X Lecture1
No ratings yet
DSA5102X Lecture1
51 pages
CS115 01
No ratings yet
CS115 01
38 pages
Problem 1 Report Trần Minh Long 2052154 Final
No ratings yet
Problem 1 Report Trần Minh Long 2052154 Final
31 pages
ML LVC 1 Post-Session Summary
No ratings yet
ML LVC 1 Post-Session Summary
15 pages
2021 Logistic Regression
No ratings yet
2021 Logistic Regression
33 pages
Lecture 2 2022
No ratings yet
Lecture 2 2022
34 pages
W4 Ecs7020p
No ratings yet
W4 Ecs7020p
48 pages
The Hundred-Page Machine Learning Book - Andriy Burkov
No ratings yet
The Hundred-Page Machine Learning Book - Andriy Burkov
16 pages
Unit 2 ML - Ver 2
No ratings yet
Unit 2 ML - Ver 2
129 pages
Award in Education and Training Sample
No ratings yet
Award in Education and Training Sample
9 pages
Doing Business in Hungary
No ratings yet
Doing Business in Hungary
22 pages
Master of Science in Renewable Energy and Management
No ratings yet
Master of Science in Renewable Energy and Management
1 page
Biological Data Science Lecture6
No ratings yet
Biological Data Science Lecture6
29 pages
w2c Central Limit
No ratings yet
w2c Central Limit
1 page
Biological Data Science Lecture4
No ratings yet
Biological Data Science Lecture4
21 pages
BDS 2016-17
No ratings yet
BDS 2016-17
4 pages
W2e Multivariate Gaussian
No ratings yet
W2e Multivariate Gaussian
6 pages
MATH11183 Week 1-Part 2
No ratings yet
MATH11183 Week 1-Part 2
18 pages
BDS 2018-19
No ratings yet
BDS 2018-19
6 pages
Week 2 Naive Bayes
No ratings yet
Week 2 Naive Bayes
15 pages
MDA3S
No ratings yet
MDA3S
22 pages
Part 4
No ratings yet
Part 4
24 pages
State Space Models Overview
No ratings yet
State Space Models Overview
31 pages
TS Part2
No ratings yet
TS Part2
62 pages
Week 8 Pca
No ratings yet
Week 8 Pca
26 pages
Expectations in Probability Theory
No ratings yet
Expectations in Probability Theory
3 pages
PMRslides 03 B
No ratings yet
PMRslides 03 B
45 pages
Slides 03 A
No ratings yet
Slides 03 A
21 pages
W6a Gaussian Process Kernels
No ratings yet
W6a Gaussian Process Kernels
6 pages
Part 3
No ratings yet
Part 3
29 pages
PMRslides 02
No ratings yet
PMRslides 02
13 pages
Bio Statslectures
No ratings yet
Bio Statslectures
60 pages
w9b Netflix Prize
No ratings yet
w9b Netflix Prize
3 pages
Bayesian Workshop1 Solution
No ratings yet
Bayesian Workshop1 Solution
3 pages
Heat Advection
No ratings yet
Heat Advection
12 pages
2017 AMAM Exam Paper
No ratings yet
2017 AMAM Exam Paper
6 pages
Laplace Approximation in Bayesian Logistic Regression
No ratings yet
Laplace Approximation in Bayesian Logistic Regression
4 pages
2019 AMAM Exam Paper
No ratings yet
2019 AMAM Exam Paper
3 pages
Bayesian Week4 LectureNotes
No ratings yet
Bayesian Week4 LectureNotes
15 pages
B Tech
No ratings yet
B Tech
5 pages
Lesson 3.7 Descartes Rule of Signs
No ratings yet
Lesson 3.7 Descartes Rule of Signs
2 pages
Dynamic Order-Picking Optimization
No ratings yet
Dynamic Order-Picking Optimization
26 pages
Chapter - 2 - Introduction
No ratings yet
Chapter - 2 - Introduction
42 pages
Pele0889-00 Colored Dust Caps For S - o - SSM Services
No ratings yet
Pele0889-00 Colored Dust Caps For S - o - SSM Services
2 pages
Rolls-Royce: 250-C30 Series Operation and Maintenance
No ratings yet
Rolls-Royce: 250-C30 Series Operation and Maintenance
4 pages
STEM Paper Evaluation Rubric
No ratings yet
STEM Paper Evaluation Rubric
2 pages
Transmission Line Basics Guide
No ratings yet
Transmission Line Basics Guide
37 pages
List of Rbi Registered Micro Finance As On 31-01-2021
No ratings yet
List of Rbi Registered Micro Finance As On 31-01-2021
8 pages
BMC Filter Regeneration
No ratings yet
BMC Filter Regeneration
1 page
FOI Release: Home Affairs Docs
No ratings yet
FOI Release: Home Affairs Docs
88 pages
Bonete-López Et Al.,2021
No ratings yet
Bonete-López Et Al.,2021
11 pages
DON Purchase Card Program Guide 4.0
No ratings yet
DON Purchase Card Program Guide 4.0
77 pages
Process Selection
No ratings yet
Process Selection
8 pages
Study Plan - GMAT Focus Edition - Self Paced Course - XLSX - Live Classes
No ratings yet
Study Plan - GMAT Focus Edition - Self Paced Course - XLSX - Live Classes
3 pages
2025 All Series Zoomlion Brochure Printing Version中联重科工程机械产品绿皮书
No ratings yet
2025 All Series Zoomlion Brochure Printing Version中联重科工程机械产品绿皮书
51 pages
MQ 136
No ratings yet
MQ 136
6 pages
Interactive Schematic: This Document Is Best Viewed at A Screen Resolution of 1024 X 768
No ratings yet
Interactive Schematic: This Document Is Best Viewed at A Screen Resolution of 1024 X 768
26 pages
Conditionals
100% (4)
Conditionals
40 pages
BitLife How-To Escape Every Prison Guide - All Jail Layouts, Houdini & Jailbird Ribbon! - Pro Game Guides
100% (1)
BitLife How-To Escape Every Prison Guide - All Jail Layouts, Houdini & Jailbird Ribbon! - Pro Game Guides
1 page
Uchenna E. Duru: Senior Electrical Engineer
No ratings yet
Uchenna E. Duru: Senior Electrical Engineer
3 pages
IBP Functions
No ratings yet
IBP Functions
99 pages
Philippines' Population Growth
No ratings yet
Philippines' Population Growth
18 pages
Diana Cole (Author) - Parameter Redundancy and Identifiability (2020, Chapman and Hall - CRC) (10.1201 - 9781315120003) - Libgen - Li
No ratings yet
Diana Cole (Author) - Parameter Redundancy and Identifiability (2020, Chapman and Hall - CRC) (10.1201 - 9781315120003) - Libgen - Li
273 pages
7-Direct Seeding and Transplanting
No ratings yet
7-Direct Seeding and Transplanting
43 pages
312006-Basic Mechanical Engineering 281223
No ratings yet
312006-Basic Mechanical Engineering 281223
7 pages
Volvo Service
100% (2)
Volvo Service
316 pages
Credit Card Statements
No ratings yet
Credit Card Statements
4 pages
Ch.6 ACT112..
No ratings yet
Ch.6 ACT112..
18 pages
Cognitive Behavioural Therapy (CBT) Teach Yourself (Christine Wilding)
100% (1)
Cognitive Behavioural Therapy (CBT) Teach Yourself (Christine Wilding)
276 pages

Machine Learning and Pattern Recognition Week 3 Intro - Classification

Uploaded by

Machine Learning and Pattern Recognition Week 3 Intro - Classification

Uploaded by

Classification: Regression, Gaussians, and

# Train model on synthetic dataset

MLPR:w3a Iain Murray and Arno Onken, [Link] 1

# Predictions with alternative weights:

2 ‘One-hot’ or ‘one-of-K’ encodings

MLPR:w3a Iain Murray and Arno Onken, [Link] 2

MLPR:w3a Iain Murray and Arno Onken, [Link] 3

4 “Bayes classifiers” and “Naive Bayes”

P(x | y = k, θ ) = ∏ P(xd | y = k, θ ) = ∏ θd,k xd (1 − θd,k )1−xd , (8)

MLPR:w3a Iain Murray and Arno Onken, [Link] 4

MLPR:w3a Iain Murray and Arno Onken, [Link] 5

You might also like