0% found this document useful (0 votes)

49 views22 pages

Session 02 - Regression - and - Classification

1) This session covers fundamental aspects of regression and classification models, including predicting sales based on advertising spending data using regression models. 2) Regression aims to predict a continuous target variable given input variables, while classification predicts a discrete class. Simple linear regression fits a linear relationship between a single input and output, while multiple regression uses multiple inputs. 3) Parameters in regression models are estimated to minimize the residual sum of squares between predicted and actual target values in the training data. This process fits the model to best predict new examples based on their input features.

Uploaded by

HGE05

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

49 views22 pages

Session 02 - Regression - and - Classification

Uploaded by

HGE05

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

26/06/2019

Session 2 – Regression and

Classification
Dr Ivan Olier
[email protected]

ECI – International Summer School /

Machine Learning
2019

In this session
• We will learn fundamental aspects about regression and classification models.

2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 2

1
26/06/2019

Example: Advertising data

• Consider the following dataset[*], which was collected to predict impact of advertising on
sales:
• Is there a relationship between
advertising budget and sales?
TV radio newspaper sales
• How strong is the relationship between
230.1 37.8 69.2 22.1
advertising budget and sales?
44.5 39.3 45.1 10.4
• Which media contribute to sales?
17.2 45.9 69.3 9.3
• How accurately can we predict future
151.5 41.3 58.5 18.5
sales?
180.8 10.8 58.4 12.9
• Is the relationship linear?
… … … … • Is there synergy among the advertising
media?

We should expect a model that predicts sales based on TV, radio, and
newspaper predictor variables:
𝑠𝑎𝑙𝑒𝑠 = 𝑓 𝑇𝑉, 𝑟𝑎𝑑𝑖𝑜, 𝑛𝑒𝑤𝑠𝑝𝑎𝑝𝑒𝑟
[*] http://www-bcf.usc.edu/~gareth/ISL/data.html
2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 3

Regression 400
Price?
• The goal of regression is to predict
Price (£) in 1000’s

300
the value of one or more
continuous target variables 𝑦 200
given the value of a p-dimensional
vector x of input variables. 100

Size = 1080
0
0 500 1000 1500 2000 2500

Size in feet2

A multivariate regression example

(2-dimensional inputs)

2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 4

2
26/06/2019

Simple (Univariate) linear regression

• We assume a model
𝑌 = 𝛽0 + 𝛽1 𝑋 + 𝜖

• where 𝛽0 and 𝛽1 are two unknown constants that represent the intercept and slope, also
known as coefficients or parameters, and 𝜖 is the error term.

• Given some estimates 𝛽መ0 and 𝛽መ1 for the 400

model coefficients, we predict a new

Price (£) in 1000’s

300
response 𝑦ො of a new input value 𝑥 using Price=225
200

𝑦ො = 𝛽መ0 + 𝛽መ1 𝑥 100

Size = 1080
0
• The hat symbol denotes an estimated value 0 500 1000 1500 2000 2500
Size in feet2

2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 5

Simple linear regression – Parameter estimation

• Let 𝑦ො𝑖 = 𝛽መ0 + 𝛽መ1 𝑥𝑖 be the prediction for 𝑌 based on the 𝑖th value of 𝑋. Then 𝑒𝑖 = 𝑦𝑖 − 𝑦ො𝑖
represents the 𝑖th residual.
• We define the residual sum of squares (RSS) as:
2 2
𝑅𝑆𝑆 = 𝑒12 + 𝑒22 + ⋯ + 𝑒𝑛2 = 𝑦1 − 𝛽መ0 + 𝛽መ1 𝑥𝑖 + ⋯ + 𝑦𝑛 − 𝛽መ0 + 𝛽መ1 𝑥𝑛

• The least squares approach chooses 𝛽መ0 and 𝛽መ1 to minimise the RSS. That is, we solve 𝛽መ0 and
𝛽መ1 from 𝜕𝑅𝑆𝑆 𝛽መ0 , 𝛽መ1 Τ𝜕𝛽መ0 = 0, and 𝜕𝑅𝑆𝑆 𝛽መ0 , 𝛽መ1 Τ𝜕𝛽መ1 = 0, respectively.
• The minimising values can be shown to be:
σ𝑛𝑖=1 𝑥𝑖 − 𝑥ҧ 𝑦𝑖 − 𝑦ത
𝛽መ1 = , 𝛽መ0 = 𝑦ത − 𝛽መ1 𝑥ҧ
σ𝑛𝑖=1 𝑥𝑖 − 𝑥ҧ 2

1 1
• Where 𝑦ത = 𝑛 σ𝑛𝑖=1 𝑦𝑖 , and 𝑥ҧ = 𝑛 σ𝑛𝑖=1 𝑥𝑖

2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 6

3
26/06/2019

Multiple (Multivariate) linear regression model

• Now, let’s assume 𝑋 𝑇 a vector of inputs 𝑋1 , … , 𝑋𝑃 .
A multiple linear regression model is defined as
𝑃

𝑌 = 𝛽0 + ෍ 𝑋𝑗 𝛽𝑗 + 𝜖
𝑗=1

• Or, if 1 is included in 𝑋,
𝑃

𝑌 = ෍ 𝑋𝑗 𝛽𝑗 + 𝜖 = 𝑋 𝑇 𝛽 + 𝜖
𝑗=0
• Predicting a new output variable value, assuming 𝛽መ is
already estimated, is given by

𝑌෠ = 𝑋 𝑇 𝛽መ
2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 7

Multiple linear regression – Estimation of parameters

• How do we fit the multiple linear model to a set of training data? Again, least squares is an
option.
• It estimates 𝛽መ by minimising the residual sum of squares
𝑛 𝑛
2 𝑇
𝑅𝑆𝑆 𝛽መ = ෍ 𝜖𝑖2 = ෍ 𝑦𝑖 − 𝑥𝑖𝑇 𝛽መ = 𝐲 − 𝐗𝛽መ 𝐲 − 𝐗𝛽መ
𝑖=1 𝑖=1

• Where 𝐗, 𝐲 is the training dataset, 𝐗, an 𝑛 × 𝑝 matrix, 𝐲, an 𝑛-vector of outputs.

𝑇
• … and 𝛽መ = 𝛽መ0 , 𝛽መ1 , … , 𝛽መ𝑝 a vector of model parameters.
• Since RSS is a quadratic function of the parameters, there is the guarantee of at least a local
minimum, although it may not be unique. So, differentiating w.r.t. 𝛽, equating to 0 and
solving 𝛽, we get
𝛽መ = 𝐗 𝑇 𝐗 −1 𝐗 𝑇 𝐲
2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 8

4
26/06/2019

Example – Advertising data

• Suggested multiple linear regression model:
𝑠𝑎𝑙𝑒𝑠 = 𝛽0 + 𝛽1 ∙ 𝑇𝑉 + 𝛽2 ∙ 𝑟𝑎𝑑𝑖𝑜 + 𝛽3 ∙ 𝑛𝑒𝑤𝑠𝑝𝑎𝑝𝑒𝑟 + 𝜖

2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 9

Performance metrics for regression

• Residual standard error – average amount that the response will deviate from the true.

𝑛
1 1 2
𝑅𝑆𝐸 = 𝑅𝑆𝑆 = ෍ 𝑦𝑖 − 𝑦ො𝑖
𝑛−2 𝑛−2
𝑖=1

• R-squared – fraction of variance explained:

𝑅𝑆𝑆
𝑅2 = 1 −
𝑇𝑆𝑆

Where 𝑇𝑆𝑆 = σ𝑁 ത 2 is the total sum of squares.

𝑖=1 𝑦𝑖 − 𝑦
• If the regression model is simple linear, then 𝑅2 = 𝑟 2 , where 𝑟 is the correlation between 𝑋
and 𝑌:

2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 10

5
26/06/2019

Some important questions

1. Is at least one of the predictors 𝑋1 , 𝑋2 , … , 𝑋𝑝 useful in predicting the response?
2. Do all the predictors help to explain 𝑌 , or is only a subset of the predictors useful?
3. How well does the model fit the data?
4. Given a set of predictor values, what response value should we predict, and how accurate
is our prediction?

2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 11

Interpreting regression coefficients

• A multiple regression model
𝑃

𝑌 = 𝛽0 + ෍ 𝑋𝑗 𝛽𝑗 + 𝜖
𝑗=1

• We interpret 𝛽𝑗 as the average effect on 𝑌 of a one unit increase in 𝑋𝑗 , holding all other
predictors fixed.

2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 12

6
26/06/2019

Issues
• The ideal scenario is when the predictors are uncorrelated – a balanced design:
• Each coefficient can be estimated and tested separately.
• Interpretations such as “a unit change in 𝑋𝑗 is associated with a 𝛽𝑗 change in 𝑌 , while all
the other variables stay fixed”, are possible.
• Correlations amongst predictors cause problems:
• The variance of all coefficients tends to increase, sometimes dramatically.
• Interpretations become hazardous – when 𝑋𝑗 changes, everything else changes.

2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 13

Categorical predictors
• A categorical (or qualitative, or factor) predictor takes categorical values (i.e. levels with no
particular order) only.
• Examples: gender (female, male), marital status (single, married, etc), ethnicity
(Caucasian, African American, Asian).

• How to code categorical predictors?

• WRONG WAY – Create a numerical variable, assign an integer value to each level. This
wrongly creates an order between the levels.
• CORRECT WAY – Having 𝑚 levels, create (𝑚 − 1) binary variables, one for each level.
Each variable can take 1 to represent a corresponding level. There is one level without
variable, it is represented when all the binary variables take 0. It is called baseline.

2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 14

7
26/06/2019

Example – Coding BMI

• Let’s assume we have a categorical variable for BMI with levels:
BMI
Underweight Normal Overweight Obese
• It must be coded using three new binary variables as follows:
New binary variable value meaning
BMI[Normal] 1 BMI is normal
0 BMI is not normal
BMI[Overweight] 1 BMI is overweight
0 BMI is not overweight
BMI[Obese] 1 BMI is obese
0 BMI is not obese
• The baseline, which is not a variable, represents the absence of the coded levels.
Baseline (not a BMI[Normal] = 0, BMI is neither normal,
variable) BMI[Overweight] = 0, & nor overweight, nor
BMI[Obese] = 0 obese.
2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 15

Non-linear regression models

• The truth is never linear! Or almost never!
• But often the linearity assumption is good enough.

When its not …

• polynomials,
• step functions,
• splines,
• local regression, and
• generalised additive models
offer a lot of flexibility, without losing the ease and interpretability of linear models.

2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 16

8
26/06/2019

Polynomial regression
• General form:

Degree-4 polynomial • Create new variables 𝑋1 = 𝑋, 𝑋2 = 𝑋 2 , etc and then

treat as multiple linear regression.
• Coefficient values are less relevant. Predictions only:

• Confidence intervals are estimated point-wise:

• We either fix the degree d at some reasonably low value,

else use cross-validation to choose d.
2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 17

Piecewise polynomials
• Instead of a single polynomial in 𝑋
over its whole domain, we can rather
use different polynomials in regions
denoted by knots.

𝛽01 + 𝛽11 𝑥𝑖 + 𝛽21 𝑥𝑖2 + ⋯ + 𝜖 𝑥𝑖 < 𝑐

𝑦𝑖 = ൝
𝛽02 + 𝛽12 𝑥𝑖 + 𝛽22 𝑥𝑖2 + ⋯ + 𝜖 𝑥𝑖 ≥ 𝑐

• Better to add constraints to the

polynomials, e.g. continuity.
• Splines have the “maximum” amount
of continuity.

2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 18

9
26/06/2019

Generalised additive models (GAM)

• Allows for flexible nonlinearities in several variables, but retains the additive structure of
linear models.

𝑦𝑖 = 𝛽0 + 𝑓1 𝑥𝑖1 + 𝑓2 𝑥𝑖2 + ⋯ + 𝑓𝑝 𝑥𝑖𝑝 + 𝜖

• It is called an additive model because we calculate a separate 𝑓𝑗 for each 𝑋𝑗 and then add
together all of their contributions.
• e.g. let’s assume the following model

wage = 𝛽0 + 𝑓1 year + 𝑓2 age + 𝑓3 education + 𝜖

• Where year and age are numeric, and education, categorical with levels (“<HS”, ”HS”,
“<Coll”, “Coll”)

2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 19

Generalised additive models (GAM)

• Fitting year and age with natural splines, and education uses a step function.

2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 20

10
26/06/2019

Some considerations with GAMs

• GAMs allow us to fit a non-linear 𝑓𝑗 to each 𝑋𝑗 , so that we can automatically model non-
linear relationships that standard linear regression will miss.
• The non-linear fits can potentially make more accurate predictions for the response 𝑌.
• Because the model is additive, we can still examine the effect of each 𝑋𝑗 on 𝑌 individually.
• The main limitation of GAMs is that the model is restricted to be additive. With many
variables, important interactions can be missed.
• However, as with linear regression, we can manually add interaction terms to the GAM
model by including additional predictors of the form 𝑋𝑗 × 𝑋𝑘 .
• In addition, we can add low-dimensional interaction functions of the form 𝑓𝑗𝑘 𝑋𝑗 , 𝑋𝑘 into
the model.

2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 21

Classification
• Qualitative variables take values in an unordered set 𝒞, such as:

• Given a feature vector 𝑋 and a qualitative response 𝑌 taking values in the set 𝒞, the
classification task is to build a function 𝐶 𝑋 that takes as input the feature vector 𝑋 and
predicts its value for 𝑌; i.e. 𝐶 𝑋 ∈ 𝒞.
• Often we are more interested in estimating the probabilities that 𝑋 belongs to each
category in 𝐶.
• For example, it is more valuable to have an estimate of the probability that an insurance
claim is fraudulent, than a classification fraudulent or not.

2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 22

11
26/06/2019

Example: Credit Card Default

Can we predict whether a costumer will default?

2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 23

Logistic regression
• Let's write 𝑝 𝑋 = Pr 𝑌 = 1 𝑋 for short and consider using balance to predict default.
• Logistic regression uses the form
(𝑒 ≈ 2.71828 is the Euler's number.)

• It is easy to see that no matter what values 𝛽0 , 𝛽1 Logit function

or 𝑋 take, 𝑝(𝑋) will have values between 0 and 1.
• A bit of rearrangement gives

• This monotone transformation is called the log

odds or logit transformation of 𝑝(𝑋). (by log we
mean natural log: ln.)
2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 24

12
26/06/2019

Probabilities and odds

• The odds of an event are commonly used in betting circles.
• For example, a bookmaker may offer odds of 10 to 1 that Arsenal Football Club will be
champions of the Premiership this season.
• This means that the bookmaker considers the probability that Arsenal will not be
champions is 10 times the probability that they will be.
• Odds and probabilities:
• The odds of event A are defined as the probability that A does happen divided by the
probability that it does not happen:
Pr(𝐴)
Odds 𝐴 =
1 − Pr 𝐴
• Odd ratios (ORs):
Odds(𝐴)
OR =
Odds(𝐵)
2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 25

Making predictions

2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 26

13
26/06/2019

Making predictions

2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 27

Multiple logistic regression

2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 28

14
26/06/2019

Multinomial regression
• So far we have discussed logistic regression with two classes. It is easily generalised to more
than two classes. One version (used in the R package glmnet) has the symmetric form

• Here there is a linear function for each class.

• Only 𝐾 − 1 linear functions are needed as in 2-class logistic regression.
• Multiclass logistic regression is also referred to as multinomial regression.

2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 29

Bayes' theorem
• Bayes' theorem is stated as the following equation:

𝑃 𝐵𝐴 𝑃 𝐴
𝑃 𝐴𝐵 =
𝑃 𝐵
Where 𝐴 and 𝐵 are events, and 𝑃(𝐵) ≠ 0

• 𝑃(𝐴) and 𝑃(𝐵) are the probabilities of observing 𝐴 and 𝐵 without regard to each other
• 𝑃 𝐴 𝐵 a conditional probability, is the probability of observing event 𝐴 given that 𝐵 is true.
• 𝑃 𝐵 𝐴 is the probability of observing event 𝐵 given that 𝐴 is true.

• Bayes' theorem is the key to using new observations to modify prior beliefs.

2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 30

15
26/06/2019

Bayes' theorem 𝐻: Hypothesis

𝑒: evidence

Likelihood Prior probability

How probable is the evidence How probable was our hypothesis
given that our hypothesis is true? before observing the evidence?

𝑃 𝑒𝐻 𝑃 𝐻
𝑃 𝐻𝑒 =
𝑃 𝑒

Posterior probability Marginal likelihood

How probable is our hypothesis How probable is the new evidence
given the observed evidence? under all possible hypotheses?
(Not directly computable) 𝑃 𝑒 = ෍ 𝑃 𝑒 𝐻𝑖 𝑃(𝐻𝑖 )
𝑖

Posterior probability  Prior probability x Likelihood

2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 31

Bayes – understanding the posterior

How the prior has an impact on the evidence, which is reflected in the posterior

2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 32

16
26/06/2019

Naïve Bayes model It is typically used for classification

- Taking an instance for which we have
Naïve Bayes model observed a number of features 𝑥1 … 𝑥𝑛
- Goal: infer to which class the particular
Class Hidden instance belongs to
- Assumptions: every set of features 𝑥𝑖 , 𝑥𝑗
are conditionally independent given the
𝑥1 𝑥2 … 𝑥𝑛 class:
𝑥𝑖 ⊥ 𝑥𝑗 𝐶) for all 𝑥𝑖 , 𝑥𝑗
Observed

The distribution assumed over

𝑃 𝐶 𝑃 𝑥1 , … , 𝑥𝑛 |𝐶 𝑃 𝑥𝑖 |𝐶 will give the different
𝑃 𝐶|𝑥1 , … , 𝑥𝑛 = Naïve Bayes models, e.g.
𝑃(𝑥1 , … , 𝑥𝑛 ) Bernoulli, Multinomial, etc.

2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 33

Discriminant Analysis
• Here the approach is to model the distribution of 𝑋 in each of the classes separately, and
then use Bayes theorem to flip things around and obtain Pr(𝑌|𝑋).
• When we use normal (Gaussian) distributions for each class, this leads to linear or quadratic
discriminant analysis.
• However, this approach is quite general, and other distributions can be used as well. We will
focus on normal distributions.

2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 34

17
26/06/2019

Bayes theorem for discriminant analysis

• Bayes theorem can be used to estimate Pr(𝑌|𝑋) as follows:

Pr 𝑋 = 𝑥 𝑌 = 𝑘) Pr 𝑌 = 𝑘
Pr 𝑌 = 𝑘 𝑋 = 𝑥 =
Pr 𝑋 = 𝑥

• One writes this slightly differently for discriminant analysis:

𝜋𝑘 𝑓𝑘 𝑥
Pr 𝑌 = 𝑘 𝑋 = 𝑥 =
σ𝐾
𝑙=1 𝜋𝑙 𝑓𝑙 𝑥

• 𝑓𝑘 (𝑥) = Pr 𝑋 = 𝑥 𝑌 = 𝑘 is the density for 𝑋 in class 𝑘. Here we will use normal densities
for these, separately in each class.
• 𝜋𝑘 = Pr 𝑌 = 𝑘 is the marginal or prior probability for class 𝑘.

2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 35

Classify to the highest density

• We classify a new point according to which density is highest.

• When the priors are different, we take them into account as well, and compare 𝜋𝑘 𝑓𝑘 𝑥 . On
the right, we favour the pink class – the decision boundary has shifted to the left.
2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 36

18
26/06/2019

Why discriminant analysis?

• When the classes are well-separated, the parameter estimates for the logistic regression
model are surprisingly unstable. Linear discriminant analysis does not suffer from this
problem.
• If 𝑛 is small and the distribution of the predictors 𝑋 is approximately normal in each of the
classes, the linear discriminant model is again more stable than the logistic regression
model.
• Linear discriminant analysis is popular when we have more than two response classes,
because it also provides low-dimensional views of the data.

2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 37

Other forms of discriminant analysis

• When 𝑓𝑘 𝑥 are Gaussian densities, with the same covariance matrix in each class, this
leads to linear discriminant analysis.
• By altering the forms for 𝑓𝑘 𝑥 , we get different classifiers.
• With Gaussians but different 𝚺𝑘 in each class, we get quadratic discriminant analysis.
𝑝
• With 𝑓𝑘 𝑥 = ς𝑗=1 𝑓𝑗𝑘 𝑥𝑗 (conditional independence model) in each class we get
naive Bayes. For Gaussian this means the 𝚺𝑘 are diagonal.
• Many other forms, by proposing specific density models for 𝑓𝑘 𝑥 , including
nonparametric approaches.

2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 38

19
26/06/2019

Logistic regression, LDA, and Naïve Bayes

• Logistic regression is very popular for classification, especially when K = 2.
• LDA is useful when n is small, or the classes are well separated, and Gaussian assumptions
are reasonable. Also when K > 2.
• Naïve Bayes is useful when p is very large.

2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 39

GAMs for classification

• As in regression models, we can build non-linear classification models using basis functions
𝑓𝑘 (𝑋𝑗 ) and add them together to make predictions: generalised additive models (GAMs) for
classification.

2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 40

20
26/06/2019

k-Nearest neighbour (kNN)

• It is a memory-based classification
method (i.e. require no model to be fit).

• Algorithm:
1. Given a query point 𝑥0, find the k
training points 𝑥(𝑟) , 𝑟 = 1, … , 𝑘
closest in distance to 𝑥0.
2. Classify using majority vote among
the k neighbours.

2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 41

Decision boundaries
• K-NN algorithm does not explicitly compute decision Decision boundary of a 2-class 2-
boundaries. dimensional problem using 7-NN
• The more examples that are stored, the more
complex the decision boundaries can become.
• K-NN heavily suffers from the curse of
dimensionality
• Suppose we have 5000 points uniformly distributed
in the unit hypercube and we want to apply the 5-
NN.
• Suppose our query point is at the origin:
• 1D – on a one dimensional line, we must go a
distance of 5/5000=0.001 on average to capture
the 5 nearest neighbours.
• 2D – in two dimensions, we must go 0.001 to
get a square that contains 0.001 of the volume.
• pD – in p dimensions, we must go 0.0011/𝑝 !
2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 42

21
26/06/2019

K-Nearest Neighbours
• Advantages: • Disadvantages:
• Simple technique that is easily • Classifying unknown records are
implemented. relatively expensive
• Building model is inexpensive . • Requires distance computation of k-
• Extremely flexible classification nearest neighbours
scheme • Computationally intensive,
• does not involve preprocessing especially when the size of the
• Well suited for training set grows
• Multi-modal classes (classes of • Accuracy can be severely degraded by
multiple forms) the presence of noisy or irrelevant
features
• Records with multiple class labels
• NN classification expects class
• Asymptotic Error rate at most twice conditional probability to be locally
Bayes rate constant
• Cover & Hart paper (1967) • bias of high dimensions
• Can sometimes be the best method

2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 43

Summary
1. We learnt about regression models and several regression methods: simple and
multiple linear regression, and non-linear regression (polynomial, splines, GAMs)

2. We learnt about classification models and several classification methods: simple

and multiple logistic regression, discriminant analysis, GAMs, Naïve Bayes, and K-
nearest neighbour.

3. We also learnt how to interpret the coefficients of linear and logistic regression
models.

2019 - ECI - International Summer School/Machine Learning - Dr Ivan Olier 44

Sarah B Pomeroy - Ancient Greece
91% (22)
Sarah B Pomeroy - Ancient Greece
491 pages
MATH6183 Introduction+Regression
No ratings yet
MATH6183 Introduction+Regression
70 pages
Ms 236 N 0
No ratings yet
Ms 236 N 0
63 pages
Cover Letter Bureau Veritas
No ratings yet
Cover Letter Bureau Veritas
14 pages
(Slide) Non Linear Regression
No ratings yet
(Slide) Non Linear Regression
39 pages
Regression
No ratings yet
Regression
44 pages
W1.3 Regression 2
No ratings yet
W1.3 Regression 2
28 pages
W2 Ecs7020p
No ratings yet
W2 Ecs7020p
54 pages
2.1 Regression Analysis
No ratings yet
2.1 Regression Analysis
28 pages
Linear Regression
No ratings yet
Linear Regression
31 pages
Lecture - 8 MLR
No ratings yet
Lecture - 8 MLR
63 pages
Mult Regression
No ratings yet
Mult Regression
28 pages
Linear Regression Notes
No ratings yet
Linear Regression Notes
25 pages
3 Unit - Dspu
No ratings yet
3 Unit - Dspu
23 pages
Welcome To:: Multiple Regression and Model Building
No ratings yet
Welcome To:: Multiple Regression and Model Building
20 pages
5 - AML Lecture 5 - Linear Regression
No ratings yet
5 - AML Lecture 5 - Linear Regression
56 pages
Chapter - 2 - Linear and Logistic Regression
No ratings yet
Chapter - 2 - Linear and Logistic Regression
34 pages
BS Classes V2
No ratings yet
BS Classes V2
70 pages
Unit 3c Linear Regression
No ratings yet
Unit 3c Linear Regression
98 pages
Multiple Linear Regression
100% (1)
Multiple Linear Regression
33 pages
Machine Learning Lecture 1
No ratings yet
Machine Learning Lecture 1
5 pages
Lecture 8
No ratings yet
Lecture 8
29 pages
Simple Linear Regression, Cont.: BIOST 515 January 13, 2004
No ratings yet
Simple Linear Regression, Cont.: BIOST 515 January 13, 2004
23 pages
ML EasySol
No ratings yet
ML EasySol
62 pages
2 Linear
No ratings yet
2 Linear
15 pages
AM Lecture10
No ratings yet
AM Lecture10
27 pages
2 - (9-3) Regression Classifiers
No ratings yet
2 - (9-3) Regression Classifiers
35 pages
5.linear Regression
No ratings yet
5.linear Regression
39 pages
(Reformatted) Module 5 (Students)
No ratings yet
(Reformatted) Module 5 (Students)
32 pages
Stat 353 Study Guide
No ratings yet
Stat 353 Study Guide
44 pages
3 Regression Diagnostics
100% (1)
3 Regression Diagnostics
53 pages
Regression III: Advanced Methods: William G. Jacoby Department of Political Science
No ratings yet
Regression III: Advanced Methods: William G. Jacoby Department of Political Science
21 pages
LinearRegression1 210720 171800
No ratings yet
LinearRegression1 210720 171800
41 pages
U3 U4 Regression
No ratings yet
U3 U4 Regression
22 pages
Multiple Linear Regression & Nonlinear Regression Models
No ratings yet
Multiple Linear Regression & Nonlinear Regression Models
51 pages
Regression Analysis (AI)
No ratings yet
Regression Analysis (AI)
9 pages
11 - Econometrics - Linear Regression
No ratings yet
11 - Econometrics - Linear Regression
20 pages
Regression Model and Its Applications
100% (1)
Regression Model and Its Applications
30 pages
Module05 Notes
No ratings yet
Module05 Notes
19 pages
Regression - Part III - 2021
No ratings yet
Regression - Part III - 2021
55 pages
Linear Regression PDF
100% (1)
Linear Regression PDF
32 pages
AAI Lecture 10 SP 25
No ratings yet
AAI Lecture 10 SP 25
37 pages
AI Lec 2
No ratings yet
AI Lec 2
49 pages
1.linear Regression PSP
No ratings yet
1.linear Regression PSP
92 pages
Regression Questionnaire
No ratings yet
Regression Questionnaire
10 pages
Mod 3C
No ratings yet
Mod 3C
36 pages
Paper On Polynomial Regression
No ratings yet
Paper On Polynomial Regression
7 pages
Session 01 - Introduction
No ratings yet
Session 01 - Introduction
28 pages
Lecture 9-10 - Regression and Classification Cognitive
No ratings yet
Lecture 9-10 - Regression and Classification Cognitive
61 pages
SIDDHANT VIJAY 2K20 CH 65 Sem 5
No ratings yet
SIDDHANT VIJAY 2K20 CH 65 Sem 5
29 pages
Simple Regression Model: Erbil Technology Institute
No ratings yet
Simple Regression Model: Erbil Technology Institute
9 pages
Multiple Linear Regression
No ratings yet
Multiple Linear Regression
18 pages
SimpleMultipleLinearRegression FoundationalMathofAI S24
No ratings yet
SimpleMultipleLinearRegression FoundationalMathofAI S24
6 pages
Lec03 MultLinRegression
No ratings yet
Lec03 MultLinRegression
42 pages
Week 2
No ratings yet
Week 2
43 pages
AA3 - Linear Regression - 2024
No ratings yet
AA3 - Linear Regression - 2024
26 pages
chp6 (10) Fam
No ratings yet
chp6 (10) Fam
24 pages
Sta 3
No ratings yet
Sta 3
9 pages
Econometrics I Lecture 3 Wooldridge
No ratings yet
Econometrics I Lecture 3 Wooldridge
50 pages
The Strategic Digital Media Entrepreneur
From Everand
The Strategic Digital Media Entrepreneur
Penelope M. Abernathy
No ratings yet
Global Expansion Remote Visibility for Scale-Ups: Guidelines for Building Visibility and Lead-Generation Activities during International Expansion
From Everand
Global Expansion Remote Visibility for Scale-Ups: Guidelines for Building Visibility and Lead-Generation Activities during International Expansion
Jan Buis
No ratings yet
Modern Asset Allocation for Wealth Management
From Everand
Modern Asset Allocation for Wealth Management
David M. Berns
No ratings yet
Examenes Corte 1 y 3
No ratings yet
Examenes Corte 1 y 3
40 pages
Coulomb, Potencial, Campo, Densidad (V, S, L) 2
No ratings yet
Coulomb, Potencial, Campo, Densidad (V, S, L) 2
13 pages
Coulomb, Potencial, Campo, Densidad (V, S, L)
No ratings yet
Coulomb, Potencial, Campo, Densidad (V, S, L)
7 pages
Teorema de Divergencia
No ratings yet
Teorema de Divergencia
5 pages
Coulomb, Potencial, Campo, Densidad (V, S, L)
No ratings yet
Coulomb, Potencial, Campo, Densidad (V, S, L)
7 pages
HandsOn 3. Sensor Data
No ratings yet
HandsOn 3. Sensor Data
3 pages
Hands-On Activity: 3. Exploring The Array Data Model of An Image
No ratings yet
Hands-On Activity: 3. Exploring The Array Data Model of An Image
3 pages
Hands-On Activity: 2. Exploring The Semi-Structured Data Model of JSON
No ratings yet
Hands-On Activity: 2. Exploring The Semi-Structured Data Model of JSON
3 pages
Session 04 - Tree-Based Methods
No ratings yet
Session 04 - Tree-Based Methods
25 pages
Big Data Computing
No ratings yet
Big Data Computing
57 pages
Session 03 - Neural Networks
No ratings yet
Session 03 - Neural Networks
21 pages
Big Data Computing: Working With Data Models and Big Data Processing
No ratings yet
Big Data Computing: Working With Data Models and Big Data Processing
46 pages
Low-Cost Strategy in The Air Air Arabia
No ratings yet
Low-Cost Strategy in The Air Air Arabia
15 pages
Open Hole Logging Costs ( ) : Platform Express
No ratings yet
Open Hole Logging Costs ( ) : Platform Express
8 pages
BSBSUS401 Assess 1 ProjecT
No ratings yet
BSBSUS401 Assess 1 ProjecT
16 pages
Online Car Driving School Management System-1
No ratings yet
Online Car Driving School Management System-1
35 pages
Jama Caricchio 2021 Oi 210064 1626283669.23567
No ratings yet
Jama Caricchio 2021 Oi 210064 1626283669.23567
10 pages
KEI HW List Price - 15th Feb 2025
No ratings yet
KEI HW List Price - 15th Feb 2025
1 page
Instant Download Wrist Diagnosis and Operative Treatment 2nd Edition The Wei Zhi PDF All Chapter
100% (2)
Instant Download Wrist Diagnosis and Operative Treatment 2nd Edition The Wei Zhi PDF All Chapter
24 pages
Fixed Displacement Vane Pumps Datasheet
No ratings yet
Fixed Displacement Vane Pumps Datasheet
6 pages
Great Quotes From Zig Ziglar PDF
100% (4)
Great Quotes From Zig Ziglar PDF
51 pages
Ebook Global Talent Management
No ratings yet
Ebook Global Talent Management
216 pages
Apinayé Art: A Case Study in A Brazilian Indigenous School
No ratings yet
Apinayé Art: A Case Study in A Brazilian Indigenous School
23 pages
Autocad 2008 Features and Benifits
No ratings yet
Autocad 2008 Features and Benifits
7 pages
CBSE Class 9 Social Science Economics Notes Chapter 1 The Story of Village Palampur
No ratings yet
CBSE Class 9 Social Science Economics Notes Chapter 1 The Story of Village Palampur
3 pages
Information Brochure Diploma Certificate Courses
No ratings yet
Information Brochure Diploma Certificate Courses
12 pages
Audi A4 Avant 95-01 Service & Repair Manual - Heating and AC
No ratings yet
Audi A4 Avant 95-01 Service & Repair Manual - Heating and AC
231 pages
Interior Design Final
No ratings yet
Interior Design Final
11 pages
Sapcon Folder
No ratings yet
Sapcon Folder
4 pages
Document
No ratings yet
Document
5 pages
India's Consumer Durables Market
No ratings yet
India's Consumer Durables Market
5 pages
Apacible - NCM118 LP1 Introduction
No ratings yet
Apacible - NCM118 LP1 Introduction
6 pages
Presentation 1 Adjectives-1
No ratings yet
Presentation 1 Adjectives-1
13 pages
A Study On Employer Employee Relationship With Special Reference To Shree Devi Textile, Coimbatore
No ratings yet
A Study On Employer Employee Relationship With Special Reference To Shree Devi Textile, Coimbatore
5 pages
Washing Machine Owner's Instructions: B1485AV/ B1285AV/ B1285AS/ B1285A/ B1085A/ R1285AV/ R1085A/ F1285AV/ F1085A
No ratings yet
Washing Machine Owner's Instructions: B1485AV/ B1285AV/ B1285AS/ B1285A/ B1085A/ R1285AV/ R1085A/ F1285AV/ F1085A
22 pages
Reciepes
No ratings yet
Reciepes
10 pages
2024 Memo 16 Conduct of School Intramurals
No ratings yet
2024 Memo 16 Conduct of School Intramurals
7 pages
138 Modeling Stochastic Wind - Loads - On Vertical - Axis Wind Turbines VEERS SANDIA
No ratings yet
138 Modeling Stochastic Wind - Loads - On Vertical - Axis Wind Turbines VEERS SANDIA
20 pages
Fibonacci
No ratings yet
Fibonacci
2 pages
BHEL Application
No ratings yet
BHEL Application
6 pages