1 Introduction
1 Introduction
Introduction
Machine learning
Machine learning approaches to data analysis
cannot be done without computers
Common with statistical modelling/analysis
• For prediction and classification
• Requires an optimisation procedure
• Obtain parameters or functions from observations
• Uncertainty of learning vs. prediction/classification
New elements or techniques
• Explanation or theoretical construct not emphasis
• Data can be ‘organic’, such as text, image
• Distinction of training vs. validation/test data
• Reliance on ready-made software for implementation
2
Some broad remarks
Supervised vs. unsupervised learning
• Target outcome y and covariates/features x?
NB. log-linear models of contingency tables
• Can the learned result be applied to unseen units?
NB. principal components, clustering
Prediction vs. classification
• Best prediction of y is its expectation µx = E(y | x)
2 2 2
E (y − µ) | x = (µx − µ) + E (y − µx) | x
3
Some broad remarks
Parametric vs. non-parametric models
• function/model f (x; θ) fixed given θ, i.e. parameters
f (x) = E(y | x) or f (x) = Pr(y | x)
• parametric if θ contains a fixed number of constants
NB. linear regression model as a typical example
• non-parametric if no. unknowns in θ grows with the
no. observations, or if f is indeterminate in advance
Error vs. residual
• Given f (x) = E(y | x) or y0 = arg max
′
Pr(y = y ′
| x), error
y
6
Additional exercise
16
14
12
y
10
8
6
4
6 8 10 12 14
7
Additional exercise
n
X n
X
β̂ = xi yi / x2i
i=1 i=1
V fˆ1(x) = V (β̂x) = x V (β̂) 2
n n n
2 2
X X X
2 2 2
x2i
= x V (yi | xi) xi / xi = x V (yi | xi)/
i=1 i=1 i=1
n
X
V̂ (yi | xi) = (yi − β̂xi)2/(n − 1)
i=1
V fˆ2(x) = V (y | x) = V (yi | xi) NB. non-existant V̂ (yi | xi)
K
X
fˆ(x) = yj (x)/K
j=1
XK
V fˆ(x) = V yj (x) /K 2
j=1
K
X 2
yj (x) − fˆ(x) /(K − 1) NB. from K obs.
V̂ yj (x) =
j=1
Assume unbiasedness in all the cases...