Machine Learning and Deep Learning Supervised Learning 1682688720
Machine Learning and Deep Learning Supervised Learning 1682688720
John Pinney
November 2020
Intended learning outcomes
After attending the three sessions of this
workshop, you will be better able to:
• Allows us to
• detect or learn structures and relationships in data.
• assign observations to different classes.
• make predictions based on current knowledge.
Some essential vocabulary…
• vector
A quantity within a multidimensional space.
Some essential vocabulary…
• function
A mapping from one vector space (input) to another (output).
Some essential vocabulary…
• optimisation
A procedure that attempts to find the minimum (or maximum)
of a function.
A ‘machine’ has inputs and outputs.
machine
training data
What are the features and what are their data types?
Unsupervised learning
Unsupervised learning
• In unsupervised learning, we are looking for structure in
the inputs without any knowledge of associated
outputs: the data are considered to be unlabelled.
• Examples include:
• Dimensionality reduction, e.g. principal component analysis
• Self-organising map
• Clustering
Clustering
To look for structure within a dataset, we often make use
of clustering techniques.
0
0
2
2 1
machine
1
1
2
2
output cluster
input dataset, x labels, y
Clustering
• Feature-based clustering
takes as input the set of input feature vectors.
• Distance-based clustering
takes as input a matrix of distances that are
calculated between each pair of input feature vectors.
e.g. Euclidean distance.
machine ? predictions
?
Two types of supervised learning
y is a continuous value
=> Regression
(estimate the response to a given input)
machine ? predictions
?
Linear regression
Fitting is an optimisation
procedure: e.g. minimise the
sum of squared errors.
Linear regression example
With the iris dataset:
Considering only iris virginica:
1. Split the data into training and testing sets.
2. Use linear regression to predict sepal length
from petal length.
Linear regression exercise
With the abalone dataset:
Considering only adults:
1. Split the data into training and testing sets.
2. Use linear regression to predict rings from
the numerical features.
Linear regression with many features
machine ? predictions
?
Logistic regression
Confusingly, logistic regression is an algorithm for
classification.
Hours of study
Logistic regression
Probability of passing exam
Hours of study
Logistic regression
Probability of passing exam
Hours of study
Logistic regression example
With the iris dataset:
Considering iris versicolor and virginica:
1. Split the data into training and testing sets.
2. Predict iris (the species) from petal length.
3. Use a confusion matrix to examine the
results.
4. Do the results improve if the other numerical
features are included?
Do the same for a three-class logistic regression.
Logistic regression exercise
With the kickstarter dataset:
1. Split the data into training and testing sets.
2. Predict funded from the numerical features.
3. Use a contingency table to examine the
results.
4. How could we make use of the type feature,
which is a nominal data type?
‘One-hot’ encoding
Useful for converting a categorical variable into
multiple binary features, which can be used in
algorithms that require numerical inputs.
What about non-linear classification?
What about non-linear classification?
Visited GP Did not visit GP Visited GP
Decision tree
John Pinney
November 2020
Intended learning outcomes
After attending this workshop, you will be
better able to:
(ground truth)
(predictions)
sensitivity (recall) = TP / N+
specificity = TN / N-
precision = TP /𝑵+
accuracy = (TP + TN ) / N
Receiver operating characteristic
There is always a trade-off between sensitivity and specificity.
If a method reports some kind of probability score for its
predictions then we could adjust a threshold to tune between
maximum sensitivity and maximum specificity.
1 - specificity
Classification metrics exercise
With the breast cancer dataset,
Compare the performance of a tree and a logistic
regression for the task of predicting recurrence.
Visualise the performance of the two methods
(over 5-fold validation) on a ROC curve.
Improving performance
Bias versus variance
These terms have a special meaning when talking about
about ML performance.
Consider a set of regression predictions made from your
testing dataset. There are two different ways in which
these can be wrong:
high bias => the model is too simple to describe the
training data well (under-fitting), so there is a systematic
error in the predictions.
high variance => the model tries to replicate the training
data in too much detail (over-fitting), leading to a lot of
noise in the predictions.
Bias versus variance
Bias versus variance
What can we do to improve
performance?
What can we do to improve
performance?
A larger training dataset (if available) will reduce bias.
John Pinney
November 2020
Intended learning outcomes
After attending this workshop, you will be
better able to:
signal
processing
multiple
inputs
Perceptron (1958)
weighted
sum of
inputs
multiple single
inputs output:
0 or 1
arc
bias provides a
threshold
activation
function
Single-layer network
Multi-layer network A feedforward network
contains no directed loops
Multi-layer networks
• Use multiple hidden layers to create more complex
models => can solve more complex problems.
• The model will “discover” its own representation of
the data in a way that best fits the learning task.
• Neurons in hidden layers can therefore represent
complex features without any need to engineer these
a priori.
• This can be very powerful if enough training data is
available.
Fitting a neural network model
• To train our model, we need to find values for the
weights, w. (We can incorporate the bias b as an
additional weight w0).
z=
zero gradient everywhere (except z=0)
means backpropagation won’t work!
Solution: continuous activation functions
These continuous
activation functions
(and many others) are
used in multi-layer
neural networks.
Images
Text
Speech
Music
…
Image classification task
machine ? predictions
From pixels to features?
Convolution
• Convolution is a way to transform unstructured data
into network inputs in a way that preserves the
relevant relationships in time and/or space.
• A network architecture that uses this technique is
known as a Convolutional Neural Network (CNN).
• In practice, convolutional layers may form part of a
larger architecture.
Convolution The kernel is a matrix of weights
that passes over the image