0% found this document useful (0 votes)
20 views50 pages

1c Machinelearning

Uploaded by

mb6hbk2ctg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views50 pages

1c Machinelearning

Uploaded by

mb6hbk2ctg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

Foundations of Data Science, Fall 2024

Introduction to Data Science for Doctoral Students, Fall 2024

1c. Introduction: Machine Learning

Dr. Haozhe Zhang

Sept 16, 2024

MSc: https://lms.uzh.ch/url/RepositoryEntry/17589469505
PhD: https://lms.uzh.ch/url/RepositoryEntry/17589469506
Machine Learning in Action

(Using https://www.betafaceapi.com/demo.html)

1
Machine Learning in Action

(Using https://www.betafaceapi.com/demo.html)

1
Machine Learning in Action

(Using https://www.betafaceapi.com/demo.html)

1
Is anything wrong?

2
Is anything wrong?

(See Guardian article)

2
What is machine learning?

What is artificial intelligence?

3
What is machine learning?

What is artificial intelligence?

“Instead of trying to produce a programme to simulate


the adult mind, why not rather try to produce one which
simulates the child’s? If this were then subjected to an
appropriate course of education one would obtain the
adult brain.”

Turing, A.M. (1950). Computing machinery and intelligence. Mind, 59, 433-460.

3
What is machine learning?

What is artificial intelligence?

“Instead of trying to produce a programme to simulate


the adult mind, why not rather try to produce one which
simulates the child’s? If this were then subjected to an
appropriate course of education one would obtain the
adult brain.”

Turing, A.M. (1950). Computing machinery and intelligence. Mind, 59, 433-460.

Programming vs. Learning

3
Application: Boston Housing Dataset

Numerical attributes Predict house cost


• Crime rate per capita
• Non-retail business fraction
• Nitric Oxide concentration
• Age of house
• Floor area
• Distance to city centre
• Number of rooms

Categorical attributes
• On the Charles river?
• Index of highway access (1-5)

Source: UCI repository

4
Application: Object Detection and Localisation

• 200-basic level categories

• Here: Six pictures containing airplanes and people

• Dataset contains over 400,000 images

• Imagenet competition (2010-)

• All recent successes through very deep neural networks!


5
Programming vs Learning

Programming, like all engineering, is a lot of work:

We have to build everything from scratch.

Learning is more like farming, which lets nature do most of the work. Farmers
combine seeds with nutrients to grow crops.

Learners combine knowledge with data to grow programs.

6
What is machine learning?

Definition by Tom Mitchell (1997)

A computer program is said to learn from experience E with respect to some


class of tasks T and performance measure P, if its performance at tasks in T, as
measured by P, improves with experience E.

7
What is machine learning?

Definition by Tom Mitchell (1997)

A computer program is said to learn from experience E with respect to some


class of tasks T and performance measure P, if its performance at tasks in T, as
measured by P, improves with experience E.

Face Detection

• E : images (with bounding boxes) around faces

7
What is machine learning?

Definition by Tom Mitchell (1997)

A computer program is said to learn from experience E with respect to some


class of tasks T and performance measure P, if its performance at tasks in T, as
measured by P, improves with experience E.

Face Detection

• E : images (with bounding boxes) around faces

• T : given an image without boxes, put boxes around faces

7
What is machine learning?

Definition by Tom Mitchell (1997)

A computer program is said to learn from experience E with respect to some


class of tasks T and performance measure P, if its performance at tasks in T, as
measured by P, improves with experience E.

Face Detection

• E : images (with bounding boxes) around faces

• T : given an image without boxes, put boxes around faces

• P : number of faces correctly identified

7
What is machine learning?

Learning = Representation + Evaluation + Optimisation

• Representation: Hypothesis space of the learner

• Choose the set of classifiers that it can possibly learn

• Evaluation: Objective or scoring function

• Distinguish good classifiers from bad ones

• Optimisation: Search method for the highest-scoring classifier

• Key to the efficiency of the learner


• Unlike in most optimisation problems, we do not have access to the
function we want to optimise!
• Use training error as a surrogate for test error :(
• Objective function is only a proxy for the true goal ⇒ no need to fully
optimise it, local optimum may be OK
P. Domingos. A few useful Things to Know about Machine Learning. CACM, 2012.
8
Cats vs Dogs

9
An early (first?) example of automatic classification

Ronald Fisher: Iris Flowers (1936)

• Three types: setosa, versicolour, virginica

• Data: sepal width, sepal length, petal width, petal length

10
Histogram Plots for Different Measurements for Versicolour & Virginica

11
Scatter Plots of Pairwise Measurements for Versicolour & Virginica

12
An Early (first?) Example of Automatic Classification

Ronald Fisher: Iris Flowers (1936)

• Three types: setosa, versicolour, virginica

• Data: sepal width, sepal length, petal width, petal length

• Method: Find linear combinations of features that maximally differentiates


the classes (Fisher Linear Discriminant)

setosa versicolour virginica


13
Frank Rosenblatt and the Perceptron

• Perceptron (1957) - inspired by neurons


• Simple learning algorithm: Adjust the x1 x2 x3 x4
weights if incorrect prediction on new input
• Built using specialised hardware w1 w4
w2 w3

sign(w0 + w1 x1 + · · · + w4 x4 )

+1 if w · x ≥ 0
= sign(w · x) =
−1 otherwise

14
Perceptron Training Algorithm

15
Perceptron Training Algorithm

15
Perceptron Training Algorithm

15
Perceptron Training Algorithm

15
Perceptron Training Algorithm

15
Perceptron Training Algorithm

15
Perceptron Training Algorithm

15
Perceptron Training Algorithm

15
Perceptron Training Algorithm

15
Perceptron Training Algorithm

15
The Perceptron Algorithm

Initialisation w = 0 –- all params set to 0


Repeat Until Convergence:
For t = 1 . . . n –- go over all examples
1. y ′ = sign(xt · w ) –- compute the prediction

2. If y ̸= yt then w := w + yt xt –- update the params
else leave w unchanged

16
The Perceptron Algorithm

Initialisation w = 0 –- all params set to 0


Repeat Until Convergence:
For t = 1 . . . n –- go over all examples
1. y ′ = sign(xt · w ) –- compute the prediction

2. If y ̸= yt then w := w + yt xt –- update the params
else leave w unchanged

Convergence: w remains unchanged for an entire pass over the training set.
Then, all training examples are classified correctly.

16
Machine Learning Models and Methods

k -Nearest Neighbours Linear Discriminant Analysis


Linear Regression Quadratic Discriminant Analysis
Logistic Regression Perceptron Algorithm
Ridge Regression Naïve Bayes Classifier
Hidden Markov Models Hierarchical Bayes
Mixtures of Gaussian k -means Clustering
Principle Component Analysis Support Vector Machines
Independent Component Analysis Gaussian Processes
Kernel Methods Artificial Neural Networks
Decision Trees Convolutional Neural Networks
Boosting and Bagging Markov Random Fields
Belief Propagation Structural SVMs
Variational Inference Conditional Random Fields
EM Algorithm Structure Learning
Monte Carlo Methods Restricted Boltzmann Machines
Spectral Clustering Multi-dimensional Scaling
Hierarchical Clustering Reinforcement Learning
Recurrent Neural Networks ···
17
NeurIPS Papers!

Advances in Neural Information Processing Systems 1988


18
NeurIPS Papers!

Advances in Neural Information Processing Systems 1995


18
NeurIPS Papers!

Advances in Neural Information Processing Systems 2000


18
NeurIPS Papers!

Advances in Neural Information Processing Systems 2005


18
NeurIPS Papers!

Advances in Neural Information Processing Systems 2009


18
NeurIPS Papers!

Advances in Neural Information Processing Systems 2016


18
NeurIPS Papers!

Advances in Neural Information Processing Systems 2017


https://www.youtube.com/watch?v=mlXzufEk-2E 18
Machine Learning vs. Deep Learning

Le Cunn et al., Deep Learning, Nature (2015)

19
Supervised Learning

Training data has inputs x (numerical, categorical) as well as outputs y (target)

Regression: When the output is real-valued, e.g., housing price

Classification: Output is a category

• Binary classification: only two classes e.g., spam


• Multi-class classification: several classes e.g., object detection

20
Unsupervised Learning : Genetic Data of European Populations

Source: Novembre et al., Nature (2008)

Dimensionality reduction - Map high-dimensional data to low dimensions


Clustering - group together individuals with similar genomes 21
Unsupervised Learning : Group Similar News Articles

Group similar articles into categories such as politics, music, sport, etc.

In the dataset, there are no labels for the articles

22
Active and Semi-Supervised Learning

Active Learning
• Initially all data is unlabelled
• Learning algorithm can ask a human to label some
data

Semi-supervised Learning
• Limited labelled data, lots of unlabelled data
• How to use the two together to improve learning?

23
Collaborative Filtering : Recommender Systems

Movie / User Alice Bob Charlie Dean Eve


The Shawshank Redemption 7 9 9 5 2
The Godfather 3 ? 10 4 3
The Dark Knight 5 9 ? 6 ?
Pulp Fiction ? 5 ? ? 10
Schindler’s List ? 6 ? 9 ?

Netflix competition to predict user-ratings (2008-09)


Any individual user will not have used most products
Most products will have been use by some individual

24
Reinforcement Learning

• Automatic flying helicopter; self-driving cars

• Cannot conceivably program by hand

• Uncertain (stochastic) environment

• Must take sequential decisions

• Can define reward functions

• Fun: Playing Atari breakout!


https://www.youtube.com/watch?v=V1eYniJ0Rnk

25

You might also like