Lec 01 Introductionv 2024

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 127

模式识别与机器学习 A

Advanced Pattern Recognition & Machine Learning

郭玉柱
[email protected]

自动化科学与电气工程学院
Sunday, March 31, 2024
Start from ChatGPT

RLHF ( Reinforcement Learning from Human


Feedback )
2
Sora

3
Embodied AI
Minds live in bodies, and bodies move through a changing world. The goal of embodied
artificial intelligence is to create agents, such as robots, that learn to creatively solve
challenging tasks requiring interaction with the environment. Fantastic advances in deep
learning have enabled superhuman performance on a variety of AI tasks previously thought
intractable. Computer vision, speech recognition, and natural language processing have
experienced transformative revolutions at passive input-output tasks like language translation
and image processing, and reinforcement learning has similarly achieved world-class
performance at interactive tasks like games. These advances have supercharged embodied AI,
which can:
• See: perceive their environment through vision or other senses.
• Talk: hold a natural language dialog grounded in their environment.
• Listen: understand and react to audio input anywhere in a scene.
• Act: navigate and interact with their environment to accomplish goals.
• Reason: consider and plan for the long-term consequences of their actions.

4
AI4Science

5
Simpson’s Paradox

[email protected] Pattern Recognition & Machine Learning 6


Correlation vs Causation

Monthly ice cream production in the United States and


drowning deaths in Florida.
Out-of-distribution
Catastrophic Forgetting
Adversarial Attack
Small Data Learning

Simple network
Small data training
Adaptive

Big Data is all about finding


correlations, but Small Data is all
about finding the causation, the
reason why.
“If one takes the top 100 biggest
innovations of our time, perhaps
around 60% to 65% percent are really
based on Small Data.” as Martin Small Data: the Tiny Clues that
Lindstrom Uncover Huge Trends
AGI

“Honeybees are excellent navigators and


explorers, using vision extensively in
these tasks, despite having a brain of
only one million neurons”

The team aim to create the first flying


robot able to sense and act as
autonomously as a bee, rather than just
carry out a pre-programmed set of
instructions.
Probabilistic AI

1. Probabilistic Computing
2. Third wave of AI
3. Ex: Driving a car
4. Role in Explainable AI (XAI)
5. Role of probability in machine learning
Roleof Probability in AI

Prediction → Inference
• Probabilistic computing allows us to
1. Deal with uncertainty in natural data around us
2. Predict events in the world with an understanding
of data and model uncertainty
• Predicting what will happen next in a scenario,
as well as effects of our actions, can only be
done if we know how to model the world around
us with probability distributions
Rolewith XAI

• Augmenting deep learning with probabilistic


methods opens door to understanding why
AI systems make the decisions they make,
• Will help with issues like tackling bias in AI
systems.
• Research into probabilistic computing is really
about establishing a new way to evaluate the
performance of the next wave of AI — one
that requires real-time assessment of “noisy”
data.
Current AI Models

Rule-based System: Classic Machine Representation Learning:


Pre-programmed Logic learning Sense and Perceive
Deep Learning
Output Output Output Output

Mapping from Mapping from Mapping from


features features features
Hand-
designed Additional layers
program of more abstract
Hand-designed
Features features
Features

Simple features
Input Input Input
Input

Shaded boxes indicate components that can learn from data


Next step for AI

• First AI systems focused on logic:


– Pre-programed rules.
• Second wave of AI concerns ability to sense
and perceive information
– Leveraging neural networks to learn over time.
• But, neither solution can do things that human
beings do naturally as we navigate the world.
– They can’t think through multiple potential scenarios
based on data that you have on-hand while
conscious of potential data that you don’t have.
Next step for AI
Driving a Car and Soccer Ball

• If you are driving a car and see a soccer ball roll


into the street,
• Your immediate and natural reaction is to stop
the car since we can assume a child is running
after the ball and isn’t far behind. 8
Role of Probabilistic System

• Driver reaches the decision to stop the car


based on experience of natural data and
assumptions about human behavior.
– But, a traditional computer likely wouldn’t reach the
same conclusion in real-time, because today’s
systems are not programmed to mine noisy data
efficiently and to make decisions based on
environmental awareness.
– You would want a probabilistic system calling the
shots—one that could quickly assess the situation
and act (stop the car) immediately.
9
Explainable AI

Why did you do that?


Why not something else?
When do you succeed?
Anecdote: Medical AI When do you fail?
Decisions can be worse with AI When can I trust you?
e.g., Patient discharge to a nursing home How do I correct an error?

10
Role of Probability in ML
• In neural networks (discriminative models)
1. Output is a probability distribution over y
2. Instead of error as loss function we use a
surrogate loss function, viz., log-likelihood, so that
it is differentiable (which is necessary for gradient
descent)
• In probabilistic AI (generative models)
– We learn a distribution over observed and latent
variables whose parameters are determined by
gradient descent as well
1
p(x; )  Z() p (x, ) Z()  x p (x,

Introduction
What is Artificial Intelligence?

A brief history of AI

Course logistics

[email protected] Pattern Recognition & Machine Learning 23


Outline

What is Artificial Intelligence?

A brief history of AI

Course logistics

[email protected] Pattern Recognition & Machine Learning 24


What is “AI”?

[email protected] Pattern Recognition & Machine Learning 25


Some classic definitions

Building computers that

Think like a human Think rationally


- Cognitive science / neuroscience - Logic and automated reasoning
- Can’t there be intelligence without - But, not all problems can be
humans? solved just be reasoning

Act like a human Act rationally


- Turing test - Basis for intelligence agents
- ELIZA, Loebner prize framework
- “What is 1228 x 5873?” … “I - Unclear if this captures the
don’t know, I’m just a human” current scope of AI research

[email protected] Pattern Recognition & Machine Learning 26


Symbolicism AI

a.k.a “classical AI,” “rule-based AI,” and “good old-fashioned AI.”

Symbolic AI involves the explicit embedding of human knowledge and behavior


rules into computer programs. The practice showed a lot of promise in the early
decades of AI research. But in recent years, as neural networks, also known as
connectionist AI, gained traction, symbolic AI has fallen by the wayside.

Symbolic AI programs are based on creating explicit structures and behavior rules.

An example of symbolic AI tools is


object-oriented programming.

[email protected] Pattern Recognition & Machine Learning 27


Connectionism AI
What is connectionism?
Connectionism is based on the idea that the brain is made up of a large number of simple
processing units, or neurons, that are interconnected. These neurons are able to learn by
adjusting the strength of the connections between them.
What are the benefits of connectionism?
flexible, scalable, effective at learning from data.
What are the limitations of connectionism?
• too simplistic and that it does not capture the true nature of intelligence.
• too reliant on data and that it is not able to generalize well to new situations.
connectionism is well-suited for problems where data is abundant and where there is a need
for fast and accurate predictions. struggle with tasks that require reasoning. difficult to
interpret.
Caenorhabditis elegans, a much-studied worm, has
approximately 300 neurons whose pattern of interconnections
is perfectly known. Yet connectionist models have failed to
mimic even this worm.

[email protected] Pattern Recognition & Machine Learning 28


Actionism AI ( Dynamic )

Reinforcement learning is a type of machine learning that is concerned with how


software agents ought to take actions in an environment so as to maximize some
notion of cumulative reward. The agent learns by interacting with its environment, and
through trial and error discovers which actions yield the most reward.

Reinforcement learning is an important area of machine learning because it is able to


deal with problems that are too difficult for traditional supervised learning methods.
Additionally, reinforcement learning can be used to solve problems that do not have a
clear set of training data, as is the case with many real-world problems.

[email protected] Pattern Recognition & Machine Learning 29


Outline

What is Artificial Intelligence?

A brief history of AI

Course logistics

[email protected] Pattern Recognition & Machine Learning 30


(Some) history of AI

[email protected] Pattern Recognition & Machine Learning 31


Prehistory (400 B.C – )

Philosophy: mind/body dualism,


materialism

Mathematics: logic, probability, decision


theory, game theory

Cognitive psychology
Aristotle
Computer engineering

[email protected] Pattern Recognition & Machine Learning 32


Birth of AI (1943 – 1956)

1943 – McCulloch and Pitts: simple neural


networks

1950 – Turing test

1955-56 – Newell and Simon: Logic


Theorist

1956 – Dartmouth workshop, organized by


John McCarthy, Marvin Minsky, Nathaniel
Rochester, Claude Shannon
“The study is to proceed on the basis of the conjecture that
every aspect of learning or any other feature of intelligence can in
principle be so precisely described that a machine can be made to
simulate it. … We think that a significant advance can be made in
one or more of these problems if a carefully selected group of
[email protected] scientists work
Patternon it together
Recognition & Machinefor a summer.”
Learning 33
Early successes (1950s – 1960s)

1952 – Arthur Samuel develops checkers


program, learns via self-play

1958 – McCarthy LISP, advice taker, time


sharing

1958 – Rosenblatt’s Perceptron algorithm


learns to recognize letters

1968-72 – Shakey the robot

1971-74 – Blocksworld planning and


reasoning domain

[email protected] Pattern Recognition & Machine Learning 34


First “AI Winter” (Later 1970s)

Many early promises of AI fall short

1969 – Minky and Pappert’s “Perceptrons”


books shows that single-layer neural
network cannot represent XOR function

1973 – Lighthill report effectively ends AI


funding in U.K.

1970s – DARPA cuts funding for several AI


projects

[email protected] Pattern Recognition & Machine Learning 35


Expert systems(1970s – 1980s)

Move towards encoding domain expert


knowledge as logical rules

1971-74 – Feigenbaum’s DENRAL


(molecular structure prediction) and MYCIN
(medical diagnoses)

1981 – Japan’s “fifth generation” computer


project, intelligence computers running
Prolog

1982 – R1, expert system for configuring


computer orders, deployed at DEC

[email protected] Pattern Recognition & Machine Learning 36


Second “AI Winter” (Late 1980s – Early 1990s)
As with past AI methods, expert systems
fail to deliver on promises

Complexity of expert systems made them


difficult to develop/maintain

1987 – DARPA cuts AI funding for expert


systems

1991 – Japan’s 5th generation project fails


to meet goals

[email protected] Pattern Recognition & Machine Learning 37


Splintering of AI (1980s – 2000s)

Hidde
Input n Much of AI focus shifts to subfields: machine
Output learning, multiagent systems, computer vision,
natural language processing, robotics, etc

1982 – Backpropagation for training neural


networks popularized by Rumelhart, Hopfield,
Hinton (amongst many others)

1988 – Judea Pearl’s work on Bayesian


networks

1995 – NavLab5 automobile drives across


country steering itself 98% of the time

[email protected] Pattern Recognition & Machine Learning 38


Focus on applications
(1990s – Early 2010s)

Meanwhile, AI (sometimes under a subfield),


achieves some notable milestones

1997 – Deep Blue beats Gary Kasparov

2005, 2007 – Stanford and CMU respectively


win DARPA grand challenge in autonomous
driving

2000s – Ad placement and prediction for


internet companies becomes largely AI-based

2011 – IBM’s Watson defeats human Jeopardy


opponents

[email protected] Pattern Recognition & Machine Learning 39


“AI” Renaissance (2010s – ??)

“AI” is a buzzword again; Google, Facebook,


Apple, Amazon, Microsoft, etc, all have
large “AI labs”

2012 – Deep neural network wins image


classification contest

2013 – Superhuman performance on most Atari


games via a single RL algorithm

2016 – DeepMind’s AlphaGo beats one of the


top human Go players

2017 – CMU’s Libratus defeats top pro players


at No-limit Texas Hold’em

[email protected] Pattern Recognition & Machine Learning 40


AI is all around us

Face detection Personal assistants

Machine translation Logistics planning

[email protected] Pattern Recognition & Machine Learning 41


A broader definition

Artificial intelligence is the development and study of


computer systems to address problems typically
associated with some form of intelligence

[email protected] Pattern Recognition & Machine Learning 42


Turing Test


[email protected] Pattern Recognition & Machine Learning 43


The Chinese Room

[email protected] Pattern Recognition & Machine Learning 44


Deep Learning

[email protected] Pattern Recognition & Machine Learning 45


AI Safety

[email protected] Pattern Recognition & Machine Learning 46


AI Ethics

[email protected] Pattern Recognition & Machine Learning 47


Singularity



48
[email protected] Pattern Recognition & Machine Learning 48
Some parting thoughts

“Computers in the future may have only 1,000


vacuum tubes and weigh only 1.5 tons.”
– Popular Mechanics, 1949

“Machines will be capable, within twenty


years, of doing any work a man can do.”
– Herbert Simon, 1965

[email protected] Pattern Recognition & Machine Learning 49


Outline

What is Artificial Intelligence?

A brief history of AI

Course logistics

[email protected] Pattern Recognition & Machine Learning 50


Organization of course

• Undergrad AI, broad introduction to a wide range of topics


• Grad AI, more focused on a few topics, leaving out others

The goal of this course is to introduce you to some of the topics


and techniques that are at the forefront of modern AI research:
• Probabilistic reasoning
• Graphical model
• Feature Engineering
• Learning theory
• Model assessment and selection
• Machine learning and deep learning

[email protected] Pattern Recognition & Machine Learning 51
Grading

Students taking this course should have experience with:


mathematical proofs, linear algebra, calculus, probability,
Matlab/Python programming

Grading breakdown for the course:


10% class performance
30% project
60% exams

[email protected] Pattern Recognition & Machine Learning 52


Academic integrity

Homework/project policy:
• You may discuss homework problems with other
students, but you need to specify all students you
discuss with in your writeup
• Your writeup and code must be written entirely
on your own, without reference to notes that
you took during any group discussion

All code and written material that you submit must be


entirely your own unless specifically cited (in quotes for text,
or within a comment block for code) from third party
sources

[email protected] Pattern Recognition & Machine Learning 53


Pattern Recognition References

Pattern Recognition and Probabilistic Machine Deep Learning by Ian


Machine Learning by Learning An Introduction by Goodfellow, Yoshua
Christopher M. Bishop Kevin P. Murphy Bengio, and Aaron
Courville
[email protected] Pattern Recognition & Machine Learning 54
Why Learn Learning?

[email protected] Pattern Recognition & Machine Learning 55


Motivation

• “We are drowning in information,


but we are starved for knowledge”
- John
Naisbitt,
Megatrends

• Data = raw information


• Knowledge = patterns or models
behind the data
[email protected] Pattern Recognition & Machine Learning 56
Solution: Machine Learning

• Hypothesis: pre-existing data repositories contain a


lot of potentially valuable knowledge
• Mission of learning: find it
• Definition of learning:
(semi-)automatic extraction of valid, novel, useful and
comprehensible knowledge – in the form of rules,
regularities, patterns, constraints or models – from arbitrary
sets of data

[email protected] Pattern Recognition & Machine Learning 57


Applications of ML are Deep and Prevalent
• Online ad selection and placement
• Risk management in finance, insurance, security
• High-frequency trading
• Medical diagnosis
• Mining and natural resources
• Malware analysis
• Drug discovery
• Search engines

[email protected] Pattern Recognition & Machine Learning 58
Draws on Many Disciplines

• Artificial Intelligence
• Statistics
• Continuous Optimisation
• Databases
• Information Retrieval
• Communications/Information Theory
• Signal Processing
• Computer Science Theory
• Philosophy
• Psychology and Neurobiology

[email protected] Pattern Recognition & Machine Learning 59


Terminology

• Input to a machine learning system can consist of


 Instance: measurements about individual entities/objects
a loan application
 Attribute (aka Feature, explanatory var.): component of the
instances
the applicant’s salary, number of dependents, etc.
 Label (aka Response, dependent var.): an outcome that is
categorical, numeric, etc.
forfeit vs. paid off
 Examples: instance coupled with label
<(100k, 3), “forfeit”>
 Models: discovered relationship between attributes
and/or label
[email protected] Pattern Recognition & Machine Learning 60
Human Perception

• Humans have developed highly sophisticated skills for


sensing their environment and taking actions
according to what they observe, e.g.,
– Recognizing a face.
– Understanding spoken words.
– Reading handwriting.
– Distinguishing fresh food from its smell.
– ...

[email protected] Pattern Recognition & Machine Learning 61


Why is Pattern Recognition important?

Kurzweil describes a series of thought


experiments which suggest to him that the
brain contains a hierarchy of pattern
recognizers. Based on this he introduces
his Pattern Recognition Theory of Mind.
He says the neocortex contains 300 million
very general pattern
recognition circuits and argues that they
are responsible for most aspects of
human thought.

[email protected] Pattern Recognition & Machine Learning 62


Pattern Recognition (PR)

• Pattern Recognition is the study of how machines


can:
– observe the environment,
– learn to distinguish patterns of interest,
– make sound and reasonable decisions about the categories
of the patterns.

[email protected] Pattern Recognition & Machine Learning 63


What is a Pattern

• What is a Pattern?
– is an abstraction, represented by a set of
measurements describing a “physical” object
• Many types of patterns exist:
– visual, temporal, sonic, logical, ...

[email protected] Pattern Recognition & Machine Learning 64


What is a Pattern

“A pattern is the opposite of a chaos; it is an entity


vaguely defined, that could be given a name.”
(Watanabe)

[email protected] Pattern Recognition & Machine Learning 65


Pattern Recognition (PR)

• What is a Pattern Class (or category)?


– is a set of patterns sharing common attributes
– a collection of “similar”, not necessarily identical, objects
– During recognition, given objects are assigned to a prescribed
class

[email protected] Pattern Recognition & Machine Learning 66


Recognition

Identification of a pattern as a member of a category


(class) we already know, or we are familiar with
Classification (known categories)

Clustering (learning categories)

Category “A”

Category “B”

Clustering
Classification

[email protected] Pattern Recognition & Machine Learning 67


Pattern Recognition (PR)

• No single theory of Pattern Recognition can


possibly cope with such a broad range of
problems...
• However, there are several standard models,
including:
– Statistical or fuzzy pattern recognition
– Syntactic or structural pattern recognition
– Knowledge-based pattern recognition

[email protected] Pattern Recognition & Machine Learning 68


PR Systems
Pattern recognition systems have four major components:
 data acquisition and collection  feature extraction and representation
 similarity detection and pattern  performance evaluation
classifier design

[email protected] Pattern Recognition & Machine Learning 69


Pattern Recognition

• Two phase Process


1. Training/Learning

Learning is hard and time consuming

System must be exposed to several examples of
each class

Creates a “model” for each class
• Once learned, it becomes natural
2. Detecting/Classifying

[email protected] Pattern Recognition & Machine Learning 70


Methodology

[email protected] Pattern Recognition & Machine Learning 71


Development of PR

 1929 : Tauschek invented the first OCR (Optical


Character Recognition) machine called Reading Machine,
which can read number 0-9.

 1930s : Fisher proposed the theory of statistical


classification, which was the foundation for statistical
pattern recognition.

 1950s : Noam Chemsky proposed formal language


theory ; King-Sun Fu proposed the syntactic/structural
pattern recognition theory.

72
[email protected] Pattern Recognition & Machine Learning 72
Development of PR

 1960s : L. A. Zadeh proposed fuzzy set theory, fuzzy


pattern recognition method was developed and applied.

 1980s : The neural network model represented by Hopfield


network and BP network leads to the revival of artificial
neural network and is widely used in pattern recognition.

 1990s : Small Sample Size Learning and Support Vector


Machine (SVM) have attracted much attention.

 2006 : Deep Learning

73
[email protected] Pattern Recognition & Machine Learning 73
Applications

[email protected] Pattern Recognition & Machine Learning 74


Main Contents

[email protected] Pattern Recognition & Machine Learning 75


Mathematical Foundations

[email protected] Pattern Recognition & Machine Learning 76


Main Contents

[email protected] Pattern Recognition & Machine Learning 77


From Evidence-Based Medicine
to Personalised Precision
Medicine

 Complex physiological
and pathological processes
 Data driven machine learning
 Personalised in silico medicine

[email protected] Pattern Recognition & Machine Learning 78


Prediction of risk of hip fracture

Clinical gold standard

Body + Organ + Tissue

Miss 30-50% of hip


fracture

Personalised FE simulation
+
SVM

AUC increase from 50% to


92%
[email protected] Pattern Recognition & Machine Learning 79
Multi-Modal Fusion

检测 – 诊断 – 干预 脑电 皮肤电导

一体化智能传感技术

肌电
惯性传感器

肌肉电刺激 足压分布
改善患者生活
实现无人化护理,缓解医疗压力
[email protected] Pattern Recognition & Machine Learning 80
FOG Detection with Wearable Sensors








STFT CWT TV-ARMA TV-ARMA


with RLS with LROFR

Identify the nonlinear function


1 1 1 1

10 10 10

10 10 10

K 20 20 20
K 20 20 20
K K
… …
10 10 10


10 10 10

30 30 30 30 30 30

,
20 20 20
20 20 20

… … ,…, …
40 40 40 40 40 40


30 30 30 30
10 10 10 30 30
10 10 10
50 50 50 50 50 50
40 40 40 40
40 40
20 20 20 20 20 20


60


60


60 60 60 60
50 50 50


50 50 50
30 30 30 30 30 30
0. 5 0. 6 0. 7 0. 8 0. 9 1 1. 1. 2 1. 3 1. 4 1. 5 0. 5 0. 6 0. 7 0. 8 0. 9 1 1. 1. 2 1. 3 1. 4 1. 5 0. 5 0. 6 0. 7 0. 8 0. 9 1 1. 1. 2 1. 3 1. 4 1. 5 0. 5 0. 6 0. 7 0. 8 0. 9 1 1. 1. 2 1. 3 1. 4 1. 5 0. 5 0. 6 0. 7 0. 8 0. 9 1 1. 1. 2 1. 3 1. 4 1. 5 0. 5 0. 6 0. 7 0. 8 0. 9 1 1. 1. 2 1. 3 1. 4 1. 5

60 60 60 60
40 40 60 60
40 40 40 40

0. 5 0. 6 0. 7 0. 8 0. 9 1 1. 1. 2 1. 3 1. 4 1. 5 0. 5 0. 6 0. 7 0. 8 0. 9 1 1. 1. 2 1. 3 1. 4 1. 5 0. 5 0. 6 0. 7 0. 8 0. 9 1 1. 1. 2 1. 3 1. 4 1. 5 0. 5 0. 6 0. 7 0. 8 0. 9 1 1. 1. 2 1. 3 1. 4 1. 5
0. 5 0. 6 0. 7 0. 8 0. 9 1 1. 1. 2 1. 3 1. 4 1. 5 0. 5 0. 6 0. 7 0. 8 0. 9 1 1. 1. 2 1. 3 1. 4 1. 5
50 50 50 50 50 50

60 60 60 60 60 60

0. 5 0. 6 0. 7 0. 8 0. 9 1 1. 1. 2 1. 3 1. 4 1. 5 0. 5 0. 6 0. 7 0. 8 0. 9 1 1. 1. 2 1. 3 1. 4 1. 5 0. 5 0. 6 0. 7 0. 8 0. 9 1 1. 1. 2 1. 3 1. 4 1. 5 0. 5 0. 6 0. 7 0. 8 0. 9 1 1. 1. 2 1. 3 1. 4 1. 5 0. 5 0. 6 0. 7 0. 8 0. 9 1 1. 1. 2 1. 3 1. 4 1. 5 0. 5 0. 6 0. 7 0. 8 0. 9 1 1. 1. 2 1. 3 1. 4 1. 5

LSTM
y  k   flstm  x  k  , x  k  1 ,, x  k  nu  , e(k ) 

[email protected] Pattern Recognition & Machine Learning 81


Computer Aided Diagnosis
LPRD
灵敏度
24.1→82.7%
已应用于解放军 306 医院
临床 1500 余例 LPR 诊断

[email protected] Pattern Recognition & Machine Learning 82


MCI and AD brain disease

Complex networks
Normal Control for Average Connectivity MCI Patient for Average Connectivity
1 1

0.9 10 0.9
10

0.8 20 0.8
20

0.7 0.7
30 30

0.6 0.6
40 40
0.5 0.5
50 50
0.4 0.4
60 60
0.3 0.3
70 70
0.2 0.2

80 80 0.1
0.1

90
10 20 30 40 50 60
Mechanism
70 80 90
0 90
10 20 30 40 50 60 70 80 90
0

• Playing an important role in MCI pathology trigger mechanism analysis


– More accurately revealing the brain activity of tools
– Providing a new approach for challenging brain research

[email protected] Pattern Recognition & Machine Learning 83


Brain Computer Interface

[email protected] Pattern Recognition & Machine Learning 84


Brain Mode Decomposition

[email protected] Pattern Recognition & Machine Learning 85


Brain Inspired Intelligence

科技创新 2030“ 新一代人工智能”重大项目


--- 人在回路的混合增强智能理论与方法

[email protected] Pattern Recognition & Machine Learning 86


Important Conceptions in PR

Pattern , Feature , Algorithm , Model ,


Machine Learning , Optimization , Validation,
Over-fitting, Regularization, Cross-Validation

Feature selection – Feature Extraction


Classification – Regression
Supervised – Unsupervised – Reinforcement
Syntactic – Statistical – ANN
Generative – Discriminative
Linear – Nonlinear

[email protected] Pattern Recognition & Machine Learning 87


A Case Study: Fish Classification

• Problem:
– sort incoming fish on a conveyor
belt according to species
–Assume only two classes exist:
• Sea Bass and Salmon

Salmon
Sea-bass

[email protected] Pattern Recognition & Machine Learning 88


A Case Study: Fish Classification

• What kind of information can distinguish one


species from the other?
– length, width, weight, number and shape of fins, tail shape, etc.
• What can cause problems during sensing?
– lighting conditions, position of fish on the conveyor belt,
camera noise, etc.
• What are the steps in the
process? Pre-processing

1.Capture image. Feature Extraction

2.Isolate fish.
3.Take measurements Classification

4.Make decision
“Sea Ba ss” “Salmon”

[email protected] Pattern Recognition & Machine Learning 89


A Case Study: Fish Classification

• Selecting Features
– Assume a fisherman told us that a sea bass is generally
longer than a salmon.
– We can use length as a feature and decide between sea
bass and salmon according to a threshold on length.
– How can we choose this threshold?
Even though “sea
Histograms of the bass” is longer than
length feature for “salmon” on the
two types of fish in average, there are
training samples. many examples of
How can we choose fish where this
the threshold to observation does
make a reliable not hold...
decision?

[email protected] Pattern Recognition & Machine Learning 90


A Case Study: Fish Classification

• Selecting Features
– Let’s try another feature and see if we get better
discrimination
➡Average Lightness of the fish scales

Histograms of It looks easier to


choose the
the lightness threshold x but
feature for two we still cannot
types of fish in make a perfect
training decision.
samples.

[email protected] Pattern Recognition & Machine Learning 91


A Case Study: Fish Classification

• Multiple Features
– Single features might not yield the best performance.
– To improve recognition, we might have to use more than one
feature at a time.
– Combinations of features might yield better performance.
– Assume we also observed that sea bass are typically wider than
salmon.

x1 : lightness Scatter plot of


lightness and width
features for
x2 : width training samples.
We can draw a
Each fish decision
image is now boundary to
represented by divide the feature
a point in this space into two
2D feature regions.
space Does it look better
than using only
lightness?

[email protected] Pattern Recognition & Machine Learning 92


A Case Study: Fish Classification

• Designing a Classifier
• Can we do better with another decision rule?
• More complex models result in more complex
boundaries.

DANGER OF
We may
OVER
distinguish
FITTING!!
training samples
perfectly but how
can we predict CLASSIFIER
how well we can WILL FAIL TO
generalize to GENERALIZE
unknown TO NEW
samples? DATA...

[email protected] Pattern Recognition & Machine Learning 93


A Case Study: Fish Classification

• Designing a Classifier
• How can we manage the tradeoff between complexity of decision
rules and their performance to unknown samples?

Different criteria
lead to different
decision
boundaries

[email protected] Pattern Recognition & Machine Learning 94


Architecture of a BCI

Pattern Recgonition & Machine Learning 95


Example Calibration Problem

• Task: A person is presented with a sequence of


300 images (one ever 2 seconds). Half of the
images are exciting, the other half are not.
One channel of EEG (at Cz location) is recorded.
• Question: How to design a BCI that can
determine whether a person is shown an exciting
or a non-exciting image?
• Approach: For each trial k, cut out an epoch Xk of
1s length, extract a short vector of features fk,
and assign a label yk in {E,NE}. Use machine
learning to find an optimal statistical mapping
from fk onto yk.
Pattern Recgonition & Machine Learning 96
Extracting Features of a Peak

• A supposed characteristic peak in a time


window (relative to an event) could be
characterized by three parameters:
 Latency 

 Height 
 Width 

Pattern Recgonition & Machine Learning 97


Resulting Feature Space

• Plotting the 3-element feature vectors for all


exciting trials in red, and non-exciting trials in
green, we obtain two distributions in a 3d
space:

Pattern Recgonition & Machine Learning 98


ML with Feature Extraction

• Including the feature extraction, the analysis process


is as follows:
Here: calc_peak()

S2 S1 R1
S1

Extract
… Features
, ,
2 1 1

Training 𝑓1 𝑓1 𝑓
Model function 𝑓2 𝑓2 1 …
𝜽 X,y ⋮ , ⋮ 𝑓
, 2

2 1
Pattern Recgonition & Machine Learning 1 99
Pattern Class

 A collection of similar (not necessarily


identical) objects
 A class is defined by class samples
(exemplars, prototypes)
 Intra-class variability
 Inter-class similarity
 How to define similarity?

[email protected] Pattern Recognition & Machine Learning 10


Intra-Class Variability

Handwritten numerals

[email protected] Pattern Recognition & Machine Learning 10


Inter-class Similarity

Characters that look similar

Identical twins

[email protected] Pattern Recognition & Machine Learning 10


Feature Extraction
• Designing a Feature Extractor
• Its design is problem specific (e.g. features to
extract from graphic objects may be quite different
from sound events...)
• The ideal feature extractor would produce the
same feature vector X for all patterns in the same
class, and different feature vectors for patterns in
different classes.
• In practice, different inputs to the feature extractor
will always produce different feature vectors, but
we hope that the within-class variability is small
relative to the between-class variability.
• Designing a good set of features is sometimes “more
of an art than a science”...
[email protected] Pattern Recognition & Machine Learning 10
Curse of Dimensionality

Does adding more features always


improve the results?
No!! So we must:
Avoid unreliable features.
Be careful about correlations with existing features.
Be careful about measurement costs.
Be careful about noise in the measurements.
Is there some curse for working in
very high dimensions?
YES THERE IS! ==> CURSE OF
DIMENSIONALITY

➡thumbn rule: n >= d(d-1)/2


= nr of examples in training dataset
d = nr of features

[email protected] Pattern Recognition & Machine Learning 10


Inadequate Features

• Problem: Inadequate Features


– features simply do not contain the information needed to separate the
classes, it doesn't matter how much effort you put into designing the
classifier.
– Solution: go back and design better features.

“Good” features “Bad” features

[email protected] Pattern Recognition & Machine Learning 10


Correlated Features

– Often happens that two features that were meant to


measure different characteristics are influenced by some
common mechanism and tend to vary together.
• E.g. the perimeter and the maximum width of a figure will both vary
with scale; larger figures will have both larger perimeters and larger
maximum widths.
– This degrades the performance of a classifier based on
Euclidean distance to a template.
• A pattern at the extreme of one class can be closer to the template for
another class than to its own template. A similar problem occurs if
features are badly scaled, for example, by measuring one feature in
microns and another in kilometers.
– Solution: (Use other metrics, e.g. Mahalanobis...) or extract
features known to be uncorrelated!

[email protected] Pattern Recognition & Machine Learning 10


Designing a Classifier

• Model selection:
– Domain dependence and prior information.
– Definition of design criteria.
– Parametric vs. non-parametric models.
– Handling of missing features.
– Computational complexity.
– Types of models: templates, decision-theoretic or
statistical, syntactic or structural, neural, and
hybrid.
– How can we know how close we are to the true
model underlying the patterns?

[email protected] Pattern Recognition & Machine Learning 10


Designing a Classifier

• Designing a Classifier
• How can we manage the tradeoff between complexity of decision rules
and their performance to unknown samples?

Different criteria lead


to different decision
boundaries

[email protected] Pattern Recognition & Machine Learning 10


Curved Boundaries
– linear boundaries produced by a minimum-
Euclidean-distance classifier may not be
flexible enough.
• For example, if x1 is the perimeter and x2 is the area of a figure,
x1 will grow linearly with scale, while x2 will grow
quadratically.This will "warp" the feature space and prevent a
linear discriminant function from performing well.
– Solutions:
• Redesign the feature set (e.g., let x2 be the square root of the area)
• Try using Mahalanobis distance, which can produce quadratic decision
boundaries
• Try using a neural network (beyond the scope of these notes; see
Haykin)

[email protected] Pattern Recognition & Machine Learning 10


Designing a Classifier

• Problem: Subclasses in the dataset


– frequently happens that the classes defined by the end
user are not the "natural" classes ...
– Solution: use CLUSTERING.

[email protected] Pattern Recognition & Machine Learning 11


Machine Learning

• How can a machine learn the rule from


data?
– Supervised learning: a teacher provides a category
label or cost for each pattern in the training set.
➡Classification
– Unsupervised learning: the system forms clusters or
natural groupings of the input patterns (based on
some similarity criteria).
➡Clustering
• Reinforcement learning: no desired category is given
but the teacher provides feedback to the system such as
the decision is right or wrong.
[email protected] Pattern Recognition & Machine Learning 11
Supervised Learning

• Supervised Training/Learning
– a “teacher” provides labeled training sets, used to train a
classifier

1 Learn about Shape 1 Learn about Color

Triangles
2
Clas Blue Objects
sify
?
Training Set

Training Set
It’s a Triangle! 2
Clas
s ify

Circle It’s Yellow!


Yellow Objects
s

[email protected] Pattern Recognition & Machine Learning 11


Unsupervised Training/Learning

– No labeled training sets are provided


– System applies a specified clustering/grouping criteria to
unlabeled dataset
– Clusters/groups together “most similar” objects (according
to given criteria)

Unlabeled Training set

1 Clustering Criteria = some similarity


measure

? 2

[email protected] Pattern Recognition & Machine Learning 11


Evaluating a Classifier

• Training Set
– used for training the classifier
• Testing Set
– examples not used for training
– avoids overfitting to the data
– tests generalization abilities of the trained classifiers
• Data sets are usually hard to obtain...
– Labeling examples is time and effort consuming
– Large labeled datasets usually not widely available
– Requirement of separate training and testing datasets
imposes higher difficulties...
– Use Cross-Validation techniques!

[email protected] Pattern Recognition & Machine Learning 11


Evaluating a Classifier

• Costs of Error
–We should also consider costs of different
errors we make in our decisions. For example,
if the fish packing company knows that:
• Customers who buy salmon will object
vigorously if they see sea bass in their cans.
• Customers who buy sea bass will not be unhappy
if they occasionally see some expensive salmon
in their cans.
• How does this knowledge affect our decision?

[email protected] Pattern Recognition & Machine Learning 11


Evaluating a Classifier

• Confusion Matrix

[email protected] Pattern Recognition & Machine Learning 11


Simple Classifiers

• Minimum-distance Classifiers
– based on some specified “metric” ||x-
m||
– e.g. Template Matching

[email protected] Pattern Recognition & Machine Learning 11


Simple Classifiers

• Template Matching
TEMPLATE NOISY EXAMPLES
S

– To classify one of the noisy examples, simply


compare it to the two templates.This can be done in
a couple of equivalent ways:
1.Count the number of agreements. Pick the class that has the
maximum number of agreements. This is a maximum correlation approach.
2.Count the number of disagreements. Pick the class with the
minimum number of disagreements. This is a minimum error approach.
• Works well when the variations within a class are due to
"additive noise”, and there are no other distortions of the
characters -- translation, rotation, shearing, warping,
expansion, contraction or occlusion.
[email protected] Pattern Recognition & Machine Learning 11
Simple Classifiers

• Metrics
– different ways of measuring distance:
• Euclidean metric:
– || u || = sqrt( u12 + u22 + ... + ud2 )
• Manhattan (or taxicab) metric:
– || u || = |u1| + |u2| + ... + |ud|
• Contours of constant...
– ... Euclidean distance are circles (or spheres)
– ... Manhattan distance are squares (or boxes)
– ... Mahalanobis distance are ellipses (or ellipsoids)

[email protected] Pattern Recognition & Machine Learning 11


Classifiers: Neural Networks

[email protected] Pattern Recognition & Machine Learning 12


Gaussian Modeling

p(x1,x )
2
x
x2 2

6 6
4 4
1
0.7 2 2
5 5
0.5 0 x
0.2 2
1
6
5 -5 0 -2
4
0
0 -4
-5 x1
5 -6

-6 -4 -2 0 2 4 6

[email protected] Pattern Recognition & Machine Learning 12


Gaussian Mixture Models

• Use multiple Gaussians to model the


data
x 10
2
7.5

1 2.5
0.7 10
5 0
0.5 5
0.2 -2.5
5 -5
0
0 -5
0
5 -5 x1
10 -5 -2.5 0 2.5 5 7.5 10

[email protected] Pattern Recognition & Machine Learning 12


Classifiers: kNN

• k-Nearest Neighbours class of training pattern training pattern

Classifier
3
3 3
2 3 3 2
2 2
– Lazy Classifier 2
2
2
2
2
1 1
• no training is actually performed 1 1 3 3 1 1

(hence, lazy ;-))


1 1
1 3 3

– An example of Instance
X (a pattern to be classified)
Based Learning k=8
four patterns of category 1
two patterns of category 2
two patterns of category 3

plurality are in category 1, so


decide X is in category 1

[email protected] Pattern Recognition & Machine Learning 12


Decision Trees

• Learn rules from


data
x3
 1

• Apply each rule at 


x2

x4
1
1
each node 0
0
x3x4 x1

• classification is at
x3x2 1  1
x3x2
1 0
the leafs of the tree x3x4x1 x3x4x1
f = x3x2 + x3x4x1

[email protected] Pattern Recognition & Machine Learning 12


Clustering: k-means

[email protected] Pattern Recognition & Machine Learning 12


Model Training

[email protected] Pattern Recognition & Machine Learning 12


Q&A

[email protected] Pattern Recognition & Machine Learning 12

You might also like