Lec 01 Introductionv 2024
Lec 01 Introductionv 2024
Lec 01 Introductionv 2024
自动化科学与电气工程学院
Sunday, March 31, 2024
Start from ChatGPT
3
Embodied AI
Minds live in bodies, and bodies move through a changing world. The goal of embodied
artificial intelligence is to create agents, such as robots, that learn to creatively solve
challenging tasks requiring interaction with the environment. Fantastic advances in deep
learning have enabled superhuman performance on a variety of AI tasks previously thought
intractable. Computer vision, speech recognition, and natural language processing have
experienced transformative revolutions at passive input-output tasks like language translation
and image processing, and reinforcement learning has similarly achieved world-class
performance at interactive tasks like games. These advances have supercharged embodied AI,
which can:
• See: perceive their environment through vision or other senses.
• Talk: hold a natural language dialog grounded in their environment.
• Listen: understand and react to audio input anywhere in a scene.
• Act: navigate and interact with their environment to accomplish goals.
• Reason: consider and plan for the long-term consequences of their actions.
4
AI4Science
5
Simpson’s Paradox
Simple network
Small data training
Adaptive
1. Probabilistic Computing
2. Third wave of AI
3. Ex: Driving a car
4. Role in Explainable AI (XAI)
5. Role of probability in machine learning
Roleof Probability in AI
Prediction → Inference
• Probabilistic computing allows us to
1. Deal with uncertainty in natural data around us
2. Predict events in the world with an understanding
of data and model uncertainty
• Predicting what will happen next in a scenario,
as well as effects of our actions, can only be
done if we know how to model the world around
us with probability distributions
Rolewith XAI
Simple features
Input Input Input
Input
10
Role of Probability in ML
• In neural networks (discriminative models)
1. Output is a probability distribution over y
2. Instead of error as loss function we use a
surrogate loss function, viz., log-likelihood, so that
it is differentiable (which is necessary for gradient
descent)
• In probabilistic AI (generative models)
– We learn a distribution over observed and latent
variables whose parameters are determined by
gradient descent as well
1
p(x; ) Z() p (x, ) Z() x p (x,
Introduction
What is Artificial Intelligence?
A brief history of AI
Course logistics
A brief history of AI
Course logistics
Symbolic AI programs are based on creating explicit structures and behavior rules.
A brief history of AI
Course logistics
Cognitive psychology
Aristotle
Computer engineering
Hidde
Input n Much of AI focus shifts to subfields: machine
Output learning, multiagent systems, computer vision,
natural language processing, robotics, etc
•
•
•
•
•
•
48
[email protected] Pattern Recognition & Machine Learning 48
Some parting thoughts
A brief history of AI
Course logistics
Homework/project policy:
• You may discuss homework problems with other
students, but you need to specify all students you
discuss with in your writeup
• Your writeup and code must be written entirely
on your own, without reference to notes that
you took during any group discussion
• Artificial Intelligence
• Statistics
• Continuous Optimisation
• Databases
• Information Retrieval
• Communications/Information Theory
• Signal Processing
• Computer Science Theory
• Philosophy
• Psychology and Neurobiology
…
• What is a Pattern?
– is an abstraction, represented by a set of
measurements describing a “physical” object
• Many types of patterns exist:
– visual, temporal, sonic, logical, ...
Category “A”
Category “B”
Clustering
Classification
72
[email protected] Pattern Recognition & Machine Learning 72
Development of PR
73
[email protected] Pattern Recognition & Machine Learning 73
Applications
Complex physiological
and pathological processes
Data driven machine learning
Personalised in silico medicine
Personalised FE simulation
+
SVM
检测 – 诊断 – 干预 脑电 皮肤电导
一体化智能传感技术
肌电
惯性传感器
肌肉电刺激 足压分布
改善患者生活
实现无人化护理,缓解医疗压力
[email protected] Pattern Recognition & Machine Learning 80
FOG Detection with Wearable Sensors
原
始
数
据
时
频
谱
冻
结
指
数
10 10 10
…
10 10 10
…
K 20 20 20
K 20 20 20
K K
… …
10 10 10
…
10 10 10
30 30 30 30 30 30
…
,
20 20 20
20 20 20
… … ,…, …
40 40 40 40 40 40
…
30 30 30 30
10 10 10 30 30
10 10 10
50 50 50 50 50 50
40 40 40 40
40 40
20 20 20 20 20 20
…
60
…
60
…
60 60 60 60
50 50 50
…
50 50 50
30 30 30 30 30 30
0. 5 0. 6 0. 7 0. 8 0. 9 1 1. 1. 2 1. 3 1. 4 1. 5 0. 5 0. 6 0. 7 0. 8 0. 9 1 1. 1. 2 1. 3 1. 4 1. 5 0. 5 0. 6 0. 7 0. 8 0. 9 1 1. 1. 2 1. 3 1. 4 1. 5 0. 5 0. 6 0. 7 0. 8 0. 9 1 1. 1. 2 1. 3 1. 4 1. 5 0. 5 0. 6 0. 7 0. 8 0. 9 1 1. 1. 2 1. 3 1. 4 1. 5 0. 5 0. 6 0. 7 0. 8 0. 9 1 1. 1. 2 1. 3 1. 4 1. 5
60 60 60 60
40 40 60 60
40 40 40 40
0. 5 0. 6 0. 7 0. 8 0. 9 1 1. 1. 2 1. 3 1. 4 1. 5 0. 5 0. 6 0. 7 0. 8 0. 9 1 1. 1. 2 1. 3 1. 4 1. 5 0. 5 0. 6 0. 7 0. 8 0. 9 1 1. 1. 2 1. 3 1. 4 1. 5 0. 5 0. 6 0. 7 0. 8 0. 9 1 1. 1. 2 1. 3 1. 4 1. 5
0. 5 0. 6 0. 7 0. 8 0. 9 1 1. 1. 2 1. 3 1. 4 1. 5 0. 5 0. 6 0. 7 0. 8 0. 9 1 1. 1. 2 1. 3 1. 4 1. 5
50 50 50 50 50 50
60 60 60 60 60 60
0. 5 0. 6 0. 7 0. 8 0. 9 1 1. 1. 2 1. 3 1. 4 1. 5 0. 5 0. 6 0. 7 0. 8 0. 9 1 1. 1. 2 1. 3 1. 4 1. 5 0. 5 0. 6 0. 7 0. 8 0. 9 1 1. 1. 2 1. 3 1. 4 1. 5 0. 5 0. 6 0. 7 0. 8 0. 9 1 1. 1. 2 1. 3 1. 4 1. 5 0. 5 0. 6 0. 7 0. 8 0. 9 1 1. 1. 2 1. 3 1. 4 1. 5 0. 5 0. 6 0. 7 0. 8 0. 9 1 1. 1. 2 1. 3 1. 4 1. 5
LSTM
y k flstm x k , x k 1 ,, x k nu , e(k )
Complex networks
Normal Control for Average Connectivity MCI Patient for Average Connectivity
1 1
0.9 10 0.9
10
0.8 20 0.8
20
0.7 0.7
30 30
0.6 0.6
40 40
0.5 0.5
50 50
0.4 0.4
60 60
0.3 0.3
70 70
0.2 0.2
80 80 0.1
0.1
90
10 20 30 40 50 60
Mechanism
70 80 90
0 90
10 20 30 40 50 60 70 80 90
0
• Problem:
– sort incoming fish on a conveyor
belt according to species
–Assume only two classes exist:
• Sea Bass and Salmon
Salmon
Sea-bass
2.Isolate fish.
3.Take measurements Classification
4.Make decision
“Sea Ba ss” “Salmon”
• Selecting Features
– Assume a fisherman told us that a sea bass is generally
longer than a salmon.
– We can use length as a feature and decide between sea
bass and salmon according to a threshold on length.
– How can we choose this threshold?
Even though “sea
Histograms of the bass” is longer than
length feature for “salmon” on the
two types of fish in average, there are
training samples. many examples of
How can we choose fish where this
the threshold to observation does
make a reliable not hold...
decision?
• Selecting Features
– Let’s try another feature and see if we get better
discrimination
➡Average Lightness of the fish scales
• Multiple Features
– Single features might not yield the best performance.
– To improve recognition, we might have to use more than one
feature at a time.
– Combinations of features might yield better performance.
– Assume we also observed that sea bass are typically wider than
salmon.
• Designing a Classifier
• Can we do better with another decision rule?
• More complex models result in more complex
boundaries.
DANGER OF
We may
OVER
distinguish
FITTING!!
training samples
perfectly but how
can we predict CLASSIFIER
how well we can WILL FAIL TO
generalize to GENERALIZE
unknown TO NEW
samples? DATA...
• Designing a Classifier
• How can we manage the tradeoff between complexity of decision
rules and their performance to unknown samples?
Different criteria
lead to different
decision
boundaries
Height
Width
S2 S1 R1
S1
Extract
… Features
, ,
2 1 1
Training 𝑓1 𝑓1 𝑓
Model function 𝑓2 𝑓2 1 …
𝜽 X,y ⋮ , ⋮ 𝑓
, 2
⋮
2 1
Pattern Recgonition & Machine Learning 1 99
Pattern Class
Handwritten numerals
Identical twins
• Model selection:
– Domain dependence and prior information.
– Definition of design criteria.
– Parametric vs. non-parametric models.
– Handling of missing features.
– Computational complexity.
– Types of models: templates, decision-theoretic or
statistical, syntactic or structural, neural, and
hybrid.
– How can we know how close we are to the true
model underlying the patterns?
• Designing a Classifier
• How can we manage the tradeoff between complexity of decision rules
and their performance to unknown samples?
• Supervised Training/Learning
– a “teacher” provides labeled training sets, used to train a
classifier
Triangles
2
Clas Blue Objects
sify
?
Training Set
Training Set
It’s a Triangle! 2
Clas
s ify
? 2
• Training Set
– used for training the classifier
• Testing Set
– examples not used for training
– avoids overfitting to the data
– tests generalization abilities of the trained classifiers
• Data sets are usually hard to obtain...
– Labeling examples is time and effort consuming
– Large labeled datasets usually not widely available
– Requirement of separate training and testing datasets
imposes higher difficulties...
– Use Cross-Validation techniques!
• Costs of Error
–We should also consider costs of different
errors we make in our decisions. For example,
if the fish packing company knows that:
• Customers who buy salmon will object
vigorously if they see sea bass in their cans.
• Customers who buy sea bass will not be unhappy
if they occasionally see some expensive salmon
in their cans.
• How does this knowledge affect our decision?
• Confusion Matrix
• Minimum-distance Classifiers
– based on some specified “metric” ||x-
m||
– e.g. Template Matching
• Template Matching
TEMPLATE NOISY EXAMPLES
S
• Metrics
– different ways of measuring distance:
• Euclidean metric:
– || u || = sqrt( u12 + u22 + ... + ud2 )
• Manhattan (or taxicab) metric:
– || u || = |u1| + |u2| + ... + |ud|
• Contours of constant...
– ... Euclidean distance are circles (or spheres)
– ... Manhattan distance are squares (or boxes)
– ... Mahalanobis distance are ellipses (or ellipsoids)
p(x1,x )
2
x
x2 2
6 6
4 4
1
0.7 2 2
5 5
0.5 0 x
0.2 2
1
6
5 -5 0 -2
4
0
0 -4
-5 x1
5 -6
-6 -4 -2 0 2 4 6
1 2.5
0.7 10
5 0
0.5 5
0.2 -2.5
5 -5
0
0 -5
0
5 -5 x1
10 -5 -2.5 0 2.5 5 7.5 10
Classifier
3
3 3
2 3 3 2
2 2
– Lazy Classifier 2
2
2
2
2
1 1
• no training is actually performed 1 1 3 3 1 1
– An example of Instance
X (a pattern to be classified)
Based Learning k=8
four patterns of category 1
two patterns of category 2
two patterns of category 3
• classification is at
x3x2 1 1
x3x2
1 0
the leafs of the tree x3x4x1 x3x4x1
f = x3x2 + x3x4x1