Ai - Foundations of Machine Learning I
Ai - Foundations of Machine Learning I
2
• Learning is an important characteristic of any living system:
organisms, from amoebas to humans, organisms adapt to the
environment, trying to survive, they learn.
3
• One of the earliest definitions of Machine Learning is due to
Arthur Samuel (1959):
“the field of study that gives computers the ability to learn without
being explicitly programmed”
• Important concepts:
E: Experience
P: Performance
T: Task
4
• ML allows to simplify enormously the process of building artificial
decision systems.
Human intervenHon
NO
YES
IMPLEMENT
5
• While in ML the flow would be similar to this:
Machine intervenHon
update NO
YES
IMPLEMENT
6
• Note also that the Expert Systems approach can be unfeasible
when there are no clear “rules” or associations between input
and output: it is very easy to tell whether some particular
music likes us or not but very complicate or even impossible
to explain the reasons why.
7
• Examples of ML (Mitchell, 1997):
8
• A robot driving learning problem:
• Task T: driving on public four-lane highways using vision
sensors
• Performance measure P: average distance traveled before an
error (as judged by human overseer)
• Training experience E: a sequence of images and steering
commands recorded while observing a human driver
9
• The power of Machine learning lies in the fact that it makes
computers to program themselves.
10
• Any ML algorithm has three basic components:
11
• In terms of evaluation we have also dozens of performance
measures: Accuracy, Precision and Recall, Mean Suared Error,
Absolute Error, Confusion Matrices, Likelihood, Expected Utility,
Entropy, Kullback-Leibler Divergence, Mahalanobis Distance…
12
• Combinations are explosive, we can employ different algorithms or
performance measures under the same model, this makes the
possible combinations almost unlimited:
• ….
13
• Most of the uses of ML are predictive in the sense that we want to
know what will happen in the future or what will be the outcome of
some particular action; examples are:
14
• Both interests do not necessarily fit into any particular kind of
learning but it is common that prediction employs supervised
algorithms (defined latter) and description employs unsupervised
(or semi supervised) algorithms (also defined latter).
15
• Since there is a close relationship between ML and Statistics, it is very
useful to have a good knowledge of the formal methods employed in
Statistics (and also Mathematics).
Learning = Estimation
Network = Model
17
• In terms of the degree of intervention of an external “teacher” we
can diferentiate among three kinds of learning: “Supervised”,
“Unsupervised” and “Reinforced”.
• This means that we have to know, in advance, which are the correct
labels for a set of examples that are used to build the model.
TRAINING PHASE
MODEL
0
1
2
3
4
5
6
7
8
9
PREDICTION PHASE
MODEL
2
20
• The predictor variables (also called independent variables or
features) allow to determine the responses (also called the
dependent variables)
Y: responses
X: independent variables
21
• After the optimal parameters are found, the model is generally
used to forecast unseen examples X’
Ŷ = fΘ̂ (X ')
performance = g(Ŷ ,Y )
22
• One important problem in supervised learning is dimensionality
reduction which consists on the process of discovering the
explanatory variables that account for the greatest changes in the
response variable.
• T
he problem of transforming raw data into a usable dataset is
called feature engineering and many times is a labor-intensive
process that demands time –and skills- from the data analyst .
23
• Supervised learning can be also employed in regression problems
where the response is not restricted to belong to a finite set of
labels.
24
• In some other regression examples, the response variable is not
limited to a finite number of classes but it can only take some
bounded values.
• In this case:
• Note also that supervised problems are not perfectly defined, for
example, in the case of credit scoring the output could be a rating
(unbounded), a class (default/non-default), or a probability ([0,1]).
25
• Some examples of the techniques that are used in supervised
learning are the following:
• Linear Regression
• Logistic Regression
• k-Nearest Neighbors
• Decision Trees
• Support Vector Machines (SVMs)
• Feedforward Neural Networks
26
• In Unsupervised Learning we do not know the labels of the
examples X and the objective is to clasify them into K categories
where K can be a specified or unespecified number.
27
28
• Note that in unsupervised learning the system tries to learn
without a “teacher”.
• After the typologies have been established, firms may target each
of the groups with specific marketing strategies.
29
• Unsupervised learning is widely employed in dimensionality
reduction i.e. problems where we want to have a simplified version
of the data.
30
• Note that, in some cases, the problem can be considered as
supervised or unsupervised depending on the objectives or the
availability of data.
• K-Means
• DBSCAN
• Hierarchical Cluster Analysis (HCA)
• Isolation Forest
• Principal Component Analysis (PCA)
• Locally-Linear Embedding (LLE)
32
• Finally, there exists an approach that is in some sense a mixture
of supervised and unsupervised learning and which is refered as
semi-supervised learning.
33
• Semi-supervised learning essentially consists on two steps:
2. In the second step, after one of the examples of each of the
groups is labeled, then the rest of the remaining examples
are also labeled
Step 1 Step 2
34
• In reinforcement learning the objective consists on using
observations gathered from the interaction with the environment
and taking actions that maximize some reward.
• The agent obtains some rewards in return (or penalties in the form
of negative rewards).
• The agent must learn by itself (changing its state) what is the best
strategy -called a policy- to get the most reward over time.
35
• T
he action would be optimal if it maximizes the expected
average reward.
AGENT
ENVIRONMENT
36
• Reinforced Learning differs signifficantly from the other two
paradigms in several aspects but the most important is that
rewards are delayed: past performance affects future estate of
the system (other architectures share this characteristic though,
e.g. recurrent networks).
• Q-Learning
• State-Action-Reward-State-Action (SARSA)
• Deep Q Network (DQN)
• Dyna-Q
• Deep Deterministic Policy Gradient (DDPG)
• Twin Delayed Deep Deterministic Policy Gradients (TD3)
37
• Note that learning can happen incrementally (the algorithm acts
and learns at the same time, as we humans do) or sequentially.
• In the first case we say that the algorithm learns in an online mode
while in the second we refer it as a batch learning or offline
learning.
38
• In general, both kinds of learning are compatible and the use of
one or another depends on the graduality in the changes
introduced in the training data as well as the speed needed to
produce a recall.
39