Topic 07 - Data Modelling - Part I
Topic 07 - Data Modelling - Part I
Le Ngoc Thanh
[email protected]
Department of Computer Science
2
Process
3
After preprocessing
4
Data Science Process
◎ Give the question to answer
◎ Collecting data
◎ Data Discovery & preprocessing to obtain data that can be
analyzed
◎ Data analysis (in visualizations, statistics,machine learning)
à answers (hypotheses) for the question
◎ Evaluation
◎ Decision Making
5
Data Science vs. Machine Learning
https://www.coursera.org/articles/data-science-vs-machine-learning
6
ML Tasks
7
Machine Learning Choice
◎ Before implementing the machine learning (ML) model, the data
scientist needs to identify (several) branches in ML that can solve
the given problem.
8
The course’s focus
◎ In this course, we focus on three main groups of ML:
○ Regression
○ Classification
○ Clustering
9
Contents
◎ Data science and machine learning
◎ Machine learning architecture
◎ Regression model
10
After hypothesis
◎ The job of a learning algorithm to find the best suitable
hypothesis for a problem.
11
After hypothesis
◎ To choose the suitable hypothesis, we need to define the loss
function.
12
After loss function design
◎ We are looking for what parameters to produce the lowest loss rate for given
dataset, so we need the process to optimize the function (fitting).
13
General model learning architecture
(Hypothesis)
14
Contents
◎ Data science and machine learning
◎ Machine learning architecture
◎ Regression model
○ Linear regression
○ Non-linear regression
○ Over- and Under-Determined Systems
○ Model selection
○ Overfitting
15
Regression
◎ Consider a set of n data points:
𝑥! , 𝑦! , 𝑥" , 𝑦" , 𝑥# , 𝑦# , … , (𝑥$ , 𝑦$ )
◎ Purpose:
○ Select a function f (·) and fit it to the data (curve fitting = regression)
𝐘 = 𝑓(𝐀, 𝜷)
x (size) 16
Linear regression
◎ Assume that a line is fitted through the points (hypothesis)
𝑓 𝑥 = 𝛽! 𝑥 + 𝛽"
◎ The loss function is MSE (mean-squares error)
$ $
1 "
1 "
𝐸 𝑓 = . 𝑓 𝑥% − 𝑦% = . 𝛽! 𝑥% + 𝛽" − 𝑦%
𝑛 𝑛
%&! %&!
17
Linear regression
◎ The optimization method: derivatives
◎ Generalization, the 2 × 2 system:
𝐀𝐱 = 𝐛
18
Contents
◎ Data science and machine learning
◎ Machine learning architecture
◎ Regression model
○ Linear regression
○ Non-linear regression
◉ Fit Function
◉ Gradient descent
○ Over- and Under-Determined Systems
○ Model selection
○ Overfitting
19
Nonlinear regresstion
◎ How with nonlinear regresstion? For example:
𝑓 𝑥 = 𝛽" exp(𝛽! 𝑥)
◎ The MSE function:
$
%&!
20
Contents
◎ Data science and machine learning
◎ Machine learning architecture
◎ Regression model
○ Linear regression
○ Non-linear regression
◉ Fit Function
◉ Gradient descent
○ Over- and Under-Determined Systems
○ Model selection
○ Overfitting
21
Go to downhill
22
Go to downhill
23
Go to downhill
◎ What means if direction vector is:
𝑥, 𝑦
= 𝑤ℎ𝑖𝑐ℎ 𝑤𝑎𝑦 𝑖𝑠 𝑑𝑜𝑤𝑛 𝑖𝑛 𝑥 𝑑𝑖𝑟𝑒𝑐𝑡𝑖𝑜𝑛, 𝑤ℎ𝑖𝑐ℎ 𝑤𝑎𝑦 𝑖𝑠 𝑑𝑜𝑤𝑛 𝑖𝑛 𝑦 𝑑𝑖𝑟𝑒𝑐𝑡𝑖𝑜𝑛
= [−1,1]
◎ To actually move downhill, we move to:
⇒ 𝑥!"# , 𝑦!"# = 𝑥, 𝑦 + [−1, 1]
24
Go to downhill
◎ Generally, to move in 𝑥𝑦 space toward the
minimum point, we need identify:
○ Moving direction (increase/descrease x and y)
○ Rate of change (based on slope)
⇒ It is a direction vector
25
Direction vector
◎ The derivative of a function at a specific
point gives the slope of the tangent line.
𝑓 𝑥& − 𝑓(𝑥% )
𝑓! 𝑥 = lim
"! #"" →% 𝑥& − 𝑥%
◎ Why is the tangent line considered as a
direction vector?
26
Directional derivative
◎ If you stand at some point 𝐚 = (𝑥!, 𝑦!), the slope of the ground in front of you
will depend on the direction you are facing.
◎ To calculate the slope in any direction, we derivative in this direction.
⇒ called the directional derivative.
𝐷𝐮 𝑓(𝑥!, 𝑦!)
where 𝐮 = (𝑢#, 𝑢$) is an unit vector that points in the direction in which we want
to compute the slope.
27
Gradient
◎ The gradient of 𝑓 at any point tells you:
○ a direction is the steepest from that point with respect to the 𝑥,𝑦 plane
○ how steep it is (the slope of the hill in that direction)
𝜕𝑓(𝑥, 𝑦)
𝜕𝑥 𝜕𝑓(𝑥, 𝑦) 𝜕𝑓(𝑥, 𝑦)
∇𝑓 𝑥, 𝑦 = 𝜕𝑓(𝑥, 𝑦) = 𝐱* + 𝐲̂
𝜕𝑥 𝜕𝑦
𝜕𝑦
◎ The partial derivatives give the slope in the positive 𝑥 direction
and the slope in the positive 𝑦 direction.
28
Gradient Descent
◎ As we update, we want the value
of 𝑓(𝑥, 𝑦) to decrease.
○ When it stops decreasing, (𝑥! , 𝑦! ) will
have arrived at the position giving the
minimum value of 𝑓(𝑥, 𝑦).
◎ The next position at time step 𝑡:
𝐱 -./ = 𝐱 - − ∇𝑓 𝐱 -
29
Issues: Learning rate
◎ Need to restrict the size of the steps by shrinking the direction vector
using a learning rate 𝜂, whose value is less than 1:
𝐱 %&# = 𝐱 % − 𝜂∇𝑓 𝐱 %
30
Issues: Starting point (non-linear function)
31
Momentum
32
Summary for nonlinear regression
◎ The nonlinear optimization procedure:
○ The initial guess
○ Step size 𝜂
○ Computing the gradient efficiently
33
Contents
◎ Data science and machine learning
◎ Machine learning architecture
◎ Regression model
○ Linear regression
○ Non-linear regression
○ Over- and Under-Determined Systems
○ Model selection
○ Overfitting
34
Over-determined systems
◎ Over-determined systems have
more constraints (equations) than
unknown variables.
○ No solutions satisfying the linear system.
○ Approximate solutions to minimize a given
error.
35
Under-Determined Systems
◎ Under-determined systems
have more unknowns than
constraints.
○ an infinite number of solutions.
○ some choice of constraint must be
made.
36
Contents
◎ Data science and machine learning
◎ Machine learning architecture
◎ Regression model
○ Linear regression
○ Non-linear regression
○ Over- and Under-Determined Systems
○ Model selection
○ Overfitting
37
Model Selection
◎ Model selection is not simply
about reducing error, it is about
producing a model that has a high
degree of interpretability,
generalization and predictive
capabilities.
38
Overfitting
◎ The production is too closely to a particular set of data, and may
therefore fail to fit to predict future observations reliably.
○ Overfitting does not allow for generalization.
39
40