Lecture 3 - Machine learning and data driven analysis
Lecture 3 - Machine learning and data driven analysis
❑ Machine learning is a
artificial intelligence method of achieving
artificial intelligence by
machine learning learning from data
The definition is broad, and the Compared with the traditional It belongs to the further
concept first appeared, including one, it has a data-based development of machine learning
traditional expert systems, learning function. The and generally refers to the bionic
traditional automatic voice commonly used models include neural network model, which has
customer service, recovery kNN , SVM , etc., but the ability more powerful functions and can
according to hard-coded programs, to process complex data such more effectively process
and no ability to learn as images and voices is limited. unstructured data such as images
independently and voices, but requires higher
computing power
Every detail needs to be designed development line Less human design and intervention
in advance
How do machines learn?
Every detail needs to be designed development line Less human design and intervention
in advance
Training Predicting
(data) (output)
What is Machine Learning
Machine learning (ML) is the study of computer algorithms that
improve automatically through experience. Machine learning
algorithms build a model based on sample data, known as "training
data", in order to make predictions or decisions without being
explicitly programmed to do so
What is Machine Learning
𝑌 = 𝑓(𝑋1 , 𝑋2 , … , 𝑋𝑛 )
In summary, machine learning is an essential tool in the toolbox of data-driven analysis, allowing for the
extraction of valuable information and insights from data. By leveraging the power of machine learning,
data-driven analysis can lead to more informed decisions, better predictions, and increased efficiency in
various domains.
Examples of ML methods Machine Learning
Supervised
Learning Unsupervised
Learning
● Linear
● Support Vector
Regression ● K-Means, K-Medoids
Machine
● SVR, GPR Fuzzy C-Means
● Discriminant
● Ensemble ● Gaussian Mixture
Analysis
Methods ● Neural Networks
● Naive Bayes
● Decision Trees ● Hidden Markov
● Nearest
● Neural Model
Neighbor
Networks
Build a simplest machine learning/data-driven model: linear
regression
Regression, one of the most established
supervised learning approaches
A brief history of the term “regression”
The method of least squares, which was first introduced by Legendre in
1805 and later by Gauss in 1809, was the earliest form of regression. They
both applied this method to determine the orbits of celestial bodies around
the sun, particularly comets and newly discovered minor planets. Gauss
further developed the theory of least squares in 1821, which also included
his version of the Gauss-Markov theorem.
Francis Galton coined the term "regression" in the 19th century to describe
the biological phenomenon of the tendency for the heights of descendants
of tall ancestors to regress down towards a normal average, also known as
regression toward the mean. Initially, Galton's work only focused on this
biological meaning of regression, but Udny Yule and Karl Pearson later
extended it to a more general statistical context. In the work of Yule and
Pearson, the joint distribution of the response and explanatory variables is
assumed to be Gaussian distribution. British biologist and
statistician
Galton (Francis Galton ,
Regression examples: find the relation between
input X and target/output Y
Regression problems are usually used to predict a
value, such as predicting prices, temperature, etc.
For example, the actual price of a product is 500
yuan, and the predicted value through regression
analysis is 499 yuan. We think this is a relatively
good regression analysis.
Height (cm)
life expectancy
Age
No. of variables
Univariate:𝑦 = 𝑓(𝑥)
Multivariate:𝑦 = 𝑓(𝑥1 , 𝑥2 ⋯ 𝑥𝑛 )
Regression
Linear:𝑦 = 𝑎𝑥 + 𝑏
Function
Non-linear:𝑦 = 𝑎𝑥 2 + 𝑏𝑥 + 𝑐
Linear regression: the simplest algorithm
• The simplest linear regression algorithm is univariate linear
regression. The sample data corresponding to the problem it
solves has only one feature/input.
𝑦 = 𝑎𝑥 + 𝑏
𝑦 = 𝑎𝑥 + 𝑏
x y y'1 y'2
1 1 0.5 4
2 2 1 3
3 3 1.5 2
1 𝑚 1
𝐽1 = σ𝑖=1(𝑦1′ 2
− 𝑦) = × 0.5 − 1 2 + 1−2 2 + 1.5 − 3 2 = 1.166
𝑚 3
1 𝑚 1
𝐽2 = σ𝑖=1(𝑦2′ − 𝑦)2 = × 4−1 2 + 3−2 2 + 2−3 2 = 3.66
𝑚 3
How to find a and b in a regression problem
1. Conventional math analytic approach (least squares)
Time Production
20 195
30 305
50 480
60 580
𝐽 = 𝑓 𝑝 = 3.5𝑝2 − 14𝑝 + 14
The gradient descend approach
𝑦 = 𝑎x + b 1
Minimize 𝐽 𝑎, b = σ(ොy− 𝑦)2
y 𝑛
𝜕
𝑝𝑖+1 = 𝑝𝑖 − a 𝐽(𝑝𝑖 )
𝜕𝑝𝑖
Step-by-step approach
x
𝜕 2
𝑎𝑛𝑒𝑤 = 𝑎𝑜𝑙𝑑 − a 𝐽 𝑎, 𝑏 = 𝑎𝑜𝑙𝑑 − a (𝑎𝑜𝑙𝑑 𝑥𝑖 + 𝑏𝑜𝑙𝑑 − 𝑦𝑖 )𝑥𝑖
𝜕𝑎 𝑛
𝜕 2
𝑏𝑛𝑒𝑤 = 𝑏𝑜𝑙𝑑 − a 𝐽 𝑎, 𝑏 = 𝑏𝑜𝑙𝑑 − a (𝑎𝑜𝑙𝑑 𝑥𝑖 + 𝑏𝑜𝑙𝑑 − 𝑦𝑖 )
𝜕𝑏 𝑛
Gradient descend example
𝐽 𝑝 = 3.5𝑝2 − 14𝑝 + 14
𝑝𝑖 = 0.5, 𝛼 = 0.01
𝑝𝑖+1 =? ?
𝜕 𝜕
𝑓 𝑝𝑖 = 7𝑝 − 14 𝑝𝑖+1 = 𝑝𝑖 −𝛼 𝐽 𝑝𝑖
𝜕𝑝𝑖 𝜕𝑝𝑖
𝜕 = 0.5 + 0.105
𝑓 𝑝𝑖 = −10.5 = 0.605
𝜕𝑝𝑖
Slowly approach(p=2)
Illustration of solving linear regression
𝐶𝑜𝑠𝑡 = 𝐽 𝑎, 𝑏 𝑦 = 𝑎𝑥 + 𝑏
With the same example,
Time Production please select a learning rate
20 195 and using gradient descent to
update a and b of a linear
30 305 regression model
50 480
60 580 𝜕
𝑝𝑖+1 = 𝑝𝑖 − a 𝐽(𝑝𝑖 )
𝜕𝑝𝑖
Multivariate linear regression
𝑦 = 𝑎1 𝑥1 + 𝑎2 𝑥2 + ⋯ + 𝑎𝑛 𝑥𝑛 + 𝑏
https://www.youtube.com/watch?v=2spTnAiQg4M&t=159s
Multivariate linear regression
𝑦 = 𝑎1 𝑥1 + 𝑎2 𝑥2 + ⋯ + 𝑎𝑛 𝑥𝑛 + 𝑏
We can solve it using matrix calculation approach by transforming the equation into Y = XA
𝑥11 ⋯ 𝑥1𝑛 𝑎1 𝑦1
1
𝑋= ⋮ ⋱ ⋮ 𝐴= ⋮ 𝑌= ⋮
⋮ 𝑎𝑛
𝑥𝑚1 ⋯ 𝑥𝑚𝑛 1 𝑦𝑚
𝑏
• X is an m × (n+1) matrix where the last column is all ones (for the intercept b term), the
remaining columns represent the n independent variables, and m is the number of instances
in the dataset.
• A is an (n+1) × 1 matrix (column vector) of the coefficients, including the intercept b as the
first element of this vector.
• Y is an m × 1 matrix (column vector) representing the dependent variable.
Then A is solved in the following procedures
The residuals are given by the difference between the observed values and the predicted values, which
is:
R = Y - XA
The sum of squared residuals is given by the dot product of R with itself, which is:
S = RᵀR = (Y - XA) ᵀ (Y - XA)
Here ᵀ denotes the transpose of a matrix.
To find the matrix A that minimizes S, we take the derivative of S with respect to A, set it to zero, and
solve for A. The derivative of S with respect to A is:
dS/dA = -2Xᵀ (Y - XA)
Setting this to zero and solving for A gives:
0 = -2Xᵀ (Y - XA)
XᵀY = XᵀXA
(XᵀX)A = XᵀY
Assuming that (XᵀX) is invertible, we can multiply both sides by the inverse of (XᵀX) to get:
A = (XᵀX)^-1 XᵀY
Similar strategy can be applied to second-order
nonlinear regression model
𝑦 = 𝑎𝑥 2 + 𝑏𝑥 + 𝑐
Here we could consider 𝑥 2 as an input variables. This equation
can be written in matrix form as:
Y = XA
where:
• A is a column vector of the coefficients given by [a, b, c]ᵀ.
• X is a column vector of the variables given by [x²,x,1].
In class exercise: model the relationship between fuel flow rate
and output power in a gas turbine-driven generator
Fuel flow rate Power Please build both linear regression model
(𝑦 = 𝑎x + 𝑏)
1.0 20 and the second-order nonlinear models
2.0 45 (𝑦 = 𝑎𝑥 2 + 𝑏𝑥 + 𝑐)
3.0 55 to fit the provided data and compare
their performance using the matrix
4.0 75
approach:
A = (XᵀX)^-1 XᵀY
Fuel flow Power Linear Nonlinear
rate model yො model yො
1.0 20 22.5 21.25
2.0 45 40 41.25
4.0 75 75 73.75