0% found this document useful (0 votes)

11 views

Topic 07 - Data Modelling - Part I

Topic 07

Uploaded by

Sơn Nguyễn Kim

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views

Topic 07 - Data Modelling - Part I

Topic 07

Uploaded by

Sơn Nguyễn Kim

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 40

University of Science, VNU-HCM

Faculty of Information Technology

Môn Cơ Sở Trí Tuệ Nhân Tạo

Introduction to Data Science Course

Data Modeling (Part 1)

Le Ngoc Thanh
[email protected]
Department of Computer Science

Ho Chi Minh City

Contents
◎ Data science and machine learning
◎ Machine learning architecture
◎ Regression model

2
Process

3
After preprocessing

4
Data Science Process
◎ Give the question to answer
◎ Collecting data
◎ Data Discovery & preprocessing to obtain data that can be
analyzed
◎ Data analysis (in visualizations, statistics,machine learning)
à answers (hypotheses) for the question
◎ Evaluation
◎ Decision Making

5
Data Science vs. Machine Learning

https://www.coursera.org/articles/data-science-vs-machine-learning

6
ML Tasks

7
Machine Learning Choice
◎ Before implementing the machine learning (ML) model, the data
scientist needs to identify (several) branches in ML that can solve
the given problem.

8
The course’s focus
◎ In this course, we focus on three main groups of ML:
○ Regression
○ Classification
○ Clustering

9
Contents
◎ Data science and machine learning
◎ Machine learning architecture
◎ Regression model

10
After hypothesis
◎ The job of a learning algorithm to find the best suitable
hypothesis for a problem.

11
After hypothesis
◎ To choose the suitable hypothesis, we need to define the loss
function.

Machine learning = iterative procedure to find a minimum of loss for

the given data.

12
After loss function design
◎ We are looking for what parameters to produce the lowest loss rate for given
dataset, so we need the process to optimize the function (fitting).

13
General model learning architecture

(Hypothesis)

14
Contents
◎ Data science and machine learning
◎ Machine learning architecture
◎ Regression model
○ Linear regression
○ Non-linear regression
○ Over- and Under-Determined Systems
○ Model selection
○ Overfitting

15
Regression
◎ Consider a set of n data points:
𝑥! , 𝑦! , 𝑥" , 𝑦" , 𝑥# , 𝑦# , … , (𝑥$ , 𝑦$ )
◎ Purpose:
○ Select a function f (·) and fit it to the data (curve fitting = regression)
𝐘 = 𝑓(𝐀, 𝜷)

Size in feet2 (x) Price ($) in 1000's (y)

y
100 10 (price)
800 150
1534 315
852 178

x (size) 16
Linear regression
◎ Assume that a line is fitted through the points (hypothesis)
𝑓 𝑥 = 𝛽! 𝑥 + 𝛽"
◎ The loss function is MSE (mean-squares error)
$ $
1 "
1 "
𝐸 𝑓 = . 𝑓 𝑥% − 𝑦% = . 𝛽! 𝑥% + 𝛽" − 𝑦%
𝑛 𝑛
%&! %&!

17
Linear regression
◎ The optimization method: derivatives
◎ Generalization, the 2 × 2 system:
𝐀𝐱 = 𝐛

18
Contents
◎ Data science and machine learning
◎ Machine learning architecture
◎ Regression model
○ Linear regression
○ Non-linear regression
◉ Fit Function
◉ Gradient descent
○ Over- and Under-Determined Systems
○ Model selection
○ Overfitting

19
Nonlinear regresstion
◎ How with nonlinear regresstion? For example:
𝑓 𝑥 = 𝛽" exp(𝛽! 𝑥)
◎ The MSE function:
$

𝐸 𝛽! , 𝛽" = . 𝛽" exp 𝛽! 𝑥% − 𝑦% "

%&!

20
Contents
◎ Data science and machine learning
◎ Machine learning architecture
◎ Regression model
○ Linear regression
○ Non-linear regression
◉ Fit Function
◉ Gradient descent
○ Over- and Under-Determined Systems
○ Model selection
○ Overfitting

21
Go to downhill

22
Go to downhill

23
Go to downhill
◎ What means if direction vector is:
𝑥, 𝑦
= 𝑤ℎ𝑖𝑐ℎ 𝑤𝑎𝑦 𝑖𝑠 𝑑𝑜𝑤𝑛 𝑖𝑛 𝑥 𝑑𝑖𝑟𝑒𝑐𝑡𝑖𝑜𝑛, 𝑤ℎ𝑖𝑐ℎ 𝑤𝑎𝑦 𝑖𝑠 𝑑𝑜𝑤𝑛 𝑖𝑛 𝑦 𝑑𝑖𝑟𝑒𝑐𝑡𝑖𝑜𝑛
= [−1,1]
◎ To actually move downhill, we move to:
⇒ 𝑥!"# , 𝑦!"# = 𝑥, 𝑦 + [−1, 1]

24
Go to downhill
◎ Generally, to move in 𝑥𝑦 space toward the
minimum point, we need identify:
○ Moving direction (increase/descrease x and y)
○ Rate of change (based on slope)
⇒ It is a direction vector

25
Direction vector
◎ The derivative of a function at a specific
point gives the slope of the tangent line.
𝑓 𝑥& − 𝑓(𝑥% )
𝑓! 𝑥 = lim
"! #"" →% 𝑥& − 𝑥%
◎ Why is the tangent line considered as a
direction vector?

26
Directional derivative
◎ If you stand at some point 𝐚 = (𝑥!, 𝑦!), the slope of the ground in front of you
will depend on the direction you are facing.
◎ To calculate the slope in any direction, we derivative in this direction.
⇒ called the directional derivative.
𝐷𝐮 𝑓(𝑥!, 𝑦!)
where 𝐮 = (𝑢#, 𝑢$) is an unit vector that points in the direction in which we want
to compute the slope.

27
Gradient
◎ The gradient of 𝑓 at any point tells you:
○ a direction is the steepest from that point with respect to the 𝑥,𝑦 plane
○ how steep it is (the slope of the hill in that direction)

𝜕𝑓(𝑥, 𝑦)
𝜕𝑥 𝜕𝑓(𝑥, 𝑦) 𝜕𝑓(𝑥, 𝑦)
∇𝑓 𝑥, 𝑦 = 𝜕𝑓(𝑥, 𝑦) = 𝐱* + 𝐲̂
𝜕𝑥 𝜕𝑦
𝜕𝑦
◎ The partial derivatives give the slope in the positive 𝑥 direction
and the slope in the positive 𝑦 direction.
28
Gradient Descent
◎ As we update, we want the value
of 𝑓(𝑥, 𝑦) to decrease.
○ When it stops decreasing, (𝑥! , 𝑦! ) will
have arrived at the position giving the
minimum value of 𝑓(𝑥, 𝑦).
◎ The next position at time step 𝑡:
𝐱 -./ = 𝐱 - − ∇𝑓 𝐱 -

29
Issues: Learning rate
◎ Need to restrict the size of the steps by shrinking the direction vector
using a learning rate 𝜂, whose value is less than 1:
𝐱 %&# = 𝐱 % − 𝜂∇𝑓 𝐱 %

30
Issues: Starting point (non-linear function)

31
Momentum

32
Summary for nonlinear regression
◎ The nonlinear optimization procedure:
○ The initial guess
○ Step size 𝜂
○ Computing the gradient efficiently

33
Contents
◎ Data science and machine learning
◎ Machine learning architecture
◎ Regression model
○ Linear regression
○ Non-linear regression
○ Over- and Under-Determined Systems
○ Model selection
○ Overfitting

34
Over-determined systems
◎ Over-determined systems have
more constraints (equations) than
unknown variables.
○ No solutions satisfying the linear system.
○ Approximate solutions to minimize a given
error.

35
Under-Determined Systems
◎ Under-determined systems
have more unknowns than
constraints.
○ an infinite number of solutions.
○ some choice of constraint must be
made.

36
Contents
◎ Data science and machine learning
◎ Machine learning architecture
◎ Regression model
○ Linear regression
○ Non-linear regression
○ Over- and Under-Determined Systems
○ Model selection
○ Overfitting

37
Model Selection
◎ Model selection is not simply
about reducing error, it is about
producing a model that has a high
degree of interpretability,
generalization and predictive
capabilities.

38
Overfitting
◎ The production is too closely to a particular set of data, and may
therefore fail to fit to predict future observations reliably.
○ Overfitting does not allow for generalization.

39
40

DL Unit-2
No ratings yet
DL Unit-2
24 pages
Math 137
No ratings yet
Math 137
4 pages
ML Notes
No ratings yet
ML Notes
14 pages
Foundations of Machine Learning - 3
No ratings yet
Foundations of Machine Learning - 3
38 pages
Predictive Maintenance
No ratings yet
Predictive Maintenance
66 pages
[PR 2024] Lec2 Regression II
No ratings yet
[PR 2024] Lec2 Regression II
41 pages
Lecture 2
No ratings yet
Lecture 2
66 pages
2EL1730 ML Lecture02 Linear and Logistic Regression
No ratings yet
2EL1730 ML Lecture02 Linear and Logistic Regression
65 pages
Basic Machine Learning: Case Study
No ratings yet
Basic Machine Learning: Case Study
11 pages
Brief Summary ML
No ratings yet
Brief Summary ML
25 pages
Lecture3_upload
No ratings yet
Lecture3_upload
28 pages
11 Gradient Descent
No ratings yet
11 Gradient Descent
58 pages
Lecture 3 - Linear Regression
No ratings yet
Lecture 3 - Linear Regression
31 pages
Lec6 Linear Model With LSP
No ratings yet
Lec6 Linear Model With LSP
35 pages
Sms Essay 2
No ratings yet
Sms Essay 2
6 pages
Machine Learning Shortnote
No ratings yet
Machine Learning Shortnote
14 pages
5.1Loss Function, Optimization,Gd
No ratings yet
5.1Loss Function, Optimization,Gd
39 pages
(MLP) Lecture Notes
No ratings yet
(MLP) Lecture Notes
22 pages
Introduction To Machine Learning Algorithms: Linear Regression
No ratings yet
Introduction To Machine Learning Algorithms: Linear Regression
1 page
Regression Analysis
No ratings yet
Regression Analysis
54 pages
Machinelearning
No ratings yet
Machinelearning
59 pages
L02 Linear Regression
No ratings yet
L02 Linear Regression
9 pages
BITS F464 ML Lecture Notes
No ratings yet
BITS F464 ML Lecture Notes
86 pages
Lecture 1, Part 1: Linear Regression: Roger Grosse
No ratings yet
Lecture 1, Part 1: Linear Regression: Roger Grosse
9 pages
2022 Linear Regression
No ratings yet
2022 Linear Regression
34 pages
Week11_regularization and optimization
No ratings yet
Week11_regularization and optimization
75 pages
Tut04 - One Algorithm To Optimize Them All
No ratings yet
Tut04 - One Algorithm To Optimize Them All
19 pages
APSC 258 Midterm Study Guide
No ratings yet
APSC 258 Midterm Study Guide
4 pages
HODL Lec 2 Training NNs Intro TF
No ratings yet
HODL Lec 2 Training NNs Intro TF
83 pages
Linear Regression
No ratings yet
Linear Regression
29 pages
Gradient Descent Deep Learning: by T.K. Damodharan Vice President, RBS Reg - No: PC2013003013008
No ratings yet
Gradient Descent Deep Learning: by T.K. Damodharan Vice President, RBS Reg - No: PC2013003013008
37 pages
Intro To Machine Learning With PyTorch
No ratings yet
Intro To Machine Learning With PyTorch
48 pages
Linear Regression
No ratings yet
Linear Regression
9 pages
Machine Learning - SoS 2017
No ratings yet
Machine Learning - SoS 2017
15 pages
Data Science Interview Questions 30 Days 1686062665
No ratings yet
Data Science Interview Questions 30 Days 1686062665
300 pages
CS221 - Artificial Intelligence - Machine Learning - 2 Linear Regression
No ratings yet
CS221 - Artificial Intelligence - Machine Learning - 2 Linear Regression
24 pages
LFD-1
No ratings yet
LFD-1
39 pages
ML-1-PPT-UNIT-1
No ratings yet
ML-1-PPT-UNIT-1
93 pages
(Machine Learning Coursera) Lecture Note Week 1
No ratings yet
(Machine Learning Coursera) Lecture Note Week 1
8 pages
Linear Classifier: by Dr. Sanjeev Kumar Associate Professor Department of Mathematics IIT Roorkee, Roorkee-247 667, India
No ratings yet
Linear Classifier: by Dr. Sanjeev Kumar Associate Professor Department of Mathematics IIT Roorkee, Roorkee-247 667, India
86 pages
Week 1 Lecture Notes
No ratings yet
Week 1 Lecture Notes
7 pages
Neural Network - Optimization DRAFT 3.11
No ratings yet
Neural Network - Optimization DRAFT 3.11
66 pages
ML_AI
No ratings yet
ML_AI
53 pages
Eem520l3 2023
No ratings yet
Eem520l3 2023
25 pages
Module2-Optimizations
No ratings yet
Module2-Optimizations
65 pages
Lec 07-08 - Final
No ratings yet
Lec 07-08 - Final
32 pages
ML 20 04 23
No ratings yet
ML 20 04 23
19 pages
QSRI-lecture1
No ratings yet
QSRI-lecture1
45 pages
ML Session 1
No ratings yet
ML Session 1
22 pages
ML Summary PDF
No ratings yet
ML Summary PDF
5 pages
Machine Learning Notes Cs229 1
No ratings yet
Machine Learning Notes Cs229 1
217 pages
CS601 Machine Learning Unit 2 Notes 1672759753
No ratings yet
CS601 Machine Learning Unit 2 Notes 1672759753
14 pages
CS601 - Machine Learning - Unit 2 - Notes - 1672759753
No ratings yet
CS601 - Machine Learning - Unit 2 - Notes - 1672759753
14 pages
Anuranan Das Summer of Sciences, 2019. Understanding and Implementing Machine Learning
No ratings yet
Anuranan Das Summer of Sciences, 2019. Understanding and Implementing Machine Learning
17 pages
ML-1
No ratings yet
ML-1
24 pages
Module 3
No ratings yet
Module 3
27 pages
Machine Learning Summary
No ratings yet
Machine Learning Summary
38 pages
Machine Learning Lecture 1
No ratings yet
Machine Learning Lecture 1
5 pages
Essentials of Linear Regression in Python
No ratings yet
Essentials of Linear Regression in Python
23 pages
Line Drawing Algorithm: Mastering Techniques for Precision Image Rendering
From Everand
Line Drawing Algorithm: Mastering Techniques for Precision Image Rendering
Fouad Sabry
No ratings yet
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Computer_vision_part2
No ratings yet
Computer_vision_part2
62 pages
AIO2024 Module02 Extra SQL Big Data
No ratings yet
AIO2024 Module02 Extra SQL Big Data
94 pages
Topic 03 - Basic Statistics
No ratings yet
Topic 03 - Basic Statistics
42 pages
Cylindrospermopsin Synthesis
No ratings yet
Cylindrospermopsin Synthesis
8 pages
Daphniphyllum Alkaloids Final MDP
No ratings yet
Daphniphyllum Alkaloids Final MDP
15 pages
Topic 02 - Data Collection
No ratings yet
Topic 02 - Data Collection
44 pages
03a-GP Organomet Cat
No ratings yet
03a-GP Organomet Cat
40 pages
Teruaki Mukaiyama - : Y. Ishihara Baran Lab Group Meeting
No ratings yet
Teruaki Mukaiyama - : Y. Ishihara Baran Lab Group Meeting
9 pages
MA 2261 Probability and Random Processes
No ratings yet
MA 2261 Probability and Random Processes
4 pages
Course Syllabus Business Calculus MAT 135 Instructor: Dr. P. Rahman Quarter: Spring 2012 Prerequisite
No ratings yet
Course Syllabus Business Calculus MAT 135 Instructor: Dr. P. Rahman Quarter: Spring 2012 Prerequisite
5 pages
A. Finding Roots of The Equations:: (Algorithm
No ratings yet
A. Finding Roots of The Equations:: (Algorithm
2 pages
Amc 12
No ratings yet
Amc 12
6 pages
Weighting Grid-Boxes On Earth: Constant Latitude-Longitude Spacing, This Implies Grid Area Is Everywhere
No ratings yet
Weighting Grid-Boxes On Earth: Constant Latitude-Longitude Spacing, This Implies Grid Area Is Everywhere
1 page
Marginal Analysis For Optimal Decisions
100% (1)
Marginal Analysis For Optimal Decisions
13 pages
Absolute Value Inequalities
No ratings yet
Absolute Value Inequalities
2 pages
Ex 5
No ratings yet
Ex 5
4 pages
Introduction
No ratings yet
Introduction
4 pages
(Ebook) A Graduate Course in Algebra, Volume 1 by Ioannis Farmakis, Martin Moskowitz ISBN 9789813142633, 9813142634 All Chapters Instant Download
100% (2)
(Ebook) A Graduate Course in Algebra, Volume 1 by Ioannis Farmakis, Martin Moskowitz ISBN 9789813142633, 9813142634 All Chapters Instant Download
81 pages
Numerical Analysis
No ratings yet
Numerical Analysis
2 pages
Problem 3.51
No ratings yet
Problem 3.51
1 page
MTH202 Mcqs by Moaz Downloaded From Vurank
No ratings yet
MTH202 Mcqs by Moaz Downloaded From Vurank
32 pages
Eulers Formula and Trig Identities
No ratings yet
Eulers Formula and Trig Identities
5 pages
NN PDF
No ratings yet
NN PDF
48 pages
Centroid
100% (2)
Centroid
46 pages
Standard Deviation
100% (1)
Standard Deviation
3 pages
Numerical Study of Two-Dimensional Transient Heat Conduction Using Finite Element Method
No ratings yet
Numerical Study of Two-Dimensional Transient Heat Conduction Using Finite Element Method
10 pages
Math04 CO3 SY20222023
No ratings yet
Math04 CO3 SY20222023
64 pages
9 - Multiplication of Matirces
No ratings yet
9 - Multiplication of Matirces
6 pages
1st Periodical Exam - General Mathematics
No ratings yet
1st Periodical Exam - General Mathematics
5 pages
Grade 9 Module 1
No ratings yet
Grade 9 Module 1
12 pages
Kuttler LinearAlgebra AFirstCourse 2021A (020 040)
No ratings yet
Kuttler LinearAlgebra AFirstCourse 2021A (020 040)
21 pages
Dhun Nishchal 050684 MathIA
No ratings yet
Dhun Nishchal 050684 MathIA
23 pages
Fourier Series MP Career Endeavour Notes
No ratings yet
Fourier Series MP Career Endeavour Notes
9 pages
12th Maths Preboard-1 2021
No ratings yet
12th Maths Preboard-1 2021
7 pages
MAP Booklet G 4+5 Answers
No ratings yet
MAP Booklet G 4+5 Answers
10 pages
Second Order Differential Equations With Variable Coefficients (Euler-Cauchy Equation) (A) Homogeneous Euler-Cauchy Equation
No ratings yet
Second Order Differential Equations With Variable Coefficients (Euler-Cauchy Equation) (A) Homogeneous Euler-Cauchy Equation
6 pages
Course Outline
No ratings yet
Course Outline
7 pages

Topic 07 - Data Modelling - Part I

Uploaded by

Topic 07 - Data Modelling - Part I

Uploaded by

University of Science, VNU-HCM

Faculty of Information Technology

Môn Cơ Sở Trí Tuệ Nhân Tạo

Data Modeling (Part 1)

Ho Chi Minh City

Machine learning = iterative procedure to find a minimum of loss for

Size in feet2 (x) Price ($) in 1000's (y)

𝐸 𝛽! , 𝛽" = . 𝛽" exp 𝛽! 𝑥% − 𝑦% "

You might also like