CE6146 Lecture 1
CE6146 Lecture 1
• Basic Concepts
2
Intended Learning Outcomes
12
What are Learning Tasks
針對特定的問題或是有⽬的性的演算法
• Learning tasks are specific problems or objectives that an algorithm is
designed to solve through learning from data.
• Learning tasks form the core of what algorithms aim to achieve, whether in
machine learning or deep learning.
• Learning tasks guide the design and implementation of models, determining
how algorithms are trained, validated, and deployed in real-world
applications. 驅動整個演算法的規劃與設計
13
Types of Learning Tasks
分類 回歸分析
可轉換 聚類分析
Classification Regression Clustering
有標籤,能分成不同類別 有標籤,所以可以分析所有數值 沒有任何標籤
The task of assigning input The task of predicting a The task of grouping
data into predefined continuous numerical value similar instances without
categories or classes. based on input data. predefined labels.
將相似的個體⾃動分群在⼀起
可轉換
Source: https://medium.datadriveninvestor.com/problems-with-classification-examples-from-real-life-645b7b756e96 14
What is Machine Learning Paradigms
15
Types of Machine Learning Paradigms
監督式學習 非監督式學習 強化式學習
Supervised Learning Unsupervised Learning Reinforcement Learning
Learning from labeled data Learning from unlabeled data Learning by interacting with
不需要有標籤帶入
(input-output pairs). to find hidden patterns. an environment to maximize
有標籤的Data 沒有標籤的Data cumulative reward.
透過動態的環境不斷重複地互動,在
沒有⼈類⼲預的情況下學習
i
• Vectors are 1D arrays of numbers that represent features of data (e.g., pixel
intensities in an image or values in a dataset).
• Matrices are 2D arrays of numbers that represent transformations (e.g.,
weight matrices that map input features to outputs in a neural network layer).
[[x1,y1],[x2,y2]]
22
Matrix Operations in DL
合併
矩陣加法 -> 合併bias到input中 乛
25
Calculus in DL
梯度 微分
乛
• Calculus is used for optimizing neural networks. Specifically, derivatives and
gradients help determine how model parameters should change to minimize
the loss function.
Rate
• Key Operations: 乛
Figure 4.3 in Deep Learning by Ian Goodfellow, Yoshua Bengio, and Aaron
Courville.
Figure 4.1 in Deep Learning by Ian Goodfellow, Yoshua Bengio, and Aaron
Courville.
29
Backpropagation and Chain Rule
• Backpropagation uses the chain rule to compute gradients for all parameters
in a neural network. It allows us to efficiently compute the derivatives of the
loss function with respect to each parameter in the network.
• The chain rule states:
𝑑
𝑓 𝑔 𝑥 = 𝑓′ 𝑔 𝑥 ∙ 𝑔′(𝑥)
𝑑𝑥
In a neural network, this rule is applied layer by layer to propagate the error back from
the output to the input, updating the weights accordingly.
30
Probability and Statistics in DL
量化
• A random variable is a variable that can take different values based on some
random process or experiment. In deep learning, we often treat model outputs
(predictions) as random variables. In classification, the predicted class probabilities
from a softmax output can be treated as random
• Types of Random Variables: variables that follow a discrete distribution.
H 𝑋 = − 𝑃 𝑥𝑖 𝑙𝑜𝑔𝑃(𝑥𝑖 )
𝑖
• Kullback–Leibler (KL) Divergence measures how one probability distribution
differs from another reference distribution:
Often used in Variational Autoencoders (VAEs) and model regularization.
𝑃(𝑥𝑖 )
D𝐾𝐿 𝑃||𝑄 = 𝑃 𝑥𝑖 𝑙𝑜𝑔
𝑄(𝑥𝑖 )
𝑖
37
Building and Evaluating Deep Learning Models
Flow for Constructing a DL Model
• Data Collection:
‐ Collect data relevant to the problem you are solving (e.g., images, text, structured data).
‐ Ensure that the data is representative of the real-world scenarios the model will encounter.
• Data Preprocessing:
‐ Cleaning the Data: Handle missing values, remove duplicates.
‐ Normalization/Standardization: Ensure input features have the same scale (important for
neural networks).
‐ Augmentation (for images): Apply transformations like rotations, flips, and scaling to
artificially expand the dataset.
40
Dataset Splitting
Train Val Test
• Why Split the Dataset?
To prevent the model from learning patterns specific to the training data and ensure it
can generalize to unseen data.
• Dataset Splitting:
‐ Training Set: Used to train the model (typically 70%-80% of the data).
‐ Validation Set: Used to tune hyperparameters and monitor overfitting (typically 10%-
15%).
‐ Test Set: Used to evaluate the model’s final performance on unseen data (typically
10%-15%).
41
Model Design
42
Model Training
• Hyperparameters:
‐ Learning Rate: Controls how much to adjust the model weights with each training step.
‐ Batch Size: Determines how many samples are processed before the model updates its
weights.
43
Overfitting and Underfitting
• Overfitting: 過度訓練
‐ Occurs when the model performs well on the training set but poorly on the
validation/test set.
‐ Symptoms: Training accuracy continues to improve, but validation accuracy stagnates
or decreases.
• Underfitting:
‐ Occurs when the model performs poorly on both the training and validation sets.
‐ Symptoms: Low training and validation accuracy.
44
Overfitting and Underfitting
Figure 5.2 in Deep Learning by Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 45
Overfitting and Underfitting
實際上應該是⼀個區間
Figure 5.3 in Deep Learning by Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 46
Bias-Variance Tradeoff
⼀般來說,模型越複雜可
以有越精準的bias,因為他
可以調整的細節更多
太分散
Source: https://medium.com/@ivanreznikov/stop-using-the-same-image-in-bias-variance-trade-off-explanation-691997a94a54 48
Model Evaluation
51
Evaluating Classification Models (3/3)
• Let X be the predicted values with positive label, and Y with negative label.
• TPR(c) = Pr(X > c)
• FPR(c) = Pr(Y > c)
1
• AUC = ROC (t )dt
0
1 𝑛𝑥 𝑛𝑦
• 𝐴 𝑈𝐶 = σ σ 𝐼(𝑋𝑖 > 𝑌𝑗 )
𝑛𝑥 𝑛𝑦 𝑖=1 𝑗=1
52
Evaluating Regression Models
53
Evaluating Unsupervised Learning Models
54
Preventing Overfitting
• Regularization Techniques:
‐ L2 Regularization: Adds a penalty for large weights in the model.
‐ Dropout: Randomly drops a percentage of neurons during training to reduce
dependency on specific neurons.
55
Frameworks for DL: Keras and PyTorch
• Keras
• PyTorch
Introduction to Keras and PyTorch
• Deep learning frameworks like Keras and PyTorch allow developers to focus
on model design and training without dealing with low-level matrix
operations or gradient computation.
• They provide pre-built functionalities for creating neural networks,
optimizing them, and handling large datasets efficiently.
57
Keras
58
Keras Key Functionalities
• Loss Functions and Optimizers: Built-in loss functions (e.g., MSE, cross-
entropy) and optimizers (e.g., SGD, Adam).
• Callbacks: Customizable training callbacks for early stopping, learning rate
scheduling, etc.
• Data Handling: In-built support for image, text, and sequence data.
59
PyTorch Overview
• Tensors are the core data structure in both Keras and PyTorch, representing
multi-dimensional arrays.
• Tensors enable matrix operations and hold gradients for backpropagation.
61
Key Differences Between Keras and PyTorch
• Keras vs PyTorch:
‐ Ease of Use: Keras is user-friendly, suited for beginners, and focuses on rapid
prototyping.
‐ Flexibility: PyTorch offers more flexibility, making it preferred for research, where
dynamic changes to the computation graph are required.
• Computation Graph:
‐ Keras uses static computation graphs (compiled before training).
‐ PyTorch uses dynamic computation graphs (created on the fly).
62
Q&A