0% found this document useful (0 votes)

20 views63 pages

CE6146 Lecture 1

Uploaded by

tony910313

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views63 pages

CE6146 Lecture 1

Uploaded by

tony910313

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 63

CE6146

Introduction to Deep Learning

Deep Learning Basics
Chia-Ru Chung
Department of Computer Science and Information Engineering
National Central University
2024/9/19
Outline

• Basic Concepts

• Essential Mathematics for Deep Learning

• Building and Evaluating Deep Learning Models

• Frameworks for Deep Learning: Keras and PyTorch

2
Intended Learning Outcomes

By the end of this lecture, you will be able to:

• Understand key terminologies and differentiate between Artificial Intelligence,
Machine Learning, and Deep Learning.
• Identify different learning tasks and machine learning paradigms.
• Apply essential linear algebra, probability, and numerical computation concepts in
deep learning.
• Explore basic machine learning concepts relevant to deep learning.
• Understand the basic architecture and features of Keras and PyTorch frameworks.
3
Basic Concepts
• Key Terminologies
• What is Artificial Intelligence, Machine Learning, and Deep
Learning
• Learning Tasks
• Machine Learning Paradigms
Key Terminologies
⼀個Data 可能包含Label、Feature

• Data: Information used for analysis, training, and testing models.

• Label: The output or target value used in supervised learning. 監督式學習才會有label
• Feature: An individual measurable property or characteristic of the data being analyzed.
• Model: An algorithm or mathematical representation that makes predictions or decisions
based on data. ⼜稱為Black box
• Training Dataset: The subset of data used to train the model.
超參數
• Validation Dataset: A portion of the data used to tune the model's hyperparameters and
prevent overfitting. 避免過度訓練
• Test Dataset: Data that is completely unseen by the model and used to assess its final
performance. ⼜稱為independent dataset
training Testing Validation
5
What is Artificial Intelligence (AI)
AI includes any technique that enables computers to
mimic human intelligence, including rule-based
• Definition: systems, machine learning, and deep learning.

Artificial Intelligence (AI) is the simulation of human intelligence processes

by machines, especially computer systems. Rule-base 可以清楚定義規則
• Key Characteristics:
The ability to learn from data and experiences, and
‐ Learning improve over time.
The ability to make decisions and solve problems
‐ Reasoning
based on available information.
‐ Self-Correction The ability to recognize and correct errors to
improve performance.
6
What is Machine Learning (ML)
A subset of AI where models learn patterns from data
• Definition: without being explicitly programmed for each task.

Machine learning is a subset of artificial intelligence focused on building

systems that learn from data to make decisions or predictions.
沒有辦法有很明確、很清晰的規則，需要⼈為介入
• Key Characteristics:
‐ Data-driven Models learn from data rather than being explicitly
programmed with rules.
‐ Adaptable
Models can adapt and improve over time with more data.
‐ Predictive Used for making predictions or decisions based on input data.
7
What is Deep Learning (DL)
A specialized branch of machine learning where multi-layered neural
networks (deep networks) automatically learn hierarchical representations
• Definition: from raw data, often requiring no manual feature engineering.
Deep learning is a subset of machine learning that uses neural networks with
many layers (deep networks) to model complex patterns in data.
更有能⼒去處理非結構性資料，
• Key Characteristics: 不需要⼈為介入給定規則或特徵

‐ Automated Feature Extraction Neural networks automatically learn features from

raw data.
‐ Scalability Effective with large datasets and complex problems.
‐ High Performance Achieves state-of-the-art results in various tasks like
image recognition and natural language processing.
8
Difference Between ML and DL
Aspect Machine Learning Deep Learning

Feature Extraction Manual Automated

提取⼿動提取特徵⾃動提取特徵
Model Complexity
Simpler models Complex neural networks

Data Requirements Smaller datasets Large datasets

因為有結構，所以容易學習訓練
拓展性差 Good for structured data, Excels with unstructured
r
Performance
limited scalability data, scalable
Linear Regression, Decision Convolutional Neural
Example Algorithms
Trees, Random Forest Networks CNN 9
10

Source: 1. https://reurl.cc/yLq3Q6 2. https://reurl.cc/8vkgpb 3. https://reurl.cc/jWAn1L

Source: https://www.youtube.com/shorts/CxT5DVZmWCU 11
Key Applications of Deep Learning

• Image Recognition: Identifying objects within images. Computer vision

• Natural Language Processing (NLP): Understanding and generating human

language.
• Autonomous Systems: Self-driving cars, drones. ⾃駕⾞

• Healthcare: Disease detection from medical images, personalized medicine.

• Finance: Fraud detection, algorithmic trading.

12
What are Learning Tasks
針對特定的問題或是有⽬的性的演算法
• Learning tasks are specific problems or objectives that an algorithm is
designed to solve through learning from data.
• Learning tasks form the core of what algorithms aim to achieve, whether in
machine learning or deep learning.
• Learning tasks guide the design and implementation of models, determining
how algorithms are trained, validated, and deployed in real-world
applications. 驅動整個演算法的規劃與設計

13
Types of Learning Tasks
分類回歸分析
可轉換聚類分析
Classification Regression Clustering
有標籤，能分成不同類別有標籤，所以可以分析所有數值沒有任何標籤
The task of assigning input The task of predicting a The task of grouping
data into predefined continuous numerical value similar instances without
categories or classes. based on input data. predefined labels.
將相似的個體⾃動分群在⼀起

可轉換
Source: https://medium.datadriveninvestor.com/problems-with-classification-examples-from-real-life-645b7b756e96 14
What is Machine Learning Paradigms

• Machine learning paradigms refer to the different approaches or

methodologies used to train machine learning models.
• Machine learning paradigms specify how the model learns from data and how
it is guided to make predictions or decisions.
1 2
• The three primary paradigms are supervised learning, unsupervised
3
learning, and reinforcement learning.

15
Types of Machine Learning Paradigms
監督式學習非監督式學習強化式學習
Supervised Learning Unsupervised Learning Reinforcement Learning
Learning from labeled data Learning from unlabeled data Learning by interacting with
不需要有標籤帶入
(input-output pairs). to find hidden patterns. an environment to maximize
有標籤的Data 沒有標籤的Data cumulative reward.
透過動態的環境不斷重複地互動，在
沒有⼈類⼲預的情況下學習

Source: 1. https://botpenguin.com/glossary/supervised-learning 2. https://botpenguin.com/glossary/unsupervised-learning 3. https://botpenguin.com/glossary/reinforcement-learning 16

Source: https://doi.org/10.3389/fphar.2021.720694 17
Source: https://www.youtube.com/watch?v=6M5VXKLf4D4 18
Essential Mathematics for Deep Learning
• Linear Algebra
• Probability and Information Theory
• Numerical Computation
Why Mathematics is Essential in DL

• Mathematics is the foundation of deep learning models because it allows us to

1 2 3
represent and transform data, optimize parameters, and measure performance.
• Core areas include:
‐ Linear Algebra: Handles matrix and vector operations for data manipulation,
parameter updates, and neural network layers.
‐ Calculus: Used for optimization through gradient-based methods like backpropagation.
‐ Probability & Statistics: Necessary for understanding model evaluation, predictions,
and uncertainty in deep learning.
20
The Role of Linear Algebra in DL

• Linear algebra is fundamental in deep learning because most computations

(input transformations, weight updates, activations) are expressed as matrix
or vector operations.
• Vectors, matrices, and tensors are the building blocks for representing data,
weights, and operations in neural networks.
• Weight matrices, activation vectors, gradient computations, and
transformation operations are all handled through linear algebra.
21
Vectors and Matrices in DL
[x,y]

i
• Vectors are 1D arrays of numbers that represent features of data (e.g., pixel
intensities in an image or values in a dataset).
• Matrices are 2D arrays of numbers that represent transformations (e.g.,
weight matrices that map input features to outputs in a neural network layer).
[[x1,y1],[x2,y2]]

22
Matrix Operations in DL
合併
矩陣加法 -> 合併bias到input中乛

• Matrix Addition: Adds corresponding elements from two matrices. Used in

neural networks to incorporate biases into the linear transformation of inputs.
• Matrix Multiplication: Multiplies rows of one matrix by columns of another.
矩陣乘法 -> 合併權重跟輸入
In deep learning, this is used to combine inputs with weights.
• Matrix Transpose: Switches rows and columns. In backpropagation,
transposed matrices are used to compute weight gradients.矩陣轉置 -> 計算權重斜率
• In deep learning, we apply matrix multiplications repeatedly through each
layer of the network to transform input data into predictions.
23
Eigenvalues and Eigenvectors
特徵截取
7
• Eigenvalues and Eigenvectors are crucial in understanding transformations in data
and feature extraction techniques like Principal Component Analysis.
• An eigenvector is a direction in the data that does not change direction when a
transformation (matrix) is applied, only its magnitude (determined by the
eigenvalue).
• Eigenvector Decomposition: Given a square matrix A, the eigenvector 𝜈 satisfies:
定義 𝐴∙𝜈 =𝜆∙𝜈
特徵數值特徵向量
Here, 𝜆 is the eigenvalue, representing the scaling factor, and 𝜈 is the eigenvector.
24
Tensors in DL

• Tensors generalize vectors and matrices to higher dimensions.

• A tensor can be 1D (vector), 2D (matrix), or higher (3D, 4D, etc.), and they
are essential in representing multi-dimensional data (e.g., batches of images)
in deep learning.
• In frameworks like PyTorch and TensorFlow, tensors are the primary data
structure for storing inputs, weights, and intermediate activations.

25
Calculus in DL
梯度微分
乛
• Calculus is used for optimizing neural networks. Specifically, derivatives and
gradients help determine how model parameters should change to minimize
the loss function.
Rate
• Key Operations: 乛

‐ Derivatives measure how a small change in input affects the output.

‐ Gradients (vector of partial derivatives) are used to find the direction in
which to update the model’s parameters to reduce error.
26
Derivatives and Gradients

• The derivative of a function measures the rate of change of the function's

output with respect to a change in its input.
• In Deep Learning:
‐ The derivative of the loss function with respect to the model's weights tells
us how the weights should be adjusted to reduce the loss.
‐ Gradient (Vector of Derivatives):The gradient points in the direction of the
steepest increase in the loss function. By moving in the opposite direction,
we minimize the loss. 27
Gradient Descent for Optimization
Controls the step size of each update in gradient descent.
If too high, the model may overshoot the optimal solution.
If too low, training may be slow and get stuck in local minima.
• Gradient Descent Algorithm:
𝜃new = 𝜃old − 𝜂∇𝜃 𝐽(𝜃),
where 𝜃 represent the model’s parameters, 𝜂 is the learning rate, and ∇𝜃 𝐽(𝜃)
is the gradient of the loss function with respect to 𝜃.
• Gradient descent adjusts the model’s weights in the direction that reduces the
loss function. This is the core optimization method used in training deep
learning models.
28
Gradient Descent for Optimization

Figure 4.3 in Deep Learning by Ian Goodfellow, Yoshua Bengio, and Aaron
Courville.
Figure 4.1 in Deep Learning by Ian Goodfellow, Yoshua Bengio, and Aaron
Courville.
29
Backpropagation and Chain Rule

• Backpropagation uses the chain rule to compute gradients for all parameters
in a neural network. It allows us to efficiently compute the derivatives of the
loss function with respect to each parameter in the network.
• The chain rule states:

𝑑
𝑓 𝑔 𝑥 = 𝑓′ 𝑔 𝑥 ∙ 𝑔′(𝑥)
𝑑𝑥
In a neural network, this rule is applied layer by layer to propagate the error back from
the output to the input, updating the weights accordingly.
30
Probability and Statistics in DL
量化

• In deep learning, models often make predictions based on uncertain data.

Understanding probability helps quantify and interpret these uncertainties.
1 2
• Statistics allows us to assess model performance, optimize decision-making,
3
and understand data distributions.
• Key Applications:
‐ Probability is essential for making predictions and understanding model uncertainty.
‐ Statistics is used for model evaluation, understanding distributions, and performance
measurement.
31
Understanding Random Variables

• A random variable is a variable that can take different values based on some
random process or experiment. In deep learning, we often treat model outputs
(predictions) as random variables. In classification, the predicted class probabilities
from a softmax output can be treated as random
• Types of Random Variables: variables that follow a discrete distribution.

‐ Discrete Random Variable: Takes on a countable set of values (e.g., binary

classification outputs like 0 or 1).
‐ Continuous Random Variable: Takes on any value within a range (e.g., regression
outputs like temperature predictions).
32
Probability Distributions in DL

• Bernoulli Distribution: Used for binary classification tasks (e.g., predicting

whether an image is a cat or not).
• Gaussian (Normal) Distribution: Assumes that the prediction errors or noise
in a model are normally distributed. 常態分布(⾼斯分佈)：鐘型曲線

• In a regression task, the output prediction might follow a normal distribution

centered around the predicted value, with some variance representing the
uncertainty in the prediction.
33
Bernoulli Distribution in Binary Classification

• Bernoulli Distribution: Used to model binary classification tasks, where the

output is either 0 or 1.
P(X = 1) = p, P(X = 0) = 1 − p
• In binary classification (e.g., spam detection), the model assigns a probability
p that the email is spam (1), and 1 − p that it is not spam (0).
• Real-World Example: Logistic regression outputs probabilities for binary
classification tasks, and the decision is made by thresholding this probability
(e.g., if 𝑝>0.5p>0.5, classify as 1).
34
Gaussian (Normal) Distribution in Regression

• Gaussian Distribution: A continuous probability distribution commonly used

to model prediction errors in regression tasks.
• The errors (residuals) in regression models often assume a normal
distribution, allowing the model to predict not just a point estimate but also a
measure of uncertainty (e.g., prediction intervals).
• Example: In a temperature prediction model, you might predict that the
temperature tomorrow will be 25∘C with a variance of 2∘C, implying that
most of the time, the actual temperature will lie between 23 and 27 degrees.
35
Maximum Likelihood Estimation

• MLE is used to estimate the parameters of a model by maximizing the

likelihood of the observed data.
• In neural networks, this often translates to minimizing the negative log-
likelihood. The objective is to find the parameter θ that maximizes the
likelihood of the observed data.
• Example: For a classification task, maximizing the likelihood of the correct
class label is equivalent to minimizing the cross-entropy loss.
36
Information Theory in DL

• Entropy (H) measures the uncertainty of a probability distribution:

Used to quantify the unpredictability in classification tasks.

H 𝑋 = − ෍ 𝑃 𝑥𝑖 𝑙𝑜𝑔𝑃(𝑥𝑖 )
𝑖
• Kullback–Leibler (KL) Divergence measures how one probability distribution
differs from another reference distribution:
Often used in Variational Autoencoders (VAEs) and model regularization.
𝑃(𝑥𝑖 )
D𝐾𝐿 𝑃||𝑄 = ෍ 𝑃 𝑥𝑖 𝑙𝑜𝑔
𝑄(𝑥𝑖 )
𝑖
37
Building and Evaluating Deep Learning Models
Flow for Constructing a DL Model

1) Data Collection and Preprocessing

2) Dataset Splitting (Training, Validation, Test)
3) Model Design (Architecture Selection)
4) Model Training
5) Model Evaluation
6) Model Optimization
Following a structured model development flow ensures that the model
generalizes well to unseen data and performs reliably. 39
Data Collection and Preprocessing

• Data Collection:
‐ Collect data relevant to the problem you are solving (e.g., images, text, structured data).
‐ Ensure that the data is representative of the real-world scenarios the model will encounter.

• Data Preprocessing:
‐ Cleaning the Data: Handle missing values, remove duplicates.
‐ Normalization/Standardization: Ensure input features have the same scale (important for
neural networks).
‐ Augmentation (for images): Apply transformations like rotations, flips, and scaling to
artificially expand the dataset.
40
Dataset Splitting
Train Val Test
• Why Split the Dataset?
To prevent the model from learning patterns specific to the training data and ensure it
can generalize to unseen data.
• Dataset Splitting:
‐ Training Set: Used to train the model (typically 70%-80% of the data).
‐ Validation Set: Used to tune hyperparameters and monitor overfitting (typically 10%-
15%).
‐ Test Set: Used to evaluate the model’s final performance on unseen data (typically
10%-15%).
41
Model Design

• Choosing a Model Architecture:

‐ Select a model architecture suitable for the problem (e.g., CNN for image
data, RNN for sequence data).
‐ Layers and Neurons: Decide how many layers and neurons each layer
should have based on the complexity of the problem.

42
Model Training

• Training the Model:

‐ Compile the Model: Choose the loss function and optimizer.
‐ Fit the Model: Train the model using the training set and monitor performance on the
validation set.

• Hyperparameters:
‐ Learning Rate: Controls how much to adjust the model weights with each training step.
‐ Batch Size: Determines how many samples are processed before the model updates its
weights.
43
Overfitting and Underfitting

• Overfitting: 過度訓練
‐ Occurs when the model performs well on the training set but poorly on the
validation/test set.
‐ Symptoms: Training accuracy continues to improve, but validation accuracy stagnates
or decreases.

• Underfitting:
‐ Occurs when the model performs poorly on both the training and validation sets.
‐ Symptoms: Low training and validation accuracy.
44
Overfitting and Underfitting

Figure 5.2 in Deep Learning by Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 45
Overfitting and Underfitting

實際上應該是⼀個區間

Figure 5.3 in Deep Learning by Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 46
Bias-Variance Tradeoff

• Bias: Error introduced by simplifying assumptions in the model. High bias

models tend to underfit the data and have poor performance on both training
and test sets.
• Variance: Error introduced by the model's sensitivity to small fluctuations in
the training data. High variance models tend to overfit, performing well on
how the predicted values of the model change if
the training set but poorly on unseen data. we train the model on different training datasets

As model complexity increases, bias decreases but variance increases.

The goal is to find a model that balances these two sources of error to minimize total error.
47
Bias-Variance Tradeoff

⼀般來說，模型越複雜可
以有越精準的bias，因為他
可以調整的細節更多

Predict Value Low Bias

準確變動不會
太⼤的model

太分散

Source: https://medium.com/@ivanreznikov/stop-using-the-same-image-in-bias-variance-trade-off-explanation-691997a94a54 48
Model Evaluation

• Evaluating the Model Based on the Learning Task:

‐ Classification Tasks: Use accuracy, precision, recall, F1-score, ROC-AUC to assess
model performance.
‐ Regression Tasks: Use mean squared error (MSE), mean absolute error (MAE), R-
squared to evaluate continuous output models.
‐ Unsupervised Learning Tasks: Use silhouette score, Davies-Bouldin index, or
clustering evaluation metrics.
The performance of a model depends on how well it aligns with the task's goals
(e.g., accuracy may not be sufficient for imbalanced classification).
49
Evaluating Classification Models (1/3)
背
• Accuracy: Proportion of correct predictions out of all predictions.
• Precision: Proportion of true positives among predicted positives.
• Recall (Sensitivity): Proportion of actual positives correctly predicted.
• F1-Score: The harmonic mean of precision and recall, useful when you need
to balance false positives and false negatives.
• ROC-AUC: Area under the Receiver Operating Characteristic curve. It helps
measure the trade-off between true positive rate (TPR) and false positive rate
(FPR). 50
Evaluating Classification Models (2/3)
Test Result
TP TP True
• Sensitivity ( hit rate or recall) = =
P TP + FN Condition Positive Negative
TN TN Status
• Specificity = =
N TN + FP True False
TP Disease Positive Negative P
• Precision (Positive Predictive Value, PPV) = TP + FP
(TP) (FN)
TP + TN TP + TN False True
• Accuracy (ACC) = P + N = TP + TN + FP + FN Health Positive Negative N
(FP) (TN)
2TP
• F1 score = 2TP + FP + FN 可以避免極端值影響過重
TP  TN − FP  FN
• Matthews correlation coefficient (MCC) = (TP + FP)(TP + FN )(TN + FP )(TN + FN )

51
Evaluating Classification Models (3/3)

• Let X be the predicted values with positive label, and Y with negative label.
• TPR(c) = Pr(X > c)
• FPR(c) = Pr(Y > c)
1
• AUC =  ROC (t )dt
0
1 𝑛𝑥 𝑛𝑦
෣
• 𝐴 𝑈𝐶 = σ σ 𝐼(𝑋𝑖 > 𝑌𝑗 )
𝑛𝑥 𝑛𝑦 𝑖=1 𝑗=1

AUC 越接近1 代表model 越好

52
Evaluating Regression Models

• Mean Squared Error (MSE): Measures the average squared difference

between the predicted and actual values.
• Mean Absolute Error (MAE): Measures the average absolute difference
between the predicted and actual values.
• R-Squared (R²): Indicates the proportion of variance explained by the model.
An R² of 1 means the model perfectly fits the data.

53
Evaluating Unsupervised Learning Models

• Silhouette Score: Measures how similar an object is to its own cluster

compared to other clusters. A higher score indicates better-defined clusters.
• Davies-Bouldin Index: Measures the average similarity ratio between each
cluster and its most similar cluster. A lower value indicates better clustering.
• Cluster Purity: Evaluates the extent to which each cluster contains only data
points from a single class.

54
Preventing Overfitting

• Regularization Techniques:
‐ L2 Regularization: Adds a penalty for large weights in the model.
‐ Dropout: Randomly drops a percentage of neurons during training to reduce
dependency on specific neurons.

• Early Stopping: Stop training when validation performance stops improving

to prevent overfitting.

55
Frameworks for DL: Keras and PyTorch
• Keras
• PyTorch
Introduction to Keras and PyTorch

• Deep learning frameworks like Keras and PyTorch allow developers to focus
on model design and training without dealing with low-level matrix
operations or gradient computation.
• They provide pre-built functionalities for creating neural networks,
optimizing them, and handling large datasets efficiently.

57
Keras

• Keras is a high-level API built on top of TensorFlow. It simplifies model

building by abstracting complex backend operations, allowing for rapid
prototyping.
• Key Features:
‐ Sequential API: Simple linear stack of layers.
‐ Functional API: For more complex models (multi-input/output).

58
Keras Key Functionalities

• Loss Functions and Optimizers: Built-in loss functions (e.g., MSE, cross-
entropy) and optimizers (e.g., SGD, Adam).
• Callbacks: Customizable training callbacks for early stopping, learning rate
scheduling, etc.
• Data Handling: In-built support for image, text, and sequence data.

59
PyTorch Overview

• PyTorch is a low-level, flexible deep learning framework popular in academic

research. It features dynamic computation graphs, allowing more freedom in
experimenting with model structures.
• Key Features:
‐ Dynamic Computation Graphs: Graphs are built on the fly, making it
easier to debug and adjust models during training.
‐ Autograd: Automatic differentiation library that computes gradients during
backpropagation. 60
Tensors in PyTorch and Keras

• Tensors are the core data structure in both Keras and PyTorch, representing
multi-dimensional arrays.
• Tensors enable matrix operations and hold gradients for backpropagation.

61
Key Differences Between Keras and PyTorch

• Keras vs PyTorch:
‐ Ease of Use: Keras is user-friendly, suited for beginners, and focuses on rapid
prototyping.
‐ Flexibility: PyTorch offers more flexibility, making it preferred for research, where
dynamic changes to the computation graph are required.

• Computation Graph:
‐ Keras uses static computation graphs (compiled before training).
‐ PyTorch uses dynamic computation graphs (created on the fly).
62
Q&A

MCA - ML Question Bank Answer
No ratings yet
MCA - ML Question Bank Answer
139 pages
Deep Learning Notes
No ratings yet
Deep Learning Notes
200 pages
Deep Learning Module 1 Chapter 1
No ratings yet
Deep Learning Module 1 Chapter 1
18 pages
Unit 1a - Fundamentals of Deep Learning
No ratings yet
Unit 1a - Fundamentals of Deep Learning
54 pages
Introduction (v4)
No ratings yet
Introduction (v4)
54 pages
0 AI M Learning Deep Learning 2022
No ratings yet
0 AI M Learning Deep Learning 2022
81 pages
ML 0
No ratings yet
ML 0
47 pages
Machine Learning
No ratings yet
Machine Learning
31 pages
DL Notes 1 5 Deep Learning
100% (1)
DL Notes 1 5 Deep Learning
189 pages
Deep Learning U1
No ratings yet
Deep Learning U1
5 pages
AA12 Deep Learning 2024
No ratings yet
AA12 Deep Learning 2024
30 pages
Deep Learning and Its Applications
No ratings yet
Deep Learning and Its Applications
33 pages
Report - Stock Price Prediction DL
No ratings yet
Report - Stock Price Prediction DL
37 pages
Deep Learning UNIT 5
No ratings yet
Deep Learning UNIT 5
182 pages
ML Concepts
No ratings yet
ML Concepts
37 pages
ML - Unit I - Final
No ratings yet
ML - Unit I - Final
132 pages
Deep Learning Material
No ratings yet
Deep Learning Material
136 pages
Unit 1
No ratings yet
Unit 1
46 pages
ML Unit-1
No ratings yet
ML Unit-1
139 pages
Mlunit 1
No ratings yet
Mlunit 1
139 pages
DL Unit 1
No ratings yet
DL Unit 1
21 pages
Module 1
No ratings yet
Module 1
175 pages
Digraphs
No ratings yet
Digraphs
40 pages
Deep Learning Introduction Class
No ratings yet
Deep Learning Introduction Class
46 pages
Chapter 1
No ratings yet
Chapter 1
62 pages
Unit 1
No ratings yet
Unit 1
112 pages
1 AI - Introduction and ML
No ratings yet
1 AI - Introduction and ML
32 pages
Presentation On ML
No ratings yet
Presentation On ML
469 pages
Unit 1
No ratings yet
Unit 1
38 pages
Upload Unit 1
No ratings yet
Upload Unit 1
36 pages
Unit 1 1. Define Machine Learning. Application of Machine Learning Applications of ML
No ratings yet
Unit 1 1. Define Machine Learning. Application of Machine Learning Applications of ML
40 pages
ML Short U1-4
No ratings yet
ML Short U1-4
60 pages
Deep Learning
100% (3)
Deep Learning
32 pages
Unit 1
No ratings yet
Unit 1
15 pages
Ai Notes ch2
No ratings yet
Ai Notes ch2
2 pages
Jntuk r20 Unit-I Deep Learning Techniques (WWW - Jntumaterials.co - In)
No ratings yet
Jntuk r20 Unit-I Deep Learning Techniques (WWW - Jntumaterials.co - In)
23 pages
MLUnit - 1 Share
No ratings yet
MLUnit - 1 Share
162 pages
DL Unit 1
No ratings yet
DL Unit 1
27 pages
21cs743 Solutions
No ratings yet
21cs743 Solutions
19 pages
MLUnit 1
No ratings yet
MLUnit 1
131 pages
Unit 1
No ratings yet
Unit 1
43 pages
Iml Material
No ratings yet
Iml Material
139 pages
SEng5305-chap-1-Introduction To ML
No ratings yet
SEng5305-chap-1-Introduction To ML
85 pages
Module1 - Deep Learning
No ratings yet
Module1 - Deep Learning
26 pages
Module 3
No ratings yet
Module 3
97 pages
Unit - 1 Deep Learning 3-2
No ratings yet
Unit - 1 Deep Learning 3-2
15 pages
DL Notes
No ratings yet
DL Notes
97 pages
Cracking The AI Code - Unlocking The Secrets of Machine Learning
No ratings yet
Cracking The AI Code - Unlocking The Secrets of Machine Learning
18 pages
Intro To ML - 1
No ratings yet
Intro To ML - 1
29 pages
Ahishek File
No ratings yet
Ahishek File
6 pages
Fundamentals of Deep Learning
No ratings yet
Fundamentals of Deep Learning
195 pages
Machine Learning INTRO
No ratings yet
Machine Learning INTRO
12 pages
Deep Learning Concise Notes
No ratings yet
Deep Learning Concise Notes
4 pages
Deep Learning (DL) - Comprehensive Summary
No ratings yet
Deep Learning (DL) - Comprehensive Summary
9 pages
Lecture 1a - Introduction
No ratings yet
Lecture 1a - Introduction
38 pages
Reviewer
No ratings yet
Reviewer
7 pages
Unit-1 MLT
No ratings yet
Unit-1 MLT
51 pages
Introduction To AI and ML - Day 1: Gururajan Narasimhan Erode
No ratings yet
Introduction To AI and ML - Day 1: Gururajan Narasimhan Erode
39 pages
Unit - 1 Deep Learning Techniques
No ratings yet
Unit - 1 Deep Learning Techniques
18 pages
Explicit Dynamic Analysis - Abaqus
No ratings yet
Explicit Dynamic Analysis - Abaqus
19 pages
Finite Element Analysis of Trusses
No ratings yet
Finite Element Analysis of Trusses
17 pages
Foundations of Engineering With MATLAB 7: Eric S. Carlson
No ratings yet
Foundations of Engineering With MATLAB 7: Eric S. Carlson
14 pages
Lecture 9-10 - Group Technology and Cellular Manufacturing
No ratings yet
Lecture 9-10 - Group Technology and Cellular Manufacturing
48 pages
MA3151 Matrices and Calculus Two Mark Questions 2
No ratings yet
MA3151 Matrices and Calculus Two Mark Questions 2
13 pages
Math
No ratings yet
Math
215 pages
07 - Eigen Values and Eigen Vectors
No ratings yet
07 - Eigen Values and Eigen Vectors
8 pages
CS502 Quiz Solved For Final Term Preparation
No ratings yet
CS502 Quiz Solved For Final Term Preparation
20 pages
J00612 2025-26 DAV Bistupur Syllabus - Std-XII
No ratings yet
J00612 2025-26 DAV Bistupur Syllabus - Std-XII
40 pages
Journal of Applied Geophysic Loke PDF
No ratings yet
Journal of Applied Geophysic Loke PDF
22 pages
New Syllabus Mathematics For 0-Level 1
No ratings yet
New Syllabus Mathematics For 0-Level 1
8 pages
B.Tech EE 2023 v2
No ratings yet
B.Tech EE 2023 v2
132 pages
Quora
No ratings yet
Quora
6 pages
Mdof
No ratings yet
Mdof
67 pages
CE6146 Lecture 3
No ratings yet
CE6146 Lecture 3
83 pages
Questions Bank New 23-24
No ratings yet
Questions Bank New 23-24
93 pages
CE6146 Lecture 2
No ratings yet
CE6146 Lecture 2
72 pages
Principal Component Analysis: Term Paper For Data Mining & Data Warehousing
No ratings yet
Principal Component Analysis: Term Paper For Data Mining & Data Warehousing
11 pages
CE6146 Lecture 4
No ratings yet
CE6146 Lecture 4
53 pages
CE6146 Lecture 5
No ratings yet
CE6146 Lecture 5
55 pages
MSC Chemistry 07122022
No ratings yet
MSC Chemistry 07122022
43 pages
Matlab
No ratings yet
Matlab
12 pages
محاضرة 2-1
No ratings yet
محاضرة 2-1
39 pages
22mats41 Cse
No ratings yet
22mats41 Cse
4 pages
Graybill Dist FC
No ratings yet
Graybill Dist FC
7 pages
343-Co-Rotational Finite Element Formulation Used in The Koiter-Newton Method For Nonlinear Buckling Analyses
No ratings yet
343-Co-Rotational Finite Element Formulation Used in The Koiter-Newton Method For Nonlinear Buckling Analyses
17 pages
Advanced: You Were Here You Are Here You Are Going Here
No ratings yet
Advanced: You Were Here You Are Here You Are Going Here
35 pages
Similarity Transformations
No ratings yet
Similarity Transformations
20 pages
Keyphrase Extraction Using Word Embedding
100% (1)
Keyphrase Extraction Using Word Embedding
8 pages
Chapter 2. Matrix Algebra
No ratings yet
Chapter 2. Matrix Algebra
41 pages
Quantum Field Theory 1 Homework 1: N X A X
No ratings yet
Quantum Field Theory 1 Homework 1: N X A X
10 pages
Tutorial For The TI-89 Titanium Calculator: SI Physics
No ratings yet
Tutorial For The TI-89 Titanium Calculator: SI Physics
6 pages
EFE Matrix
No ratings yet
EFE Matrix
3 pages
Fundamentals of Machine Learning: a Simplified Approach
From Everand
Fundamentals of Machine Learning: a Simplified Approach
Er. Sudhir Goswami
No ratings yet
Artificial Intelligence Algorithms
From Everand
Artificial Intelligence Algorithms
akosnemeth
No ratings yet

CE6146 Lecture 1

Uploaded by

CE6146 Lecture 1

Uploaded by

CE6146

Introduction to Deep Learning

• Essential Mathematics for Deep Learning

• Building and Evaluating Deep Learning Models

• Frameworks for Deep Learning: Keras and PyTorch

By the end of this lecture, you will be able to:

• Data: Information used for analysis, training, and testing models.

Artificial Intelligence (AI) is the simulation of human intelligence processes

Machine learning is a subset of artificial intelligence focused on building

‐ Automated Feature Extraction Neural networks automatically learn features from

Feature Extraction Manual Automated

Data Requirements Smaller datasets Large datasets

Source: 1. https://reurl.cc/yLq3Q6 2. https://reurl.cc/8vkgpb 3. https://reurl.cc/jWAn1L

• Image Recognition: Identifying objects within images. Computer vision

• Natural Language Processing (NLP): Understanding and generating human

• Healthcare: Disease detection from medical images, personalized medicine.

• Machine learning paradigms refer to the different approaches or

Source: 1. https://botpenguin.com/glossary/supervised-learning 2. https://botpenguin.com/glossary/unsupervised-learning 3. https://botpenguin.com/glossary/reinforcement-learning 16

• Mathematics is the foundation of deep learning models because it allows us to

• Linear algebra is fundamental in deep learning because most computations

• Matrix Addition: Adds corresponding elements from two matrices. Used in

• Tensors generalize vectors and matrices to higher dimensions.

‐ Derivatives measure how a small change in input affects the output.

• The derivative of a function measures the rate of change of the function's

• In deep learning, models often make predictions based on uncertain data.

‐ Discrete Random Variable: Takes on a countable set of values (e.g., binary

• Bernoulli Distribution: Used for binary classification tasks (e.g., predicting

• In a regression task, the output prediction might follow a normal distribution

• Bernoulli Distribution: Used to model binary classification tasks, where the

• Gaussian Distribution: A continuous probability distribution commonly used

• MLE is used to estimate the parameters of a model by maximizing the

• Entropy (H) measures the uncertainty of a probability distribution:

1) Data Collection and Preprocessing

• Choosing a Model Architecture:

• Training the Model:

• Bias: Error introduced by simplifying assumptions in the model. High bias

As model complexity increases, bias decreases but variance increases.

Predict Value Low Bias

• Evaluating the Model Based on the Learning Task:

AUC 越接近1 代表model 越好

• Mean Squared Error (MSE): Measures the average squared difference

• Silhouette Score: Measures how similar an object is to its own cluster

• Early Stopping: Stop training when validation performance stops improving

• Keras is a high-level API built on top of TensorFlow. It simplifies model

• PyTorch is a low-level, flexible deep learning framework popular in academic

You might also like