0% found this document useful (0 votes)
13 views64 pages

Lecture 2 - Neural Network v1.0

The document outlines an assignment for a neural network course, detailing tasks such as building a neural network using FashionMNIST data, reading a blog on neural networks, and summarizing a related video. It includes concepts like the XOR problem, network depth and width, initialization of parameters, loss functions, and the vanishing gradient problem. Additionally, it covers the structure and training process of neural networks, emphasizing the importance of activation and loss functions.

Uploaded by

minxuzheng8
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views64 pages

Lecture 2 - Neural Network v1.0

The document outlines an assignment for a neural network course, detailing tasks such as building a neural network using FashionMNIST data, reading a blog on neural networks, and summarizing a related video. It includes concepts like the XOR problem, network depth and width, initialization of parameters, loss functions, and the vanishing gradient problem. Additionally, it covers the structure and training process of neural networks, emphasizing the importance of activation and loss functions.

Uploaded by

minxuzheng8
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 64

AI Essential for Business  “2.

0”
Jim Kyung-Soo Liew, Ph.D.
Associate Professor in Finance
President of SoKat
Spring II 2025
Lecture 2 – Neural Network Details
Assignment 2
1) Redo building a Neural Network (e.g. myFirstMNIST.ipynb) but instead of using the MNIST data as we did in-
class, employ the FashionMNIST data. Increase the accuracy of NN model by adjusting architectural structure,
etc. Advanced students should re-code everything, understand each line of the code. (Submit code)
2) Read the blog “Imperial College ML – NN”; explain concepts that you found interesting, what did you learn?
(1/2 page)
• What is the XOR problem, who showed it was impossible for the perceptron to solve it? Why?
• What is the depth of a network, what about width?
• What happens if we initialize all the parameters to zero (all wgts and biases)?
• What’s the difference between binary cross entropy and negative log likelihood loss function?
• What is the vanishing gradient problem?
• Describe regularization? What is dropout?
3) Watch and summarize -- https://www.youtube.com/watch?v=ErnWZxJovaM (1/2 page)
4) Vibe Code Assignment 2 – “Elon Fired!” (Submit 3-min video narrative and code)

3/31/2025 @Prof Jim Liew 2


Table of Contents

Perceptron to Basic Neural Network (NN)


What’s inside NN?
Deep Dive into NN
In-Class Exercise: MNIST in Jupyter Notebook
Deploying NN
Assignment 2

3/31/2025 @Prof Jim Liew 3


Perceptron to
Basic Neural
Network

3/31/2025 @Prof Jim Liew Source: Google Brain Map 4


Inspired by how our brain works

3/31/2025 @Prof Jim Liew 5


First, inputs are the data,
multiplied be the weights

X1

X2

X3

X4

3/31/2025 @Prof Jim Liew 6


Aggregated and add the bias

X1

Cell Nucleus
X2

X3

X4

3/31/2025 @Prof Jim Liew 7


Next, push through activation function

X1

Cell Nucleus Axon


X2

X3

X4

3/31/2025 @Prof Jim Liew 8


Finally, generate output signal

X1

Cell Nucleus Axon


Output
X2
Signal!

X3

X4

3/31/2025 @Prof Jim Liew 9


Multi-Layer
Perceptron to
Basic Neural
Network

3/31/2025 @Prof Jim Liew 10


Stack the Perceptrons to build Multi-Layer Perceptron
or basic Neural Network

Perceptron

3/31/2025 @Prof Jim Liew 11


Stack the Perceptrons to build Multi-Layer Perceptron
or basic Neural Network

Perceptron

Perceptron

3/31/2025 @Prof Jim Liew 12


Stack the Perceptrons to build Multi-Layer Perceptron
or basic Neural Network

Perceptron

Perceptron
three Perceptron

3/31/2025 @Prof Jim Liew 13


Stack the Perceptrons to build Multi-Layer Perceptron
or basic Neural Network

Perceptron

Perceptron

three Perceptron

four Perceptron

3/31/2025 @Prof Jim Liew 14


Simple Neural Network to Deep Neural Network

One Hidden Layer Two Or More Hidden Layer

Deep Neural Network


3/31/2025 @Prof Jim Liew 15
What’s Inside of Neural Network?

3/31/2025 @Prof Jim Liew 16


What’s Inside Neural Nets?

Daniel Khashabi 17
Feedforward networks
 This is a particular class called “feedforward” networks.
o Cascade neurons together

Daniel Khashabi [Slides: HKUST] 18


Feedforward networks

• Inputs multiplied by initial set of weights

Daniel Khashabi [Slide: HKUST] 19


Feedforward networks

• Intermediate “predictions” computed at first hidden layer

Daniel Khashabi [Slide: HKUST] 20


Feedforward networks

• Intermediate predictions multiplied by second layer of weights


• Predictions are fed forward through the network

Daniel Khashabi [Slide: HKUST] 21


Feedforward networks

• Compute second set of intermediate predictions

Daniel Khashabi [Slide: HKUST] 22


Feedforward networks

• Multiply by final set of weights

Daniel Khashabi [Slide: HKUST] 23


Feedforward networks

• Aggregate all the computations in the output


• e.g. probability of a particular class

Daniel Khashabi [Slide: HKUST] 24


Feedforward networks

• All the intermediate parameters are ought to be learned.

Weights to learn!
Weights to learn!

Weights to learn!

Daniel Khashabi [Slide: HKUST] 25


Deep Dive into Neural Network

Building Intuition of Neural Networks


Activation Functions
Loss Functions
Backpropagation
Gradient Descent

3/31/2025 @Prof Jim Liew 26


Neural Network Build Intuition
First, try Activation = Linear

https://playground.tensorflow.org
3/31/2025 @Prof Jim Liew 27
Activation Functions - Sigmoid

1
𝑓𝑓 𝑥𝑥 =
1 + 𝑒𝑒 −𝑥𝑥

3/31/2025 @Prof Jim Liew 28


Activation Functions – Hyperbolic -Tangent

𝑒𝑒 𝑥𝑥 − 𝑒𝑒 −𝑥𝑥
𝑓𝑓 𝑥𝑥 = 𝑥𝑥
𝑒𝑒 + 𝑒𝑒 −𝑥𝑥

3/31/2025 @Prof Jim Liew 29


Activation Functions – ReLU Function

𝑓𝑓 𝑥𝑥 = 𝑚𝑚𝑚𝑚𝑚𝑚 0, 𝑥𝑥

3/31/2025 @Prof Jim Liew 30


Activation Functions – Leaky ReLU Function

𝑥𝑥, 𝑖𝑖𝑖𝑖 𝑥𝑥 > 0


𝑓𝑓 𝑥𝑥 = �
𝛼𝛼𝑥𝑥, 𝑖𝑖𝑖𝑖 𝑥𝑥 ≤ 0

3/31/2025 @Prof Jim Liew 31


Loss Functions
Loss functions measure how well the neural network's predictions match the
true values. They quantify the error between the predicted output and the
actual target.

Why Are Loss Functions Important?


1.Optimization Target: They provide a single value to minimize during
training
2.Problem-Specific: Different problems require different loss functions
3.Gradient Properties: They must be differentiable for backpropagation

3/31/2025 @Prof Jim Liew 32


Loss Function – Mean Squared Error (MSE)

𝑛𝑛
1 2
𝑀𝑀𝑀𝑀𝑀𝑀 = � 𝑦𝑦𝑖𝑖 − 𝑦𝑦�𝑖𝑖
𝑛𝑛
𝑖𝑖=1

3/31/2025 @Prof Jim Liew 33


Loss Function – Binary Cross-Entropy

𝑛𝑛
1
𝐵𝐵𝐵𝐵𝐵𝐵 = − � 𝑦𝑦𝑖𝑖 log 𝑦𝑦�𝑖𝑖 + 1 − 𝑦𝑦𝑖𝑖 log 1 − 𝑦𝑦�𝑖𝑖
𝑛𝑛
𝑖𝑖=1

3/31/2025 @Prof Jim Liew 34


Loss Function – Categorical Cross-Entropy

𝑛𝑛 𝑚𝑚

𝐶𝐶𝐶𝐶𝐶𝐶 = − � � 𝑦𝑦𝑖𝑖,𝑗𝑗 log 𝑦𝑦�𝑖𝑖,𝑗𝑗


𝑖𝑖=1 𝑗𝑗=1

3/31/2025 @Prof Jim Liew 35


Recognizing the digits in handwriting
• Bank check recognition system is one of the
most famous (or few) early successes of
Neural Network
• Developed by Yann LeCun, then at AT&T Lab

• Now Professor of New York University and


Chief AI Scientist at Facebook
Data
• MNIST handwritten digits
• 60k examples for training
• 10k examples for testing

3/31/2025 @Prof Jim Liew 36


Neural Network structure

Output

Pixels: 28 x 28 = 784
Output
Layer

37
Neural Network structure

Output

Pixels: 28 x 28 = 784
Output
Layer

38
Neural Network structure

Output

Pixels: 28 x 28 = 784
Output
Layer

39
Training process – Forward Propagation
Epoch 1
Output
Batch 1

Batch 2
Forward propagation -Compare with
Batch 3 Truth
-Calculate the Loss

Batch n

Training Output
Data Layer

40
Training process – Back Propagation
Epoch 1
Output
Batch 1

Batch 2
Forward propagation -Compare with
Batch 3 Truth
-Calculate the Loss

Back propagation
Batch n

Training Output
Data Layer

41
Backprop intuition

3/31/2025 @Prof Jim Liew 42


How it works

σ(z)

3/31/2025 @Prof Jim Liew 43


Forward pass, let’s make a prediction first

Initial parameters:
• Weight 1 (w₁) = 0.5 σ(z)
• Weight 2 (w₂) = 0.5

• Input (x) = 2.0


• Target value (t) = 1.0

3/31/2025 @Prof Jim Liew 44


Forward pass, let’s make a prediction first

Step 1: Forward Pass


First, we calculate the output of our network:
σ(z)
1. Calculate the input to the hidden node:
1. z = x × w₁ = 2.0 × 0.5 = 1.0

2. Apply the sigmoid activation function:


1. h = σ(z) = σ(1.0) = 0.731059
2. Formula: σ(z) = 1 / (1 + e^(-z))
3. h = 1 / (1 + e^(-1.0)) = 1 / (1 + 0.367879) = 0.731059

3.Calculate the output:


1. y = h × w₂ = 0.731059 × 0.5 = 0.365529

3/31/2025 @Prof Jim Liew 45


Forward pass, let’s make a prediction first

Step 1: Forward Pass


First, we calculate the output of our network:
σ(z)
1. Calculate the input to the hidden node:
1. z = x × w₁ = 2.0 × 0.5 = 1.0

2. Apply the sigmoid activation function:


1. h = σ(z) = σ(1.0) = 0.731059
2. Formula: σ(z) = 1 / (1 + e^(-z))
3. h = 1 / (1 + e^(-1.0)) = 1 / (1 + 0.367879) = 0.731059

3.Calculate the output:


1. y = h × w₂ = 0.731059 × 0.5 = 0.365529

3/31/2025 @Prof Jim Liew 46


Forward pass, let’s make a prediction first

Step 1: Forward Pass


First, we calculate the output of our network:
σ(z)
1. Calculate the input to the hidden node:
1. z = x × w₁ = 2.0 × 0.5 = 1.0

2. Apply the sigmoid activation function:


1. h = σ(z) = σ(1.0) = 0.731059
2. Formula: σ(z) = 1 / (1 + e^(-z))
3. h = 1 / (1 + e^(-1.0)) = 1 / (1 + 0.367879) = 0.731059

3.Calculate the output:


1. y = h × w₂ = 0.731059 × 0.5 = 0.365529

3/31/2025 @Prof Jim Liew 47


Forward pass, predicted value

3. Calculate the output:


1. y = h × w₂ = 0.731059 × 0.5 = 0.365529

4. Calculate the loss using Mean


Squared Error (MSE):
1. Loss = (1/2)(y - t)²
= (1/2)(0.365529 - 1.0)²
= (1/2)(-0.634471)²
= 0.201277

1. Formula: Loss(y, t) = (1/2)(y - t)²

3/31/2025 0.365529
@Prof Jim Liew 48
Backward Pass – The Chain Rule in Action

3/31/2025 @Prof Jim Liew 49


Backward Pass – The Chain Rule in Action

3/31/2025 @Prof Jim Liew 50


Backward Pass – The Chain Rule in Action

3/31/2025 @Prof Jim Liew 51


Backward Pass – The Chain Rule in Action

3/31/2025 @Prof Jim Liew 52


Backward Pass – The Chain Rule in Action

3/31/2025 @Prof Jim Liew 53


Visualizing the Chain Rule in Backprop,
for each weight get its gradient

3/31/2025 @Prof Jim Liew 54


Use the gradients to update the old weights,
however, we need the learning rate (hyper-parameter)

3/31/2025 @Prof Jim Liew 55


Effect of Weight Updates

3/31/2025 @Prof Jim Liew 56


Key Takeaways

• Backpropagation uses the Chain Rule from Calculus to efficiently


calculate how each wgt contributes to the error (loss)
• The Sigmoid function and its derivative are crucial for Neural
Networks with non-linear behavior
• Gradient descent uses the gradients to update wgts in a direction
that reduces the error (loss)
• Learning is iterative - the network gets better with each update,
gradually reducing the loss
• The learning rate controls how quickly the network adapts - too
small and learning is slow, too large and it might overshoot
3/31/2025 @Prof Jim Liew 57
Learning Rate Challenges

https://www.doc.ic.ac.uk/~nuric/posts/teaching/imperial-college-machine-learning-neural-networks
3/31/2025 @Prof Jim Liew 58
Gradient Descent

https://www.doc.ic.ac.uk/~nuric/posts/teaching/imperial-college-machine-learning-neural-networks
3/31/2025 @Prof Jim Liew 59
Under vs Over Fitting

3/31/2025 @Prof Jim Liew 60


Mitigate Over-Fit
Early Stopping – Regularization Technique

3/31/2025 @Prof Jim Liew 61


Training and Evaluation

.fit() train
.evaluate() to test
Overfitting and regularization techniques

3/31/2025 @Prof Jim Liew 62


Deploy Concepts

What does it mean to “deploy”?


Inference vs training (pre-training)
Stochastic vs Batch vs Mini-Batch

3/31/2025 @Prof Jim Liew 63


Steps to Train NN and Employ

Step 1: Setup and Import libraries


Step 2: Load and explore the MNIST dataset
• Shape & visualize
Step 3: Preprocess the data
• Normalization and Reshaping
Step 4: Build the NN Model
Step 5: Train the NN Model
Step 6: Visualize Training Process
Step 7: Evaluate the Test Data
Step 8: Make Predictions [Inference]
3/31/2025 @Prof Jim Liew 64

You might also like