Notes : deep learning
Saturday, May 24, 2025 9:55 PM
What is Deep Learning?
Deep Learning is a subset of machine learning that uses algorithms inspired by the human brain —
called neural networks — to model and solve complex tasks like image recognition, speech
translation, and game playing.
Key Idea: Learn patterns in data through multiple layers.
Neural Network Basics
What is a Neural Network?
A neural network is made up of neurons (nodes) arranged in layers:
• Input Layer – takes features (like pixels or measurements)
• Hidden Layers – extract patterns
• Output Layer – produces prediction
Each connection has a weight, and each neuron has a bias. The network learns by adjusting these to
reduce error.
What is a Perceptron?
A perceptron is the simplest building block of a neural network. It takes multiple inputs, multiplies
each by a weight, adds a bias, and passes the result through an activation function to produce an
output.
Formula:
z=w1x1+w2x2+…+wnxn+b
y^=Activation(z)
Structure of a Perceptron
Inputs: x1 x2 x3
│ │ │
w1│+ w2│+ w3│
└─────┬──────┘
▼
Weighted Sum + Bias
▼
Activation Function (e.g. Step, ReLU)
▼
Output
Activation Functions
These add non-linearity to the network, enabling it to learn complex patterns.
Function Formula Output Range Notes
Step 1 if z > 0 else 0 0 or 1 Used in original perceptron
Sigmoid 1/1+e−z (0, 1) Good for probabilities
Tanh ez−e−z /ez+e−z (-1, 1) Zero-centered
ReLU max(0, z) [0, ∞) Most common in DL
Perceptron Learning Rule
We adjust weights to reduce prediction error:
wi := wi + η(y−y^)xi
b := b+η(y−y^)
Where:
• η= learning rate (small number like 0.01)
• y= actual output
• y^= predicted output
Limitation: XOR Problem
A single perceptron cannot solve XOR because XOR is not linearly separable.
Solution? ➤ Multi-Layer Perceptrons (MLPs)
What is an MLP?
A Multi-Layer Perceptron (MLP) is a neural network with:
• An input layer
• One or more hidden layers
• An output layer
Each layer is made of neurons (perceptrons), fully connected to the next layer.
Why Use Multiple Layers?
• One layer: Can only learn linear boundaries (e.g., line)
• Multiple layers: Can learn complex and non-linear functions (e.g., XOR, image patterns)
Activation Functions in Hidden Layers
Function Why Use It?
ReLU Fast & helps avoid vanishing gradient
Tanh Zero-centered output
Sigmoid Good for probabilities, but prone to vanishing gradients
Backpropagation + Gradient Descent
This is how the network learns.
Steps:
1. Compute the loss.
2. Backward pass: Use backpropagation to compute gradients of weights.
3. Update weights using gradient descent.
w:=w−η (∂Loss/∂w )
Forward Propagation
This is how input flows through the network to generate predictions.
Steps:
1. Multiply inputs by weights.
2. Add bias.
3. Apply activation.
4. Pass result to next layer.
Repeat until the output layer gives the final result y^.
Vocabulary:
• Backpropagation
It’s the brain of learning.
• After seeing how wrong the output is (using the cost function), backpropagation goes
backward through the network.
• It calculates how much each weight contributed to the error, so it can be adjusted properly.
• Gradient Descent
This is the method the model uses to learn.
• It adjusts the weights in small steps to reduce the error.
• Like walking downhill to reach the lowest point (minimum cost).
• Learning Rate
It controls how big the steps are in gradient descent.
• If too small, training is slow.
• If too big, the model might skip the best solution.
• Batches
We don't feed the entire dataset at once into the network. Instead, we split it into smaller parts
called batches.
• Helps with memory and speeds up training.
• Epochs
One epoch means the model has seen all the data once.
• Training usually takes many epochs so the model can learn well.