Deep Learning
Deep Learning
Deep Learning
Module: 1
Preface
groundbreaking innovations.
Learning Objectives:
2. Real-world Applications
3. Perceptron Mastery
4. Activation Exploration
Data Science
1.7 Summary
1.8 Keywords
1.11 Reference
1.1 What is an Artificial Neural Network?
artificial neuron.
prominence.
advent of GPUs.
Data Science:
recognition.
creditworthiness.
Data Science
The Expanding Horizons: Why Neural Networks are Integral in
Data Science
reprogramming.
speech recognition.
error or noise.
efficient.
finance sector.
art.
otherwise.
generalises to a hyperplane.
Layers:
● Input Layer: The initial layer that directly receives input data.
of input features.
Neurons:
function.
target values.
as follows:
1. Sigmoid
Characteristics:
values.
Equation: f(x)=ex+e−xex−e−x
Characteristics:
Equation: f(x)=max(0,x)
Characteristics:
in CNNs.
constant.
Characteristics:
training.
a neuron can activate (or not) even if all its input weights are
zero.
For instance, consider a neuron with a sigmoid activation
boundaries.
network.
Optimal Performance
1.8 Keywords
science.
weight.
output of a neuron?
untreated.
A renowned eye hospital in Bengaluru realised that a large
retinal cameras and the deep learning model, reaching out to rural
This initiative not only streamlined the diagnostic process but also
ensured that individuals living in remote areas received timely care.
Questions:
3. How did the deep learning model benefit patients and the
1.11 References
Aaron Courville
Aggarwal
Deep Learning
Module: 2
Learning Objectives:
At the core of every neural network lies the artificial neuron, which
Structure:
neurons.
of the network.
Functionality:
data.
Synaptic Weights:
● Each connection, or synapse, between two neurons in a
values.
Importance:
The input layer is the initial layer in a neural network through which
data is introduced into the system. It's akin to the entry point for
Features:
image that's 28x28 pixels has 784 input features, hence 784
● Data Normalisation: Often, the data fed into the input layer is
normalisation.
Hidden layers reside between the input and output layers, capturing
and refining patterns and features from the input data to aid in
decision-making.
Features:
The output layer is the terminal layer of a neural network where the
Features:
● Characteristics:
computationally expensive.
distributed.
● Applications:
things.
make predictions.
● Limitations:
● Characteristics:
o Convolutional Layers: Use filters to scan an input for
● Applications:
tasks.
● Limitations:
previous steps.
● Characteristics:
memory.
● Applications:
machine translation.
● Limitations:
their training.
of this data.
layer.
boundaries.
▪ Defined as f(x)=max(0,x).
o Sigmoid:
▪ Equation: f(x)=1+e−x1.
▪ Historically popular for its 'S' shape and the fact that
1 and 1.
o Softmax:
multiple classes.
● Importance:
linear functions.
function with respect to each weight by applying the chain rule. This
is how deep learning models "learn" from the errors they make and
adapt accordingly.
error. This error, when spread across the network, is what will
quantifies how well the neural network's predictions align with the
the model.
for regression.
compute the gradient of the cost function and move in the opposite
aims to find the weight values that result in the smallest possible
error.
a mini-batch of samples.
around a minimum.
● Momentum:
rate.
training data:
● L1 and L2 Regularization:
(Ridge regression).
● Dropout:
● Early Stopping:
dataset.
dependencies.
overfitting.
2.8 Keywords
simplified.
internal state.
application?
Background:
Delhi, the capital of India, has been grappling with hazardous levels
of air pollution for the past few years. The worsening air quality,
model processed the spatial patterns from the images and the
temporal patterns from historical pollution data.
Questions:
3. How can this model be scaled or adapted for other cities facing
similar environmental challenges in India?
2.11 References
Courville
Aggarwal
Shanmugamani
Course: MSc DS
Deep Learning
Module: 3
Learning Objectives:
Structure:
Learning
the gradient of the loss with respect to the parameters, the model
(or a single point) for each update, leading to faster but noisier
convergence.
more reliably.
Solutions:
reduced or removed.
Solutions:
about finely tuning the settings under which the model learns. These
reduced loss.
convergence at all.
● Resource Optimization: With the proper settings, a model can
memory.
● Parameters:
network.
● Hyperparameters:
Model Performance
the learning rate dictates the size of the steps taken during
optimization.
diverge.
or exploding gradients.
● Weights: Starting with weights that are too small can lead
architecture.
optimal solution.
Descent.
problem.
networks.
● L1 Regularization (Lasso):
● L2 Regularization (Ridge):
magnitude of coefficients.
them to zero.
● Dropout:
Pros:
hyperparameters.
parallelize.
Cons:
When to Use:
When to Avoid:
Random Search:
Pros:
spaces.
Cons:
randomness.
a near-optimal solution.
has become imperative. One such advanced method that has gained
Tuning
hyperparameters.
optimised.
executed in parallel.
tools and platforms optimise these steps, aiming for the best
Training
domain knowledge.
While both manual tuning and AutoML have their merits, it's
of deep learning:
● Manual Tuning:
Advantages:
Limitations:
long process.
● AutoML:
Advantages:
Limitations:
3.7 Summary
performance.
3.8 Keywords
unseen data.
performance.
learning models?
Deep Learning
Introduction:
challenges due to diseases like cotton leaf curl and bacterial blight.
extensive damage.
Background:
A team of data scientists at the Indian Institute of Technology (IIT)
cotton leaves.
The team used a dataset of 10,000 images, with a 70-20-10 split for
photograph a cotton leaf, and the app would identify if the plant was
Questions:
performance?
3.11 References
Courville.
Kasam.
Course: MSc DS
Deep Learning
Module: 4
Learning Objectives:
Structure:
from input images. The name "convolutional" stems from the key
convolution.
CNN, the two functions being combined are the input data
granularity.
● Pooling Layers: Following the convolution operation, CNNs
overfitting.
The roots of CNNs can be traced back to the 1970s and 1980s,
digit recognition.
and patterns.
for a task directly from the data, optimising the entire process
extracted features.
convolved feature.
relationships.
of values.
values.
changes.
grid data such as images. Given the intrinsic nature of certain data
underfitting or overfitting.
range.
Pitfalls
overfitting.
fine-grained patterns.
easy-to-use interface.
Example:
model = tf.keras.models.Sequential([
tf.keras.layers.MaxPooling2D(2, 2),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(10, activation='softmax')
])
Example:
import torch.nn as nn
class SimpleCNN(nn.Module):
def __init__(self):
super(SimpleCNN, self).__init__()
self.pool = nn.MaxPool2d(2, 2)
x = self.pool(F.relu(self.conv1(x)))
x = x.view(-1, 32 * 15 * 15)
x = F.relu(self.fc1(x))
x = self.fc2(x)
return x
Sees
performed by CNNs.
● Early layers often detect basic features like edges and colours.
textures or shapes.
Benefits:
performing as expected.
Techniques:
made.
● Metrics:
labels.
● Techniques:
negative classifications.
Overfitting:
Underfitting:
preprocessing.
Vanishing/Exploding Gradients:
● Fine-tuning Techniques:
hyperparameters.
4.5 Introduction to RNNs
● Parameter Sharing: The same weights are used for each input,
ensuring consistent processing across different time steps and
Data
account for previous inputs, making them ill-suited for tasks where
well.
steps.
● Basic Architecture:
receives the current input but also the hidden state from
the previous time step, thereby incorporating historical
information.
sentence.
the information from one step in the sequence back into the
1. Accept an input.
hidden state).
3. Produce an output.
● Mathematical Perspective:
o At each time step 't', the hidden state htis computed as:
ht=σ(Whhht−1+Wxhxt+bh) Where:
While RNNs are powerful, they come with certain challenges, most
dependencies.
too large, leading to weight updates that are too dramatic and
minimum.
effectively.
they can only process sequences in one direction, typically from the
past to the present. This limitation might not be optimal for tasks
● Advantages:
● Drawbacks:
processing.
context.
durations.
● Advantages:
● Drawbacks:
mechanisms.
● Concept: GRUs utilise two gates: reset and update gates. The
reset gate determines how to combine new input with the
long-term dependencies.
● Advantages:
complexity.
● Drawbacks:
versa.
Considerations
preparation is essential.
● Sequence Length:
● Batch Size:
time.
training process.
3. Building RNN Architectures: From Simple RNN to LSTMs
● Simple RNN: This is the basic form where outputs from one
step are fed as inputs to the next. However, they suffer from
dependencies.
update).
performance.
● Bidirectional RNNs:
o Processes sequences from both start-to-end and
● Code Examples:
experience.
● Use Cases:
previous compositions.
activities or anomalies.
classification.
performance.
o Regularisation Techniques:
values.
plateaus or deteriorates.
on latent biases.
4.10 Summary
outputs.
structure.
performance.
4.11 Keywords
be crucial.
application?
retinopathy.
debilitating condition.
Questions:
deep learning?
real-world setting?
4.14 References
Courville
Aggarwal
Shanmugamani
Course: MSc DS
Deep Learning
Module: 5
Learning Objectives:
Variation Loss
The overall loss is a weighted sum of these three losses, and the
frame.
few.
Detection
object detection.
● Haar Cascades:
face detection.
content.
● Fast R-CNN:
coordinates.
● Faster R-CNN:
forward pass.
different resolutions.
Allows detection of objects at various scales.
● RetinaNet:
in object detection.
and resolutions.
the use of deep neural networks for tasks that involve large
amounts of data. Over the past few years, this discipline has seen
power.
Variations:
Personalised Treatments
Cities
Language Translation
5.3 Summary
located.
responsible AI development.
5.4 Keywords
structures or content.
● R-CNN and YOLO: R-CNN (Region-based Convolutional
Neural Networks) and YOLO (You Only Look Once) are both
YOLO, on the other hand, divides the image into a grid and
applications.
in Beijing
Introduction:
timings.
Background:
across the city. This vast dataset included vehicle counts, speed,
model were about 92% accurate, and the optimised traffic light
initial model.
Questions:
5.6 References
Aaron Courville
Charu Aggarwal
Shanmugamani