0% found this document useful (0 votes)
40 views28 pages

Deep Learning

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views28 pages

Deep Learning

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

What is deep learning

Deep learning is a subset of machine learning, which is a subset of artificial


intelligence. Artificial intelligence is a general term that refers to
techniques that enable computers to mimic human behavior. Machine
learning represents a set of algorithms trained on data that make all of this
possible. Deep learning is just a type of machine learning, inspired by the
structure of the human brain.

The first advantage of deep learning over machine learning is


the redundancy of the so-called feature extraction.
Here’s how it works: A more and more abstract and
compressed representation of the raw data is produced over
several layers of an artificial neural net. We then use this
compressed representation of the input data to produce the
result. The result can be, for example, the classification of the
input data into different classes.
❖​In a machine learning model, to determine if a particular image is
showing a car or not, we humans first need to identify the unique
features of a car (shape, size, windows, wheels, etc.), then extract the
feature and give it to the algorithm as input data. In this way, the
algorithm would perform a classification of the images. That is, in
machine learning, a programmer must intervene directly in the
action for the model to come to a conclusion.

❖​In the case of a deep learning model, the feature extraction step is
completely unnecessary. The model would recognize these unique
characteristics of a car and make correct predictions without human
intervention.
TensorFlow
TensorFlow is an open-source platform for machine learning and a
symbolic math library that is used for machine learning applications.

Keras
It is an Open Source Neural Network library that runs on top of Theano or
Tensorflow. It is designed to be fast and easy for the user to use. It is a
useful library to construct any deep learning algorithm of whatever choice
we want.

Difference between TensorFlow and Keras:


S.N
TensorFlow Keras
o

Tensorhigh-performan
1. ceFlow is written in Keras is written in Python.
C++, CUDA, Python.

TensorFlow is used for


large datasets and Keras is usually used for
2.
high performance small datasets.
models.

TensorFlow is a
framework that offers
3. Keras is a high-Level API.
both high and
low-level APIs.

TensorFlow is used for


Keras is used for
4. high-performance
low-performance models.
models.

In Keras framework, there


In TensorFlow
is only minimal
5. performing debugging
requirement for debugging
leads to complexities.
the simple networks.

TensorFlow has a Keras has a simple


6. complex architecture architecture and easy to
and not easy to use. use.

TensorFlow was Keras was developed by


7. developed by the François Chollet while he
Google Brain team. was working on the part of
the research effort of
project ONEIROS.

Processing Power
Deep learning can require significant processing power. Complex models
trained on bigdata datasets can take hours, days or even more to train. The
models we present in this chapter can be trained in minutes to just less
than an hour on computers with conventional CPUs. You’ll need only a
reasonably current personal computer. We’ll discuss the special
high-performance hardware called GPUs (Graphics Processing Units) and
TPUs (Tensor Processing Units) developed by NVIDIA and Google to meet
the extraordinary processing demands of edge-of-the-practice
deep-learning applications.
Keras Built-In Datasets
Here are some of Keras’s datasets (from the module
tensorflow.keras.datasets13) for practicing deep learning.
❖​ MNIST database of handwritten digits
Used for classifying handwritten digit images, this dataset contains
28-by-28 grayscale digit images labeled as 0 through 9 with 60,000 images
for training and 10,000 for testing. We use this dataset in Section 16.6,
where we study convolutional neural networks.
❖​Fashion-MNIST database of fashion articles
Used for classifying clothing images, this dataset contains 28-by-28
grayscale images of clothing labeled in 10 categories16 with 60,000 for
training and 10,000 for testing.
❖​IMDb Movie reviews—Used for sentiment analysis, this dataset contains
reviews labeled as positive (1) or negative (0) sentiment with 25,000
reviews for training and 25,000 for testing.
❖​CIFAR1018 small image classification
Used for small-image classification, this dataset contains 32-by-32 color
images labeled in 10 categories with 50,000 images for training and 10,000
for testing.
❖​CIFAR10019 small image classification
Also, used for small-image classification, this dataset contains 32-by-32
color images labeled in 100 categories with 50,000 images for training and
10,000 for testing.
Neural Network
A neural network is a computational model inspired by the
human brain, composed of layers of interconnected nodes
(neurons) that learn patterns from data.
Similar to the human brain that has neurons interconnected to
one another, artificial neural networks also have neurons that
are interconnected to one another in various layers of the
networks. These neurons are known as nodes.

The given figure illustrates the typical diagram of Biological Neural Network.

The typical Artificial Neural Network looks something like the given figure.
Dendrites from Biological Neural Network represent inputs in Artificial Neural
Networks, cell nucleus represents Nodes, synapse represents Weights, and
Axon represents Output.

Relationship between Biological neural network and artificial neural network:

Biological Neural Network Artificial Neural Network

Dendrites Inputs

Cell nucleus Nodes

Synapse Weights

Axon Output

Various types of layers available in an artificial neural network.

Artificial Neural Network primarily consists of three layers:

Input Layer:

As the name suggests, it accepts inputs in several different formats provided


by the programmer.

Hidden Layer:
The hidden layer presents in-between input and output layers. It performs all
the calculations to find hidden features and patterns.

Output Layer:

The input goes through a series of transformations using the hidden layer,
which finally results in output that is conveyed using this layer.

The artificial neural network takes input and computes the weighted sum of
the inputs and includes a bias. This computation is represented in the form of
a transfer function.

It determines the weighted total is passed as an input to an activation


function to produce the output. Activation functions choose whether a node
should fire or not. Only those who are fired make it to the output layer. There
are distinctive activation functions available that can be applied upon the sort
of task we are performing.

1. Basic Components
Neuron (Node): Performs a computation on inputs.
Formula:
z = sum(wi * xi) + b
a = Activation(z)
w1*x1+w2*x2+b, which can be represented like Ax1+Bx2+c
Which is a line equation.

Weights (w): Learnable parameters that scale inputs.


Bias (b): Allows shifting the activation function.
Activation Function:
Adds non-linearity (e.g., ReLU, Sigmoid, Tanh,Softmax).

Layers: -
Input Layer: Takes input features.
Hidden Layers: Perform computations.
Output Layer: Produces final prediction

2. Forward Propagation
The process of passing data through the network to make a
prediction.
3. Loss Function
Measures how far the prediction is from the actual value.
Examples:
MSE (Mean Squared Error) for regression.
Cross-Entropy for classification.
4. Forward Propagation
The process of passing data through the network to make a
prediction.

5. Optimization Algorithm
Gradient Descent: Updates weights to minimize loss.
Variants include: SGD, Adam, RMSProp.

6. Training Process
1. Initialize weights randomly.
2. Forward pass.
3. Compute loss.
4. Backward pass.
5. Update weights.
6. Repeat for many epochs.

7. Overfitting & Regularization


Overfitting: Model memorizes training data.
Solutions:
- Dropout
- L2 Regularization
- Early Stopping

8. Popular Architectures
- Feedforward Neural Network (FNN)
- Convolutional Neural Network (CNN) for images
- Recurrent Neural Network (RNN) for sequences
- Transformer for NLP and more
Tensor
The main use of a tensor is to hold and manipulate data in
deep learning and machine learning.

Tensor Dimensions Example Shape


Type

0D Just a single number 5 () (no


Tensor dimensio
(Scalar n)
)
1D A list of numbers [1, 2, 3, 4] (4,)
Tensor
(Vector
)

2D Table of numbers [[1, 2], [3, (3, 2)


Tensor (rows × columns) 4], [5, 6]]
(Matrix
)

3D A "stack" of matrices [[[1,2], (2, 2,


Tensor (like a cube) [3,4]], [[5,6], 2)
[7,8]]]

4D Batch of 3D tensors Imagine a batch (10,


Tensor (common in deep of 10 RGB images, 32, 32,
learning for images) each 32×32 pixels 3)

5D Batch of videos 5 videos, each (5, 20,


Tensor (frames) — each with 20 frames, 64, 64,
video has multiple each frame is 3)
frames, each frame is 64×64 with 3
an image color channels

Quick Visual Feel:

●​ 0D → Single point​

●​ 1D → Line of numbers​

●​ 2D → Table of numbers (rows and columns)​

●​ 3D → Cube (stack of tables)​

●​ 4D → Collection of cubes (batch of images)​


●​ 5D → Collection of moving cubes (batch of videos)​

High-Performance Processors Powerful processors are needed


for real-world deep learning because the size of tensors can be
enormous and large-tensor operations can place crushing
demands on processors. The processors most commonly used
for deep learning are:
• NVIDIA GPUs (Graphics Processing Units)—Originally
developed by companies like NVIDIA for computer gaming,
GPUs are much faster than conventional CPUs for processing
large amounts of data, thus enabling developers to train,
validate and test deep-learning models more efficiently—and
thus experiment with more of them. GPUs are optimized for
the mathematical matrix operations typically performed on
tensors, an essential aspect of how deep learning works “under
the hood.” NVIDIA’s Volta Tensor Cores are specifically
designed for deep learning.31,32 Many NVIDIA GPUs are
compatible with TensorFlow, and hence Keras, and can enhance
the performance of your deep-learning models.33
• Google TPUs (Tensor Processing Units)—Recognizing that
deep learning is crucial to its future, Google developed TPUs
(Tensor Processing Units), which they now use in their Cloud
TPU service, which “can provide up to 11.5 petaflops of
performance in a single pod”34 (that’s 11.5 quadrillion
floating-point operations per second). Also, TPUs are designed
to be especially energy efficient. This is a key concern for
companies like Google with already massive computing clusters
that are growing exponentially and consuming vast amounts of
energy.
ANN
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import load_iris
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Input
from tensorflow.keras.layers import Dense
from tensorflow.keras.utils import to_categorical

iris = load_iris()
X = iris.data # Features (4 features: Sepal length, Sepal
width, Petal length, Petal width)
y = iris.target # Labels (0, 1, 2)
# Check data
print("Features shape:", X.shape)
print("Labels shape:", y.shape)
output:
Features shape: (150, 4)
Labels shape: (150,)

# Standardize features (important for ANN)


scaler = StandardScaler()
X = scaler.fit_transform(X)
# Convert labels to one-hot encoding
y = to_categorical(y)
# Split dataset into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.2, random_state=42)

# Initialize the model


model = Sequential()

# Add input layer separately


model.add(Input(shape=(4,))) # <-- input defined here
# Add first hidden layer
model.add(Dense(10, activation='relu')) # 10 neurons

# Add second hidden layer


model.add(Dense(8, activation='relu')) # 8 neurons

# Add output layer


model.add(Dense(3, activation='softmax')) # 3 classes
Code Explanation:
Input layer
●​ This explicitly defines the input shape to the model.
●​ shape=(4,) means the model expects an input vector of 4 features.
●​ The Input layer doesn't contain any neurons or weights — it just defines
the expected input format.
1st Hidden layer
●​ Dense(10) creates a fully connected layer with 10 neurons.
●​ Each neuron receives all 4 inputs.
●​ activation='relu' applies the Rectified Linear Unit activation function,
which helps the network learn non-linear patterns:​
ReLU(x)=max⁡(0,x)
2nd Hidden Layer
●​ Adds another Dense layer with 8 neurons, each connected to the 10
outputs from the previous layer.
●​ Again uses ReLU activation for non-linearity.
Output Layer
●​ This layer has 3 neurons, suitable for a 3-class classification
problem.
●​ activation='softmax' turns the outputs into probabilities that sum to
1.
●​ The neuron with the highest probability is typically selected as the
predicted class.
# Compile the model
model.compile(optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy'])

# Train the model


history = model.fit(X_train, y_train,
epochs=50,
batch_size=8,
validation_split=0.1, # use 10% of training data
for validation
verbose=1)
Output:

# Evaluate on test data


loss, accuracy = model.evaluate(X_test, y_test, verbose=0)
print("Test Accuracy: {:.2f}%".format(accuracy * 100))
Output:
Test Accuracy: 100.00%

# Plot training & validation accuracy values


plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.title('Model Accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend(['Train', 'Validation'], loc='lower right')
plt.show()

# Plot training & validation loss values


plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('Model Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend(['Train', 'Validation'], loc='upper right')
plt.show()

Output:
predictions = model.predict(np.array([[5.1, 3.5, 1.4, 0.2]]))
print(predictions)
Output:
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 35ms/step
[[0.59903073 0.37916926 0.0218 ]]

Convolutional Neural Networks for Vision;


MultiClassification with the MNIST Dataset

In deep learning, particularly in Convolutional Neural


Networks (CNNs), filters (also called kernels) are crucial
components that help detect patterns in data, typically in
images. The convolution layer applies a filter (kernel) over the
input image/tensor to extract features.

Types of Filters in Deep Learning

1. Edge Detection Filters


Used in the early layers of CNNs to detect edges (boundaries)
in images.

●​ Sobel Filter: Detects edges in horizontal or vertical


directions.​

●​ Horizontal Sobel:

[-1 0 1]
[-2 0 2]
[-1 0 1]

●​ Vertical Sobel:

[-1 -2 -1]
[ 0 0 0]
[ 1 2 1]

2. Sharpening Filter

●​ Enhances the edges or fine details in an image.

[ 0 -1 0]
[-1 5 -1]
[ 0 -1 0]

3. Blurring / Smoothing Filter

Reduces noise and detail.

●​ Gaussian Filter: Uses a Gaussian function to give more


weight to center pixels.​

4. Learned Filters

In CNNs, most filters are learned during training, rather than


predefined. For example:
●​ Early layers might learn filters that detect edges or simple
textures.​

●​ Intermediate layers may detect patterns like corners,


shapes, or parts of objects.​

●​ Deep layers capture high-level features like faces, objects,


or scene elements.​

5. Depthwise Filters (in MobileNets)

Each filter works on a single input channel, improving


computational efficiency.

6. Dilated (Atrous) Filters

Spread out across the input with gaps (dilations), useful for
capturing wider context without losing resolution (common in
segmentation tasks).

7. Separable Filters

Decompose a filter into simpler, smaller convolutions (e.g.,


depthwise separable convolutions in Xception), reducing
computational cost.
Inputs:
Input tensor (e.g., 2D image): size 𝐻×𝑊
Filter/kernel: size 𝑓×𝑓
Stride s: how many steps the filter moves
Padding p: how many zeros are added around the input

𝐻+2𝑃−𝑓
Output size formula:Output height/width=⌊ 𝑆
⌋+1

Steps:

1.​Position the filter on the input tensor.​

2.​Compute element-wise multiplication between the filter


and the input slice.​

3.​Sum the results → this is a single value in the output.​

4.​Slide the filter based on stride and repeat.

Input image (4×4):


[[1, 2, 0, 1],
[3, 1, 2, 2],
[1, 0, 1, 3],
[2, 1, 2, 1]]

Filter/kernel (2×2):
[[1, 0],
[0, -1]]

Step-by-step:

1. Convolution (stride=1, padding=0)

Slide the 2×2 filter over the input and compute:

conv=∑(element-wise product)
Example at top-left (first 2×2 region):
Input slice:
[[1, 2],
[3, 1]]

Filter:
[[1, 0],
[0, -1]]

= (1*1 + 2*0 + 3*0 + 1*(-1)) = 1 + 0 + 0 - 1 = 0

Repeat this for each region → Output will be 3×3.

2. Apply ReLU

Replace all negatives with 0.

3. Apply 2×2 Max Pooling (stride=1) on ReLU output.

Output:
1. Convolution Output:
[[ 0. 0. -2.]
[ 3. 0. -1.]
[ 0. -2. 0.]]

2. After ReLU (Negative values replaced with 0):


[[0. 0. 0.]
[3. 0. 0.]
[0. 0. 0.]]

3. After 2×2 Max Pooling (stride=1):


[[3. 0.]
[3. 0.]]

Program:
import numpy as np

# Input and filter


input_img = np.array([
[1, 2, 0, 1],
[3, 1, 2, 2],
[1, 0, 1, 3],
[2, 1, 2, 1]
])

kernel = np.array([
[1, 0],
[0, -1]
])

# Convolution
def convolve2d(image, kernel, stride=1):
k = kernel.shape[0]
output_dim = (image.shape[0] - k) // stride + 1
output = np.zeros((output_dim, output_dim))
for i in range(0, output_dim):
for j in range(0, output_dim):
region = image[i:i+k, j:j+k]
output[i, j] = np.sum(region * kernel)
return output

# ReLU
def relu(x):
return np.maximum(0, x)

# Max Pooling
def max_pooling(image, pool_size=2, stride=1):
output_dim = (image.shape[0] - pool_size) // stride + 1
output = np.zeros((output_dim, output_dim))
for i in range(0, output_dim):
for j in range(0, output_dim):
region = image[i:i+pool_size, j:j+pool_size]
output[i, j] = np.max(region)
return output

# Run pipeline
conv_output = convolve2d(input_img, kernel)
relu_output = relu(conv_output)
pool_output = max_pooling(relu_output)

# Print results
print("Convolution Output:\n", conv_output)
print("ReLU Output:\n", relu_output)
print("Max Pooling Output:\n", pool_output)
Output:
Convolution Output:
[[ 0. 0. -2.]
[ 3. 0. -1.]
[ 0. -2. 0.]]
ReLU Output:
[[0. 0. 0.]
[3. 0. 0.]
[0. 0. 0.]]
Max Pooling Output:
[[3. 0.]
[3. 0.]]

Convolutional Neural Networks for Vision;


MultiClassification with the MNIST Dataset
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D,
Flatten, Dense, Dropout
from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical

(X_train, y_train), (X_test, y_test) = mnist.load_data()


print(X_train.shape)
print(y_train.shape)
print(X_test.shape)
print(y_test.shape)

Output:
(60000, 28, 28)
(60000,)
(10000, 28, 28)
(10000,)
Visualizing Digits
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
sns.set(font_scale=2)

import numpy as np
index = np.random.choice(np.arange(len(X_train)), 24,
replace=False)
figure, axes = plt.subplots(nrows=4, ncols=6, figsize=(16, 9))
for item in zip(axes.ravel(), X_train[index], y_train[index]):
axes, image, target = item
axes.imshow(image, cmap=plt.cm.gray_r)
axes.set_xticks([]) # remove x-axis tick marks
axes.set_yticks([]) # remove y-axis tick marks
axes.set_title(target)
plt.tight_layout()
Output:
Data Preprocessing
# Reshape to (num_samples, height, width, channels)
X_train = X_train.reshape(60000, 28, 28, 1)
X_test = X_test.reshape(10000, 28, 28, 1)

print(X_train.shape, X_test.shape)

Output:
(60000, 28, 28, 1) (10000, 28, 28, 1)
#Normalizing the Image Data
X_train = X_train.astype('float32') / 255
X_test = X_test.astype('float32') / 255

# Convert labels to one-hot vectors


y_train = to_categorical(y_train, num_classes=10)
y_test = to_categorical(y_test, num_classes=10)

Build the CNN Model


model = Sequential([
Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28,
1)),
MaxPooling2D(pool_size=(2, 2)),

Conv2D(64, (3, 3), activation='relu'),


MaxPooling2D(pool_size=(2, 2)),

Flatten(),
Dense(128, activation='relu'),
Dropout(0.5),
Dense(10, activation='softmax') # 10 classes for digits 0-9
])
model.summary()
Output:
Model: "sequential"

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━
━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type) ┃ Output Shape ┃ Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━
━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ conv2d (Conv2D) │ (None, 26, 26, 32) │ 320 │
├─────────────────────────────────┼────────────────────────┼
───────────────┤
│ max_pooling2d (MaxPooling2D) │ (None, 13, 13, 32) │ 0│
├─────────────────────────────────┼────────────────────────┼
───────────────┤
│ conv2d_1 (Conv2D) │ (None, 11, 11, 64) │ 18,496 │
├─────────────────────────────────┼────────────────────────┼
───────────────┤
│ max_pooling2d_1 (MaxPooling2D) │ (None, 5, 5, 64) │ 0│
├─────────────────────────────────┼────────────────────────┼
───────────────┤
│ flatten (Flatten) │ (None, 1600) │ 0│
├─────────────────────────────────┼────────────────────────┼
───────────────┤
│ dense (Dense) │ (None, 128) │ 204,928 │
├─────────────────────────────────┼────────────────────────┼
───────────────┤
│ dropout (Dropout) │ (None, 128) │ 0│
├─────────────────────────────────┼────────────────────────┼
───────────────┤
│ dense_1 (Dense) │ (None, 10) │ 1,290 │
└─────────────────────────────────┴────────────────────────┴
───────────────┘

Total params: 225,034 (879.04 KB)

Trainable params: 225,034 (879.04 KB)

Non-trainable params: 0 (0.00 B)

Compile the Model


model.compile(optimizer='adam',
loss='categorical_crossentropy', # use
categorical_crossentropy with one-hot labels
metrics=['accuracy'])

Train the Model


model.fit(X_train, y_train, epochs=5, batch_size=64,
validation_split=0.1)

Output:
Epoch 1/5
844/844 ━━━━━━━━━━━━━━━━━━━━ 41s
47ms/step - accuracy: 0.8279 - loss: 0.5348 -
val_accuracy: 0.9837 - val_loss: 0.0565
Epoch 2/5
844/844 ━━━━━━━━━━━━━━━━━━━━ 39s
46ms/step - accuracy: 0.9717 - loss: 0.0967 -
val_accuracy: 0.9888 - val_loss: 0.0415
Epoch 3/5
844/844 ━━━━━━━━━━━━━━━━━━━━ 41s
46ms/step - accuracy: 0.9808 - loss: 0.0645 -
val_accuracy: 0.9893 - val_loss: 0.0332
Epoch 4/5
844/844 ━━━━━━━━━━━━━━━━━━━━ 39s
46ms/step - accuracy: 0.9847 - loss: 0.0521 -
val_accuracy: 0.9910 - val_loss: 0.0319
Epoch 5/5
844/844 ━━━━━━━━━━━━━━━━━━━━ 41s
46ms/step - accuracy: 0.9873 - loss: 0.0428 -
val_accuracy: 0.9917 - val_loss: 0.0327

<keras.src.callbacks.history.History at
0x78845d371910>
Evaluate the Model
import time # Import the time module
t1=time.time()
test_loss, test_acc = model.evaluate(X_test, y_test)
print(f"Test accuracy: {test_acc:.4f}")
t2=time.time()
print(f"Total time taken: {t2-t1:.2f}")
Output:
313/313 ━━━━━━━━━━━━━━━━━━━━ 2s
7ms/step - accuracy: 0.9857 - loss: 0.0390
Test accuracy: 0.9900
Total time taken: 2.63
Make Predictions
predictions = model.predict(X_test)
print(f"Prediction for first test image:
{tf.argmax(predictions[0]).numpy()}")
Output:
313/313 ━━━━━━━━━━━━━━━━━━━━ 2s
7ms/step
Prediction for first test image: 7

Dimensionality Reduction
Dimensionality reduction is the process of reducing the
number of features (dimensions) in a dataset while
preserving as much important information as possible.

You might also like