0% found this document useful (0 votes)
723 views

Deep Learning TensorFlow and Keras

Uploaded by

Zakaria Allito
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
723 views

Deep Learning TensorFlow and Keras

Uploaded by

Zakaria Allito
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 454

Deep learning with

TensorFlow and Keras


How to build artificial neural networks with Keras and TensorFlow

Deep learning with TensorFlow and Keras

What is deep learning?

What is an activation function?

Sigmoid function

Softmax activation function

Rectified linear unit (ReLU)

How does a neural network learn?

Gradient descent

How backpropagation works

What is TensorFlow?

Why TensorFlow?

TensorFlow vs. Keras

TensorFlow basics

Tensors

Variables

Automatic differentiation

How TensorFlow works

How TensorFlow models are defined


How to train artificial neural networks with Keras

Data pre-processing

Data transformation

How to build the artificial neural network

How to visualize model performance

Add dropout regularization to fight overfitting

How to accelerate network training with batch normalization

How to stop model training at the right time with early stopping

How to save the best model with checkpoints

Make predictions on the test set

Check the confusion matrix

Make a single prediction

How to save and load Keras models

How to evaluate the Keras model with cross-validation

Deep learning with TensorFlow and Keras

How to tune model hyperparameters in Keras

How to tune the network parameters

Final thoughts

How to build CNN in TensorFlow


What is CNN?

How do CNNs work?

Convolution

Padding

Apply ReLU

Pooling

Dropout regularization

Flattening

Full connection

Activation function

Convolutional Neural Networks (CNN) in TensorFlow

How to install TensorFlow

How to confirm TensorFlow is installed

What are Keras and tf.keras?

Develop multilayer CNN models

Data preprocessing

Model definition

Compiling the model

Train the model

How to plot model learning curves


Model evaluation

How to halt training at the right time with Early Stopping

How to accelerate training with batch normalization

How to create custom callbacks for TensorFlow CNN

How to visualize a deep learning model

How to save and load your model

Running CNNs with TensorFlow in the real world

Deep learning with TensorFlow and Keras

Loading the images

Generate a tf.data.Dataset

Buffered dataset prefetching

Image augmentation

Model definition

Compiling the model

Training the model

Model evaluation

Monitoring the model’s performance

Visualize CNN graph with TensorBoard

How to profile with TensorBoard


Making predictions

CNN architectures

Model without weights

Model with weights

Final thoughts

TensorFlow Recurrent Neural Networks

What is a Recurrent Neural Network?

Backpropagation through time

Types of Recurrent Neural Networks

Weaknesses of RNNs

1. Vanishing gradient problem

2. Exploding gradient problem

Long-Short Term Memory (LSTM)

Applications of LSTM

Bidirectional LSTM

Time series analysis with LSTM in TensorFlow

Imports

Data pre-processing

Create LSTM network in Keras

Compile the LSTM model


Deep learning with TensorFlow and Keras

LSTM model evaluation

Intent classification with LSTM

Imports

Load dataset

Data cleaning

Label exploration

Text vectorization

Create LSTM network

LSTM model evaluation

Final thoughts

Transfer learning guide

What is transfer learning?

Advantages of using pre-trained models

Types of transfer learning

Inductive transfer learning

Unsupervised transfer learning

Transductive transfer learning

Homogeneous transfer learning


Heterogeneous transfer learning

What is the difference between transfer learning and fine-tuning?

Why use transfer learning?

When do you use transfer learning?

When does transfer learning not work?

How to implement transfer learning?

Transfer learning in 6 steps

Step 1: Obtain the pre-trained model

Step 2: Create a base model

Step 3: Freeze layers so they don’t change during training

Step 4: Add new trainable layers

Step 5: Train the new layers on the dataset

Step 6: Improve the model via fine-tuning

Deep learning with TensorFlow and Keras

Where to find pre-trained models?

Keras pre-trained models

Transfer learning using TensorFlow Hub

Pretrained word embeddings

Stanford’s GloVe pre-trained word embeddings


Below is an example of an implementation for the GloVe pre-

trained word embeddings.

Google’s Word2vec

Fasttext

Hugging Face

Transfer learning with PyTorch

How can you use pre-trained models?

Prediction

Feature extraction

Fine-tuning

Example of transfer learning for images with Keras

Transfer learning with image data

Getting the dataset

Loading the dataset from a directory

Data pre-processing

Create a base model from the pre-trained Inception model

Create the final dense layer

Train the model

Fine-tuning the model

Example of transfer learning with natural language processing


Pretrained word embeddings

Loading the dataset

Data pre-processing

Vectorizing the words

Using GloVe Embeddings

Create the embedding layer

Create the model

Deep learning with TensorFlow and Keras

Training the model

Final thoughts

TensorBoard

Advantages of using Tensorboard

How to use TensorBoard

How to install TensorBoard

PIP installation

Conda installation

Docker installation

Using TensorBoard with Jupyter notebooks and Google Colab

How to run TensorBoard


How to use TensorBoard callback

How to launch TensorBoard

Running TensorBoard remotely

TensorBoard dashboards

TensorBoard scalars

TensorBoard images

TensorBoard graphs

TensorBoard distributions

TensorBoard histograms

Fairness indicators

What-If Tool (WIT)

Displaying data in TensorBoard

Using the TensorBoard embedding projector

Plot training examples with TensorBoard

Visualize images in TensorBoard

Displaying text data in TensorBoard

Log confusion matrix to TensorBoard

Hyperparameter tuning with TensorBoard

TensorFlow Profiler

Overview page
Deep learning with TensorFlow and Keras

Trace viewer

Input pipeline analyzer

TensorFlow stats

GPU kernel stats

Memory profile page

How to enable debugging on TensorBoard

Using TensorBoard with deep learning frameworks

TensorBoard in PyTorch

TensorBoard in Keras

TensorBoard in XGBoost

TensorBoard in JAX and Flax

Download TensorBoard data as Pandas DataFrame

Tensorboard.dev

Limitations of using TensorBoard

Final thoughts

How to build TensorFlow models with the Keras Functional API

Keras Sequential models

Keras Functional models


Defining input

Connecting layers

Functional API Python syntax

Creating the model

Training and evaluation of Functional API models

Save and serialize Functional API models

How to convert a Functional model to a Sequential API model

How to convert a Sequential model to a Functional API model

Standard network models

Multilayer perception

Convolutional Neural Network

Recurrent Neural Network

Shared layers model

Deep learning with TensorFlow and Keras

Shared input layer

Shared feature extraction layer

Multiple input and output models

Multiple input model

Multiple output model


Use the same graph of layers to define multiple models

Keras Functional API end-to-end example

Data download

Data processing

Add image path column

Create face attributes columns

Label encoding

Generate tf.data dataset

Visualize the training data

Define Keras Functional network

Plot and inspect the Keras Functional model

Compile Keras Functional network

Training the Functional network

Evaluate TensorFlow Functional network

Make predictions with Keras Functional model

Keras Functional API strengths and weaknesses

Functional API best practices

Final thoughts

How to create custom training loops in Keras

Obtain dataset
Data processing

Batch the dataset

How to create model with custom layers in Keras

Define the loss function

Define the gradients function

Create an optimizer

Deep learning with TensorFlow and Keras

Create custom training loop

Visualize the loss

Evaluate model on test dataset

Use the trained model to make predictions

Final thoughts

How to train deep learning models on Apple Silicon GPU

Training deep learning models on Apple Silicon

TensorFlow

Install Tensorflow-metal PluggableDevice

Train TensorFlow model on Apple Silicon GPU

PyTorch

Final thoughts
Object detection with TensorFlow 2 Object detection API

Object detection datasets

Preparing datasets for object detection

What is TensorFlow 2 Object Detection API?

Install TensorFlow 2 Object Detection API on Google Colab

Install TensorFlow 2 Object Detection API locally

Download object detection dataset

Download Mask R-CNN model

Edit the object detection pipeline config file

Convert the images to TFRecords

Train the model

Model evaluation and visualization

Download model from Google Colab

Object detection with Mask R-CNN

Load an image from file into a NumPy array

Visualize detections

Create model from the last checkpoint

Map labels for inference decoding

Run detector on test image

Deep learning with TensorFlow and Keras


10

Image segmentation with Mask R-CNN

Set label map

Set test image paths

Create inference function

Perform segmentation and detection

Final thoughts

Appendix

Disclaimer

Copyright

Other things to learn

GitHub repo containing all the code

https://github.com/mlnuggets/tensorflow

How to build artificial neural

networks with Keras and

TensorFlow

Building artificial neural networks

with TensorFlow and Keras requires understanding some key concepts.


After learning these concepts, you'll install

TensorFlow and start designing neural networks. This article


will cover the concepts you need to comprehend to build neural

networks in TensorFlow and Keras. Without further ado, let's

get the ball rolling.

What is deep learning?

Deep learning with TensorFlow and Keras

11

Deep learning is a branch of machine learning that involves building


networks that try to mimic the working of the human

brain. The dendrite in the human brain represents the input to

the network, while the axion terminals represent the output. The

cell is where computation would take place before we get the

output.

The image below shows a simple network with an input,

hidden, and output layer. A network with multiple hidden layers

is called a deep neural network.

Deep learning with TensorFlow and Keras

12
Random weights and biases are initialized when data is passed

to a network. Some computation happens in the hidden layers

leading to output.

This computation involves multiplying the input by the weights


and adding the bias. This is what gives the output. The bias

ensures that there is no zero output in case the input and

weights are zero.

Deep learning with TensorFlow and Keras

13

There are various ways of initializing the weights and biases.

The common ones include:

Initialize with ones.

Initialize with zeros.

Use a uniform distribution.

Apply a normal distribution.

What is an activation function?

The desired output of a neural network depends on the problem

being solved. For instance, in a regression problem, the output

should be a number predicting the quantity in question.


However, in classification problems, it's more desirable to

output a probability that is used to determine the category of

the prediction. Therefore, to make sure the network outputs the

desired results, we pass the computed result through a function

that ensures that the result is within a specific range. For

example, for probabilities, this number is between 0 and 1. We

can't have negative probabilities. The function responsible for

capping the result is known as an activation function.

Deep learning with TensorFlow and Keras

14

From the above image, we can conclude that the activation

function determines the neural network's output. Let's,

therefore, mention some of the common activation functions in

the deep learning realm.

Sigmoid function
The sigmoid activation function caps output to a number

between 0 and 1 and is majorly used for binary classification

tasks. Sigmoid is used where the classes are non-exclusive.

For example, an image can have a car, a building, a tree, etc.

Just because there is a car in the image doesn’t mean a tree

can’t be in the picture. Use the sigmoid function when there is

more than one correct answer.

Softmax activation function

The softmax activation function is a variant of the sigmoid

function used in multi-class problems where labels are mutually

exclusive. For example, a picture is either grayscale or color.

Use the softmax activation when there is only one correct

answer.

Deep learning with TensorFlow and Keras

15

Rectified linear unit (ReLU)

The Rectified linear unit (ReLU) activation function limits

the output to 0 and above. It is used in the hidden layer of

neural networks. It, therefore, ensures no negative outputs from

the hidden layers.


How does a neural network

learn?

A neural network learns by evaluating predictions against the

true values and adjusting the weights. The objective is to obtain

the weights that minimize the error, also known as the loss

function or cost function. The choice of a loss function,

therefore, depends on the problem. Classification tasks require

classification loss functions, while regression problems require

regression loss functions. As the network learns, the loss

functions should decrease.

You might see nans in the loss function while training the

network. This means that the network is not learning. In most

cases, nans will be developer errors, meaning that there is

something you have done or failed to do that is causing the

nans. For example:

The training data contains nans.

You have not scaled the data.

Performing operations that lead to nans, for example,

division by zero or the square root of a negative number.

Deep learning with TensorFlow and Keras


16

Choosing the wrong optimizer function.

Gradient descent

We have just mentioned that choosing the wrong optimizer

could result in nans. So what is an optimizer function?

During the training process, errors are reduced by the optimizer

function. This optimization is done via gradient descent.

Gradient descent adjusts the errors by reducing the cost

function. This is done by computing where the error is at its

minimum, commonly known as the local minimum. You can

think of this as descending on a slope where the goal is to get

to the bottom of the hill, that is, the global minimum. This

process involves computing the slope of a specific point on the

"hill" via differentiation.

Deep learning with TensorFlow and Keras

17
How backpropagation works

The computed errors are passed to the network, and the

weights are adjusted. This process is known

as backpropagation.

Deep learning with TensorFlow and Keras

18
There are several variants of gradient descent. They include:

Batch Gradient Descent that uses the entire dataset to

compute the gradient of the cost function. It is slow since

you have to compute the gradient of the entire dataset to

perform a single update.

Stochastic Gradient Descent where the gradient of the

cost function is computed from a single training example in

every iteration. It is faster.

Mini-Batch Gradient Descent that uses a sample of the


training data to compute the gradient of the cost function.

Deep learning with TensorFlow and Keras

19

What is TensorFlow?

TensorFlow is an open-source deep learning framework that

enables us to design and train deep learning

networks. TensorFlow can be installed from the Python Index

via the pip command. TensorFlow is already installed

on Google Colab. You will, therefore, not install it when working in this
environment.

# Requires the latest pip

pip install --upgrade pip

# Current stable release for CPU and GPU

pip install tensorflow

# Or try the preview build (unstable)

pip install tf-nightly

You can also install TensorFlow using Docker. Docker is the easiest way to
install TensorFlow on Linux if GPU support is

desired.

docker pull tensorflow/tensorflow:latest # Download latest st

able image
docker run -it -p 8888:8888 tensorflow/tensorflow:latest-jupyt

er # Start Jupyter server

Follow these instructions to install TensorFlow on Apple

arm64 machines. This will enable you to train models with

GPUs on Mac.

Deep learning with TensorFlow and Keras

20

Why TensorFlow?

There are a couple of reasons why you would choose

TensorFlow:

Has a high-level API that makes it easy to build networks.

Large ecosystem of tools and libraries.

Large community that makes it easy to find solutions to

common problems.

Well documented.

Supports deployment of models on the browser, mobile

devices, edge devices, and the cloud.

Simple and flexible architecture to make research work

faster.

TensorFlow vs. Keras


As of TensorFlow 2, Keras is the high-level API

for TensorFlow. Keras makes it simple to design and train deep

learning networks.

TensorFlow basics

In a moment, we'll be designing neural networks with

TensorFlow. However, getting some TensorFlow basics out of

the way is essential before we get there.

Deep learning with TensorFlow and Keras

21

Tensors

TensorFlow Tensors are multi-dimensional arrays similar

to NumPy arrays. Tensors are immutable, meaning they can not be updated
once created. Tensors can contain integers,

floats, strings, and even complex numbers.

import tensorflow as tf

import numpy as np

x = tf.constant([[7., 8., 9.],[10., 11., 12.]])

print(x)

print(x.shape)

print(x.dtype)
# tf.Tensor(

# [[ 7. 8. 9.]

# [10. 11. 12.]], shape=(2, 3), dtype=float32)

# (2, 3)

# <dtype: 'float32'>

You can perform indexing and operations on Tensors like in

NumPy arrays.

x[0]

# <tf.Tensor: shape=(3,), dtype=float32, numpy=array([7., 8.,

9.],

# dtype=float32)>

x[1:4]

# <tf.Tensor: shape=(1, 3), dtype=float32, numpy=array([[10., 1

1., 12.]], # dtype=float32)>

x**2

# <tf.Tensor: shape=(2, 3), dtype=float32, numpy=

# array([[ 49., 64., 81.],

# [100., 121., 144.]], dtype=float32)>

x @ tf.transpose(x) # matrix multiplication

Deep learning with TensorFlow and Keras


22

TensorFlow Tensors can also be converted to NumPy arrays.

np.array(x)

x.numpy()

# array([[ 7., 8., 9.],

# [10., 11., 12.]], dtype=float32)

Tensors can contain a different number of elements along a

certain axis. These kinds of tensors are known as ragged

tensors.

Ragged tensors are created using

the tf.ragged.constant function.

try:

tensor = tf.constant(ragged_list)

except Exception as e:

print(f"{type(e).__name__}: {e}")
# ValueError: Can't convert non-rectangular Python sequence t

o Tensor.

ragged_tensor = tf.ragged.constant(ragged_list)

Deep learning with TensorFlow and Keras

23

print(ragged_tensor)

# <tf.RaggedTensor [[0, 1, 2, 3], [4, 5], [6, 7, 8], [9]]>

print(ragged_tensor.shape)

# (4, None)

TensorFlow also supports tensors that have a lot of zeros.

These tensors are known as sparse tensors. They can be

created using the tf.sparse.SparseTensor function that stores

them in a memory-efficient way. For example, converting text

data to numerical representation in natural language

processing tasks usually results in sparse tensors.

# Sparse tensors store values by index in a memory-efficient ma


nner

sparse_tensor = tf.sparse.SparseTensor(indices=[[0, 0], [1,

2]],

values=[1, 2],

dense_shape=[3, 4])

print(sparse_tensor, "\n")

# SparseTensor(indices=tf.Tensor(

# [[0 0]

# [1 2]], shape=(2, 2), dtype=int64), values=tf.Tensor([1 2], s

hape=(2,), # dtype=int32), dense_shape=tf.Tensor([3 4], shape=

(2,), dtype=int64))

Deep learning with TensorFlow and Keras

24

A sparse tensor can also be converted to a dense tensor.

print(tf.sparse.to_dense(sparse_tensor))

# tf.Tensor(

#[[1 0 0 0]

# [0 0 2 0]

#[0 0 0 0]], shape=(3, 4), dtype=int32)

Variables
A TensorFlow variable is used to represent state in

TensorFlow programs. Keras stores model parameters in a

TensorFlow variable. A TensorFlow variable is also a tensor. A

variable is created using tf.Variable .

my_tensor = tf.constant([[8.0, 8.0], [6.0, 5.0]])

my_variable = tf.Variable(my_tensor)

my_variable

# <tf.Variable 'Variable:0' shape=(2, 2) dtype=float32, numpy=

# array([[8., 8.],

# [6., 5.]], dtype=float32)>

We can check the type and shape of the TensorFlow variable

because it's a tensor. It can also be converted to a NumPy

array. Tensor operations can also be performed on variables,

except that variables can not be reshaped. By default,

TensorFlow places variables in the GPU to improve

performance. You can, however, override this.

print("Shape: ", my_variable.shape)

print("DType: ", my_variable.dtype)

Deep learning with TensorFlow and Keras

25
print("As NumPy: ", my_variable.numpy())

# Shape: (2, 2)

# DType: <dtype: 'float32'>

# As NumPy: [[8. 8.]

# [6. 5.]]

Automatic differentiation

Automatic differentiation is applied at the backpropagation

stage of training neural networks. Automatic differentiation in

TensorFlow is done using tf.GradientTape . Inputs to this

function are usually tf.variables .

x = tf.Variable(47.0)

with tf.GradientTape() as tape:

y = x**2

# dy = 2x * dx

dy_dx = tape.gradient(y, x)

dy_dx.numpy()

# 94.0

You can use GradientTape to define custom training functions.

The GradientTape

will track all trainable variables


automatically. tf.gradients computes the gradients with

respect to the weights and biases.

# Given a callable model, inputs, outputs, and a learning rat

e...

def train(model, x, y, learning_rate):

with tf.GradientTape() as t:

# Trainable variables are automatically tracked by Gradient

Deep learning with TensorFlow and Keras

26

Tape

current_loss = loss(y, model(x))

# Use GradientTape to calculate the gradients with respect to

W and b

dw, db = t.gradients(current_loss, [model.w, model.b])

# Subtract the gradient scaled by the learning rate

model.w.assign_sub(learning_rate * dw)

model.b.assign_sub(learning_rate * db)

How TensorFlow works

When TensorFlow is executed eagerly, operations are done


in Python, and the results are sent back to Python. The alternative is graph
execution, where the operations are

executed as a TensorFlow Graph. A Graph represents a data

structure representing a set of operations.

The fact that graphs are data structures makes it possible to

save and restore them without the original Python code. As a result, these
graphs can be used in non-Python environments such as mobile devices,
servers, edge devices, and

embedded devices. Saved models in TensorFlow are exported

as graphs. Graphs enable TensorFlow to run on multiple

devices, run in parallel and be fast.

Deep learning with TensorFlow and Keras

27
TensorFlow graph representing a two-layer neural network

Creating a TensorFlow graph is done via tf.function . It expects

a normal function and returns a callable function which creates


the TensorFlow graph from the Python function.

def simple_relu(x):

if tf.greater(x, 0):

Deep learning with TensorFlow and Keras

28

return x

else:

return 0

# `tf_simple_reluìs a TensorFlow `Function` that wraps `simpl

e_relù.

tf_simple_relu = tf.function(simple_relu)

print("First branch, with graph:", tf_simple_relu(tf.constant

(1)).numpy())

print("Second branch, with graph:", tf_simple_relu(tf.constant

(-1)).numpy())

# First branch, with graph: 1

# Second branch, with graph: 0

How TensorFlow models are defined

A TensorFlow model is made up of layers. Certain

mathematical computations occur in the layers. Layers have


trainable variables, meaning that these variables are updated

as the network is fitted to the training data. In TensorFlow,

layers and models are built by creating a class that

inherits tf.Module .

class SimpleModule(tf.Module):

def __init__(self, name=None):

super().__init__(name=name)

self.a_variable = tf.Variable(5.0, name="train_me")

self.non_trainable_variable = tf.Variable(5.0, trainable=Fa

lse, name="do_not_train_me")

def __call__(self, x):

return self.a_variable * x + self.non_trainable_variable

simple_module = SimpleModule(name="simple")

simple_module(tf.constant(5.0))

# <tf.Tensor: shape=(), dtype=float32, numpy=30.0>

Deep learning with TensorFlow and Keras

29
How to train artificial neural networks

with Keras

With the basics out of the way, we can now embark on a

journey to learn how to design and train artificial neural

networks in TensorFlow and Keras. We'll use a dataset from

Kaggle to train a classification network. The aim is to predict

the satisfaction of an airline passenger. Let's start by printing a

sample of this dataset.

import pandas as pd

df = pd.read_csv("train.csv")

df.head()

Data pre-processing

Ensure the data is clean before passing it to the neural

network. For instance, we need to check for null values and

deal with them. The occurrence of null values in the training


data leads to nans in the training loss. The Arrival Delay in

Minutes column has null values. There are various ways of

Deep learning with TensorFlow and Keras

30
dealing with null values, but in this case, we'll replace them with

the mean of the column.


df['Arrival Delay in Minutes'] = df['Arrival Delay in Minute

s'].mean()

The target column is a string.

Deep learning with TensorFlow and Keras

31

We can't pass strings to the neural network. Convert this

column to a numerical representation. This can be done

using Scikit-learn's label encoder.

from sklearn.preprocessing import LabelEncoder

labelencoder = LabelEncoder()

df = df.assign(satisfaction = labelencoder.fit_transform(df["sa

tisfaction"]))

Data transformation

Apart from the target column, other columns are also in text

form. They need to be converted to a numerical format.

categories = df.select_dtypes(include=['object']).columns.tolis

t()

categories

#['Gender', 'Customer Type', 'Type of Travel', 'Class']

Another thing we need to do is to scale the dataset. The


weights and biases of neural networks are initialized to small

numbers, usually between 0 and 1. Scaling makes training

easier by forcing all values to be within a certain range. Failure

to scale can lead to nans in the training loss due to the large

magnitude between training values. After doing all this, we'll

split the data into a training and testing set.

Let's use the ColumnTransformer from Scikit-learn to apply the


transformations we have mentioned above. The transformer

Deep learning with TensorFlow and Keras

32

enables us to apply more than one transformation to multiple

columns. In this case, we apply the following steps:

Transform categorical columns to numerical form via one-

hot encoding.

Scale numerical columns using MinMaxScaler to ensure that all values are
between 0 and 1.

After obtaining the transformed data, we split it into a training


and testing set.

from sklearn.model_selection import train_test_split

from sklearn.preprocessing import MinMaxScaler

X = df.drop(["Unnamed: 0", "id","satisfaction"], axis=1) y =


df["satisfaction"]

random_state = 13

test_size = 0.3

transformer = ColumnTransformer(transformers=[('cat', OneHotEnc

oder(handle_unknown='ignore', drop="first"), categories)],remai

nder=MinMaxScaler())

X = transformer.fit_transform(X)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_

size=test_size,random_state=random_state)

The transformed data

Deep learning with TensorFlow and Keras

33

How to build the artificial neural

network

Keras makes it easy to design neural networks via

the Sequential API. We can stack the layers we want in our network using
this API. In this case, let's define a network with
the following layers:

Input layer with the number of units similar to the number

of features in the training data.

Two dense layers. We randomly define the units in these

layers, but we'll look at how to select the best units later.

A final dense with 1 unit and the sigmoid

activation function because it's a binary classification

problem.

model = Sequential(

Input(shape=(X_train.shape[1],)),

Dense(64, activation="relu", kernel_initializer="gloro

t_uniform",name="layer1"),

Dense(32, activation="relu", kernel_initializer="glorot

_uniform", name="layer2"),

Dense(1, activation="sigmoid", name="layer3"),

Apart from the number of units, the dense layer has other

parameters:
Activation function, usually ReLu.

Deep learning with TensorFlow and Keras

34

The kernel initializer that determines how the weights will be initialized.

name for naming each layer.

The next step is to compile the neural network. This is

where gradient descent is applied. This is the optimization

strategy that reduces the errors as the network is learning.

There are various optimization strategies but adam is a common

approach. It applies the Adam algorithm.

model.compile(optimizer="adam", loss="binary_crossentropy", met rics=


["accuracy"])

The next step is to train the network. Training is done using

the fit method. It expects the following parameters:

The training data.

The validation data.

The number of samples to be passed to the network at one

time. This is declared using batch_size .

The number of epochs to train the network.

history = model.fit(X_train, y_train, validation_data =(X_test,


y_test), batch_size = 32, epochs = 100)

Deep learning with TensorFlow and Keras

35

How to visualize model performance

Notice that we assigned the training function to

the history variable. This variable now contains the training

and validation metrics. We can use these variables to visualize

the neural network's performance using Matplotlib.

metrics_df = pd.DataFrame(history.history)

metrics_df[["loss","val_loss"]].plot();

metrics_df[["accuracy","val_accuracy"]].plot();

Deep learning with TensorFlow and Keras

36
You can tell the network is learning if the training and validation

loss decreases gradually. If the network is performing


significantly worse on the test data compared to the training

data, then it means that it's overfitting on the training

data. Overfitting means that the network cannot determine

patterns in the training data but memorizes it. As a result, it

performs worse on data it hasn't seen during training. You can

avoid overfitting by adding a dropout layer.

Deep learning with TensorFlow and Keras

37

Add dropout regularization to fight

overfitting

A dropout layer ensures that some connections are "dropped"

during the training process. This forces the network to learn the

patterns in the data instead of memorizing the data. Therefore,

the network performs well even on data it hasn't seen. In this

case, we add a Dropout layer and specify that 1% of the

connections should be dropped.

from tensorflow.keras.layers import Dropout

model = Sequential(

Input(shape=(X_train.shape[1],)),
Dense(64, activation="relu", kernel_initializer="gloro

t_uniform",name="layer1"),

Dropout(rate=0.1),

Dense(32, activation="relu", kernel_initializer="glorot

_uniform", name="layer2"),

Dense(1, activation="sigmoid", name="layer3"),

])

model.compile(optimizer="adam", loss="binary_crossentropy", met rics=


["accuracy"])

history = model.fit(X_train, y_train, validation_data =(X_test,

y_test), batch_size = 32, epochs = 10)

How to accelerate network training with

batch normalization

As the name suggests, batch normalization — batchnorm –

involves normalizing the input to the network. Batch

Deep learning with TensorFlow and Keras

38

normalization ensures that the mean output is close to o and

the output standard deviation is close to 1. It normalizes input

using the mean and standard deviation of the current training


batch. When making predictions, batch normalization

normalizes its output using the moving average of the mean

and standard deviation of the batches computed during

training. Batch normalization is primarily applied in deep neural

networks to make training faster.

from tensorflow.keras.layers import BatchNormalization

model = Sequential(

Input(shape=(X_train.shape[1],)),

Dense(64, activation="relu", kernel_initializer="gloro

t_uniform",name="layer1"),

BatchNormalization(),

Dense(32, activation="relu", kernel_initializer="glorot

_uniform", name="layer2"),

Dense(1, activation="sigmoid", name="layer3"),

model.compile(optimizer="adam", loss="binary_crossentropy", met rics=


["accuracy"])

history = model.fit(X_train, y_train, validation_data =(X_test,


y_test), batch_size = 32, epochs = 10)

How to stop model training at the right

time with early stopping

When training neural networks, it's often good practice to stop

the training when the model's performance is no longer

improving. In TensorFlow, this is achieved using

the EarlyStopping callback. By default, the callback will monitor Deep


learning with TensorFlow and Keras

39

the loss and halt training when the loss is no longer improving for the
number of epochs specified. In this case, we stop

training if the loss is not decreasing for three consecutive

epochs.

model.compile(optimizer="adam", loss="binary_crossentropy", met rics=


["accuracy"])

callbacks = [tf.keras.callbacks.EarlyStopping(monitor='loss', p

atience=3)]

history = model.fit(X_train, y_train, validation_data =(X_test,

y_test), batch_size = 32, epochs = 100,callbacks=callbacks)

How to save the best model with

checkpoints
Apart from halting the training, you may want to save the best

model as the network is training. You can do this with

a checkpoint callback. The checkpoint callback expects: The path where


the model will be saved.

The metric to monitor.

save_best_only to dictate how the model will be saved. If

true, the best model is saved.

save_weights_only to determine if the entire model will be

saved or just the weights. In this case, we save the weights

only.

mode is set to max here because we are monitoring the

validation accuracy.

Deep learning with TensorFlow and Keras

40
checkpoint_filepath = "model_checkpoint"

model.compile(optimizer="adam", loss="binary_crossentropy", met rics=


["accuracy"])

callbacks = [

tf.keras.callbacks.EarlyStopping(monitor="loss", p

atience=3),

tf.keras.callbacks.ModelCheckpoint(

filepath=checkpoint_filepath,

save_weights_only=True,

monitor="val_accuracy",

mode="max",

save_best_only=True)]

history = model.fit(X_train, y_train, validation_data =(X_test,

y_test), batch_size = 32, epochs = 10,callbacks=callbacks)

Checkpoint files

When training is complete, we can load the model with the

weights saved by the checkpoint.

model.load_weights(checkpoint_filepath)

Deep learning with TensorFlow and Keras

41
Make predictions on the test set

Let's now use this model to make predictions on the test set.

We set the threshold for a positive prediction at 50%.

y_pred = model.predict(X_test)

y_pred = (y_pred > 0.5)

Check the confusion matrix

We can use the above predictions to compute the confusion

matrix.

from sklearn.metrics import confusion_matrix

cm = confusion_matrix(y_test, y_pred)

cm

# array([[17396, 387],

# [ 856, 12533]])

Make a single prediction

Let's demonstrate how to make a single prediction by selecting

a sample from the test set. We use NumPy to expand the

dimensions to include the batch size; in this case, it's 1.

import numpy as np

test_data = np.expand_dims(X_test[0], axis=0)

model.predict(test_data) > 0.5


# array([[ True]])

Deep learning with TensorFlow and Keras

42

How to save and load Keras models

Apart from saving models using a checkpoint callback, you can

also save them once training is complete. This is done using

the save_weights function. The function expects the path where

the model should be saved. This path should not

be checkpoint as it will conflict with TensorFlow's default setting.

Using checkpoint results in the error below.

model.save_weights('checkpoint')

# RuntimeError: Save path 'checkpoint' conflicts with path used

for

# checkpoint state. Please use a different save path.

The save_weights functions will only save the model weights. To

save the entire model, use model.save and pass the folder

where the model should be stored. This saves the model

weights, architecture, and training configuration. The

optimizer and the training state are also saved, making it

possible to restart training at the point where it stopped. By


default, the model will be saved using the SavedModel format,

but you can also use the HDF5 format.

model.save_weights('./checkpoints/my_checkpoint')

model.save("saved_model")

The directory where the entire model is saved contains the

following items:

Deep learning with TensorFlow and Keras

43

saved_model.pb that store the training configuration and

model architecture. The training configuration includes

metrics, losses, and losses.


variables that store the model weights.

You can then load the model and check its architecture.

new_model = tf.keras.models.load_model('saved_model')

new_model.summary()

Deep learning with TensorFlow and Keras

44

TensorFlow model summary

You can also save the model in HDF5 format.

# pip install pyyaml h5py # Required to save models in HDF5 fo

rmat

model.save('my_model.h5')

new_model = tf.keras.models.load_model('my_model.h5')

How to evaluate the Keras model with

cross-validation
Next, let's look at how we can evaluate the Keras model

with cross-validation. To achieve this, we apply

the cross_val_score from Scikit-learn. The function expects: The training


data.

The scoring method.

The cross-validation strategy to be used. A 5-fold cross-

validation is used by default. The Stratified KFold strategy is applied if you


specify an integer.

Deep learning with TensorFlow and Keras

45

The SciKeras library enables the wrapping of Keras models as Scikit-


learn models, making it possible to operate on the

networks as if they were Scikit-learn models.

Install the package and import the KerasClassifier wrapper.

# pip install scikeras[tensorflow]

# https://www.adriangb.com/scikeras/stable/

from scikeras.wrappers import KerasClassifier

from sklearn.model_selection import cross_val_score

KerasClassifier expects a Keras model. We, therefore, create a

function that returns a Keras model.

def make_model():
model = Sequential([

Input(shape=(X_train.shape[1],)),

Dense(64, activation="relu", kernel_initializer="gloro

t_uniform",name="layer1"),

Dense(32, activation="relu", kernel_initializer="glorot

_uniform", name="layer2"),

Dense(1, activation="sigmoid", name="layer3"), ])

return model

Next, we instantiate the classifier using this function.

model = KerasClassifier(model=make_model, batch_size=32, optimi

zer="adam", metrics=["accuracy"],loss="binary_crossentropy",val
idation_split=0.2, epochs=1)

Apart from the model function, the classifier expects:

Deep learning with TensorFlow and Keras

46

The metrics.

The loss.

The validation split.

The optimizer.

The number of epochs.


By default, the KerasClassifier will compile the model. You can,

however, compile the model in the make_model function.

Let's now apply cross_val_score and obtain the mean of the

accuracy and standard deviation.

accuracies = cross_val_score(estimator=model, X=X_train, y=y_tr

ain, cv = 10, n_jobs = -1)

mean = accuracies.mean()

mean

# 0.9180552089621082

variance = accuracies.var()

variance

# 8.615227293449488e-05

How to tune model hyperparameters in

Keras

One common strategy for hyperparameter tuning is grid

search which performs an exhaustive search on the given

parameters. Random search is an alternative that performs a

randomized search on the given parameters. In this case, let's

apply grid search.

The first step is to define the parameters to search over.


Deep learning with TensorFlow and Keras

47

params = {

"batch_size":[10,20,32,64],

"epochs":[2,3,4],

"optimizer":["adam","rmsprop"]

Next, create an instance of GridSearchCV . It expects: The estimator.

The parameters.

The scoring criteria.

The cross-validation method to be applied. By default, it

applies K-Fold cross-validation. When an integer is passed,

it applies the Stratified K-Fold cross-validation.

grid_search = GridSearchCV(estimator=model,

param_grid=params,

scoring="accuracy",

cv=2)

The next step is to fit the grid search to the training data.

grid_search = grid_search.fit(X_train,y_train)

When it's complete, we can check the best parameters.


best_param = grid_search.best_params_

best_param

# {'batch_size': 20, 'epochs': 4, 'optimizer': 'adam'}

Deep learning with TensorFlow and Keras

48

best_accuracy = grid_search.best_score_

best_accuracy

# 0.9392839465434747

How to tune the network parameters

It is also possible to tune the parameters of the network, for

example, the number of hidden layers and the dropout rate.

The first step is to define the model function

with hidden_layer_sizes and dropout as parameters.

def make_clf(hidden_layer_sizes, dropout):

model = Sequential()

model.add(Input(shape=(X_train.shape[1],)))

for hidden_layer_size in hidden_layer_sizes:

model.add(Dense(hidden_layer_size, activation="relu"))

model.add(Dropout(dropout))

model.add(Dense(1, activation="sigmoid"))
return model

The next step is to define the parameters to be tested and

perform the search as you have done before. Thereafter, print

the best_score_ and the best_params . By default,

the refit parameter of GridSearchCV is True meaning that the

model will be re-trained with the best parameters. Therefore,

once GridSearchCV is over, the model is ready for making

predictions.

params = {

'optimizer__learning_rate': [0.05, 0.1],

'model__hidden_layer_sizes': [(100, ), (50, 50, )],

'model__dropout': [0, 0.5],

Deep learning with TensorFlow and Keras

49

}
gs = GridSearchCV(my_model, params, scoring='accuracy', n_jobs=

-1, verbose=True)

gs.fit(X_train, y_train)

print(gs.best_score_, gs.best_params_)

# 0.9317082873776641 {'model__dropout': 0, 'model__hidden_layer

_sizes': (100,), 'optimizer__learning_rate': 0.05}

Final thoughts

This article has been an in-depth guide to deep learning and

building neural networks with TensorFlow and Keras. We have

covered some core concepts to get you started with deep

learning in TensorFlow. You have also learned:

Which activation function to apply in a deep learning

network.

Which activation function to apply in the hidden and output

layers.

Selecting the appropriate loss function.

Deep learning with TensorFlow and Keras

50

How neural networks learn through gradient descent and

backpropagation.
TensorFlow basics.

How TensorFlow works.

Training a neural network with Keras and TensorFlow.

Performing hyperparameter tuning and cross-validation on

the neural network, among other topics.

How to build CNN in

TensorFlow

In the artificial neural networks with TensorFlow chapter, we

saw how to build deep learning models with TensorFlow and

Keras. We covered various concepts that are foundational in

training neural networks with TensorFlow. In that article, we

used a Pandas DataFrame to build a classification model in Keras. This


article will focus on solving image-related problems

with TensorFlow. You will learn how to create image

classification models with Keras and TensorFlow.

Let's dive in!

What is CNN?

A Convolutional Neural Network(CNN) is a special artificial

neural network that processes image data and detects complex

Deep learning with TensorFlow and Keras


51

features from data. CNNs are primarily used in image tasks and in other
problems such as natural language processing tasks.

How do CNNs work?

The internal working of CNNs is a little different from that of

regular artificial neural networks. In this section, let's explore

how CNNs work.

Convolution

Image data is usually large. We, therefore, can't pass entire

images to a neural network. This because:

Passing the entire image requires more compute power

and processing time.

The network doesn't require the entire image but only

features that are important in identifying the image.

The process of reducing the size of the image is known

as convolution. The convolution operation results in a feature map, also


known as a convolved feature or activation

map. The convolution process works by passing a feature

detector over the input image. The feature detector also goes

by other names such as kernel or filter.

In most cases, the kernel is a 3 by 3 matrix. However, different


kernel sizes can be used. The feature map is obtained through

an element-wise multiplication of the kernel with input images

and summing the values.

Deep learning with TensorFlow and Keras

52

A 3 by3 kernel reduces a 5 by 5 input to a 3 by 3 output

Given the above input image and filter, the convolution

operation looks like this:

3x1 + 5x0 + 2x0 + 9x1+7x1 + 5x0 + 2x0 + 0x0 + 6x1


=3+0+0+9+7+0+0+6= 25

Slide the kernel through the entire input image to obtain all

the values as we have done above.

Element-wise multiplication of a 5 by 5 input with a 3 by 3 filter.

Deep learning with TensorFlow and Keras

53

The kernel moves over the input images through steps known

as strides. The number of strides is defined while designing

the network.

A 3 by 3 convolutions operation.

The size of the feature map is the same as the size of the

kernel.

Padding

Applying the kernel reduces the output to the size of the kernel.
However, keeping the same image size after applying the

kernel might be desirable in specific scenarios. This is

important, for instance, when the edges of the images have

information that may be critical in classifying the image.

Maintaining the size of the feature map as the input image is

achieved via padding. Padding increases the size of the input

image by adding zeros around the image such that when the

Deep learning with TensorFlow and Keras

54

kernel is applied, the output has the same size as the input

image. The type of padding is also defined when creating the


network. The options are:

Same to pad such that the size of the input image and the

feature map are the same.

Valid to apply no padding.

The uncolored area represents the padded area.

Apply ReLU

Deep learning with TensorFlow and Keras

55

The Rectified Linear Unit (ReLU) is applied during the

convolution operation to ensure no-linearity. This forces all

values below zero to zero while the others are returned as the

actual values.

Pooling
At this point, we have a feature map. It is desirable to reduce

the size of the feature map further. This is done via a process

known as pooling. Like in the convolution operation, another

filter is applied to reduce the size of the feature map. This filter

is referred to as a pooling filter. The pooling filter is usually a 2

by 2 matrix. There are various pooling strategies, including:

Max pooling where the filter slides over the feature map

picking the largest value in each box.

Applying a 2 by 2 pooling filter to a 4 by 4 feature map.

Deep learning with TensorFlow and Keras

56

Average pooling that computes the average of the values in

a given box.

Pooling results in a pooled feature map.

Dropout regularization

It is usually good practice to drop some connections between

layers in CNNs to prevent overfitting. This forces the network to

identify essential features needed to identify an image and not

memorize the training data.

Flattening
It's time to pass the pooled feature map to a fully connected

layer. However, before we can do that, we have to convert it to

a single column. This is done by flattening the pooled feature

map. This results in a flattened feature map.

Full connection

A CNN can have several fully connected layers after the

flattening operation. However, the last fully connected layer is

responsible for generating the neural network's output.

Activation function

An activation function is applied on the last fully connected

layer depending on the number of categories in the images.

Deep learning with TensorFlow and Keras

57

The sigmoid activation function is used in a binary problem,

while the softmax activation function is applied in a multiclass task.

Convolutional Neural Networks

(CNN) in TensorFlow

With the basics out of the way, let's build CNNs with

TensorFlow. First, we need to ensure that TensorFlow is

installed.
How to install TensorFlow

TensorFlow is an open-source deep learning framework that

enables us to build and train CNNs. TensorFlow can be

installed from the Python Index via the pip command.

TensorFlow is already installed on Google Colab. You will, therefore, not


install it when working in this environment.

# Requires the latest pip

pip install --upgrade pip

# Current stable release for CPU and GPU

pip install tensorflow

# Or try the preview build (unstable)

pip install tf-nightly

You can also install TensorFlow using Docker. Docker is the easiest way to
install TensorFlow on Linux if GPU support is

Deep learning with TensorFlow and Keras

58

desired.
docker pull tensorflow/tensorflow:latest # Download latest st

able image

docker run -it -p 8888:8888 tensorflow/tensorflow:latest-jupyt

er # Start Jupyter server

Follow these instructions to install TensorFlow on Apple

Silicon machines. This will enable you to train models with

GPUs on Mac.

How to confirm TensorFlow is

installed

You can confirm that TensorFlow has been installed by printing

the version. If TensorFlow is installed, the version will be

printed.

import tensorflow as tf

print(tf.__version__)

Deep learning with TensorFlow and Keras

59

What are Keras and tf.keras?

In TensorFlow 1, Keras and TensorFlow were two separate packages. Keras


was being used as the high-level API for

TensorFlow. Due to its ease of use and popularity, Keras was


included as part of TensorFlow 2. Keras is the official high-level

API for building deep learning models in TensorFlow. You'll

import it into your programs as tf.keras .

Develop multilayer CNN models

Let's use the Fashion MNIST dataset to illustrate how to build multilayer
CNN models with TensorFlow. The dataset contains

60,ooo grayscale images for training and 10,000 for testing.

Like the digits MNIST dataset, the image size is 28 by 28.

Deep learning with TensorFlow and Keras

60
https://github.com/zalandoresearch/fashion-mnist

Data preprocessing
First, load the dataset. We use Layer to achieve this.

# !pip install layer -U # install Layer to load the dataset

import layer

mnist_train = layer.get_dataset('layer/fashion_mnist/datasets/f

Deep learning with TensorFlow and Keras

61

ashion_mnist_train').to_pandas()

mnist_test = layer.get_dataset('layer/fashion_mnist/datasets/fa

shion_mnist_test').to_pandas()

We can visualize some samples from this dataset.

mnist_train["images"][17]

mnist_test["images"][23]

Let's convert these images to NumPy arrays.

import numpy as np
def images_to_np_array(image_column):

return np.array([np.array(im.getdata()).reshape((im.size

[1], im.size[0])) for im in image_column])

train_images = images_to_np_array(mnist_train.images)

test_images = images_to_np_array(mnist_test.images)

train_labels = mnist_train.labels

test_labels = mnist_test.labels

Model definition

Deep learning with TensorFlow and Keras

62

Now that the dataset is ready, define the CNN network. The

network contains the following layers:

An input layer with the shape similar to the size of the

input image. The last parameter, 1, indicates that the

images are grayscale.

Convolution layer with 32 units, a 3 by 3 kernel size, and

a ReLu activation function.

Pooling layer with a 2 by 2 pooling filter.

Flatten layer to flatten the pooled feature map.

Dropout to add dropout regularization to prevent


overfitting.

Fully connected layer–Dense layer– with 10 units

representing the number of categories in the dataset and

the softmax activation function.

parameters = {"shape":28, "activation": "relu", "classes": 10,

"units":12, "optimizer":"adam", "epochs":1,"kernel_size":3,"po ol_size":2,


"dropout":0.5}

# Setup the layers

model = keras.Sequential(

keras.Input(shape=(parameters["shape"], parameters["shap

e"], 1)),

layers.Conv2D(32, kernel_size=(parameters["kernel_size"],

parameters["kernel_size"]), activation=parameters["activatio

n"]),

layers.MaxPooling2D(pool_size=(parameters["pool_size"], p

arameters["pool_size"])),

layers.Conv2D(64, kernel_size=(parameters["kernel_size"],

parameters["kernel_size"]), activation=parameters["activatio

n"]),
Deep learning with TensorFlow and Keras

63

layers.MaxPooling2D(pool_size=(parameters["pool_size"], p
arameters["pool_size"])),

layers.Flatten(),

layers.Dropout(parameters["dropout"]),

layers.Dense(parameters["classes"], activation="softma

x"),

Compiling the model

The next step is to compile the neural network. This is

where gradient descent is applied. This is the optimization

strategy that reduces the errors as the network is learning.

There are various optimization strategies but adam is a common

approach. It applies the Adam algorithm. In the compile stage, we also


define the loss function and the metrics. We

use sparse categorical cross-entropy because the labels are integers. The
categorical cross-entropy is used when the labels

are one-hot encoded.

# Compile the model


model.compile(optimizer=parameters["optimizer"],

loss=tf.keras.losses.SparseCategoricalCrossentropy

(),

metrics=['accuracy'])

Train the model

We are ready to train this network using the Fashion MNIST

dataset. In TensorFlow, training is done by calling

Deep learning with TensorFlow and Keras

64

the fit method. Apart from the training and validation data,

the fit function expects the number of training iterations–

epochs.

history = model.fit(x=train_images, y=train_labels,validation_d

ata=(test_images,test_labels), epochs=parameters["epochs"])

How to plot model learning


curves

When training the model, we assigned that process to

the history variable. This variable holds the training and

validation metrics. We can use that to plot the training and

validation metrics.

metrics_df = pd.DataFrame(history.history)

metrics_df[["loss","val_loss"]].plot();

metrics_df[["accuracy","val_accuracy"]].plot();# The semicolon prevents


certain matplotlib items from being printed.

Deep learning with TensorFlow and Keras

65
Model evaluation

Let's now evaluate the performance of the network on the


testing set. This is done using the evaluate method.

# And finally evaluate the accuracy

test_loss, test_acc = model.evaluate(test_images, test_labels,

verbose=2)

predictions = model.predict(test_images)

Deep learning with TensorFlow and Keras

66

df = pd.DataFrame(predictions, columns=

["0","1","2","3","4","5","6","7","8","9"]) How to halt training at the right

time with Early Stopping

CNN models can take a long time to train, especially when the

training images are in the thousands. Often, it's good practice

to stop training when the network is no longer improving. To

achieve this, we apply a built-in function in TensorFlow

called EarlyStoppingCallback . The function expects: The metrics to


monitor.

The mode , whether to check for the minimum or maximum


of the metrics.

patience to determine how long the network should wait

before halting the training if the metric is not improving.

The callback is passed using the callbacks parameter of

the fit method.

callbacks = [tf.keras.callbacks.EarlyStopping(monitor='accurac

y', mode="max", patience=3)]

# Compile the model

Deep learning with TensorFlow and Keras

67

model.compile(optimizer=parameters["optimizer"],

loss=tf.keras.losses.SparseCategoricalCrossentropy

(),

metrics=['accuracy'])

history = model.fit(x=train_images, y=train_labels,validation_d

ata=(test_images,test_labels), epochs=parameters["epochs"],call

backs=callbacks)

How to accelerate training with

batch normalization

As the name suggests, batch normalization — batchnorm –


involves normalizing the input to the network. Batch

normalization ensures that the mean output is close to o and

the output standard deviation is close to 1. It normalizes input

using the mean and standard deviation of the current training

batch. When making predictions, batch normalization

normalizes its output using the moving average of the mean

and standard deviation of the batches computed during

training. Batch normalization is primarily applied in deep neural

networks to make training faster.

model = keras.Sequential(

keras.Input(shape=(parameters["shape"], parameters["shap

e"], 1)),

layers.Conv2D(32, kernel_size=(parameters["kernel_size"],

parameters["kernel_size"]), activation=parameters["activatio

n"]),

layers.MaxPooling2D(pool_size=(parameters["pool_size"], p

arameters["pool_size"])),

layers.Conv2D(64, kernel_size=(parameters["kernel_size"],

parameters["kernel_size"]), activation=parameters["activatio
Deep learning with TensorFlow and Keras

68

n"]),

layers.MaxPooling2D(pool_size=(parameters["pool_size"], p

arameters["pool_size"])),

layers.Flatten(),

layers.Dropout(parameters["dropout"]),

layers.Dense(64, activation="relu"),

layers.BatchNormalization(),

layers.Dense(parameters["classes"], activation="softma

x"),

model.compile(optimizer=parameters["optimizer"],

loss=tf.keras.losses.SparseCategoricalCrossentropy

(),

metrics=['accuracy'])

history = model.fit(x=train_images, y=train_labels,validation_d

ata=(test_images,test_labels), epochs=parameters["epochs"])

How to create custom callbacks


for TensorFlow CNN

TensorFlow also enables you to define custom callbacks. This is handy


when you want to track items not supported by built-in

callbacks. The example below prints the keys at the end of

every epoch.

from tensorflow.keras.callbacks import Callback

class CustomCallback(Callback):

def on_epoch_end(self, epoch, logs=None):

keys = list(logs.keys())

print("End epoch {} of training; got log keys: {}".format

(epoch, keys))

# Compile the model

model.compile(optimizer=parameters["optimizer"],

Deep learning with TensorFlow and Keras

69
loss=tf.keras.losses.SparseCategoricalCrossentropy

(from_logits=True),

metrics=['accuracy'])

model.fit(x=train_images, y=train_labels, validation_data=(test

_images,test_labels),epochs=parameters["epochs"], callbacks=[Cu

stomCallback()])

How to create custom callbacks for TensorFlow CNN

How to visualize a deep

learning model

A quick way to visualize a deep learning model is to call

the summary function.

model.summary()

On the summary, you will see the following:

Network layers and their type.

Deep learning with TensorFlow and Keras

70
Output shape for each network.

Number of parameters for each layer.

Total number of parameters.

Total number of trainable and untrainable parameters.

Alternatively, you can also plot the network as an image using

the plot_model function.

tf.keras.utils.plot_model(

model,

to_file="model.png",
show_shapes=True,

show_layer_names=True,

Deep learning with TensorFlow and Keras

71
rankdir="TB",

expand_nested=True,
dpi=96,

You can follow this plot from the top to see how the shapes

change until the last output layer.

Deep learning with TensorFlow and Keras

72

How to save and load your

model

A deep learning model can be saved and loaded later. For

example, you may want to save it and deploy it. TensorFlow

enables the saving of a network's weights or the entire model.

model.save_weights('./checkpoints/my_checkpoint')

model.save("saved_model")

new_model = tf.keras.models.load_model('saved_model')

new_model.summary()
Deep learning with TensorFlow and Keras

73

You can then load the model and use it for predictions or re-

train it.

Deep learning with TensorFlow and Keras

74
Running CNNs with TensorFlow

in the real world

To run CNNs in the real world, we need the ability to load and

process image data from a folder. In this part of the article, we'll

use the food images dataset available on Kaggle to build an image


classification network.

Deep learning with TensorFlow and Keras

75
Loading the images

We start by downloading and extracting the data.

import wget # pip install wget

import tarfile

wget.download("http://data.vision.ee.ethz.ch/cvl/food-101.tar.g

z")

food_tar = tarfile.open('food-101.tar.gz')

food_tar.extractall('.')

food_tar.close()

Generate a tf.data.Dataset

Next, let's load these images using

image_dataset_from_directory from TensorFlow. The function returns a


tf.data.Dataset . The function takes the following

arguments:

The directory containing the images.

The batch size.

The desired width and height of the images.

Percentage of the images that should be used for validation

declared via the validation_split parameter.

Whether this will be a training or validation split, in this


case, training.

label_mode that determines how the labels will be

encoded. int encodes them as integers

Deep learning with TensorFlow and Keras

76

while categorical encodes them as a categorical vector.

A random seed that controls shuffling and other

transformations.

base_dir = 'food-101/images'

batch_size = 32

img_height = 128

img_width = 128

import tensorflow as tf

training_set = tf.keras.utils.image_dataset_from_directory(

base_dir,

label_mode="int"
validation_split=0.2,

subset="training",

seed=100,

image_size=(img_height, img_width),

batch_size=batch_size)

TensorFlow will infer the labels of the images from the directory

structure.

Next, we do the same to load the validation set.

validation_set = tf.keras.utils.image_dataset_from_directory(

base_dir,

validation_split=0.2,

subset="validation",

seed=100,

Deep learning with TensorFlow and Keras

77
image_size=(img_height, img_width),

batch_size=batch_size)

Let's check the class names as inferred by the data loader.

class_names = training_set.class_names

print(class_names)

We can use Matplotlib to visualize a few images.

plt.figure(figsize=(10, 10))

for images, labels in training_set.take(1):

for i in range(9):

ax = plt.subplot(3, 3, i + 1)

plt.imshow(images[i].numpy().astype("uint8"))

plt.title(class_names[labels[i]])

plt.axis("off")

Deep learning with TensorFlow and Keras

78
Buffered dataset prefetching

It's important to prefetch data when working with large

datasets. Prefetching ensures that the data is available even

before it is requested. The number of items to prefetch should

be greater than the batch size. You can set this manually or

use tf.data.AUTOTUNE to let TensorFlow handle this dynamically.


Deep learning with TensorFlow and Keras

79

AUTOTUNE = tf.data.AUTOTUNE

training_ds = training_set.cache().shuffle(1000).prefetch(buffe

r_size=AUTOTUNE)

validation_ds = validation_set.cache().prefetch(buffer_size=AUT

OTUNE)

Image augmentation

Image augmentation involves performing various

transformations on training data to ensure that the network

sees variations of the same data. Augmentation strategies in

image classifications include:

Flipping the images randomly.

Random rotations.

Random zoom.

In general, data augmentation helps to prevent overfitting by

exposing the network to images in various aspects.

from tensorflow import keras

from tensorflow.keras import layers

data_augmentation = keras.Sequential(
[

layers.RandomFlip("horizontal",

input_shape=(img_height,

img_width,

3)),

layers.RandomRotation(0.1),

layers.RandomZoom(0.1),

Deep learning with TensorFlow and Keras

80

When defining the network, we will use the above

augmentation layer as the first layer in the network. Let's look

at what an image would like after the augmentation. We can

augment some images and plot them using Matplotlib.

plt.figure(figsize=(10, 10))

for images, _ in training_set.take(1):

for i in range(9):

augmented_images = data_augmentation(images)

ax = plt.subplot(3, 3, i + 1)
plt.imshow(augmented_images[0].numpy().astype("uint8"))

plt.axis("off")

Deep learning with TensorFlow and Keras

81

Model definition

We define a neural network with the following layers:


The image augmentation layer.

A layer to scale the images.

A convolution layer with 32 filters, a kernel size of 3 by 3,

and the ReLu activation function.

Deep learning with TensorFlow and Keras

82

A MaxPooling2D layer with a 2 by 2 pool size.

A dropout layer that "drops" 25% of the connections.

The Flatten layer and finally

The final fully connected layer.

model = keras.Sequential([

data_augmentation,

layers.Rescaling(1./255),

layers.Conv2D(filters=32,kernel_size=(3,3),activation='rel

u'),

layers.MaxPooling2D(pool_size=(2,2)),

layers.Conv2D(filters=32,kernel_size=(3,3), activation='rel

u'),

layers.MaxPooling2D(pool_size=(2,2)),

layers.Dropout(0.25),
layers.Conv2D(filters=64,kernel_size=(3,3), activation='rel

u'),

layers.MaxPooling2D(pool_size=(2,2)),

layers.Dropout(0.25),

layers.Flatten(),

layers.Dense(128, activation='relu'),

layers.Dropout(0.25),

layers.Dense(len(class_names), activation='softmax')])

Compiling the model

Let's compile the network to prepare it for training.

model.compile(optimizer='adam',

loss=tf.keras.losses.SparseCategoricalCrossentrop

Deep learning with TensorFlow and Keras

83

y(),

metrics=['accuracy'])

Training the model

Train the model while applying the Early Stopping callback.

callback = tf.keras.callbacks.EarlyStopping(monitor='loss', pat

ience=3)
epochs=100

history = model.fit(training_set,validation_data=validation_se

t, epochs=epochs,callbacks=[callback])

Model evaluation

Evaluate the trained model using the evaluate function.

loss, accuracy = model.evaluate(validation_set)

print('Accuracy on test dataset:', accuracy)

Monitoring the model’s

performance

Let's visualize the performance of the model using Matplotlib.

import pandas as pd

metrics_df = pd.DataFrame(history.history)

loss, accuracy = model.evaluate(validation_set)

Deep learning with TensorFlow and Keras

84

metrics_df[["loss","val_loss"]].plot();

metrics_df[["accuracy","val_accuracy"]].plot();

Visualize CNN graph with

TensorBoard

We can visualize the CNN graph using the TensorBoard


callback. The TensorBoard callback takes the following

parameters:

The folder where the logs will be saved.

histogram_freq determines the frequency at which the

weight histograms will be computed. This requires

validation split or validation data to be provided.

Setting write_graph to true is what shows the graph of the

network.

write_images as true writes the model weights so that they

can be visualized on TensorBoard.

Setting update_freq as epoch writes the losses and metrics

to TensorBoard after each epoch. Writing too often to

TensorBoard may slow down the training.

log_folder ="logs"

tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=l

og_folder,

histogram

_freq=1,

write_graph=True,

write_images=True,
Deep learning with TensorFlow and Keras

85

update_freq='epoch'

# Compile the model

model.compile(optimizer='adam',

loss=tf.keras.losses.SparseCategoricalCrossentrop
y(),

metrics=['accuracy'])

model.fit(training_set,validation_data=validation_set,epochs=2,

callbacks=[tensorboard_callback])

CNN Graph from TensorBoard

Deep learning with TensorFlow and Keras

86

On the Histogram dashboard, we see the weight and biases

histogram of the network. Histograms are a great way to

visualize the activations of certain layers in the network. You

can also use it to check changes in the weights and biases as

the network is trained.

How to profile with


TensorBoard

Another thing we can do with TensorBoard is to profile the

training of the CNN. This is done by including

the profile_batch argument in the TensorBoard callback. In this

case, we profile batches 2 to 5. Using the update_freq as 1

means that losses and metrics will be written to TensorBoard at

every batch.

Ensure that the profile plugin is installed:

Deep learning with TensorFlow and Keras

87

pip install -U tensorboard_plugin_profile

Next, define the TensorBoard callback and pass it to the model

training function. Compile and train the network again. Run

TensorBoard and select the Profile dashboard to see the

profile analysis.

tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=l

og_folder, profile_batch='2,5', update_freq=1)

# Compile the model

model.compile(optimizer='adam',

loss=tf.keras.losses.SparseCategoricalCrossentrop
y(),

metrics=['accuracy'])

# model.fit(training_set,validation_data=validation_set,epochs=

2, callbacks=[tensorboard_callback])

history = model.fit(image_batch, labels_batch,validation_split=

0.2, epochs=1,callbacks=[tensorboard_callback])

%tensorboard --logdir={log_folder}

On the Overview page, we see the execution summary and

some recommendations for improving the model's

performance.

Deep learning with TensorFlow and Keras

88
The TensorFlow Stats tool shows the performance of all

TensorFlow operations executed during the profiling session.

The lower sections of the TensorFlow stats tools

show TensorFlow operations. For example, in the image

below, we can see familiar items such as Conv2D and MaxPool .

Deep learning with TensorFlow and Keras

89
The Trace Viewer under the tools section shows performance

bottlenecks in the input pipeline. It shows a timeline of events

as they occur in the CPU and GPU. The colored rectangular

boxes on the timeline represent individual events. Clicking an

event shows more information about it in the section below the

Trace Viewer. For example, in the image below, we see the


start time and duration of the clicked event.

Deep learning with TensorFlow and Keras

90

Making predictions

Let's look at how to use the trained model to make predictions

on a new image. We start by loading a new image and adding

the batch dimension.

image_url = "https://upload.wikimedia.org/wikipedia/commons/b/b

1/Buttermilk_Beignets_%284515741642%29.jpg"

image_path = tf.keras.utils.get_file('Sample_Food', origin=imag

e_url)

test_image = tf.keras.utils.load_img(

image_path, target_size=(img_height, img_width)

img_array = tf.keras.utils.img_to_array(test_image)

img_array = tf.expand_dims(img_array, 0)

Deep learning with TensorFlow and Keras

91
Source

Next, we scale the image and run predictions on it.


img_array = img_array / 255.0

prediction = model.predict(img_array)

prediction

Deep learning with TensorFlow and Keras

92

We need to interpret this output to understand the type of food

in the image. To do that, we pass the output via the softmax

activation function. The addition of all outputs by the softmax

function sums to 1.

The network outputs a probability for each of the food

categories. We pass this to the softmax function and take the


maximum value to determine the category of the food.

import tensorflow as tf

import numpy as np

scores = tf.nn.softmax(prediction[0])

scores = scores.numpy()

f"{class_names[np.argmax(scores)]} with a { (100 * np.max(score

s)).round(2) } percent confidence."

# 'mussels with a 1.14 percent confidence.'

Deep learning with TensorFlow and Keras

93

CNN architectures

So far, we have been designing our own CNN networks.

However, we can use various CNN architectures to hasten this

process. These networks guarantee better performance for

image tasks, especially when you use pre-trained models. The

pre-trained networks can be used immediately to run

predictions on new images or fine-tuned via transfer learning to

be specific to a task.

Popular CNN architectures include:

Xception
ResNet50

InceptionV3

MobileNetV2

DenseNet121

NASNetLarge

EfficientNetB1

Model without weights

We can load any of the above CNN architectures using Keras

applications. Let's look at loading ResNet152 architecture. We

pass the weights argument as imagenet to load a network that

has been trained on the ImageNet dataset.

Deep learning with TensorFlow and Keras

94

model = tf.keras.applications.ResNet152(

include_top=True,

weights="imagenet",

input_tensor=None,

input_shape=None,

pooling=None,

classes=1000,
classifier_activation="softmax",

We can use this network to run predictions on new images

immediately. For instance, let's run prediction on the image we

used with the CNN network.

To do that, we ensure that the image size is 224 by 224. This is

the image size used to train this ResNet network. We also need

to process the image the same way the training images were

processed. Each of the Keras applications provides

a preprocess_input for doing this.

from tensorflow.keras.applications.resnet import preprocess_inp

ut, decode_predictions

test_image = tf.keras.utils.load_img(

image_path, target_size=(224, 224)

img_array = tf.keras.utils.img_to_array(test_image)

img_array = tf.expand_dims(img_array, 0)

x = preprocess_input(img_array)

preds = model.predict(x)

# decode the results into a list of tuples (class, description,


probability)

# (one such list for each sample in the batch)

print('Predicted:', decode_predictions(preds, top=3)[0])

# Predicted: [('n07836838', 'chocolate_sauce', 0.4584937), ('n0

Deep learning with TensorFlow and Keras

95

7693725', #'bagel', 0.2302542), ('n07695742', 'pretzel', 0.1816

1112)]

The network has determined the food image to either be a

chocolate sauce, bagel or pretzel.

When loading the ResNet152 network, we

included include_top as True. This means that the network will

be downloaded with the final fully-connected layer. This is ideal

when you want to use the network to make predictions

immediately. However, when you want to fine-tune the network

on custom data, you set this to false and then include another

final fully-connected layer that is specific to your task.


Model with weights

You may also want to load the CNN architecture without the

weights. Doing this means that you will start the training from

scratch. In most cases, you'll want to load the networks with the

weights to take advantage of the training that has already been

done.

model = tf.keras.applications.ResNet152(

include_top=True,

weights="imagenet",

input_tensor=None,

Deep learning with TensorFlow and Keras

96

input_shape=None,

pooling=None,

classes=1000,

classifier_activation="softmax",

Final thoughts

We have seen how to build Convolutional Neural Networks with

Keras and TensorFlow. We have also covered:


The steps of training a CNN.

Adding dropout regularization in CNNs.

Using batch normalization to speed up training.

Applying early stopping to train the network for fewer

epochs.

Plotting the learning curves of CNNs.

Visualizing the graph and histogram of CNN in

TensorBoard.

How to profile the training of the CNN with TensorBoard.

Data augmentation strategies for image tasks.

...among other topics.

Deep learning with TensorFlow and Keras

97

TensorFlow Recurrent Neural

Networks

Recurrent Neural Networks (RNNs) are a class of neural

networks that form associations between sequential data

points. For example, the average sales made per month over a

certain period. The data has a natural progression from month

to month, meaning that the sales for the first month are the only
independent sales. The rest are dependent on the sales made

prior.

Such deep learning techniques have found use in the fields of

natural language processing, time series analysis & prediction,

speech recognition, and image captioning, among others.

What is a Recurrent Neural

Network?

Recurrent Neural Networks are an improvement on feedforward

networks.

Feedforward neural networks are a form of neural networks

where the nodes form a strictly serial connection. They do not

have any cycles. Information flows sequentially from the input

node to the hidden layers until the output layer. These types of

neural networks have no recall ability. They do not store any

previously used information in memory and have difficulty

making predictions.

Deep learning with TensorFlow and Keras

98
Traditional Feed-Forward Network

In contrast, information in a Recurrent Neural Network cycles

through a loop. Recurrent Neural Networks have a hidden

input, which is the previous input from the earlier layers. Thus,

when making predictions, each layer considers the current

input and the lessons it learned from the previous inputs.


Deep learning with TensorFlow and Keras

99

Standard Recurrent Neural Network

Feedforward and Recurrent Neural Networks apply weights to

their inputs. Recurrent Neural Networks, however, have to

apply their weights to both the current and the previous input.

They also tweak the weights for the gradient descent during

backpropagation through time, which is the next concept to

uncover.

Backpropagation through time

Let's start with a definition of the concept of rudimentary

backpropagation.

The steps for training a neural network are as follows:

1. A forward pass through the layers from input to hidden

layers and the output layer to generate a prediction.

2. Comparison of the prediction with the actual value using

a loss function. The loss function gives us the marginal

error of the algorithm to determine how accurate it is.

3. Using the error, a second traversal is made backward from

the output layer to calculate gradients for each node


using the loss function and error.

Step three is what is referred to as b ackpropagation.

Gradients essentially define the learning ability of a particular

layer. A higher gradient results in a higher adjustment to the

weights in a specific layer. Each node within a neural network

usually has its gradient determined by the effects of the

Deep learning with TensorFlow and Keras

100

gradients made to the prior layer; therefore, the adjustments made to a


particular layer will be smaller than those of the

previous layer.

Backpropagation through time is an algorithm that adjusts

weights in neural networks with recall ability.

The backpropagation steps through time are as follows:

1. Present a sequence of timesteps of input and output pairs

to the network.

2. Unroll the network, then calculate and accumulate errors

across each timestep.

3. Roll up the network and update weights.

Unrolling or unfolding is a method of simplifying an RNN by


visualizing the different steps as a graph with no cycles. It is a

requirement for Recurrent Neural Networks because each

consecutive timestep requires the previous one to determine its

output.

Types of Recurrent Neural

Networks

There are four main types of neural networks:

One to One: This neural network takes in one input and

produces a single output. It is sometimes referred to as

Deep learning with TensorFlow and Keras

101

a vanilla neural network.

One to many: One to many neural networks have several

outputs for a single input. A typical example is an image

captioning RNN.

Many to one: A many to one RNN requires a sequence of

inputs to generate a single output. This type of RNN is

applicable in sentiment analysis. A sentence could

contain several tokens whose combination can be

determined to either be positive or negative.


Many to many: Many to many RNN models take a

sequence of inputs and produce a sequence of outputs. A

typical application is in machine translation.

Weaknesses of RNNs

Let's now talk about some of the challenges you will encounter

when using RNNs.


1. Vanishing gradient problem
The vanishing gradient problem is the Short-Term Memory

problem faced by standard RNNs:

1. The gradient determines the learning ability of the neural

network. The gradient, in turn, is set during

backpropagation.

2. A larger gradient means more ability to learn from specific

inputs. So with decreasing gradients, the learning ability is

Deep learning with TensorFlow and Keras

102

depleted until it reaches zero.

An activation function converts the (input * weight) + bias into

an output for the next layer. There are different activation

functions. For this illustration, let us take into account the

Sigmoid activation function.

The sigmoid activation function outputs a value between 0 and

1. If a series of layers were stacked with the sigmoid activation

function, they would result in an exponential gradient reduction

due to the chain rule of derivatives.


Backpropagation results in the neural network only being able

to learn from a specific range of inputs towards the end.

Meaning that the set of input values from the start would

eventually hold little or no value in determining the overall

prediction.

Take this example whereby we are trying to classify user intent

in a particular text:

If the text was, "Please go and get me a very big glass of water

now!".

As the input size increases, the learning ability of the model

from the initial values decreases. For demonstration, we could

Deep learning with TensorFlow and Keras

103

say in our case that only the text that comes after 'very' would be useful in
the text classifier neural network.
2. Exploding gradient problem
In exploding gradients, the gradients accumulate and become

so big that the updates made to the neural network weights are

very large during training. This occurs when the gradients of the

consecutive nodes are larger than 1.0.

Since the weights are updated to values larger than the

previous ones, the weights could grow to become so large that

they result in NaN values. At best, the neural network's

gradients would be so large that it cannot learn from the input

data. At worst, the weights could result in NaN values. Either

way, the result is an unstable neural network that does not give

accurate outputs.

Here's how to identify exploding gradient problem in a

neural network:

1. The loss function could always have poor results. The

neural network cannot learn to give accurate predictions.

2. If the changes made to the loss are very large after each

update.

3. If the loss reaches a NaN value.


4. If the weights grow large and quickly.

Deep learning with TensorFlow and Keras

104

5. If the weights go to a NaN value.

6. If the error gradients are consistently above 1.0.

Long-Short Term Memory

(LSTM)

Considering the RNNN weaknesses mentioned above, an

improvement was necessary to overcome them. These

weaknesses are essentially due to the rapid decay or rapid

increase in the gradients due to the chain rule of derivatives

being applied from node to node. These account for standard

RNNs failing to learn for large time steps (Around 5 - 10

discrete time steps).

Deep learning with TensorFlow and Keras

105
Opensource LSTM image by Wikimedia

LSTMs overcome the issues of vanishing gradients and

exploding gradients. They contain special units known as cells.

Each cell comprises one or more memory units and three

multiplicative units. These are referred to as the gates of the

cells.

Let us break down the functionality of the cells.

1. Input - This is the read gate. It retrieves relevant inputs to

allow the adjustment of weights in a particular node.

2. Output - This is the write gate. It allows for information in

the cell to adjust the weights of a particular node based on


the relevance of the information within.

Deep learning with TensorFlow and Keras

106

3. Reset - This is the forget gate. It gets rid of information within a cell that
is no longer necessary.

The memory units can be referred to as the remember gate.

This allows the LSTM network to retain information. The

memory units are what account for the long-term recall ability of

the LSTM neural network.

Applications of LSTM

LSTMs have a wide range of applications. Let's mention a

couple:

1. Handwriting recognition and generation.

2. Language modeling and translation.

3. Acoustic modeling and speech.

4. Speech synthesis.

5. Protein secondary structure prediction.

6. Analysis of audio and video data.

Bidirectional LSTM

LSTMs are built on the logic standard RNNs. So to define


Bidirectional LSTMs, it only makes sense to start with

Bidirectional RNNs. These represent each training sequence

forward and backward to two RNNs. The two RNNs are

Deep learning with TensorFlow and Keras

107

connected to the same output layer. The implication here is that the nodes in
the Bidirectional RNN have sequential information

about the points before and after them.

Nonetheless, they still face the same issues of exploding and

vanishing gradients. The solution would be to create

Bidirectional LSTMs that can access long-range contextual

information in both input directions.

Applications of Bi-LSTMs include:

1. Text classification.

2. Speech classification.

3. Forecasting models.

Time series analysis with LSTM

in TensorFlow

There are different ways to perform time series analysis. For

example, one could use statistics using the ARIMA, SARIMA,


and SARIMAX models.

In this example, we will keep the theme of this article and

implement a time series model using Recurrent Neural

Networks. This project aims to predict the total loan amount a

company could give out in a day.

The assumption is that the company makes a profit from the

loans it gives. If true, then there is a positive correlation

Deep learning with TensorFlow and Keras

108

between the amount of the loans given and the revenue

generated. Hence, by predicting future loans, we could predict

how much the company could make.

Imports

First, let us go through this project's imports and their

functionality.

import pandas as pd

import numpy as np

import tensorflow as tf

import keras

from keras.models import Sequential


from keras.layers import Dense, LSTM, Dropout

from sklearn.preprocessing import MinMaxScaler

import matplotlib.pyplot as plt

Pandas is used to load the dataset as a DataFrame along with other pre-
processing steps.

Numpy is used to manipulate arrays and matrices.

TensorFlow is used during the creation and evaluation of the

LSTM neural network.

We use the Keras Sequential API, which will stack layers. So

the data will move from the first layer, the input layer, to the

hidden layers up to the output layer.

Dense is used to make sure we have a fully connected neural

network.

Deep learning with TensorFlow and Keras

109

LSTM is the specific type of Recurrent Neural Network that we will be


using.

Dropout is used to ensure that we do not have an overfitted

model.

MinMaxScaler is used to normalize the dataset. This means that

the range of data will be reduced from 0 to 1.


Matplotlib is used to visualize our data.

Data pre-processing

Load the data and perform a couple of pre-processing steps.

loans = pd.read_csv("loans.csv")

loans = loans[['created_at','amount']]

loans['created_at'] = pd.DatetimeIndex(loans['created_at'])

loans = loans.groupby(['created_at']).amount.sum().reset_index

()

loans.sort_values(by=['created_at'], inplace=True)

loans = loans.set_index('created_at')

Deep learning with TensorFlow and Keras

110
The steps involved here are as follows:

1. Load the data from a .csv file.

2. Retrieve the created_at and amount fields from the dataset.

We are doing a univariate analysis, so we only require the

date and value we want to predict.

3. Get the cumulative sum of the loans given out on a

particular day by getting the sum of the loans.

4. Since we are attempting to get sequential data, it is

paramount that we ensure the data is stored in the proper

order. We sort the values by the created_at date.

5. Set the created_at field as the index.

Reduce the variance of the data by scaling it. Large variance

can lead to unwanted trends being caught in data.

Deep learning with TensorFlow and Keras

111
scaler = MinMaxScaler(feature_range=(0,1))

scaled_loans = scaler.fit_transform(loans)

Next, prepare the data for loading to the LSTM Model. We feed

the neural network with enough data that it can predict the next

steps. y_train is the target variable.

In this case, y_train is the value after each 60th

interval. x_train is each consecutive 60 values. Essentially, we

obtain the loans given for sixty days and use the value to

predict the loan on the sixty-first day.

x_train = []

y_train = []

for i in range(60, 1955):

x_train.append(scaled_loans[i-60:i, 0])

y_train.append(scaled_loans[i, 0])

Deep learning with TensorFlow and Keras

112

Next, convert x_train into a 3D array. This is because LSTM

takes in three-dimensional matrices as the input. The

original x_train was only two dimensions.

y_train = np.array(y_train)
x_train = np.array(x_train)

x_train = np.reshape(x_train, (np.shape(x_train)[0], np.shape(x

_train)[1], 1 ))

Create LSTM network in Keras

Let's design the LSTM network. We store the model in a

variable known as regressor . Next, define the layers of the

Sequential model. The first layer is the input layer. We can

stack LSTM layers to increase the correctness of the model.

This is done by setting the return_sequence parameter as True .

We define units=50 for the LSTM layer to ensure that we have

50 LSTM cells in a layer. The input layer has

the input_shape defined as the shape of one value of x_train .

regressor = Sequential()

regressor.add(LSTM(units=50, return_sequences=True, input_shape

=(np.shape(x_train)[1],1)))

regressor.add(Dropout(0.2))

regressor.add(LSTM(units=50, return_sequences=True))

regressor.add(Dropout(0.2))

regressor.add(LSTM(units=50, return_sequences=True))

regressor.add(Dropout(0.2))
regressor.add(LSTM(units=50))

Deep learning with TensorFlow and Keras

113

regressor.add(Dropout(0.2))

regressor.add(Dense(units=1))

regressor.add(Dropout(0.2))

regressor.add(Dense(units=1))

regressor.summary()

For each successive LSTM layer in the hidden layer, you will

find that it does not have the input shape defined since it takes

the input of the preceding layer. Also, other than the last LSTM

layer of the hidden layer, the return_sequence is set to true.

We use the Dense layer to ensure that we have a fully

connected neural network.

The summary() function of Sequential gives details on the neural

network, including:

The layers of the models.

The input shape at each layer.

The number of parameters at each layer.

It also gives us the total trainable and non-trainable


parameters.

Deep learning with TensorFlow and Keras

114

To convert this into a Bidirectional LSTM, wrap each layer with

Bidirectional so they can have previous and future information.

regressor.add(Bidirectional(LSTM(units=50, return_sequences=Tru
e)))

Compile the LSTM model

Deep learning with TensorFlow and Keras

115

The next step is to compile the model.

The first line sets the initial value for the learning rate. This

value will, in turn, be altered during the data fitting.

While compiling a Keras model, one of the parameters required

is the optimizer. We set the optimizer function as Adam

optimizer.

Using the EarlyStopping acts as a sort of a break

functionality in model training. We plan on using 150 epochs,

and it may be unnecessary to iterate through all of them if the

model does not improve at some point. We monitor

the loss and exit training if there is no improvement after20

epochs .

ModelCheckpoint is used to save the model with the best

performance.

lr_schedule = keras.callbacks.LearningRateScheduler(lambda epoc

h:1e-7 * 10**(epoch/20))
opt = tensorflow.keras.optimizers.Adam(learning_rate=1e-7)

regressor.compile(optimizer=opt, loss='mse', metrics=['mae','ma

pe'])

early_stopping = tf.keras.callbacks.EarlyStopping(monitor='los

s', mode='min',patience=20)

mc = tf.keras.callbacks.ModelCheckpoint('best_model.h5', monito

Deep learning with TensorFlow and Keras

116

r='loss', mode='min', verbose=0, save_best_only=True)

hist = regressor.fit(x_train,y_train, epochs=150, batch_size=3


2, callbacks=[mc, lr_schedule, early_stopping])

The last step here is to fit the x_train and y_train into the

regressor model.

To make predictions on one single set of sixty values:

prediction = regressor.predict( np.array( [x_train[0],]))

We have chosen to predict the first 60 values. If, for example,

we wanted to predict the next 30 steps, we would need to do

Deep learning with TensorFlow and Keras

117

the following:

1. Make a prediction on the last 60 values.

2. Take the last 59 digits of the x list and append the

prediction.

def predictSteps(x, steps):

if(steps == 0):

prediction = regressor.predict( np.array( [x,]))

print(prediction)

return prediction

else:

prediction = regressor.predict( np.array( [x,]))


pred = x[1:]

pred = np.append(pred,prediction)

steps = steps-1

print(prediction)

predictSteps(pred, steps)

This is a recursive function meant to predict n steps into the

future. x is the value to be predicted, and steps are the number

of steps into the future.

predictSteps(x_copy[-1], 10)

Deep learning with TensorFlow and Keras

118
LSTM model evaluation

To get the metrics used in the model access keys from the

history:

print(hist.history.keys())

We can visualize the progression of the loss of our model

using Matplotlib.

plt.plot(hist.history['loss'])

This will output the following line graph:

Deep learning with TensorFlow and Keras

119

We can also try to determine the error in predictions made by

the model.

def getError(actual, prediction):


m = keras.metrics.MeanAbsolutePercentageError()

n = keras.metrics.MeanAbsoluteError()

m.update_state(actual, prediction)

n.update_state(actual, prediction)

err = m.result().numpy()

err_1 = n.result().numpy()

return ({'MAE':err_1, 'MAPE':err})

Here we want to get both the MeanAbsoluteError and

the MeanPercentageError .

To make the predictions with this, all you need to do is make

predictions on the test dataset and store it in a variable

called y_preds , Then pass y_test and y_preds into

the getError() function.

Deep learning with TensorFlow and Keras

120

train_errors = getError(y_train, y_preds)

print(train_errors)
Intent classification with LSTM

Intent classification is a type of natural language processing

problem that involves determining the aim of a particular text.

For example, a person saying, "Please help me out." The intent

here can be stated as "Making a request."

Intent classification can be used by a company trying to keep

track of the products being referred to on their social media

accounts. For example, a bank offering mortgages, business

loans, personal loans, and savings accounts. When tracking

the posts and comments of their user base, they may need to

use intent classification to determine the product being

addressed and direct the post or comment to the appropriate

department.

Deep learning with TensorFlow and Keras

121

Other classification algorithms

include KNearestNeighbors , RandomForest , and SGDClassifier .

These, however, would only classify intent on statistics rather

than meaning.

LSTM Recurrent Neural Networks can memorize important


information. Therefore, sequences of words are taken into

account rather than just the word itself. This enables the word's

meaning within a particular context to be considered. We

accomplish this using embedding and encoding layers.

In this example, we will explore customer complaints about a

company. We only want to determine the particular aspect of

the company that the customers were complaining about.

Hence, we will be doing a bivariate analysis. We will analyze

the product and the customer complaint.

Imports

Let's start by making standard imports.

We added an Embedding layer to draw meaning from the words

in the sentences.

SpatialDropout1D is used to avoid overfitting. It works similarly to

Dropout but drops entire one-dimensional feature maps rather

Deep learning with TensorFlow and Keras

122

than the individual elements.

NLTK (Natural Language Processing ToolKit ) is used to identify


stopwords in text. Stopwords are common words in a certain
language that add any value in the classification task.

import pandas as pd

import re

import tensorflow as tf

import keras

from keras.models import Sequential

from keras.layers import Dense, LSTM, Dropout, Bidirectional, E

mbedding, SpatialDropout1D

import matplotlib.pyplot as plt

import nltk

nltk.download("stopwords")

from nltk.corpus import stopwords

Load dataset

Load the dataset into a variable and retrieve the required

columns. In this case, Consumer complaint

narrative and Product . Finally, we remove the null values.

complaints = pd.read_csv("complaints.csv")

complaints = complaints[['Consumer complaint narrative','Produc

t']]

complaints.dropna(inplace=True)
Deep learning with TensorFlow and Keras

123

Data cleaning

We need to remove unnecessary symbols from the text

data. symbols_regex contains a list of characters that need to be

replaced with a space. bad_symbols_regex comprises regex for

digits and other symbols combined with text data and hence

need to be removed without adding a space.

symbols_regex = re.compile('[/(){}\[\]\|@,;]')

bad_symbols_regex = re.compile('[^0-9a-z #+_]')


def clean_text(text):

text = text.replace('\d+','')

text = text.lower()

text = symbols_regex.sub(' ', text)

text = bad_symbols_regex.sub('', text)

text = text.replace('x', '')

return text

Deep learning with TensorFlow and Keras

124

complaints['Consumer complaint narrative'] = complaints['Consum er


complaint narrative'].apply(clean_text)

The clean_text function performs the following operations:

1. Convert the data to lowercase. Since "list" in the middle of

a sentence and "List" at the beginning of a sentence should

not be considered different words.

2. Remove all the digits.

3. Replaces symbols with space.

4. In the text, we see several instances of combinations

of X being used to mask specific data such as phone

numbers. These are not words with meaning; hence also


need to be removed.

Label exploration

Let's look at the number of complaints in each category.

complaints['Product'].value_counts().sort_values(ascending=Fals

e)

Deep learning with TensorFlow and Keras

125

Text vectorization

Let's tokenize the sentences into individual words. We set the

maximum number of words used by the TextVectorizer using

the max_tokens parameter.

vectorize_layer = tf.keras.layers.TextVectorization(standardize

='lower_and_strip_punctuation',max_tokens=5000,output_mode='in

t',output_sequence_length=512)

vectorize_layer.adapt(complaints_text,batch_size=None)
X_train_padded = vectorize_layer(complaints_text)

X_train_padded = X_train_padded.numpy()

Next, we convert the text into sequences, then pad the

sequences to ensure that the sequences are all reset to 512.

Since the neural network can only have numbers as its input,

we use LabelEncoder to transform the target data into

numbers.

le = sklearn.preprocessing.LabelEncoder()

complaints['Product'] = le.fit_transform(complaints['Product'])

y = complaints['Product']

Deep learning with TensorFlow and Keras

126

Next, separate the dataset into training and testing datasets.

The testing dataset is 30 percent. random_state is set to 42 to

avoid inconsistent results with each training. This prevents data

leakage, where part of our testing dataset is used for training.

X_train, X_test, y_train, y_test = train_test_split(X,y, test_s


ize = 0.3, random_state = 42)

Create LSTM network

Other than the Embedding layer, the other layers have

functionality detailed earlier in this article.

classifier = Sequential()

classifier.add(Embedding(50000, 100, input_length=X.shape[1]))

classifier.add(SpatialDropout1D(0.2))

classifier.add(LSTM(100, dropout=0.2, recurrent_dropout=0.2))

classifier.add(Dense(17, activation='softmax'))

The Embedding layer represents the tokens as a dense vector. A

word's position within a vector space depends on the words

surrounding it. This is how we assign meaning to a word

depending on the context in which it is used.

Deep learning with TensorFlow and Keras

127

The final layer of the model has 17 cells – for the 17 different

outputs. We use the softmax activation function because this is


a multiclass labeling problem.

LSTM model evaluation

Let's evaluate the model by making predictions on the test set.

classifier.evaluate(X_test,y_test)

Final thoughts

We have explored the natural progression of concepts from

traditional feed forward networks to Recurrent Neural Networks.

The difference, in this case, was the looping mechanism in an

RNN that allows it to have recall ability. Hence it can use

previous information to come up with its predictions.

Feedforward networks take in the current input and use it to

generate a prediction. RNNs, on the other hand, use both the

current and previous inputs to come up with predictions. Hence

they are more suited to predicting progressive data, such as in

time series analysis.

Deep learning with TensorFlow and Keras

128

We also went through the weaknesses of standard RNNs and

how they impact their performance with regard to their short-

term memory. This is solved by making use of LSTM RNNs.


These contain four gates that enable them to have long-term

recall ability.

Transfer learning guide

Training computer vision (CV) or natural language processing

(NLP) models can be expensive and requires large datasets. If

labeling is done manually, the process will take a longer training

time and requires expensive hardware.

For instance, the Generative Pre-trained Transformer 2 ( GPT-

2), a benchmark-setting language model created by Open AI in

2019, is estimated to have cost around $1.6m to train. Such a cost would
make it difficult for individuals or small organizations

conducting research in NLP to compete with Open AI.

Luckily, you do not have to train models such as GPT- 2. You can obtain a
copy of such models free of charge from the

internet, fine-tune them to match your requirements and

specific dataset, and obtain results faster.

Deep learning with TensorFlow and Keras

129

In this guide, we will explore the concept of transfer

learning and its applications in computer vision and natural

language processing.
What is transfer learning?

Transfer learning is the process where a model built for a

problem is reused for a different or similar task. This

technique is commonly used in c omputer vision and

n atural language processing, where previously trained

models are used as the base for new related problems to

save time.

The pre-trained base models are trained on large benchmark

datasets in computer vision and natural language processing.

The resulting weights are used on another task with a smaller

set of data.

This approach not only reduces training time but also lowers

the generalization error. The figure below depicts the transfer

learning idea.

Deep learning with TensorFlow and Keras

130
Transfer learning

Advantages of using pre-trained models

The main advantage of using pre-trained models is their

general adaptability for use in other real-world applications. For

example:

For problems that require researchers to build models but

face downstream requirements including latency constraints or specific data


domains, models such

as GAIA and COCO trained on ImageNet can be used

effectively.

In NLP, classifying text data requires knowledge of word

representations in some vector space. Training custom

representations is feasible, except that you might not have

sufficient data to train the embeddings. In such cases, pre-


trained word embeddings like Word2Vec and GloVe, are

used to speed up the modeling process.

Deep learning with TensorFlow and Keras

131

Types of transfer learning

Transfer learning can be categorized into three groups

depending on the machine learning model involved.

Inductive transfer learning

In inductive transfer learning, the source and target domains

are the same. However, the source and target problems vary.

The models use the knowledge and inductive biases of the

source (base) model to improve the target problem.

Inductive transfer learning can be grouped into two

subcategories, i.e., multitask learning and self-taught

learning. The pre-trained model already has domain features

and is a better starting point than training the model from

scratch.

Inductive transfer learning involves extending well-known

classification and inference models, including neural networks,

Bayesian networks, and Markov Logic Networks.


Unsupervised transfer learning

Unsupervised transfer learning focuses on unsupervised

tasks in the target domain where the source and target

domains are similar, but the problems differ. In this case, the

data in either domain is not labeled.

Deep learning with TensorFlow and Keras

132

In unsupervised and semi-supervised settings, transfer learning assumes that


a reasonably sized dataset exists in the target

task, but it is generally unlabeled, following the expense of

having an individual assign the labels manually.

Transductive transfer learning

In transductive transfer learning, the domains of the source

and target problems are not exactly similar but have

interrelated uses. One can derive similarities between the

source and target tasks.

These settings generally use a lot of labeled data in the source

domain, whereas the target domain contains only unlabeled

data. Transductive transfer learning can be grouped into

subcategories, depending on whether the feature spaces are


different or the marginal probabilities.

Further, transfer learning can be categorized depending on the

similarity between the domain and independence of the type of

data samples used for training, i.e., homogeneous and

heterogeneous transfer learning.

Homogeneous transfer learning

Homogeneous transfer learning techniques are created and

used to handle scenarios where the domains originate from the

same feature space. In this transfer learning technique, the

domains slightly differ in marginal distributions.

Deep learning with TensorFlow and Keras

133

These techniques adapt the domains by correcting the sample

selection bias or covariate shift. Examples of homogeneous

transfer learning include instance, parameter, and relational-

knowledge transfer.

Heterogeneous transfer learning

Heterogeneous transfer learning techniques derive

representations from a pre-trained network to obtain meaningful

features from new samples for a closely related task.


Nonetheless, these techniques do not account for the existing

difference in the feature spaces of source and target domains.

It is time-consuming and expensive to collect labeled source

data with the same feature space as the target domain. In such

cases, heterogeneous transfer learning techniques address

such constraints.

Heterogeneous transfer learning addresses the problem of the

source and target domains containing different feature spaces

and other issues such as different data distributions and label

spaces. This type of transfer learning is prevalent in cross-

domain problems like cross-language text categorization, text-

to-image classification, etcetera.

Deep learning with TensorFlow and Keras

134
What is the difference between

transfer learning and fine-

tuning?

Transfer learning is a setting where weights obtained in one

problem– e.g., on a large-scale image classification task– are

exploited to improve generalization in another problem (say,

fruit classification) without having to train the models from

scratch.

Fine-tuning is an optional step in transfer learning used to

improve the model's performance. This step usually involves

adapting the pre-trained model to a particular task or problem.


Deep learning with TensorFlow and Keras

135

However, since the entire model has to be re-trained, it is likely to overfit.


Overfitting fine-tuned models can be solved by retraining the model or part
of it using a low learning rate to

prevent significant updates to the gradient, which results in

poor predictive performance.

Applying a callback to stop the training process when the

model's performance is not improving is also helpful for

obtaining better weights.

Other approaches to prevent overfitting include:

Increasing the batch size.

Using different regularization techniques.

Why use transfer learning?

Training neural network models require a lot of data which is

not always available. On top of that, you need resources such

as infrastructure to train the models.

Transfer learning offers numerous advantages. Reducing

training time because training large models can take several

days or weeks is key among them.

When do you use transfer learning?


Transfer learning is mainly used when:

There isn't enough labeled data to train the model.

Deep learning with TensorFlow and Keras

136

An existing pre-trained model has already been trained on similar data


and problems– avoid reinventing the wheel.

If you have trained an initial model, you might restore it

and re-train some layers for a new problem.

When does transfer learning not

work?

Even though transfer learning has transformed model

development in computer vision and natural language

processing domains, the process might fail sometimes. In other

cases, you might end up with modest results. Such cases occur

when there is dissimilarity in datasets and domains.

There is a significant relationship between domain divergence

and transfer learning performance. Different transfer learning

strategies and techniques are applied based on the domain of

the application, the problem at hand, and the available data.

Therefore, using a pre-trained model from a different domain on


an unrelated problem or unrelated data might underperform or

fail altogether because features transfer poorly.

For instance, training a DenseNet model on ImageNet and

using it to fine-tune on a medical dataset. Another case is using

a pre-trained GPT model on WikiText 103 for the medical

Deep learning with TensorFlow and Keras

137

articles dataset. To solve this, do domain adaption first and then train the
models on the target task to improve performance.

Another dissimilarity could be related to variations in

the dimension of the inputs from these datasets. For

instance, using a pre-trained ResNet for CIFAR-10 (32x32px) to

fine-tune on MNIST Handwritten Digit Classification (28x28px).

How to implement transfer

learning?

Transfer learning involves taking features from one task and

using them on a similar task. For instance, features from a

model trained to identify rats may be useful as a base model

trained to identify mice. This section will go through the process

of implementing transfer learning.


Transfer learning in 6 steps

The process of implementing transfer learning follows six major

steps, as shown below:

Deep learning with TensorFlow and Keras

138

A pre-trained model can be used directly to classify new

images as one of the 1,000 known classes included in the

image classification task in the ILSVRC (ImageNet).

A sample transfer learning using a model trained on the


ImageNet dataset and used on a smaller data set, i.e., the food

dataset, is shown below.

Deep learning with TensorFlow and Keras

139

Step 1: Obtain the pre-trained model

After determining the suitable model for your problem, the next

step involves acquiring the model. More about this later in the

article.

Most pre-trained models have different architectures. Neural


network architectures fall into two main groups, i.e., supervised

learning, where the data used for training is labeled, and

unsupervised learning, where the data is unlabeled. The main

architectures for supervised learning include Convolutional

Deep learning with TensorFlow and Keras

140

Neural Networks (CNNs) and Recurrent Neural Networks

(RNNs).

Step 2: Create a base model

The initial step in each transfer learning task involves

instantiating the base model using a preferred architecture


such as VGG or ResNet with or without pre-trained weights.

Failure to download the weights means that the model will be

trained from scratch.

Since the dataset used to train the base model contains a

superset of the labels in your dataset, it tends to have more

units in the final output layer than you need. You will have to

drop the final output layer and incorporate a final output layer

compatible with your task.

Deep learning with TensorFlow and Keras

141
Step 3: Freeze layers so they don’t change

during training

In most cases, your dataset will be small. In this scenario, you

can freeze the initial (let's say k) layers of the pre-trained model and train a
new model with the remaining(n-k) layers.

Freezing layers prevents the weights in those layers from

being re-initialized since they will lose all the learned

information, which will be the same as training a new model

from scratch.

To prevent re-initialization of weights, specify the base model to

be non-trainable.

Deep learning with TensorFlow and Keras

142

base_model.trainable = False
Or using:

base_model.trainable = 0

Step 4: Add new trainable layers

We are only using the feature extraction layers from the base

model. It is important to include additional layers on top of the

frozen layers to learn new weights and features for the new

specialized tasks. The added layers are generally the final

output layers excluded from the base model.

Deep learning with TensorFlow and Keras

143

Step 5: Train the new layers on the dataset

Include a dense output layer that contains units

corresponding to the number of outputs expected by the

model. For example, the final dense layer should represent the

class scores in the food classification problem.

Step 6: Improve the model via fine-tuning


Fine-tuning aims to improve the performance of the model. It

is done by unfreezing the entire base model or part of it and

then re-training the resulting model on the whole dataset using

a relatively low learning rate.

Setting a low learning rate for the model will prevent overfitting

while improving the model's performance on the new problem.

Recall that the original model was defined with many

parameters fit for the large dataset. Therefore, it is essential to

have a very low learning rate because you are training the

original model on a smaller dataset. Failure to do this will lead

to overfitting.

Deep learning with TensorFlow and Keras

144
When using the base model with frozen layers, the model's

behavior is also frozen during the execution of the compile

function. As such, you will need to execute the compile function

again whenever any changes are made to the model for the

model's behavior to adapt to the new changes.

After making the changes, compile the resulting model to allow

the changes to take effect. Subsequently, you will train the

model again while monitoring it using callbacks to prevent the

model from overfitting.

Deep learning with TensorFlow and Keras


145

Where to find pre-trained

models?

Pre-trained models can be obtained from the internet through

various sources, including:

Keras applications

Tensorflow Hub

PyTorch Hub

In this section, we will explore how to fetch pre-trained models.

Keras pre-trained models

Keras provides access to approximately 35 fully-

trained convolutional neural networks. Currently, the EfficientNetV2L


architecture has the highest top accuracy,

i.e., 97.5%, while MobileNet has the least top accuracy, i.e., 89.5%.

You can select any model for your problem. Once you have

downloaded a pre-trained model, you obtain the pre-trained

weights. Both are stored in the ~/.keras/models/ directory.

The following example demonstrates how to instantiate the

smaller version of MobileNetV2 architecture trained on ImageNet.

Deep learning with TensorFlow and Keras


146

import tensorflow as tf

IMAGE_SIZE = 224 # define images size

pretrained_model = tf.keras.applications.MobileNetV3Small(

input_shape = (IMAGE_SIZE, IMAGE_SIZE, 3),

alpha=1.0,

include_top=True,

weights="imagenet",

input_tensor=None,

pooling=None,

classes=1000,
classifier_activation="softmax"

pretrained_model.trainable = False

#summary of the architecture

pretrained_model.summary()

Transfer learning using

TensorFlow Hub

Deep learning with TensorFlow and Keras

147

You can also obtain pre-trained models from the TensorFlow

Hub, which lets you search and access hundreds of trained,

ready-to-deploy machine learning models in one place. Sticking

to the smaller version of MobileNetV2 architecture, you can obtain it from


TensorFlow Hub, as shown below:

import tensorflow as tf
import tensorflow_hub as hub

#link to the pre-trained model

mobilenet_v2 ="https://tfhub.dev/google/imagenet/mobilenet_v3_s

mall_100_224/classification/5"

#define the model name you want to acquire

classifier_model = mobilenet_v2

IMAGE_SHAPE = 224

classifier = tf.keras.Sequential([

hub.KerasLayer(classifier_model,

input_shape=(IMAGE_SHAPE, IMAGE_SHAPE, 3))

])

classifier.summary()

Pretrained word embeddings

Deep learning with TensorFlow and Keras

148
In NLP applications, the goal is to generate a representation of

words that store their meanings, semantic associations, and the

different types of contexts in which they are applied in. These

representations are referred to as word embeddings.

Some sources for pre-trained word embeddings include:

Stanford’s GloVe pre-trained word embeddings

Google’s Word2vec

Fasttext

Stanford’s GloVe pre-trained word embeddings

Below is an example of an implementation for

the GloVe pre-trained word embeddings.

# download glove and unzip it in Notebook.

!wget http://nlp.stanford.edu/data/glove.6B.zip

!unzip glove*.zip

from tensorflow.keras.preprocessing.text import Tokenizer

from tensorflow.keras.preprocessing.sequence import pad_sequenc

Deep learning with TensorFlow and Keras

149

es

import numpy as np
x = {'the', 'match', 'score', 'prime',

'player', 'manager', 'league'}

# create the dictionary.

tokenizer = Tokenizer()

tokenizer.fit_on_texts(x)

#define a utility function for embedding using glove

def embedding_for_vocab(file_path, word_index,

embedding_dimension):

vocabulary_size = len(word_index) + 1

# Adding again 1 because of reserved 0 index

VocabEmbeddingMatrix = np.zeros((vocabulary_size,

embedding_dimension))

with open(file_path, encoding="utf8") as f:

for line in f:

word, *vector = line.split()

if word in word_index:

idx = word_index[word]

VocabEmbeddingMatrix[idx] = np.array(

vector, dtype=np.float32)[:embedding_dimens

ion]
return VocabEmbeddingMatrix

# matrix for vocab: word_index

embedding_dimension = 50

VocabEmbeddingMatrix = embedding_for_vocab(

'glove.6B.50d.txt', tokenizer.word_index,

embedding_dimension)

print("Dense vector for first entry is => ",

VocabEmbeddingMatrix[1])

Deep learning with TensorFlow and Keras

150

Google’s Word2vec

Google's Word2Vec contains models such as DBPedia vectors

(wiki2vec) and Google News.

#download the model

!wget http://vectors.nlpl.eu/repository/20/51.zip

#unzip
!unzip 51.zip

#gzip the model for loading

!gzip model.bin

In this example, we use the model to obtain words with high

cosine similarity with a list of words.

import gensim

from gensim.models import word2vec

from gensim.models import KeyedVectors

from sklearn.metrics.pairwise import cosine_similarity

EMBEDDING_FILE = 'model.bin.gz'

word_vectors = KeyedVectors.load_word2vec_format(EMBEDDING_FIL

E, binary=True)

#get most similar words in the word vector

result = word_vectors.most_similar(positive=['player', 'leagu

e'], negative=['man'])

most_similar_key, similarity = result[0] # look at the first m

atch

print(f"{most_similar_key}: {similarity:.4f}")

Deep learning with TensorFlow and Keras

151
Fasttext

Install Fasttext glonnlp and mxnet .

pip install fasttext

pip install mxnet

pip install gluonnlp

You can access Fasttext pre-trained models using gluonnlp .

Below is an implementation of the fastText embeddings trained

on the wiki.simple dataset.

def tokenizer(source_str, token_delim=' ', seq_delim='\n'):

import re

'''Utility function for tokenizing'''

tokens = filter(None, re.split(token_delim + '|' + seq_deli

m, source_str))

return tokens

sentence = "The player scored twice during the match "

counter = nlp.data.count_tokens(tokenizer(sentence))

#create vocabulary
vocab = nlp.Vocab(counter)

#attach embedding

vocab.set_embedding(fasttext_model)

#check the embedding vector

vocab.embedding['player'][:5]

Deep learning with TensorFlow and Keras

152

Hugging Face

Our TensorBoard guide will demonstrate how to use Hugging

Face for NLP tasks. You can use HuggingFace for problems

related to:

Question answering

Summarization

Translation

Text generation

For instance, we might be interested in determining the named

entities in a given text. To do so, we will use the BERT model

with Huggin Face.

Install transformers:

pip install transformers sentencepiece


from transformers import AutoTokenizer, AutoModelForTokenClassi

fication

from transformers import pipeline

tokenizer = AutoTokenizer.from_pretrained("dslim/bert-base-NE

R")

model = AutoModelForTokenClassification.from_pretrained("dslim/

bert-base-NER")

nlp = pipeline("ner", model=model, tokenizer=tokenizer)

sentence = "The player scored twice during the match in Moscow

and helped Brendan Rodgers manager win the league"

ner_results = nlp(sentence)

print(ner_results)

Deep learning with TensorFlow and Keras

153

Transfer learning with PyTorch


PyTorch offers a range of deep learning functionalities,

including pre-trained models. You can access a pre-trained

model in PyTorch, as shown below.

import torchvision

model_conv = torchvision.models.resnet18(pretrained=True)

Check out additional implementations of transfer learning using

PyTorch in the official documentation.

How can you use pre-trained

models?

Pre-trained neural networks can be used for prediction, feature

extraction, and fine-tuning.

Let's explore these functions further.

Deep learning with TensorFlow and Keras

154
Prediction

You can obtain a pre-trained model and use it for predictions

without modification. The example below demonstrates the use

of VGG16 for predicting the category of a rabbit.

Photo by Satyabratasm on Unsplash


Deep learning with TensorFlow and Keras

155

from keras.preprocessing.image import load_img

# load an image from path

path = 'satyabratasm-u_kMWN-BWyU-unsplash.jpg'

img = load_img(path, target_size=(224, 224))

from keras.preprocessing.image import img_to_array

# convert the pixels to a numpy array

img = img_to_array(img)

# reshape data for the pre-trained VGG model

img = img.reshape((1, img.shape[0], img.shape[1], img.shape

[2]))

from keras.applications.vgg16 import preprocess_input

# transform the img for the pre-trained VGG model

img = preprocess_input(img)

# predict the probability for the output classes used in ImageN

et

yhat = model.predict(img)
from keras.applications.vgg16 import decode_predictions

# convert the probabilities to discrete class labels

label = decode_predictions(yhat, top = 5)

# Get the most likely output with the highest probability

label = label[0][0]

# Show the predicted class

print('%s (%.2f%%)' % (label[1], label[2]*100))

Feature extraction

Consider the case of a VGG16 model with 16 layers and a

VGG19 model with 19 layers. The figure below illustrates the

architecture of a VGG16 model where the input layer accepts

an image of dimensions (224 x 224 x 3) with an output of

sigmoid prediction on the 1000 classes of ImageNet.

Deep learning with TensorFlow and Keras

156
From the VGG16 model below, the input layer to the last max

pooling layer (defined by 7 x 7 x 512) is considered the feature

extraction part of the neural network. The remainder of the

model is known as the classification function of the model.

Here's an implementation of the feature extraction process

using a pre-trained VGG16 model.

import tensorflow as tf

from keras.applications.vgg16 import VGG16, preprocess_input

import numpy as np

#pre-trained model

model = VGG16(weights='imagenet', include_top=False)

#image for feature extraction


image_path = 'satyabratasm-u_kMWN-BWyU-unsplash.jpg'

image = tf.keras.utils.load_img(image_path, target_size=(224, 2

24))

from keras.preprocessing.image import img_to_array

image_data = img_to_array(image)

Deep learning with TensorFlow and Keras

157

image_data = np.expand_dims(image_data, axis=0)

image_data = preprocess_input(image_data)

extracted_features = model.predict(image_data)

print (extracted_features.shape)

The extracted features can be used in additional machine

learning tasks like clustering similar images and principal

component analysis (PCA).


Fine-tuning

Fine-tuning is a transfer learning technique where we modify a

model's output so that it learns about the new problem and

train only the output of the model depending on this

problem.

The process involves unfreezing the base model you have

acquired (or part of it) and re- training the model on the new

problem's data with a very low learning rate.

Deep learning with TensorFlow and Keras

158

Fine-tuning is aimed at improving the performance of the

model. Since there are more parameters in the base model,

some of which are related to other tasks, the process allows

you to create feature representations from the base model such

that the features are more relevant to your specified task.

# begin by unfreezing all layers of the base model

model.trainable = True

#Apart from the 10 last layers, freeze all the other layers

for layer in model.layers[:-10]:

layer.trainable = False
# compile and retrain with a very low learning rate

# compile and start training after freezing the layers

learning_rate = 1e-4

low_learning_rate = learning_rate / 100

#recompile the model with the new learning rate

model.compile(loss = 'binary_crossentropy',

optimizer = tf.keras.optimizers.RMSprop(learning_

rate = low_learning_rate),

metrics = ['acc']

You can adapt weights from the pre-trained model as initial

weights for the model. There are no standard hyper-

parameters that work for all transfer learning problems.

The best choice here depends on the new task. You might

need to experiment with different approaches before settling on

your task's optimal weights and hyper-parameters.

Deep learning with TensorFlow and Keras

159

Example of transfer learning for

images with Keras


We now know that the process of utilizing pre-trained models

for similar tasks follows five general steps:

1. Obtain weights from a previously trained model.

2. Freeze the layers in the base model to retain the

information learned during the original training of the base

model.

3. Include additional trainable layers specific to the task on

top of the frozen layers.

4. Train the new model on your dataset.

5. Fine-tune the base model using a low learning rate for

potential improvement of the model's performance

(optional).

Let's implement transfer learning using image and text data to

practice what we have learned.

Transfer learning with image

data

The theory behind transfer learning with images follows the

argument that if a model is trained on a large and general

enough dataset, the model can act as a generic model for other

similar sub-tasks.
Deep learning with TensorFlow and Keras

160

This approach is popular in computer vision and image

detection. As we demonstrate in this section, it helps achieve

relatively good results while reducing the time taken to train the

model compared to training the model from scratch.

We will be working with the cat-dog dataset.

Getting the dataset

The data can be obtained from TensorFlow datasets or downloaded from the
dataset's repository. We will demonstrate

both approaches.

#if the link below is broken, go to https://www.microsoft.com/e

n-us/download/confirmation.aspx?id=54765

#to obtain a new download link

!wget --no-check-certificate \

"https://download.microsoft.com/download/3/E/1/3E1C3F21-ECD

B-4869-8368-6DEBA77B919F/kagglecatsanddogs_5340.zip"
#remove previous files

!rm -rf PetImages

#unzip the dataset

!unzip -qq kagglecatsanddogs_5340.zip

Loading the dataset from a directory

Load the images

using image_dataset_from_directory from TensorFlow.

Deep learning with TensorFlow and Keras

161

from tensorflow.keras.preprocessing import image_dataset_from_d

irectory

dir = "PetImages/"

data = image_dataset_from_directory(dir,

shuffle=True,

batch_size=32,

image_size=(150, 1

50))

Alternatively, using TensorFlow datasets.

import tensorflow_datasets as tfds


#tfds.disable_progress_bar()

train_data, validation_data, test_data = tfds.load(

"cats_vs_dogs",

# Reserve 20% for validation and 10% for test

split=["train[:40%]", "train[40%:50%]", "train[50%:60%]"],


as_supervised=True, # Include labels

print("There are %d training samples" % tf.data.experimental.ca

rdinality(train_data))

print(

"There are %d validation samples" % tf.data.experimental.ca

rdinality(validation_data)

print("There are %d test samples" % tf.data.experimental.cardin

ality(test_data))

Deep learning with TensorFlow and Keras

162

Plot samples images using Matplotlib.


plt.figure(figsize=(10, 10))

for i, (img, label) in enumerate(train_data.take(4)):

ax = plt.subplot(2, 2, i + 1)

plt.imshow(img)

plt.title(int(label))

plt.axis("off")

plt.suptitle("Sample images (Cat :0, Dog:1)")

plt.show()

Deep learning with TensorFlow and Keras

163
Data pre-processing

Data pre-processing is an essential step in any machine

learning pipeline. We have a small dataset. Therefore, it is

advisable to initiate sample diversity by applying random but

realistic transformations to the training data. Some of the

transformations for image data include:


1. Random horizontal flipping or small random
rotations
Deep learning with TensorFlow and Keras

164

2. Gray-scaling

3. Shifts

4. Flips

5. Brightness
6. Zoom

These transformations are implemented through

augmentation. Data augmentation helps to expose the model

to different aspects of the training data, which helps to prevent

overfitting. Keras provides various augmentation layers.

from tensorflow import keras

from tensorflow.keras import layers

data_augmentation = keras.Sequential(

[layers.RandomFlip("horizontal"), #flips images

layers.RandomRotation(0.1),#randomly rotates images

layers.RandomZoom(.5, .2), #randomly zooms images

layers.RandomFlip(

mode="horizontal_and_vertical", seed=None #randomly flips i

mages

The augmentation will be done during training. Below is a

sample transformation on an image.


import numpy as np

for images, labels in train_data.take(1):

plt.figure(figsize=(10, 10))

Deep learning with TensorFlow and Keras

165

first_image = images[7]

for i in range(9):

ax = plt.subplot(3, 3, i + 1)

augmented_image = data_augmentation(

tf.expand_dims(first_image, 0), training=True


)

plt.imshow(augmented_image[0].numpy().astype("int32"))

plt.axis("off")

plt.suptitle("Sample preprocessed image")

plt.show();

Deep learning with TensorFlow and Keras

166

Create a base model from the pre-trained

Inception model

Obtain the Inception pre-trained model. We will use

the InceptionV3 version of the Inception model. We initialize the

model with saved weights and exclude the top layer

( include_top=False ) since it contains output information from the

complete ImageNet dataset relevant to this task. The input shape is


150,150,3 because that is the size required by the

Inception model.

base_model = keras.applications.InceptionV3(

weights="imagenet", # Load weights pre-trained on ImageNe

t.

input_shape=(150, 150, 3),


include_top=False, #Exclude ImageNet classifier at the top

Remember, we want to use the saved knowledge in the base

model. Therefore, we will freeze all layers, so they are not

updated when compiling the model. Otherwise, we'd be training

the model from scratch.

# Freeze the base_model

base_model.trainable = False

Create the final dense layer

We excluded the top layer when acquiring

the InceptionV3 model. We will instead include a new final

dense layer for the model.

Deep learning with TensorFlow and Keras

167

Start by applying the augmentation strategies to the input

images.

#standardize the input

inputs = keras.Input(shape=(150, 150, 3))

x = data_augmentation(inputs) # Apply random data augmentation

Pre-trained Inception weights expect the input to be scaled


from (0, 255) (-1., +1.). For this, we will use the Rescaling layer.

#rescale

scale_layer = keras.layers.Rescaling(scale=1 / 127.5, offset=-

1)

x = scale_layer(x)

Finally, we get to define the model:

1. Define the base model to run in inference mode to prevent

updating the batch normalization layers ( training=False ).


2. Generate features from the base model
using GlobalAveragePooling2D .

3. Apply dropout regularization.

4. Add a final dense layer.

x = base_model(x, training=False)

x = keras.layers.GlobalAveragePooling2D()(x)

x = keras.layers.Dropout(0.2)(x) # Regularize with dropout

outputs = keras.layers.Dense(1)(x)

model = keras.Model(inputs, outputs)

model.summary()

Deep learning with TensorFlow and Keras

168
The new dense layer has 1,281 parameters which we will be

training in the next step.

The model is defined using the Keras Functional API.

Train the model

Time to train the model using the pre-processed data. As

expected, the base model allows us to obtain relatively high

accuracy.

First, compile the model to include the additional layer and train

it over a few epochs. Define:

An EarlyStopping callback to stop training if there is no

improvement after three epochs.

Deep learning with TensorFlow and Keras


169

A TensorBoard callback to track the model's performance.

from tensorflow.keras.callbacks import EarlyStopping, TensorBoa

rd

!rm -rf image_logs

%load_ext tensorboard

log_folder = 'image_logs'

callbacks = [

EarlyStopping(patience = 3),

TensorBoard(log_dir=log_folder)

#compile the model to

model.compile(optimizer='adam',

loss=tf.keras.losses.BinaryCrossentropy(from_logi

ts=True),

metrics=keras.metrics.BinaryAccuracy())

hist = model.fit(train_data,
epochs=5,

validation_data = validation_data, callbacks =

callbacks)

The model attains an impressive 95.96% classification

accuracy on the test data with a loss of approximately 0.1127.

#evaluate performance on test data

loss, accuracy = model.evaluate(test_data)

print("Fine-tuned model accuracy:", round(accuracy, 4)*100)

print("Fine-tuned model loss:", round(loss, 4))

Deep learning with TensorFlow and Keras

170
%reload_ext tensorboard

%tensorboard --logdir {'image_logs/'}

Fine-tuning the model

Fine-tuning is an optional step aiming to improve the model's

performance. However, fine-tuning has to be done with care

because it is easy to overfit. Overfitting is prevented by setting

a low learning rate.

First, unfreeze the layers you would like to re-train. In this

case, unfreeze the last five layers.

Deep learning with TensorFlow and Keras

171
#unfreeze the base model

base_model.trainable = False

#Apart from the 5 last layers, freeze all the other layers

for layer in model.layers[:-5]:

layer.trainable = True #unfreeze layers

model.summary()

Now we have more trainable parameters. The next step is to

compile the model to update the parameters.

learning_rate = 1e-5

model.compile(

optimizer=keras.optimizers.Adam(learning_rate), # Low lear

ning rate
loss=keras.losses.BinaryCrossentropy(from_logits=True),

metrics=[keras.metrics.BinaryAccuracy()],

Deep learning with TensorFlow and Keras

172

Next, train the model on a few epochs and a low learning rate.

!rm -rf fine_tune_logs

%load_ext tensorboard

log_folder = 'fine_tune_logs'

callbacks = [

EarlyStopping(patience = 5),

TensorBoard(log_dir=log_folder)

epochs = 5

hist1 = model.fit(train_data,

epochs=epochs,

validation_data=validation_data,callbacks=callbacks)
To prevent overfitting, monitor the training loss using a callback

such that the training stops when there is no improvement to

the model performance after three epochs.

The model's performance improved slightly, with an accuracy of

approximately 96% and a loss of 0.1049 on the test data.

#evaluate performance on test data

loss, accuracy = model.evaluate(test_data)

print("Fine-tuned model accuracy:", round(accuracy, 4)*100)

print("Fine-tuned model loss:", round(loss, 4))

Deep learning with TensorFlow and Keras

173
%reload_ext tensorboard

%tensorboard --logdir {'fine_tune_logs/'}

Example of transfer learning

with natural language

processing

There are various types of transfer learning in natural language

processing (NLP), including inductive and transductive learning.

In this section, we will see transfer learning in action for NLP.

Deep learning with TensorFlow and Keras

174

Pretrained word embeddings

Pre-trained embeddings are embeddings learned from one


problem and are subsequently used to solve a different but

similar task. A word embedding is a learned representation of

textual data in a vector space.

In word embeddings, words with the same meaning have

similar representations. Therefore, you can use word

embeddings from a different task on a new but similar task.

Training word embeddings on large datasets can take a lot of

time besides consuming a lot of resources. Hence the need for

using pre-trained word embeddings.

Some of the popular pre-trained word embeddings include:

1. Google’s Word2vec

2. Stanford’s GloVe

In this guide, we will demonstrate the application of

Stanford’s GloVe in NLP problems and, more specifically,

detecting sentiments.

Loading the dataset

First, let us acquire the sentiment dataset:

Deep learning with TensorFlow and Keras

175
!wget !wget https://archive.ics.uci.edu/ml/machine-learning-dat

abases/00462/drugsCom_raw.zip

!unzip drugsCom_raw.zip

Next, import the packages used in this task.

import pandas as pd

import tensorflow as tf

from tensorflow.keras.preprocessing.text import Tokenizer

from tensorflow.keras.preprocessing.sequence import pad_sequenc

es

from tensorflow.keras.layers import Embedding, LSTM, Dense, Bid

irectional, Dropout, SpatialDropout1D, GlobalAveragePooling1D

from tensorflow.keras.models import Sequential

import numpy as np

from sklearn.model_selection import train_test_split

import re

from tensorflow.keras.utils import to_categorical


Load the obtained data using Pandas and select the relevant variables for
the task, i.e., review and sentiment category.

#read the data

df = pd.read_csv('drugsComTrain_raw.tsv', sep='\t')

Deep learning with TensorFlow and Keras

176

#create sentiment column

df['category'] = [1 if int(x)>5 else 0 for x in df['rating']]

#get relevant variables

df = df[['review', 'category']].copy()

df.head()

The objective is to use learned embeddings to predict the

category of the sentiments (ratings) of the reviews by various

drug users. We will split the data and use 70% for training and

the rest for model evaluation and testing.

First, let us pre-process the data.

Data pre-processing
Data pre-processing facilitates extracting the most relevant

features from textual data, a process essential in all machine

learning workflow regardless of the data type.

Vectorizing the words

The next step is to map the text data to a corresponding vector

of real numbers. The resulting units are called token indices.

Deep learning with TensorFlow and Keras

177

These are numerical representations of the text data,

i.e., reviews in this task.

Create a vectorization layer containing the vectors and the

resulting vocabulary that will be used to generate a word index

for the embedding matrix.

A quick examination of the data reveals that the sentences

have varying lengths. This will be handled by padding the

sentences to the same length max_length during vectorization.


df['words in sentence'] = [len(item.split()) for item in df.rev

iew]

df.head()

Create a vectorization layer that generates integer

representations of the reviews.

import tensorflow as tf

max_features = 10000 # Maximum vocabulary size.

max_len = 100 # Sequence length to pad the outputs to.

vectorize_layer = tf.keras.layers.TextVectorization(standardize

='lower_and_strip_punctuation',max_tokens=max_features,output_m

Deep learning with TensorFlow and Keras

178

ode='int',output_sequence_length=max_len)

vectorize_layer.adapt(list((df['review'].values)),batch_size=No

ne)
Below is a sample integer representation of the first sentence in

the train data

X_train[0]

Apply the vectorization layer to train and test sets

#split the data into train and test sets

from sklearn.model_selection import train_test_split

X_t = list((df['review'].values))

X_train, X_test , y_train, y_test = train_test_split(X_t, y , t

est_size = 0.30)

#apply cetorization layer to train and test

X_train = vectorize_layer(X_train)

X_test = vectorize_layer(X_test)

Using GloVe Embeddings

Deep learning with TensorFlow and Keras

179

GloVe embeddings are generally used in NLP tasks. These

embeddings can be obtained as shown below.

#download glove embeddings

# download glove
!wget http://nlp.stanford.edu/data/glove.6B.zip

# unzip it in Notebook

!unzip glove*.zip

Unzip the embeddings into your workspace.

# unzip it in Notebook

!unzip glove*.zip

Create your embedding index using the collected word

embeddings.

#load your embeddings

embeddings_index = {}

emb = open('glove.6B.100d.txt')

for sentence in emb:

values = sentence.split()

word = values[0]

coefs = np.asarray(values[1:], dtype='float32')

embeddings_index[word] = coefs

emb.close()

print('There are %s word vectors.' % len(embeddings_index))

Deep learning with TensorFlow and Keras

180
Create the embedding layer

First, create a custom word_index from the vectorizer's

vocabulary.

#get vocabulary

voc = vectorize_layer.get_vocabulary()

#create a word index

word_index = dict(zip(voc, range(len(voc))))

Using the resulting dictionary, create an embedding matrix for

all the words in the training set using the embedding_index .

Deep learning with TensorFlow and Keras


181

num_tokens = len(voc) + 2

embedding_dim = 100

hits = 0

misses = 0

# Prepare embedding matrix

embedding_matrix = np.zeros((num_tokens, embedding_dim))

for word, i in word_index.items():

embedding_vector = embeddings_index.get(word)

if embedding_vector is not None:

# Words not found in embedding index will be all-zeros.

embedding_matrix[i] = embedding_vector

hits += 1

else:

misses += 1

Now you can define an embedding layer using the pre-trained

embeddings by applying the following settings:

input_dim : Size of the vocabulary.

embeddings_initializer : The embedding matrix that you

defined.
output_dim : Length of the vector for each word.

trainable : Set to false to avoid losing the information in

the pre-trained embedding.

from tensorflow.keras.layers import Embedding

from tensorflow import keras

embedding_layer = Embedding(

input_dim = num_tokens,

output_dim = embedding_dim,

embeddings_initializer=keras.initializers.Constant(embeddin

g_matrix),

Deep learning with TensorFlow and Keras

182

trainable=False,

Create the model

You can define your model using the resulting embedding layer.

We use the Bidirectional LSTMs to pass information backward

and forward.

Next, create the model.

# define model
from tensorflow.keras.layers import Flatten

model = Sequential()

vocab_size = 10002

#use the embedding_matrix

e = Embedding(vocab_size, 100, weights=[embedding_matrix], inpu

t_length=100, trainable=False)

model.add(e)

model.add(Bidirectional(LSTM(10, return_sequences=True, dropout

=0.1, recurrent_dropout=0.1)))

model.add(Flatten())

model.add(Dense(2, activation='sigmoid'))

# compile the model

model.compile(optimizer='adam', loss='binary_crossentropy', met

rics=['accuracy'])

# summarize the model

print(model.summary())

Deep learning with TensorFlow and Keras

183
Training the model

Time to compile and train the model. You can use your

preferred metric to evaluate the performance of the model. In

this implementation, we are using the accuracy metric.

After compiling, we use callbacks to monitor the model's

performance and stop the training process if it fails to improve

after three epochs.

%load_ext tensorboard

!rm -rf embed_logs

log_folder = 'embed_logs'

from tensorflow.keras.callbacks import EarlyStopping, TensorBoa

rd

#apply callbacks
callbacks = [

EarlyStopping(patience = 3),

TensorBoard(log_dir=log_folder)

#compile

model.compile(loss='binary_crossentropy',optimizer='adam',metri

cs=['accuracy'])

Deep learning with TensorFlow and Keras

184

num_epochs = 10

history = model.fit(X_train, y_train, epochs=num_epochs, valida

tion_data=(X_test, y_test),callbacks=callbacks, batch_size = 25

60)
The model begins with an accuracy of 70.63% on the validation

and attains an accuracy of approximately 79.21% on the test

set, respectively, which is arguably a decent performance since

we did not have to train the embeddings from scratch.

You can visualize your events using TensorBoard, as shown

below.

%reload_ext tensorboard

%tensorboard --logdir {'embed_logs/'}

Deep learning with TensorFlow and Keras

185

Final thoughts

With the advent of technologies like big data and deep learning,
there is a pressing need to adopt new methodologies that

facilitate the fast development of optimally performing models.

Transfer learning, as demonstrated in this guide, provides the

right resources to implement better models in image

classification and natural language processing tasks with

relative ease compared to training the models from scratch. In

particular, you have learned:

What is transfer learning?

Types of transfer learning.

Steps of implementing transfer learning.

Where to find transfer learning models.

Deep learning with TensorFlow and Keras

186

How to use pre-trained models for transfer learning.

How to implement transfer learning in an image

classification task.

Using transfer learning to solve a natural language

processing task.

TensorBoard

TensorBoard is a visualization library that enables data science


practitioners to visualize various aspects of their machine

learning modeling. For instance, you can use TensorBoard to:

Visualize the performance of the model.

Tuning model parameters.

Profile the executions of the program. For example, check

the utilization of GPUs.

Debug machine learning code.

TensorBoard can be used with various machine learning

libraries such as TensorFlow, PyTorch, Flax, and XGBoost.

Let's dive in and see how to use TensorBoard with all these

packages.

Advantages of using

Tensorboard

Deep learning with TensorFlow and Keras

187

The main advantages of using Tensorboard include:

1. Allows data scientists to visualize the construction of

neural networks, thus driving better problem-solving.

2. Enables tracking of the performance of machine learning

models using metrics such as accuracy and log loss on


training or validation sets.

3. Easy debugging of the neural nodes.

How to use TensorBoard

Let's look at how you can start using TensorBoard.

How to install TensorBoard

To get started, install TensorBoard, which can be done

using pip or conda .

PIP installation

Run the following command on the terminal or command

prompt:

pip install tensorboard

Alternatively, in Jupyter Notebook:

!pip install tensorboard

Deep learning with TensorFlow and Keras

188

Conda installation

Open the Anaconda command prompt and run any of the

following commands:

conda install tensorboard

Docker installation
If you use a Docker image of the Jupyter Notebook server,

expose the notebook's and TensorBoard's ports. To do so, run

the following command:

docker run -it -p 8888:8888 -p 6006:6006 \ tensorflow/tensorflo

w:nightly-py3-jupyter

Using TensorBoard with

Jupyter notebooks and Google

Colab

To install Jupyter Notebook, either install it using Anaconda or

through pip:

pip install notebook

After installing the Jupyter Notebook, start an instance of a

notebook:

Deep learning with TensorFlow and Keras

189

jupyter notebook

If you prefer using Google Colab, go

to https://colab.research.google.com/ and create a new notebook instance.

You are now set to use TensorBoard. Run the following

command in a notebook instance (Jupyter or Google Colab):


%load_ext tensorboard

To reload a TensorBoard that had been previously loaded, run:

%reload_ext tensorboard

Next, set the log directory where all the logs will be stored.

Logs refer to the data that will be used to generate

visualizations. If you are using Jupyter Notebook in a Linux

distribution, remove the existing logs:

rm -rf ./logs/

If you are using Google Colab.

!rm -rf ./logs/

Deep learning with TensorFlow and Keras

190

For users running TensorBoard from a Jupyter Notebook on a

Windows machine, run the following code:

#rm -rf ./logs/

#for windows

import shutil

try:

shutil.rmtree('logs')

except:
pass

#for windows

import shutil

try:

shutil.rmtree('logsx')

except:

pass

Now create a directory where you can store the logs.

log_dir = "logs/model_fit/" + datetime.datetime.now().strftime

("%Y%m%d-%H%M%S")

Adding a datetime enables the storage and comparison of logs

at different run times.

How to run TensorBoard

To demonstrate model visualization in Tensorboard, including

metrics, consider the iris data classification problem, which

involves classifying iris plants into three classes .

Deep learning with TensorFlow and Keras

191

First, load the TensorBoard extension:

%load_ext tensorboard
Then define the model:

import datetime

from sklearn.preprocessing import normalize

import numpy as np

from sklearn import datasets

import tensorflow as tf

#LOaD DATA

iris = datasets.load_iris()

X = iris.data

y = iris.target

#normalize

X = normalize(X,axis = 0)

#Neural network module

from keras.models import Sequential

import keras

from keras.layers import Dense,Activation,Dropout

import tensorflow

from tensorflow.keras.layers import BatchNormalization

from keras.utils import np_utils

import os
from sklearn.model_selection import train_test_split

# Load the iris dataset

iris = datasets.load_iris()

X = iris.data

y = iris.target

# Create training and test split

'''

70% -- train y

30% -- test y

Deep learning with TensorFlow and Keras

192

'''

X_train, X_test, y_train, y_test = train_test_split(X, y, test_

size=0.3, stratify=y, random_state=42)

X_train1, X_test1, y_train1, y_test1 = X_train, X_test, y_trai

n, y_test

# Create categorical labels

y_train = np_utils.to_categorical(y_train)

y_test = np_utils.to_categorical(y_test)

def create_model():
# Create the model

model = keras.models.Sequential()

model.add(Dense(512, activation='relu', input_shape=(4,)))

model.add(Dense(3, activation='softmax'))

# Compile the model

model.compile(optimizer='rmsprop',

loss='categorical_crossentropy',

metrics=['accuracy'])

return model

create_model()

Create and compile the model.

logdir = os.path.join("logs", datetime.datetime.now().strftime

("%Y%m%d-%H%M%S"))

tensorboard_callback = tf.keras.callbacks.TensorBoard(logdir, h

istogram_freq=1)

def train_model():

'''

utility function for training the model

'''

model = create_model()
tensorboard_callback = tf.keras.callbacks.TensorBoard(logdi

r, histogram_freq=1)

Deep learning with TensorFlow and Keras

193

model.fit(x=X_train,

y=y_train,

epochs=10,

validation_data=(X_test, y_test), callbacks = tenso

rboard_callback)

# Get the accuracy of test data set

test_loss, test_acc = model.evaluate(X_test, y_test)

# Print the test accuracy

print('Test Accuracy: ', test_acc, '\nTest Loss: ', test_lo

ss)

tf.debugging.experimental.enable_dump_debug_info(

"/tmp/tfdbg2_logdir",
tensor_debug_mode="FULL_HEALTH",

circular_buffer_size=-1)

#train the model

train_model()

Now load the TensorBoard notebook extension and define a

variable log_folder that points to the logs folder that you had

created.

%load_ext tensorboard log_folder = 'logs'

How to use TensorBoard

callback

A callback is an object that carries out operations over various

stages of training, such as:

Deep learning with TensorFlow and Keras

194

At the end of an epoch.

Before or after a specified number of batches.

on_batch_begin – when a batch begins.

on_batch_end – when a batch ends.

on_train_begin – when training begins.

on_train_end – when training ends.


TensorBoard callback creates a log for the TensorBoard,

including:

Plots summarizing metrics.

Training graph visualization.

Weight histograms.

Sampled profiling.

When used in Model.evaluate , additional components apart from

epochs, there will be summaries that show the distribution of

evaluation metrics vs Model.optimizer.iterations . Metrics are

prepended with the corresponding evaluation ,

with model.optimizer.iterations being the step in the visualized

TensorBoard.

Import TensorBoard.

from tensorflow.keras.callbacks import TensorBoard

Deep learning with TensorFlow and Keras

195

Create the TensorBoard callback.

logdir = os.path.join("logs", datetime.datetime.now().strftime

("%Y%m%d-%H%M%S"))

tensorboard_callback = tf.keras.callbacks.TensorBoard(logdir, h
istogram_freq = 1, write_graph = False,write_images = False)

You can include other parameters such as:

write_graph – specifies whether to visualize a graph in

TensorBoard. It results in a larger log file if set to True .

write_images – specifies whether to write model weights to

visualize as an image in TensorBoard.

write_steps_per_second – specifies whether to log training

steps per second into TensorBoard. Can be used with

either epoch or batch frequency logging.

update_freq – batch or epoch or integer . When

using batch , losses and metrics are written to TensorBoard

after each batch. Similar to when epoch is specified. If you

specify integer , for example, 1000, the metrics and losses

are saved to TensorBoard every 1000 batches.

profile_batch – a non-negative integer or tuple of integers

that profiles a batch(es) to sample compute characteristics.

Profiling is disabled by default.

embeddings_freq – frequency (in epochs) at which

embedding layers are visualized. If set to 0, there is no

visualization of the embeddings.


Deep learning with TensorFlow and Keras

196

embeddings_metadata – a dictionary that maps embedding

layer names to the filename where the metadata for the

embedding layer is saved. A single filename can be passed

if the same metadata file is used for all embedding layers.

histogram_freq – the frequency at which to compute

activation and weight histograms for layers of the trained

model.

The next step involves compiling and fitting the model using the

callbacks, which will store information in the logs.

from tensorflow import keras

logdir = "logs/scalars/" + datetime.datetime.now().strftime("%

Y%m%d-%H%M%S")

file_writer = tf.summary.create_file_writer(logdir + "/metric

s")

file_writer.set_as_default()

tensorboard_callback = keras.callbacks.TensorBoard(log_dir=logd

ir, histogram_freq=1)

def train_model():
'''

utility function for training the model

'''

model = create_model()

tensorboard_callback = tf.keras.callbacks.TensorBoard(logdi

r, histogram_freq=1)

model.fit(x=X_train,

y=y_train,

epochs=20,

validation_data=(X_test, y_test), callbacks = [tens

orboard_callback])

Deep learning with TensorFlow and Keras

197
# Get the accuracy of test data set

test_loss, test_acc = model.evaluate(X_test, y_test)

# Print the test accuracy

print('Test Accuracy: ', test_acc, '\nTest Loss: ', test_lo

ss)

train_model()
Once the model is trained, the next step is to visualize it using

TensorBoard. For that, data from the logs stored through

callbacks will be used.

How to launch TensorBoard

Deep learning with TensorFlow and Keras

198

You can launch the TensorBoard extension via the command

prompt.

tensorboard --logdir logs

Alternatively, you can launch TensorBoard in Jupyter Notebook

or Google Colab:

%tensorboard --logdir logs

The TensorBoard can also be accessed through the local

host http://localhost:6006 or http://127.0.0.1:6006/

If you have set everything right, you will see a window with

interactive functionality like the one shown below.

Deep learning with TensorFlow and Keras

199
Running TensorBoard remotely

It is common practice to experiment remotely on a server with

GPUs, especially when the Tensorflow model requires a lot of

computational resources. To use TensorBoard on a remote

server:

1. Initiate an SSH to access the TensorBoard web user

interface. On the command prompt, run:

ssh -L 6006:127.0.0.1:6006 username@server_ip


Deep learning with TensorFlow and Keras

200

If you are using PuTTY , you will need to replace ssh in the command with
PuTTY to create an ssh tunnel on port 6006 from

the local machine to port 6006 on the server that you

connected to with SSH. The tunnel you have created will stay

open while the SSH connection is active.

2. Next, from the browser, you can access TensorBoard

through http://localhost:6006 or http://127.0.0.1:6006/

However, sometimes you need to contact the server and then

use the contact to connect to the server GPU. In such a case,

you will add an extra step to the transfer port:

1. Transfer port from contact server to the local machine using

SSH. In your local machine:

ssh -L 6006:127.0.0.1:6006 username@contact_server_ip

2. Transfer the port from the GPU server to the contact. Your

server:

ssh -L 6006:127.0.0.1:6006 username@GPU_server_ip

3. Now start the TensorBoard on the GPU server.

tensorboard --logdir = './tensorboard_dirs' --port = 6006


TensorBoard dashboards

Deep learning with TensorFlow and Keras

201

As shown in the Sample dashboard earlier, various

components are included in a single dashboard. These

components include:

TensorBoard scalars.

Images.

Graphs.

Distributions.

Histograms.

Fairness indicators.

What-If Tool (WIT).

Each of these components provides information regarding the

model, as illustrated below.

TensorBoard scalars

The TensorBoard scalars dashboard visualizes scalar

statistics such as classification accuracy, model loss, or

learning rate.

from tensorflow import keras


logdir = "logs/scalars/" + datetime.datetime.now().strftime("%

Y%m%d-%H%M%S")

file_writer = tf.summary.create_file_writer(logdir + "/metric

s")

file_writer.set_as_default()

Deep learning with TensorFlow and Keras

202

tensorboard_callback = keras.callbacks.TensorBoard(log_dir=logd

ir, histogram_freq=1)

def train_model():

'''

utility function for training the model

'''

model = create_model()

tensorboard_callback = tf.keras.callbacks.TensorBoard(logdi

r, histogram_freq=1)

model.fit(x=X_train,

y=y_train,

epochs=20,

validation_data=(X_test, y_test), callbacks = [tens


orboard_callback])

# Get the accuracy of test data set

test_loss, test_acc = model.evaluate(X_test, y_test)

# Print the test accuracy

print('Test Accuracy: ', test_acc, '\nTest Loss: ', test_lo

ss)

train_model()

Load TensorBoard.

%tensorboard --logdir logs/scalars

Deep learning with TensorFlow and Keras

203
You can also include custom scalars. For instance, if you want

to have a custom learning rate that decreases as epochs

increase, you can define a function as shown below.

def lr_schedule(epoch):

"""

Returns a custom learning rate that decreases as epochs pro

gress.

"""
learning_rate = 0.2

if epoch > 10:

learning_rate = 0.02

if epoch > 20:

learning_rate = 0.01

if epoch > 50:

Deep learning with TensorFlow and Keras

204

learning_rate = 0.005

tf.summary.scalar('learning rate', data=learning_rate, step

=epoch)

return learning_rate

lr_callback = keras.callbacks.LearningRateScheduler(lr_schedul

e)

tensorboard_callback = keras.callbacks.TensorBoard(log_dir=logd

ir)

def train_model(epochs = 20):

'''

utility function for training the model

'''
model = create_model()

tensorboard_callback = tf.keras.callbacks.TensorBoard(logdi

r, histogram_freq=1)

model.fit(x=X_train,

y=y_train,

epochs=epochs,

validation_data=(X_test, y_test), callbacks = [tens

orboard_callback, lr_callback])

# Get the accuracy of test data set

test_loss, test_acc = model.evaluate(X_test, y_test)

# Print the test accuracy

print('Test Accuracy: ', test_acc, '\nTest Loss: ', test_lo

ss)

train_model(4)

Next, load TensorBoard.

Deep learning with TensorFlow and Keras


205

%tensorboard --logdir logs/scalars

Notice that now you have a new scalar output– learning rate

TensorBoard images

Tensorboard allows you to display images using

tf.summary and tf.summary.image . Consider the case of the

popular MNIST dataset. You can display the image as shown

below.
Deep learning with TensorFlow and Keras

206

#import libraries

import itertools

import datetime

import io

import tensorflow as tf

from tensorflow import keras

import matplotlib.pyplot as plt

import numpy as np

import sklearn.metrics

import shutil

try:

shutil.rmtree('logsx')

except:

pass

fashion_mnist = keras.datasets.fashion_mnist

(train_images, train_labels), (test_images, test_labels) = fash

ion_mnist.load_data()

# Names of the integer classes


class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress',

'Coat',

'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']

# Reshape the image for the Summary API.

img = np.reshape(train_images[100], (-1, 28, 28, 1))

# Sets up a timestamped log/images directory.

logdir = "logsx/images/" + datetime.datetime.now().strftime("%

Y%m%d-%H%M%S")

# Creates a file writer for the log directory.

file_writer = tf.summary.create_file_writer(logdir)

# Using the file writer, log the reshaped image

with file_writer.as_default():

tf.summary.image("Image data", img, step=0)

Deep learning with TensorFlow and Keras

207
%reload_ext tensorboard

%tensorboard --logdir logsx/images

You can also display multiple images using the max_outputs .

The max_outputs argument specifies the number of images you

want to visualize.

with file_writer.as_default():

# Don't forget to reshape.

images = np.reshape(train_images[50:53], (-1, 28, 28, 1))

tf.summary.image("Plotting multiple images", images, max_ou


tputs=3, step=0)

Deep learning with TensorFlow and Keras

208

%tensorboard --logdir logsx/images

TensorBoard graphs

The graphing component of TensorBoard can be helpful in

model debugging. To see the graph from TensorBoard, click on

the GRAPHS tab in the upper pane. From the upper left corner,

select your preferred run. You can view the model and align it
with your desired design.

Deep learning with TensorFlow and Keras

209

You will notice options like an op-level graph that gives you

insight into how to change your model. Turning on the trace

inputs node option shows the upstream dependencies of that

node.

TensorBoard distributions

Deep neural network models(DNN) are made up of many

layers. Each layer of a DNN comprises biases and weights.

Distributions display the distribution of the biases and weights.

Deep learning with TensorFlow and Keras


210

TensorBoard histograms

TensorBoard histograms are a collection of values

aggregated by frequency. TensorBoard histograms visualize

weights over time. Hence, they help establish whether there is

something wrong with weights initialization or the learning rate.

Histograms are located in the HISTOGRAM tab.

Deep learning with TensorFlow and Keras

211
You can specify the histogram mode as either OVERLAY :
Deep learning with TensorFlow and Keras

212

Overlay histogram mode

or OFFSET histogram:

Offset histogram mode

As shown, Histograms display similar information as

the Distributions but as a 3-D histogram changing across

various iterations.

Fairness indicators
Deep learning with TensorFlow and Keras

213

Regardless of how much care has been taken during the model

implementation and evaluation process, bias can happen at

various stages in the model pipeline.

Therefore, it is essential to evaluate the model for human bias

across all the steps. In Tensorboard, the Fairness

Indicators enable developers to evaluate fairness metrics,

such as False Positive Rate (FPR) and False Negative Rate

(FNR), for binary and multi-class classification and regression

models.

Install the Fairness Indicators plugin:

pip install --upgrade pip pip install fairness_indicators pip i

nstall tensorboard-plugin-fairness-indicators

You will need to restart the kernel for the plugin to be included

in TensorBoard. The Fairness Indicators widget can be

accessed from the dialog box:

Deep learning with TensorFlow and Keras

214
What-If Tool (WIT)

When building machine learning models, developers are often

concerned with understanding when the model underperforms

or performs well. The What-If Tool (WIT) comes in handy when

you are interested in:

Counterfactual reasoning.

Investigating decision boundaries.

Deep learning with TensorFlow and Keras

215
Explore how general changes to data points affect

predictions.

Simulating various realities to determine how a

model behaves from the tool's widget visual interface.

In Tensorboard, the What-If Tool can be configured from the

dialog box. After opening the What If widget, you need to

provide:

The host and port of the model server.

The name of the model being served.

The type of model.

The path to where you stored the TFRecords file to load.

Next, click Accept. The tool will do the rest and return the

results.

Deep learning with TensorFlow and Keras

216
Displaying data in TensorBoard

Various data formats are supported for logging and

visualization in Tensorboard, including scalars, images, audio,

histograms, and graphs.

Using the TensorBoard

embedding projector

TensorBoard's projector facilitates easy interpretation and

understanding of embeddings. By visualizing the high-

dimensional embeddings, you understand the connection of


Deep learning with TensorFlow and Keras

217

embedding layers. This guide will consider a simple example of vectors and
metadata. You will use the SummaryWriter to write

the embedding by creating an instance and adding an

embedding.

Delete previous logs.

!rm -rf runs

Create some vectors and metadata.

%load_ext tensorboard

import numpy as np

import tensorflow as tf

import tensorboard as tb

tf.io.gfile = tb.compat.tensorflow_stub.io.gfile

#install pytorch

#!pip install torch

from torch.utils.tensorboard import SummaryWriter

vectors = np.array([[0,0,1], [0,1,0], [1,0,0], [1,1,1], [1,0,

1]])

metadata = ['001', '010', '100', '111', '101'] # labels


writer = SummaryWriter()

writer.add_embedding(vectors, metadata)

writer.close()

%tensorboard --logdir=runs

Load the TensorBoard dashboard and navigate to

the Projector window.

Deep learning with TensorFlow and Keras

218

%tensorboard --logdir logs/train_data


Plot training examples with

TensorBoard

Before fitting the training model, you can visualize training data

as shown below.

from tensorflow import keras

#clear previous logs

!rm -rf logs/train_data

Deep learning with TensorFlow and Keras

219

# Download the mnist data. The data is already divided into tra in and test.

# The labels are integers representing classes.

handwriting_mnist = keras.datasets.mnist

(train_images, train_labels), (test_images, test_labels) = \

handwriting_mnist.load_data()

logdir = "logs/train_data/"

file_writer = tf.summary.create_file_writer(logdir)

import numpy as np

with file_writer.as_default():

images = np.reshape(train_images[50:53], (-1, 28, 28, 1))

tf.summary.image("3 Digits", images, max_outputs=3, step=0)


Load TensorBoard.

%tensorboard --logdir logs/train_data

Deep learning with TensorFlow and Keras

220

Visualize images in

TensorBoard

Instead of tensors, you might consider plotting arbitrary images

in TensorBoard. To demonstrate this, consider the MNIST

dataset.
# Clear out prior logging data.

!rm -rf logs/plots

logdir = "logs/plots/" + datetime.datetime.now().strftime("%Y%

m%d-%H%M%S")

file_writer = tf.summary.create_file_writer(logdir)

Deep learning with TensorFlow and Keras

221

# class names

class_names = ['0', '1', '2', '3', '4', '5', '6', '7', '8',

'9']

def plot_to_image(figure):

"""Converts the matplotlib plot specified by 'figure' to a

PNG image and

returns it. The supplied figure is closed and inaccessibl

e after this call."""

# Save the plot to a PNG in memory.

buf = io.BytesIO()

plt.savefig(buf, format='png')

# Closing the figure prevents it from being displayed direc

tly inside
# the notebook.

plt.close(figure)

buf.seek(0)

# Convert PNG buffer to TF image

image = tf.image.decode_png(buf.getvalue(), channels=4)

# Add the batch dimension

image = tf.expand_dims(image, 0)

return image

def image_grid():

"""

Return a 5x5 grid of the MNIST images as a matplotlib fig

ure.

"""

# Create a figure to contain the plot.

figure = plt.figure(figsize=(10,10))

for i in range(25):

# Start next subplot.

plt.subplot(5, 5, i + 1, title=class_names[train_labels

[i]])

plt.xticks([])
plt.yticks([])

plt.grid(False)

plt.imshow(train_images[i], cmap=plt.cm.binary)

return figure

Deep learning with TensorFlow and Keras

222

# Prepare the plot

figure = image_grid()

# Convert to image and log

with file_writer.as_default():

tf.summary.image("Image data", plot_to_image(figure), step=


0)

%tensorboard --logdir logs/plots

Displaying text data in

TensorBoard

Using the TensorFlow Text Summary API, you can log textual

data and visualize it in TensorBoard.

Deep learning with TensorFlow and Keras

223

#define text to log

your_text = "This is some text in TensorBoard!"

# Remove prior log data.

!rm -rf logs

# Sets up a timestamped log directory.

logdir = "logs/text_basics/" + datetime.datetime.now().strftime

("%Y%m%d-%H%M%S")

#log the writer to the logs directory.

file_writer = tf.summary.create_file_writer(logdir)

# Using the file writer, log the text.

with file_writer.as_default():

tf.summary.text("TensorBoard Text", your_text, step=0)


Reload TensorBoard from the logs in the logs directory.

%tensorboard --logdir logs

Deep learning with TensorFlow and Keras

224

Log confusion matrix to

TensorBoard

You can log a confusion matrix and display the results as

images. Sticking to the MNIST fashion dataset, log the


confusion matrix as follows.

Clear previous logs

!rm -rf logs

Download and prepare the data.

Deep learning with TensorFlow and Keras

225

#Importing Dataset

# downloading the dataset

fashion_mnist = keras.datasets.fashion_mnist

(train_images, train_labels), (test_images, test_labels) = fash

ion_mnist.load_data()

# all the classes

class_names = ('T-shirt/top', 'Trouser', 'Pullover', 'Dress',

'Coat',

'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle Boot')

#train model

model = keras.models.Sequential([

keras.layers.Flatten(input_shape=(28, 28)),

keras.layers.Dense(512, activation='relu'),

keras.layers.Dense(256, activation='relu'),
keras.layers.Dense(128, activation='relu'),

keras.layers.Dense(64, activation='relu'),

keras.layers.Dense(32, activation='relu'),

keras.layers.Dense(10, activation='softmax')

])

model.compile(loss='sparse_categorical_crossentropy', optimizer

='adam', metrics=['accuracy'])

Create the function to log the confusion matrix using

the LambdaCallback .

from tensorflow import keras

# Clearing out prior logging data.

!rm -rf logs/image

def plot_confusion_matrix(cm, class_names):

figure = plt.figure(figsize=(8, 8))

plt.imshow(cm, interpolation='nearest', cmap=plt.cm.Blues)

plt.title("Confusion Matrix of the Results")

plt.colorbar()

tick_marks = np.arange(len(class_names))

plt.xticks(tick_marks, class_names, rotation=90)

plt.yticks(tick_marks, class_names)
Deep learning with TensorFlow and Keras

226

labels = np.around(cm.astype('float') / cm.sum(axis=1)[:, n p.newaxis],


decimals=2)

threshold = cm.max() / 2.

for i, j in itertools.product(range(cm.shape[0]), range(cm.

shape[1])):

color = "white" if cm[i, j] > threshold else "black"

plt.text(j, i, labels[i, j], horizontalalignment="cente

r", color=color)

plt.tight_layout()

plt.ylabel('Real Class')

plt.xlabel('Predicted Class')

return figure

logdir = "logs/image/" + datetime.datetime.now().strftime("%Y%

m%d-%H%M%S")

# Defining the basic TensorBoard callback.

tensorboard_callback = keras.callbacks.TensorBoard(log_dir=logd

ir)

file_writer_cm = tf.summary.create_file_writer(logdir + '/cm')


def log_confusion_matrix(epoch, logs):

# Using the model to predict the values from the validation

dataset.

test_pred_raw = model.predict(test_images)

test_pred = np.argmax(test_pred_raw, axis=1)

# Calculating the confusion matrix.

cm = sklearn.metrics.confusion_matrix(test_labels, test_pre

d)

figure = plot_confusion_matrix(cm, class_names=class_names)

cm_image = plot_to_image(figure)

with file_writer_cm.as_default():

tf.summary.image("Confusion Matrix", cm_image, step=epo

ch)

# Defining the per-epoch callback.

cm_callback = keras.callbacks.LambdaCallback(on_epoch_end=log_c

onfusion_matrix)

Deep learning with TensorFlow and Keras

227

Train the model with the TensorFlow callback.

# Training the classifier.


model.fit(

train_images,

train_labels,

epochs=2,

verbose=0,

callbacks=[tensorboard_callback, cm_callback],

validation_data=(test_images, test_labels),

Load TensorBoard with the confusion matrix logs.

# Starting TensorBoard.

%tensorboard --logdir logs/image

Deep learning with TensorFlow and Keras

228
Hyperparameter tuning with

TensorBoard

Models are built with hyperparameters that influence the

functionality of the model. You select the hyperparameters for

optimization during modeling before settling for the 'best'

model.

Some of these hyperparameters include number of

epochs , dropout rate , or learning rate . Optimizing the selected


hyperparameter is known as hyperparameter

Deep learning with TensorFlow and Keras

229

optimization or tuning. The goal is to improve the performance of the


model.

To conduct hyperparameter tuning in TensorBoard, use

the hparams plugin from Tensorboard. Consider the iris data

classification problem.

Clear earlier logs.

#for kali

#rm -rf ./logs/

#for windows

import shutil

try:

shutil.rmtree('logs')

except:

pass

#for windows

import shutil

try:
shutil.rmtree('logsx')

except:

pass

Reload TensorBoard.

%reload_ext tensorboard

Define the hyperparameters you want to optimize and the data

to train the model.

Deep learning with TensorFlow and Keras

230

## Create hyperparameters

HP_NUM_UNITS=hp.HParam('num_units', hp.Discrete([ 5, 10]))

HP_DROPOUT=hp.HParam('dropout', hp.RealInterval(0.1, 0.2))

HP_LEARNING_RATE= hp.HParam('learning_rate', hp.Discrete([0.00

1, 0.0005, 0.0001]))

HP_OPTIMIZER=hp.HParam('optimizer', hp.Discrete(['adam', 'sgd',

'rmsprop']))

METRIC_ACCURACY='accuracy'

Set configuration files and store them in the logs directory.

'''Set configuration log files'''

log_dir ='logs/fit/' + datetime.datetime.now().strftime('%Y%m%d


-%H%M%S')

with tf.summary.create_file_writer(log_dir).as_default():

hp.hparams_config(

hparams=

[HP_NUM_UNITS, HP_DROPOUT, HP_OPTIMIZER,


HP_LEARNING_RAT

E],

metrics=[hp.Metric(METRIC_ACCURACY, display_name='Accurac

y')],

Fit the models and include the log for metrics and

hyperparameters.

def create_model(hparams):

# Create the model

model = keras.models.Sequential()

model.add(Dense(512, activation='relu', input_shape=(4,)))

model.add(Dense(3, activation='softmax'))

#setting the optimizer and learning rate

optimizer = hparams[HP_OPTIMIZER]

learning_rate = hparams[HP_LEARNING_RATE]
if optimizer == "adam":

Deep learning with TensorFlow and Keras

231

optimizer = tf.optimizers.Adam(learning_rate=learning_r ate)

elif optimizer == "sgd":

optimizer = tf.optimizers.SGD(learning_rate=learning_ra

te)

elif optimizer=='rmsprop':

optimizer = tf.optimizers.RMSprop(learning_rate=learnin

g_rate)

else:

raise ValueError("unexpected optimizer name: %r" % (opt

imizer_name,))

# Comiple the mode with the optimizer and learninf rate spe

cified in hparams

model.compile(optimizer=optimizer,

loss='categorical_crossentropy',

metrics=['accuracy'])

#Fit the model

model.fit(X_train, y_train, epochs=1, callbacks=[


tf.keras.callbacks.TensorBoard(log_dir), # log metrics

hp.KerasCallback(log_dir, hparams),# log hparams

]) # Run with 1 epoch to speed things up for demo purposes

_, accuracy = model.evaluate(X_test, y_test)

return accuracy

def run(run_dir, hparams):

with tf.summary.create_file_writer(run_dir).as_default():

hp.hparams(hparams) # record the values used in this trial

accuracy = create_model(hparams)

#converting to tf scalar

accuracy= tf.reshape(tf.convert_to_tensor(accuracy), []).nu

mpy()

tf.summary.scalar(METRIC_ACCURACY, accuracy, step=1)

All that remains is running the experiments and logging the

metrics and the hyperparameters.

Deep learning with TensorFlow and Keras

232

session_num = 0

for num_units in HP_NUM_UNITS.domain.values:

for dropout_rate in (HP_DROPOUT.domain.min_value, HP_DROPOUT.


domain.max_value):

for optimizer in HP_OPTIMIZER.domain.values:

for learning_rate in HP_LEARNING_RATE.domain.values:

hparams = {

HP_NUM_UNITS: num_units,

HP_DROPOUT: dropout_rate,

HP_OPTIMIZER: optimizer,

HP_LEARNING_RATE: learning_rate,

run_name = "run-%d" % session_num

print('--- Starting trial: %s' % run_name)

print({h.name: hparams[h] for h in hparams})

run('logs/hparam_tuning/' + run_name, hparams)

session_num += 1

Now load TensorBoard. All the model runs and their

performance can be accessed from the HPARAMS tab in the

upper pane.

%tensorboard --logdir logs/hparam_tuning

You can view the results from Table View which shows the

experiment runs. Each row shows the value of the underlying


hyper-parameter that was being optimized and the

corresponding accuracy.

Deep learning with TensorFlow and Keras

233

You can also view the results as Parallel Coordinates

View which shows each experiment as a line moving through an

axis for each hyper-parameter and the accuracy metric. You

can hover over a coordinate to view the hyper-parameters and

the accuracy metric.


Deep learning with TensorFlow and Keras

234

The Scatter Plot View shows the distribution of the hyper-

parameters vs. the metrics.

TensorFlow Profiler

The TensorFlow Profiler tool facilitates CPU operations

recording, and the CUDA kernel launches on GP. The

information can be visualized in TensorBoard and provides a


quick analysis of the performance bottleneck.

To get started, install the plugin.

Deep learning with TensorFlow and Keras

235

pip install -U tensorboard-plugin-profile

Next, create a TensorBoard callback specifying the batches that

will be profiled using the profile_batch argument. Going back to

the Iris classification problem.

#directory to store profiles

log_dir ='logs/profile/' + datetime.datetime.now().strftime('%

Y%m%d-%H%M%S')

callbacks = [tf.keras.callbacks.TensorBoard(log_dir=log_dir, pr

ofile_batch='10,50')]

#set profile batches

''' set X_train, and y_train from the iris data

iris = datasets.load_iris()

X = iris.data y = iris.target '''

model = create_model()

model.fit(X_train, y_train, epochs=10, validation_split=0.2, ca

llbacks=callbacks)
Load TensorBoard and go to Profile in the dialog box to view

the captured profile.

%tensorboard --logdir logs/profile

Deep learning with TensorFlow and Keras

236

Overview page

The overview_page provides information related to

the performance summary of the GPU and CPU, Run

Environment, and Step-time graph, which shows the


distribution of the step time during training and testing of the

model based on various aspects such as but not limited

to Compilation, Output, Input, etcetera.,

and Recommendations for Next Steps.

Performance Summary shows the information on:

Deep learning with TensorFlow and Keras

237

The time taken during various processes including


Compilation, Output, Input, Kernel Launch, Host Compute,

Device Collection Communication, Device to Device, and

Device Compute.

TF Op Placement.

Op Time Spent on eager execution

Device compute precision

Run Environment provides information on the system where

Profiling was conducted. For instance, the environment used

for this guide includes one host on a CPU device.

Deep learning with TensorFlow and Keras

238
The Step Time graph displays the device step time over all the

steps that have been sampled. It shows all the time

components included in the performance summary but over

different train and test processes.

Another vital tool included in the profiler is

the recommendations for the next step, which contain

Deep learning with TensorFlow and Keras

239
suggestions on how to improve your pipeline. The

recommendations depend on the kind of model that you have

implemented.

Trace viewer

Selecting Trace Viewer from the Tools drop-down dialogue

should return a dashboard similar to the one shown below. It

shows a timeline for different events on the GPU or CPU during

the profiling process.

Deep learning with TensorFlow and Keras

240
The Trace Viewer is designed such that:

To the left (vertical grey column), you can see two major

sections: /device and /host . This provides information on

which TensorFlow op was executed on which device (GPU

or CPU resp.).

To the right, the colored bars denote the duration for

which the respective TensorFlow ops were executed.

Deep learning with TensorFlow and Keras

241
Trace Viewer makes it easy to understand the performance

bottlenecks in the input pipeline. Besides, the Trace Viewer

provides interactive functionality. Use the keyboard

Deep learning with TensorFlow and Keras

242
shortcut S. A and D to move to the left and right, respectively.

Alternatively, use the navigation widget included in the Trace

Viewer window.

To analyze an individual event, use the selection option and

click on a TensorFlow Op.

Deep learning with TensorFlow and Keras

243
You can use your mouse to select multiple events and analyze

the traces based on the selected events or by holding onto

the Ctrl key and selecting the desired events.

Deep learning with TensorFlow and Keras

244
Input pipeline analyzer

The input pipeline analyzer checks the input pipeline and

shows whether there is a performance bottleneck in the

pipeline. It also tells us whether the model is input bound. The

tool contains information related to:

The Summary of input-pipeline analysis.

Recommendations for the next step.

Device-side analysis details.

Host-side analysis details.


Deep learning with TensorFlow and Keras

245

Input Op statistics.

Summary of input-pipeline analysis includes information on

the overall input pipeline. The information shows whether the

application is input bound and, if so, to what extent.

Recommendation for the next step provides suggestions on


what steps to take next.

Deep learning with TensorFlow and Keras

246

Device-side analysis details show the device step-time

summary statistics and the graph of time taken during various

processes, including:

Compilation

Output

Input

Kernel launch

Host compute

Device collection communication

Device to device

Device computation time

Deep learning with TensorFlow and Keras

247
Host-side analysis details provide information on the

breakdown of the input processing time on the host. Information

contained includes:

Enqueuing data

Data preprocessing

Data reading in both advance and on demand

other reading data or processing

This guide's processes mainly involved data preprocessing, as

shown below.
Deep learning with TensorFlow and Keras

248
Deep learning with TensorFlow and Keras

249

The Host-side analysis details also include a section for

recommendations on what can be done based on the host-side

statistics.

Lastly, the Input Op statistics shows details of various input operations,


including:

Input Op – the name of the underlying TensorFlow input

operation.
Count – number of instances of the operation execution

during the profiling session.

Deep learning with TensorFlow and Keras

250

Total Time – the cumulative sum of time spent on each

corresponding instance.

Total Time % – total time spent on an operation as a

percentage of the total time spent on processing the input.

Total Self Time – the cumulative sum of the self-time spent

on each instance.

Total Self Time % – total self-time as a percentage of the

total time spent on input processing.

Category – processing category of the input operation.

Deep learning with TensorFlow and Keras

251
TensorFlow stats

The TensorBoard stats displays the performance of every

TensorFlow operation that the host device has executed. The

graphs shown might vary depending on the host device and

TensorFlow processes. For instance, in this case, there are two

pie charts.

Deep learning with TensorFlow and Keras

252
The plot to the left shows the distribution of the total self-

execution time of each operation on the host, while the last plot

shows the distribution of the self-execution time on each

operation type on the host.

Deep learning with TensorFlow and Keras

253
The TensorFlow statistics can be filtered by IDLE time from the

dialog box. IDLE time refers to the portion of the total

execution time on a device (or host) that is idle.

Deep learning with TensorFlow and Keras

254
Other statistics included in the TensorFlow stats dashboard

are TensorFlow operations which various details regarding

given operations.

Deep learning with TensorFlow and Keras

255
GPU kernel stats

If the host device runs with a TPU or GPU kernel, you can view

the performance statistics, and the originating operation for

each GPU accelerated kernel through kernel_stats windows.

The figure below provides a sample overview of a GPU

accelerated kernel.

Deep learning with TensorFlow and Keras

256
Memory profile page

The Memory profile page tool profiles information on GPU

memory usage during TensorFlow Ops. This tool can analyze

and debug OOM (Out of Memory) error– raised whenever the

GPU’s memory is exhausted.

Components included in the Memory profile page include:

Memory Profile Summary shows a summary of the

memory profile of the TensorFlow application.

Memory Timeline Graph is a plot of the memory usage in

GiBs and the percentage of fragmentation versus time in

milliseconds.

Memory Breakdown Table shows active memory

allocations at the point of the highest memory usage in the

profiling interval.

Deep learning with TensorFlow and Keras


257

How to enable debugging on

TensorBoard

You can debug the information in the TensorBoard:

Select particular nodes and debug them.

Graphically control the execution of the model.

Visualize the tensors and their values.

To enable debugging, add the following code before the model

begins training.

logdir = os.path.join("logs/debugg", datetime.datetime.now().st

rftime("%Y%m%d-%H%M%S"))

tf.debugging.experimental.enable_dump_debug_info(

logdir, tensor_debug_mode="FULL_HEALTH", circular_buffer_si


ze=-1

Deep learning with TensorFlow and Keras

258

Load TensorBoard.

%tensorboard --logdir logs/debugg

Using TensorBoard with deep

learning frameworks

TensorBoard allows integration with other machine learning

frameworks.
TensorBoard in PyTorch

Deep learning with TensorFlow and Keras

259

PyTorch is a popular open-source machine learning framework.

You can log PyTorch events using TensorBoard to track loss,

RMSE, and accuracy metrics.

First, define a SummaryWriter instance. You will log the events

in ./runs/ so delete any prior logs.

import torch #summary instance from torch.utils.tensorboard imp

ort

rm -rf ./runs/

SummaryWriter writer = SummaryWriter()

Next, define the data and model, and write the metrics to

the SummaryWriter instance.

#install torch

#pip install torch

import torch

#data

x = torch.arange(-5, 5, 0.1).view(-1, 1)

y = -5 * x + 0.1 * torch.randn(x.size())
#model

model = torch.nn.Linear(1, 1)

criterion = torch.nn.MSELoss()

optimizer = torch.optim.SGD(model.parameters(), lr = 0.1)

def train_model(iter):

for epoch in range(iter):

y1 = model(x)

loss = criterion(y1, y)

writer.add_scalar("Loss/train", loss, epoch)

optimizer.zero_grad()

loss.backward()

optimizer.step()

train_model(10)

writer.flush()

Deep learning with TensorFlow and Keras

260

#close writer

writer.close()

To avoid cluttering, especially in cases where you have a large

sample, you can arrange the results in


the SummaryWriter instance as shown below.

from torch.utils.tensorboard import SummaryWriter

import numpy as np

writer = SummaryWriter()

for n_iter in range(100):

writer.add_scalar('Loss/train', np.random.random(), n_iter)

writer.add_scalar('Loss/test', np.random.random(), n_iter)

writer.add_scalar('Accuracy/train', np.random.random(), n_i

ter)

writer.add_scalar('Accuracy/test', np.random.random(), n_it

er)

Load TensorBoard.

%tensorboard --logdir=runs

Deep learning with TensorFlow and Keras

261
TensorBoard in Keras

To add Keras models to the TensorBoard, first, create a Keras

callback object of TensorBoard whose logs will be saved in

the experiment folder inside the folder containing the main logs.

tb_callback = tf.keras.callbacks.TensorBoard(log_dir="logs/expe

riment", histogram_freq=1)

model = create_model()

model.fit(X_train, y_train, epochs=10, callbacks=[tb_callback])


Now run and visualize the Keras model in TensorBoard.

Deep learning with TensorFlow and Keras

262

%tensorboard --logdir logs/experiment

TensorBoard in XGBoost

XGBoost is another popular ML package used for classification

and regression problems. To log events from XGBoost

modeling, you need the tensorboardX package which can be


installed using pip install tensorboardX . To work with XgBoost,

install the package using:

Deep learning with TensorFlow and Keras

263

conda install -c anaconda py-xgboost from your command prompt or


!conda install -c anaconda py-xgboost in Google Colab

notebook.

This example logs the events for an XGBoost model trained on

the popular Ames housing dataset.

Remove prior logs rm -rf ./runs/ and define the XGBoost

model.

import datetime

#conda install -c anaconda py-xgboost

import xgboost as xgb

import os

#set some xgboost attributes that miss in version 1.6.x

new_attrs = ['grow_policy', 'max_bin', 'eval_metric', 'callback

s', 'early_stopping_rounds', 'max_cat_to_onehot', 'max_leaves',

'sampling_method']

for attr in new_attrs:


setattr(xgb, attr, None)

from tensorboardX import SummaryWriter

from sklearn.model_selection import train_test_split

class TensorBoardCallback(xgb.callback.TrainingCallback):

'''

Run experiments while scoring the model and saving the erro

r to train or test folders

'''

def __init__(self, experiment: str = None, data_name: str =

None):

self.experiment = experiment or "logs"

self.data_name = data_name or "test"

self.datetime_ = datetime.datetime.now().strftime("%Y%

Deep learning with TensorFlow and Keras

264

m%d-%H%M%S")

#save the logs to the 'run/' folder

self.log_dir = f"runs/{self.experiment}/{self.datetime

_}"

self.train_writer = SummaryWriter(log_dir=os.path.join
(self.log_dir, "train/"))

if self.data_name:

self.test_writer = SummaryWriter(

log_dir=os.path.join(self.log_dir, f"{self.data

_name}/")

def after_iteration(

self, model, epoch: int, evals_log: xgb.callback.Traini

ngCallback.EvalsLog

) -> bool:

if not evals_log:

return False

for data, metric in evals_log.items():

for metric_name, log in metric.items():

score = log[-1][0] if isinstance(log[-1], tupl

e) else log[-1]

if data == "train":

self.train_writer.add_scalar(metric_name, s

core, epoch)

else:
self.test_writer.add_scalar(metric_name, sc

ore, epoch)

return False

from sklearn.datasets import fetch_openml

X, y = fetch_openml(name="house_prices", return_X_y=True)

#subset numerical variables

numerics = ['int16', 'int32', 'int64', 'float16', 'float32', 'f

loat64']

X = X.select_dtypes(include=numerics)

#subset the data to train and test

X_train, X_test, y_train, y_test = train_test_split(X, y, test_

size=0.2, random_state=100)

Deep learning with TensorFlow and Keras

265

dtrain = xgb.DMatrix(X_train, label=y_train, enable_categorical

= True)

dtest = xgb.DMatrix(X_test, label=y_test, enable_categorical =

True)

params = {'objective':'reg:squarederror', 'eval_metric': 'rms

e'}
bst = xgb.train(params, dtrain, num_boost_round=100, evals=[(dt

rain, 'train'), (dtest, 'test')],

callbacks=[TensorBoardCallback(experiment='exp_1', data

_name='test')])

Next, load the TensorBoard using the logs saved

to SummaryWriter .

%tensorboard --logdir runs/

Deep learning with TensorFlow and Keras

266
TensorBoard in JAX and Flax

You can log evaluation metrics when using JAX during model

training, use TensorBoard to profile JAX programs using

the jax.profiler.start_trace() and jax.profiler.stop_trace() to

start and stop JAX tracing, respectively.

from torch.utils.tensorboard import SummaryWriter

import torchvision.transforms.functional as F

log_folder = "runs"

writer = SummaryWriter(logdir)

for epoch in range(1, num_epochs + 1):

train_state, train_metrics = train_one_epoch(state, train_l

Deep learning with TensorFlow and Keras

267
oader)

training_loss.append(train_metrics['loss'])

training_accuracy.append(train_metrics['accuracy'])

print(f"Train epoch: {epoch}, loss: {train_metrics['los

s']}, accuracy: {train_metrics['accuracy'] * 100}")

test_metrics = evaluate_model(train_state, test_images, tes

t_labels)

testing_loss.append(test_metrics['loss'])

testing_accuracy.append(test_metrics['accuracy'])

writer.add_scalar('Loss/train', train_metrics['loss'], epoc

h)

writer.add_scalar('Loss/test', test_metrics['loss'], epoch)


writer.add_scalar('Accuracy/train', train_metrics['accurac

y'], epoch)

writer.add_scalar('Accuracy/test', test_metrics['accurac

y'], epoch)

print(f"Test epoch: {epoch}, loss: {test_metrics['loss']},

accuracy: {test_metrics['accuracy'] * 100}")

The figure below shows a manual sample profiling of JAX.

Deep learning with TensorFlow and Keras

268

Download TensorBoard data as

Pandas DataFrame

After you have finished modeling, you might be interested in

conducting post-hoc analyses and creating custom

visualizations based on log data. TensorBoard allows you to

access your log data

using data.experimental.ExperimentFromDev() function.


Consider the Iris classification problem. You can access the

data for a given experiment using:

import tensorboard as tb

experiment_id = "c1KCv3X3QvGwaXfgX1c4tg"

experiment = tb.data.experimental.ExperimentFromDev(experiment_

id)

df = experiment.get_scalars()

df.head()

You can also obtain the DataFrame as a wide format since, in the
experiment, the two tags ( epoch_loss and epoch_accuracy )

are present at the same set of steps in each run.

Deep learning with TensorFlow and Keras

269

try:

experiment_id = "c1KCv3X3QvGwaXfgX1c4tg"

experiment = tb.data.experimental.ExperimentFromDev(experim

ent_id)
df = experiment.get_scalars()

df_wide = experiment.get_scalars(pivot=True)

display(df_wide.head())

except:

print("There is only a single tag")

df_wide = experiment.get_scalars(pivot=False)

display(df_wide.head())

Finally, you can save the Pandas DataFrame as a CSV file.

#path

import pandas as pd

csv_path = 'tensor_experiment_1.csv'

df_wide.to_csv(csv_path, index=False)

df_wide_roundtrip = pd.read_csv(csv_path)

pd.testing.assert_frame_equal(df_wide_roundtrip, df_wide)

You can now visualize the data using a visualization package

such as Matplotlib.

Deep learning with TensorFlow and Keras

270
Tensorboard.dev

With TensorBoard, you can easily host, track, and share ML

experiments. All you need to do is upload logs to

TensorBoard.dev. Sharing logs is possible in Google Colab or

from the command prompt.

In your working directory, open the command prompt and run:

tensorboard dev upload --logdir logs

On Google Colab notebook:

%tensorboard dev upload --logdir logs

On Jupyter Notebook:

!tensorboard dev upload --logdir logs

Deep learning with TensorFlow and Keras

271
You will be prompted to continue with the upload by entering

y/yes; otherwise, abort the operation. After supplying Yes , an

authorization window for www.google.com will be opened for

you to complete the process. Upon successful completion, a

unique link for the experiment will be created. The

following link shows an example of an uploaded TensorBoard.

To stop uploading, interrupt the execution in Jupyter and

Google Colab notebooks or press Ctrl-C if you are using the

command prompt.

Limitations of using

TensorBoard

TensorBoard has its share of limitations. Some of the limitations

of TensorBoard include:

1. Lacks private hosting. All experiments shared using

Tensorboard.dev are public. Be keen not to upload


sensitive information to TensorBoard.dev.

Deep learning with TensorFlow and Keras

272

2. TensorBoard is limited to specific data formats limiting the logging


and visualization of other data formats such as

audio/video or custom HTML.

3. Lack of user and workspace management features often

necessary for larger organizations.

4. Scalability issues. TensorBoard starts getting

performance issues as the number of runs increases.

5. Does not offer functionality for team collaboration which

disadvantages users who work on ML products as a team.

Final thoughts

The process of machine learning engineering, which every data

scientist interacts with from time to time, requires extensive

modeling using different frameworks to optimize the underlying

models' predictive ability. However, the process of model

optimization, debugging, and deployment can present its fair

share of challenges. Tools like TensorBoard provide developers

with resources to build better machine learning models and


produce quality results with less effort.

From setting up TensorBoard to debugging and visualizing logs

from other libraries, this guide delves into the functionality of

TensorBoard in visualizing the machine learning modeling

process.

Deep learning with TensorFlow and Keras

273

How to build TensorFlow

models with the Keras

Functional API

The Keras Functional API provides a way to build flexible and

complex neural networks in TensorFlow. The Functional API is

used to design networks that are not linear. In this article, you

will discover that the Keras Functional API is used to create

networks that:

Are non-linear.

Share layers.

Have multiple inputs and outputs.

Keras Sequential models


We used the Sequential API in the CNN tutorial to build an image
classification model with Keras and TensorFlow. The

Sequential API involves stacking layers. One layer is followed

by another layer until the final dense layer. This makes

designing networks with the Sequential API easy and

straightforward.

parameters = {"shape":28, "activation": "relu", "classes": 10,

"units":12, "optimizer":"adam", "epochs":1,"kernel_size":3,"po ol_size":2,


"dropout":0.5}

# Setup the layers

model = keras.Sequential(

Deep learning with TensorFlow and Keras

274

layers.Conv2D(32, kernel_size=(parameters["kernel_size"],
parameters["kernel_size"]), input_shape =(parameters["shape"],
parameters["shape"], 1),activation=parameters["activation"]),
layers.MaxPooling2D(pool_size=(parameters["pool_size"], p

arameters["pool_size"])),

layers.Conv2D(64, kernel_size=(parameters["kernel_size"],

parameters["kernel_size"]), activation=parameters["activatio

n"]),
layers.MaxPooling2D(pool_size=(parameters["pool_size"], p

arameters["pool_size"])),

layers.Flatten(),

layers.Dropout(parameters["dropout"]),

layers.Dense(parameters["classes"], activation="softma

x"),

The Sequential API limits you to one input and one output.

However, you may want to design neural networks with multiple

inputs and outputs in certain scenarios. For example, given an

image of a person, you can design a network to predict several

attributes such as gender, age, and hair color. This is a network

with one input but multiple outputs. To achieve this,

the Sequential API is required. Plotting the network shows that

the layers are arranged in a linear manner.

keras.utils.plot_model(model, "model.png",show_shapes=True)

Deep learning with TensorFlow and Keras

275
Keras Functional models

Designing Functional models is a little different from developing


Sequential models. Let's look at those differences.

Defining input

Deep learning with TensorFlow and Keras

276

The first difference is the requirement to create an input layer.

With the Sequential API, you don't have to define the input

layer. Defining the input shape in the first layer is sufficient.

The inputs layer contains the shape and type of data to be

passed to the network.

inputs = keras.Input(shape=(parameters["shape"], parameters["sh ape"], 1))

inputs.shape

# TensorShape([None, 28, 28, 1])

inputs.dtype

# tf.float32

The input layer is defined without the batch size if the data is

one-dimensional.

inputs = keras.Input(shape=(784,))

Connecting layers

The next difference is how layers are connected using the

Functional API. To create the connection, we create another


layer and pass the inputs layer to it. This is best understood by

considering each of the layers as a function. Since the layers

are functions, they can be called with parameters. For

example, let's pass the inputs to a Conv2D layer.

Deep learning with TensorFlow and Keras

277

conv2D = layers.Conv2D(32)

x = conv2D(inputs)

# <KerasTensor: shape=(None, 26, 26, 32) dtype=float32 (created

by layer 'conv2d_7')>

In the above example, we create a Conv2D layer, call it as a

function and pass the inputs. The resulting output's shape is

different from the initial inputs shape as a result of being

passed to the convolution layer.

Functional API Python syntax

The above example shows how to define and connect the

networks verbosely. However, the syntax can be simplified. The

simplified version looks like this:

conv2D = Conv2d(...) (inputs)


conv2D() is similar to conv2D.__call__(self,....) . Python objects

implement the __call__() method. Keras layers also implement

this method. The method returns the output given an input

tensor.

inputs = keras.Input(shape=(parameters["shape"], parameters["sh ape"], 1))

conv2D = layers.Conv2D(32, kernel_size=(parameters["kernel_siz

e"], parameters["kernel_size"]), input_shape =(parameters["shap e"],


parameters["shape"], 1),activation=parameters["activatio n"])(inputs)

Deep learning with TensorFlow and Keras

278

conv2D

# <KerasTensor: shape=(None, 26, 26, 32) dtype=float32 (created

by layer 'conv2d_8')>

Creating the model

Let's add a few more layers to the network to demonstrate how

to create a Keras model when layers are defined using the

Functional API.

parameters = {"shape":28, "activation": "relu", "classes": 10,

"units":12, "optimizer":"adam", "epochs":1,"kernel_size":3,"po ol_size":2,


"dropout":0.5}

inputs = keras.Input(shape=(parameters["shape"], parameters["sh ape"], 1))


conv2D = layers.Conv2D(32, kernel_size=(parameters["kernel_siz

e"], parameters["kernel_size"]), input_shape =(parameters["shap e"],


parameters["shape"], 1),activation=parameters["activatio n"])(inputs)

maxPooling2D = layers.MaxPooling2D(pool_size=(parameters["pool_

size"], parameters["pool_size"]))(conv2D)

conv2D_2 =layers.Conv2D(64, kernel_size=(parameters["kernel_siz

e"], parameters["kernel_size"]), activation=parameters["activat ion"])


(maxPooling2D)

maxPooling2D_2 = layers.MaxPooling2D(pool_size=(parameters["poo

l_size"], parameters["pool_size"]))(conv2D_2)

flatten = layers.Flatten()(maxPooling2D_2)

dropout = layers.Dropout(parameters["dropout"])(flatten)

ouputs = layers.Dense(parameters["classes"], activation="softma x")


(dropout)

Deep learning with TensorFlow and Keras

279

A Keras model is created using the keras.Model function while passing the
inputs and outputs .

model = keras.Model(inputs=inputs, outputs=outputs, name="mnist

_model")

We can plot the model to confirm that it's similar to the one we
defined using the Sequential API.

keras.utils.plot_model(model, "model.png",show_shapes=True)

Deep learning with TensorFlow and Keras

280
Training and evaluation of

Functional API models


Training and evaluating models are the same in the Functional

API and the Sequential API. keras.Model avails

the fit and evaluate methods.

Deep learning with TensorFlow and Keras

281

(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.loa

d_data()

x_train = x_train.astype("float32") / 255

x_test = x_test.astype("float32") / 255

model.compile(

loss=keras.losses.SparseCategoricalCrossentropy(from_logits

=True),

optimizer=keras.optimizers.RMSprop(),

metrics=["accuracy"],

history = model.fit(x_train, y_train, batch_size=64, epochs=2,

validation_split=0.2)

test_scores = model.evaluate(x_test, y_test, verbose=2)


print("Test loss:", test_scores[0])

print("Test accuracy:", test_scores[1])

Save and serialize Functional

API models

Model saving and serialization work the same in the

Functional API and the Sequential API. For instance, we can

save the entire model using model.save() .

model.save("saved_model")

del model

Deep learning with TensorFlow and Keras

282
model = keras.models.load_model("saved_model")

model.summary()

How to convert a Functional

model to a Sequential API

model

A Functional model with linear layers can be converted into a

Sequential model by creating an instance of Sequential and

adding the layers.

Deep learning with TensorFlow and Keras


283

seq_model = keras.models.Sequential()

for layer in model.layers:

seq_model.add(layer)

seq_model.summary()

How to convert a Sequential

model to a Functional API

model

Similarly, we can convert Sequential networks to Functional

models.
inputs = keras.Input(batch_shape=seq_model.layers[0].input_shap

e)

Deep learning with TensorFlow and Keras

284

x = inputs

for layer in seq_model.layers:

x = layer(x)

outputs = x

func_model = keras.Model(inputs=inputs, outputs=outputs, name


="func_mnist_model")

func_model.summary()

Standard network models

Let's look at how to define standard neural networks using the

Functional Keras API.

Multilayer perception

Deep learning with TensorFlow and Keras

285

We start by defining a neural network with multiple hidden

layers and plot the model.

inputs = keras.Input(shape=(parameters["shape"], parameters["sh ape"], 1))

dense1 = layers.Dense(128)(inputs)

dropout = layers.Dropout(parameters["dropout"])(dense1)

dense2 = layers.Dense(128)(dropout)

dropout1 = layers.Dropout(parameters["dropout"])(dense2)

outputs = layers.Dense(parameters["classes"], activation="softm ax")


(dropout1)

model = keras.Model(inputs=inputs, outputs=outputs, name="mnist

_model")

keras.utils.plot_model(model, "model.png",show_shapes=True)
Deep learning with TensorFlow and Keras

286

Convolutional Neural Network

Next, we look at how to define Convolutional Neural

Networks using the Functional API. The network has


convolution, pooling, flatten, and dense layers.

inputs = keras.Input(shape=(parameters["shape"], parameters["sh ape"], 1))

conv2D = layers.Conv2D(32, kernel_size=(parameters["kernel_siz

Deep learning with TensorFlow and Keras

287

e"], parameters["kernel_size"]), input_shape =(parameters["shap e"],


parameters["shape"], 1),activation=parameters["activatio n"])(inputs)

maxPooling2D = layers.MaxPooling2D(pool_size=(parameters["pool_

size"], parameters["pool_size"]))(conv2D)

conv2D_2 =layers.Conv2D(64, kernel_size=(parameters["kernel_siz

e"], parameters["kernel_size"]), activation=parameters["activat ion"])


(maxPooling2D)

maxPooling2D_2 = layers.MaxPooling2D(pool_size=(parameters["poo

l_size"], parameters["pool_size"]))(conv2D_2)

flatten = layers.Flatten()(maxPooling2D_2)

dropout = layers.Dropout(parameters["dropout"])(flatten)

outputs = layers.Dense(parameters["classes"], activation="softm ax")


(dropout)

model = keras.Model(inputs=inputs, outputs=outputs, name="mnist

_model")

keras.utils.plot_model(model, "model.png",show_shapes=True)
Deep learning with TensorFlow and Keras

288
Recurrent Neural Network

Let's look at the definition of a bidirectional LSTM using the

Functional API. The network contains an Embedding layer.

inputs = keras.Input(784,)

embedding = layers.Embedding(512, 64, input_length=1024)(input

s)

bidirectional1 = layers.Bidirectional(layers.LSTM(64, return_se

Deep learning with TensorFlow and Keras

289
quences=True))(embedding)

bidirectional2 = layers.Bidirectional(layers.LSTM(64,))(bidirec

tional1)

dense1 = layers.Dense(32, activation='relu')(bidirectional2)

outputs = layers.Dense(1, activation='sigmoid')(dense1)

model = keras.Model(inputs=inputs, outputs=outputs, name="lstm_


model")

keras.utils.plot_model(model, "model.png",show_shapes=True)

Deep learning with TensorFlow and Keras

290

Shared layers model

Defining layers with the Functional API enables the creation of

networks that share certain layers. Shared layers are used

several times in a network.

Shared input layer

This example defines a CNN with one input layer shared by two

convolution blocks. We then join the outputs from these blocks

using the concatenate layer. After that, we pass the result to

a DropOut layer and finally to fully connected layer.

inputs = keras.Input(shape=(parameters["shape"], parameters["sh ape"], 1))

conv2D = layers.Conv2D(32, kernel_size=(parameters["kernel_siz

e"], parameters["kernel_size"]), input_shape =(parameters["shap e"],


parameters["shape"], 1),activation=parameters["activatio n"])(inputs)

maxPooling2D = layers.MaxPooling2D(pool_size=(parameters["pool_

size"], parameters["pool_size"]))(conv2D)

flatten1 = layers.Flatten()(maxPooling2D)
conv2D_2 = layers.Conv2D(64, kernel_size=(parameters["kernel_si

ze"], parameters["kernel_size"]), activation=parameters["activa tion"])


(inputs)

maxPooling2D_2 = layers.MaxPooling2D(pool_size=(parameters["poo

l_size"], parameters["pool_size"]))(conv2D_2)

flatten2 = layers.Flatten()(maxPooling2D_2)

# merge layers

merged_layers = layers.concatenate([flatten1, flatten2])

dropout = layers.Dropout(parameters["dropout"])(merged_layers)

outputs = layers.Dense(parameters["classes"], activation="softm ax")


(dropout)

Deep learning with TensorFlow and Keras

291
model = keras.Model(inputs=inputs, outputs=outputs, name="mnist

_model")

keras.utils.plot_model(model, "model.png",show_shapes=True)

Plotting the network shows the connection between the

different layers.

Shared feature extraction layer

In this example, we create an embedding layer shared by two

bidirectional LSTMs. A shared feature extraction layer enables

sharing the same feature extractor multiple times in the

network. For example, sharing this information between two

inputs can make it possible to train a network with less data.


inputs = keras.Input(784,)

embedding = layers.Embedding(512, 64, input_length=1024)(input

Deep learning with TensorFlow and Keras

292

s)

bidirectional1 = layers.Bidirectional(layers.LSTM(64, return_se

quences=True))(embedding)

bidirectional2 = layers.Bidirectional(layers.LSTM(64, return_se

quences=True))(embedding)

# merge layers

merged_layers = layers.concatenate([bidirectional1, bidirection

al2])
dense1 = layers.Dense(32, activation='relu')(merged_layers)

outputs = layers.Dense(1, activation='sigmoid')(dense1)

model = keras.Model(inputs=inputs, outputs=outputs, name="lstm_

model")

keras.utils.plot_model(model, "model.png",show_shapes=True)

Next, let's discuss the multiple inputs and outputs scenario.

Multiple input and output

models

Deep learning with TensorFlow and Keras

293

Networks with multiple inputs and outputs can also be defined

using the Functional API. This is not possible with the

Sequential API.
Multiple input model

In this example, we define a network that takes two inputs of

different lengths. We pass the inputs to dense layers and sum

them using the add layer.

input1 = keras.Input(shape=(16,))

x1 =layers.Dense(8, activation='relu')(input1)

input2 = layers.Input(shape=(32,))

x2 = layers.Dense(8, activation='relu')(input2)

# equivalent to àdded = tf.keras.layers.add([x1, x2])àdded = layers.Add()


([x1, x2]) out = layers.Dense(4)(added)

model = keras.Model(inputs=[input1, input2], outputs=out)

keras.utils.plot_model(model, "model.png",show_shapes=True)

Deep learning with TensorFlow and Keras

294

Multiple output model

The Functional API enables the definition of models

with multiple outputs. The example below defines

a convolutional neural network with two output layers. For

instance, given an image of a person, this network can predict

gender and hair color.


image_input = keras.Input(shape=(parameters["shape"], parameter

s["shape"], 3), name="images")

x = layers.Conv2D(filters=32,kernel_size=(3,3),activation='rel

u')(image_input)

x = layers.MaxPooling2D(pool_size=(2,2))(x)

x = layers.Conv2D(filters=32,kernel_size=(3,3), activation='rel

u')(x)

x = layers.Dropout(0.25)(x)

x = layers.Conv2D(filters=64,kernel_size=(3,3), activation='rel

u')(x)

x = layers.MaxPooling2D(pool_size=(2,2))(x)

x = layers.Dropout(0.25)(x)

x = layers.Flatten()(x)

x = layers.Dense(128, activation='relu')(x)

x = layers.Dropout(0.25)(x)

gender_prediction = layers.Dense(3, activation='softmax')(x)

age_prediction = layers.Dense(3, activation='softmax')(x)

model = keras.Model(

inputs=image_input,

outputs=[gender_prediction, age_prediction],
)

keras.utils.plot_model(model, "model.png",show_shapes=True)

Deep learning with TensorFlow and Keras

295
Use the same graph of layers to

define multiple models


The Functional API also enables the definition of multiple

models using the same layers. This is possible because

creating a model using the Functional API requires only the

inputs and outputs. For example, this is applicable in the

encode decoder network architecture.

Deep learning with TensorFlow and Keras

296

encoder_input = keras.Input(shape=(28, 28, 1), name="img") x =


layers.Conv2D(16, 3, activation="relu")(encoder_input)

x = layers.Conv2D(32, 3, activation="relu")(x)

x = layers.MaxPooling2D(3)(x)

x = layers.Conv2D(32, 3, activation="relu")(x)

x = layers.Conv2D(16, 3, activation="relu")(x)

encoder_output = layers.GlobalMaxPooling2D()(x)

encoder = keras.Model(encoder_input, encoder_output, name="enco

der")

encoder.summary()

x = layers.Reshape((4, 4, 1))(encoder_output)

x = layers.Conv2DTranspose(16, 3, activation="relu")(x)

x = layers.Conv2DTranspose(32, 3, activation="relu")(x)
x = layers.UpSampling2D(3)(x)

x = layers.Conv2DTranspose(16, 3, activation="relu")(x)

decoder_output = layers.Conv2DTranspose(1, 3, activation="rel

u")(x)

autoencoder = keras.Model(encoder_input, decoder_output, name

="autoencoder")

autoencoder.summary()

Keras Functional API end-to-

end example

Let's design a TensorFlow network that predicts various

characteristics given an image of a face. The characteristics we

will be predicting are:

Age

Hair color

Mustache color

Deep learning with TensorFlow and Keras

297
Eye color

We will use the Face Classification Open Annotated

Dataset from Ango AI.

Data download

We start by downloading the images. We'll use the download

script provided by Ango AI. The images will be downloaded into

an assets folder.

!git clone https://github.com/mlnuggets/tensorflow.git

!mv tensorflow/data/labels-en.json labels-en.json

!mv tensorflow/data/dataloader.py dataloader.py

!python dataloader.py -j labels-en.json

Data processing

Next, we load the JSON file using Pandas. The file contains the
classification for each image.

import pandas as pd

df = pd.read_json("labels-en.json")

df.head()

Deep learning with TensorFlow and Keras

298
Add image path column

We will load the images using the Pandas DataFrame. To do that, we need
to provide a path for each image. Let's add the

image path column to the DataFrame.

def image_names(externalId):

return f"{externalId}.png"

df["image_path"] = df["externalId"].map(image_names)

df.tail()

Create face attributes columns

The tasks column contains the labels for the face

characteristics. We, therefore, define a function to get that

information and create a column for each face attribute.

Let's start by obtaining all the labels and appending them to a

list.

age = []

hair = []

beard = []
mustache = []

eye = []

def get_answers(tasks):

for all_tasks in tasks:

all_it = all_tasks[0]

Deep learning with TensorFlow and Keras

299

for item in all_it['classifications']:

if item['title'] == 'Age':

age.append(item['answer'])

if item['title'] == 'Hair Color':

hair.append(item['answer'])

if item['title'] == 'Beard Color':

beard.append(item['answer'])

if item['title'] == 'Mustache Color':

mustache.append(item['answer'])

if item['title'] == 'Eye Color':

eye.append(item['answer'])

Next, we use these lists to create a new column for each face

attribute.
get_answers(df['tasks'])

df['age'] = age

df['hair_color'] = hair

df['beard_color'] = beard

df['mustache_color'] = mustache

df['eye_color'] = eye

Deep learning with TensorFlow and Keras

300

Label encoding

We now have the labels for each face attribute. The next step is

to convert them to integer representation. We use Scikit-

learns LabelEncoder to achieve that.

from sklearn.preprocessing import LabelEncoder

age_labelencoder = LabelEncoder()
hair_labelencoder = LabelEncoder()

beard_labelencoder = LabelEncoder()

mustache_labelencoder = LabelEncoder()

eye_labelencoder = LabelEncoder()

df = df.assign(age = age_labelencoder.fit_transform(df["age"]))

df = df.assign(hair_color = hair_labelencoder.fit_transform(df

["hair_color"]))

df = df.assign(beard_color = beard_labelencoder.fit_transform(d

f["beard_color"]))

df = df.assign(mustache_color = mustache_labelencoder.fit_trans

form(df["mustache_color"]))

df = df.assign(age = age_labelencoder.fit_transform(df["eye_col

or"]))

df = df.assign(eye_color = eye_labelencoder.fit_transform(df["e

ye_color"]))

Deep learning with TensorFlow and Keras

301
Generate tf.data dataset

The next step is to create a TensorFlow dataset using

this DataFrame and the image data. We achieve this using


ImageDataGenerator . This function generates a batch of tensor image data
with augmentation. Since the image

information is a DataFrame, we load the images

using flow_from_dataframe . The first step is to define

the ImageDataGenerator with the desired configuration. In this

case:

Rescaling the images.

Specifying the validation split.

Defining the augmentation strategy, including horizontal flip

and shear_range .

from tensorflow.keras.preprocessing.image import ImageDataGener


ator

train_datagen = ImageDataGenerator(rescale=1./255,

shear_range=0.2,

Deep learning with TensorFlow and Keras

302

zoom_range=0.2,

horizontal_flip=True,

width_shift_range=0.1,

height_shift_range=0.1,

validation_split=0.2

validation_gen = ImageDataGenerator(rescale=1./255,validation_s

plit=0.2)

Next, use the flow_from_dataframe function to load the images.

This requires the following parameters:

The image size. The images will be resized to the defined

size.

The batch size.

base_dir indicating the folder containing the face images.

target_columns to specify the target columns.


class_mode set to multi_output because the network will

produce output for the 5 target columns.

image_size = (128, 128)

batch_size = 32

base_dir = 'assets'

target_columns = ['age','hair_color','beard_color','mustache_co

lor','eye_color']

training_set = train_datagen.flow_from_dataframe(df,base_dir,

seed=101,

target_size=ima

ge_size,

batch_size=batc

h_size,

x_col='image_pa

Deep learning with TensorFlow and Keras

303
th',

y_col=target_co

lumns,

subset = 'train

ing',

class_mode='mul

ti_output')

validation_set = validation_gen.flow_from_dataframe(df,base_di

r,

target_size=image

_size,

batch_size=batch_

size,

x_col='image_pat

h',
y_col=target_co

lumns,

subset = 'validat

ion',

class_mode='multi

_output'

Visualize the training data

Let's use Matplotlib to visualize a sample of the TensorFlow dataset.

Deep learning with TensorFlow and Keras

304
import matplotlib.pyplot as plt

plt.figure(figsize=(10, 10))

for images, labels in training_set:

for i in range(25):

ax = plt.subplot(5, 5, i + 1)

plt.imshow(images[i])

plt.axis("off")
break

Deep learning with TensorFlow and Keras

305

Define Keras Functional

network

Given an image, the network should predict each of the five

attributes of the face. To achieve that, we define a network with

five output layers. This can not be achieved using the

Sequential API. We, therefore, use the Functional API

knowledge we learned at the beginning of this article.

Define a Convolutional Neural Network with three blocks

followed by a DropOut and Dense layer. We use layers.add to

sum the output from each block. After that, we create five

dense layers that will produce the prediction for each of the

face attributes. The number of units in each dense layer is the

number of unique items in each of the five columns. We use the

softmax activation for each layer because the classes are more

than two.

We name each layer of the network to make it easier to inspect

the summary and plot of the network. It will also make it easier
to evaluate the performance of the network.

from tensorflow import keras

from tensorflow.keras import Sequential

from tensorflow.keras.layers import Dense,Conv2D,MaxPooling2D,F

latten,Dropout,Resizing

from tensorflow.keras.preprocessing.image import ImageDataGener

ator

from tensorflow.keras.callbacks import EarlyStopping

Deep learning with TensorFlow and Keras

306

image_input = keras.Input(shape=(image_size[0], image_size[0], 3),


name="images_input")

x = Conv2D(filters=32,kernel_size=(3,3),activation="relu",name

="first_block_conv2d")(image_input)

x = MaxPooling2D(pool_size=(2,2),name="first_block_maxpool2d")

(x)

first_block_output = Flatten(name="first_block_flatten")(x)

x = Conv2D(filters=32,kernel_size=(3,3), activation='relu', nam

e="second_block_conv2d")(image_input)

x = MaxPooling2D(pool_size=(2,2),name="second_block_maxpool2d")
(x)

x = Flatten(name="second_block_flatten")(x)

second_block_output = layers.add([x, first_block_output], name

="second_block_add")

x = Conv2D(filters=32,kernel_size=(3,3), activation='relu',name

="third_block_conv2d")(image_input)

x = MaxPooling2D(pool_size=(2,2),name="third_block_maxpool2d")

(x)

x = Flatten(name="third_block_flatten")(x)

third_block_output = layers.add([x, second_block_output],name

="third_block_add")

x = Dropout(0.25, name="dropout1")(third_block_output)

x = Dense(128, activation="relu", name="dense1")(x)

age_prediction = Dense(df["age"].nunique(), activation="softma

x",name="dense_age")(x)

hair_prediction = Dense(df["hair_color"].nunique(), activation

="softmax",name="dense_hair")(x)

beard_prediction = Dense(df["beard_color"].nunique(), activatio

n="softmax",name="dense_beard")(x)

mustache_prediction = Dense(df["mustache_color"].nunique(), act


ivation="softmax",name="dense_mustache")(x)

eye_prediction = Dense(df["eye_color"].nunique(), activation="s


oftmax",name="dense_eye")(x)

Deep learning with TensorFlow and Keras

307

After that, we pass the inputs and outputs to keras.Model to

create the model. There are five outputs in the model as

required.

model = keras.Model(

inputs=image_input,

outputs=[ age_prediction,

hair_prediction,beard_prediction,

mustache_prediction,eye_prediction

],

)
Plot and inspect the Keras

Functional model

Plot the network to confirm that the connections are expected.

It also helps to check if the shapes at various stages are as

intended.

keras.utils.plot_model(model, "model.png",show_shapes=True)

Deep learning with TensorFlow and Keras

308

Compile Keras Functional

network

Specify the loss for each prediction layer when compiling a

Keras Functional network. This is passed as a list. In this case,

we pass one loss, which will be applied to the five layers. We

use the SparseCategoricalCrossentropy loss

and SparseCategoricalAccuracy because the labels are integers.

model.compile(optimizer='adam', loss=keras.losses.SparseCategor

icalCrossentropy(), metrics=keras.metrics.SparseCategoricalAccu

racy())

Training the Functional network

Let's train this network with the Earlystopping callback and 100
epochs. We stop training if the network doesn't improve for 10

consecutive iterations. You will see the training and validation

metrics for the five items we want to predict as the network

trains.

callback = EarlyStopping(monitor='loss', patience=10)

epochs=100

history = model.fit(training_set,validation_data=validation_se

t, epochs=epochs, callbacks = [callback])

Deep learning with TensorFlow and Keras

309

Evaluate TensorFlow Functional

network

Check the performance of the network using

the evaluate method.


model.evaluate(validation_set)

This prints the validation metrics for the five prediction

attributes.

Deep learning with TensorFlow and Keras

310

We can also plot the performance metrics of the network.

The history variable contains the metrics for the five attributes.

metrics_df = pd.DataFrame(history.history)

The metric names have the layer names we defined when


creating the network. This makes it easy to identify the layers in

this metrics DataFrame.

Using this DataFrame, we can plot some of these metrics. Let's

plot the hair accuracy.

Deep learning with TensorFlow and Keras

311

metrics_df[["dense_hair_sparse_categorical_accuracy","val_dense
_hair_sparse_categorical_accuracy"]].plot()

You can tune the network or change the architecture to get

better results.

Make predictions with Keras

Functional model

We download, process, and resize an image to make

predictions with the Keras Functional model. The image size


should be the same as the one used during training.

Deep learning with TensorFlow and Keras

312

image_url = "https://storage.googleapis.com/ango-covid-dataset/

ffhq-dataset/batch2/25000.png"

image_path = keras.utils.get_file('Face', origin=image_url)

test_image = keras.utils.load_img(

image_path, target_size=(image_size[0], image_size[1])


)

import matplotlib.pyplot as plt

plt.axis("off")

plt.imshow(test_image);

Next, convert the image to an array and expand the dimensions

to include the batch dimension.

Deep learning with TensorFlow and Keras

313

img_array = tf.keras.utils.img_to_array(test_image)

img_array = tf.expand_dims(img_array, 0)

img_array = img_array / 255.0

Next, pass this image to the model to get predictions.

predictions = model.predict(img_array)
predictions

We get five attributes. One for each of the face attributes.

We need to map the above numerical output to the various face

attributes to get the actual predictions. To achieve that, we use

the classes_ attributes to get the categories.

age_labelencoder.classes_

# array(['Blue', 'Brown', 'Green', 'Not sure', 'Not visible',

'Other'],

# dtype=object)

Deep learning with TensorFlow and Keras

314

Let's bundle all this into a function that will receive an image URL and do
the following:

Download the image.

Convert it to NumPy.

Rescale the image.

Expand the dimensions and add a batch dimension.

Run predictions on the image.

Obtain the predictions for each face attribute.

use tf.nn.softmax to process the prediction and map it to a


category.

Print the five predictions for the image.

def make_face_prediction(image_url):

import tensorflow as tf

import numpy as np

image_path = keras.utils.get_file('Face', origin=image_url)

test_image = keras.utils.load_img(

image_path, target_size=(image_size[0], image_size[1]))

img_array = tf.keras.utils.img_to_array(test_image)

img_array = tf.expand_dims(img_array, 0)

img_array = img_array / 255.0

predictions = model.predict(img_array)

age_predictions = predictions[0][0]

hair_predictions = predictions[1][0]

beard_predictions = predictions[2][0]

mustache_predictions = predictions[3][0]

eye_predictions = predictions[4][0]

age_scores = tf.nn.softmax(age_predictions).numpy()

hair_scores = tf.nn.softmax(hair_predictions).numpy()

Deep learning with TensorFlow and Keras


315

beard_scores = tf.nn.softmax(beard_predictions).numpy()

mustache_scores = tf.nn.softmax(mustache_predictions).numpy

()

eye_scores = tf.nn.softmax(eye_predictions).numpy()

print(f"Age: {list(age_labelencoder.classes_)[np.argmax(age

_scores)]} with a { (100 * np.max(age_scores)).round(2) } perce

nt confidence.")

print(f"Hair Color: {list(hair_labelencoder.classes_)[np.ar

gmax(hair_scores)]} with a { (100 * np.max(hair_scores)).round

(2) } percent confidence.")

print(f"Beard Color: {list(beard_labelencoder.classes_)[np.

argmax(beard_scores)]} with a { (100 * np.max(beard_scores)).ro

und(2) } percent confidence.")

print(f"Mustache Color: {list(mustache_labelencoder.classes

_)[np.argmax(mustache_scores)]} with a { (100 * np.max(mustache


_scores)).round(2) } percent confidence.")

print(f"Eye Color: {list(eye_labelencoder.classes_)[np.argm

ax(eye_scores)]} with a { (100 * np.max(eye_scores)).round(2) }

percent confidence.")

Let's run the function on an image.

make_face_prediction('https://storage.googleapis.com/ango-covid

-dataset/ffhq-dataset/batch2/25000.png')

Deep learning with TensorFlow and Keras

316

Keras Functional API strengths

and weaknesses

Keras Functional API is handy when designing non-linear

networks. If you think that you may need to convert a network

to a non-linear structure, then you should use the Functional

API. Some of the strengths of the Functional API include:

It is less verbose compared to subclassing

the Model class.

The requirement to create an Input ensures that all

Functional networks will run because passing the wrong

shape leads to an immediate error.


A Functional model is easier to plot and inspect.

Easy to serialize and save Functional models because

they are data structures.

However, one drawback of using the Functional API is that it

doesn't support dynamic architectures such as recursive

networks or Tree RNNs.

Functional API best practices

Keep best practices in mind while working with the Keras

Functional API:

Deep learning with TensorFlow and Keras

317

Always print a summary of the network to confirm that the

shapes of the various layers are as expected.

Plot the network to ensure the layers are connected as

you expect them.

Name the layers to make it easy to identify them in the

network plot and summary. For

example Conv2D(...,name="first_conv_layer") .

Use variable names that are related to the layers, for

example, conv1 and conv2 for convolution layers. This will


clarify the type of layer when inspecting plots and network

summary.

Instead of creating a custom training loop, use

the keras.Model to create models because it makes it

easier to train models via the fit method and evaluate

them with the evalaute method.

Final thoughts

In this article, you have discovered that you can design neural

networks in Keras using the Sequential API. In particular, we

have covered:

How to define Functional models in Keras.

How to train and evaluate Keras Sequential networks.

Defining Keras networks with multiple inputs and outputs.

How to plot and inspect Keras Sequential models.

Deep learning with TensorFlow and Keras

318

Feature extraction with Keras Sequential networks.

Building an end-to-end face classification Convolutional

Neural Network with Keras Sequential API.

How to create custom training


loops in Keras

Training models in Keras is usually done using the fit method.

However, you may want more control over the training process.

To do that, you'll need to create a custom training loop. This

involves setting up a custom function to compute the loss and

gradient.

This article will walk you through the process of doing that.

Let's get to it.

Obtain dataset

We'll use the Fashion MNIST dataset for this illustration and load it using
the Layer data loader.

# pip install layer

import layer

mnist_train = layer.get_dataset('layer/fashion_mnist/datasets/f

ashion_mnist_train').to_pandas()

mnist_test = layer.get_dataset('layer/fashion_mnist/datasets/fa

shion_mnist_test').to_pandas()

# Successfully logged into https://app.layer.ai as guest

# ⠴ fashion_mnist_train ━━━━━━━━━━ LOADED [0:00:10]

# ⠦ fashion_mnist_test ━━━━━━━━━━ LOADED [0:00:04]


Deep learning with TensorFlow and Keras

319

Here's is how the dataset looks like:

mnist_train["images"][17]

mnist_test["images"][23]

Data processing

Next, convert the cloth images to NumPy arrays.

import numpy as np

def images_to_np_array(image_column):

return np.array([np.array(im.getdata()).reshape((im.size

[1], im.size[0])) for im in image_column])

train_images = images_to_np_array(mnist_train.images)
test_images = images_to_np_array(mnist_test.images)

train_labels = mnist_train.labels

test_labels = mnist_test.labels

Deep learning with TensorFlow and Keras

320

Scaling data in deep learning is a common practice because

weights and biases of the network are initialized to small

numbers between 0 and 1. We, therefore, have to scale the

image data.

train_images = train_images / 255.0

test_images = test_images / 255.0

train_images.shape

# (60000, 28, 28)

The neural network expects the above dataset to be in a

specific shape. When training models with Keras, we pass the

shape as image_width , image_height , number_of_channels . In the

shape printed above, we see that the number_of_channels is

missing. We need to add that. Failure to do this will result in an

error similar to:

ValueError: Exception Input 0 of layer "conv2d" is incompatible


with the layer: expected min_ndim=4, found ndim=3.

To avoid that, expand the dimensions.

# Make sure images have shape (28, 28, 1)

train_images = np.expand_dims(train_images, -1)

test_images = np.expand_dims(test_images, -1)

train_images.shape

# (60000, 28, 28, 1)

Batch the dataset

Deep learning with TensorFlow and Keras

321

Next, let's define the number of images that will be passed to the network.
32 is a common choice, but this number can be

changed. Let's create batches out of the training images.

Passing images in batches also makes training faster. We start

by creating a tf.dataset with the from_tensor_slices method, then add the


batch size.

ds_train_batch = tf.data.Dataset.from_tensor_slices((train_imag

es, train_labels))

training_data = ds_train_batch.batch(32)

ds_test_batch = tf.data.Dataset.from_tensor_slices((test_image

s, test_labels))
testing_data = ds_test_batch.batch(32)

How to create model with

custom layers in Keras

Custom layers in TensorFlow are created by

inheriting tf.keras.Layer and

implementing __init__ , build and call .

class MyDenseLayer(tf.keras.layers.Layer):

def __init__(self, num_outputs):

super(MyDenseLayer, self).__init__()

self.num_outputs = num_outputs

def build(self, input_shape):

self.kernel = self.add_weight("kernel",

shape=[int(input_shape[-1]),

self.num_outputs])

def call(self, inputs):

Deep learning with TensorFlow and Keras

322

return tf.matmul(inputs, self.kernel)

layer = MyDenseLayer(10)

A better way to create custom layers is to


inherit keras.Model because it avails

the Model.fit , Model.evaluate , and Model.save methods. Let's

create a custom block with the following layers:

Convolution.

Max pooling.

Flatten.

DropOut.

Dense.

parameters = {"shape":28, "activation": "relu", "classes": 10,

"units":12, "optimizer":"adam", "epochs":100,"kernel_size":


3,"pool_size":2, "dropout":0.5}

class CustomBlock(tf.keras.Model):

def __init__(self, filters):

super(CustomBlock, self).__init__(name='')

filters1, filters2 = filters

self.conv2a = layers.Conv2D(filters=filters1,input_shape=(2

8,28,1), kernel_size=(parameters["kernel_size"], parameters["ke


rnel_size"]), activation=parameters["activation"])

self.maxpool1a = layers.MaxPooling2D(pool_size=(parameters

["pool_size"], parameters["pool_size"]))

self.conv2b = layers.Conv2D(filters2, kernel_size=(paramete


rs["kernel_size"], parameters["kernel_size"]), activation=param
eters["activation"])

Deep learning with TensorFlow and Keras

323

self.maxpool2b = layers.MaxPooling2D(pool_size=(parameters

["pool_size"], parameters["pool_size"]))

self.flatten1a = layers.Flatten()

self.dropout1a = layers.Dropout(parameters["dropout"])

self.dense1a = layers.Dense(parameters["classes"], activati

on="softmax")

def call(self, input_tensor):

x = self.conv2a(input_tensor)

x = tf.nn.relu(x)

x = self.maxpool1a(x)

x = self.conv2b(x)

x = tf.nn.relu(x)

x = self.maxpool2b(x)

x = self.flatten1a(x)

x = self.dropout1a(x)

x = self.dense1a(x)
return tf.nn.softmax(x)

Let's initialize the model and check the layers and variables.

model = CustomBlock([32,64])

input_shape = (1, 28, 28, 1)

x = tf.random.normal(input_shape)

_ = model(x)

x.shape

# TensorShape([1, 28, 28, 1])

model.layers

len(model.variables)

Deep learning with TensorFlow and Keras

324

We can also visualize the model's summary.


model.summary()

Deep learning with TensorFlow and Keras

325
The model can be used to make predictions even before

training. Obviously, the results won't be good.

Deep learning with TensorFlow and Keras

326

Define the loss function

The next step is to define the loss function. We use

SparseCategoricalCrossentropy because the labels are integers. If

labels are one-hot encoded CategoricalCrossentropy is used

instead. The goal is to reduce the errors between the true and

predicted values. The SparseCategoricalCrossentropy function

takes probability predictions and returns the average loss.

loss_object = tf.keras.losses.SparseCategoricalCrossentropy()

def loss(model, x, y, training):

# training=training is needed only if there are layers with d

ifferent
# behavior during training versus inference (e.g. Dropout).

y_ = model(x, training=training)

return loss_object(y_true=y, y_pred=y_)

l = loss(model, test_images, test_labels, training=False)

print("Loss test: {}".format(l))

Define the gradients function

Deep learning with TensorFlow and Keras

327

The gradient is computed using the tf.GradientTape function. It calculates


the gradient of the loss with respect to the model

trainable variables. The tape records operations in the forward

pass and uses this information to compute gradients on the

backward pass.

def grad(model, inputs, targets):

with tf.GradientTape() as tape:

loss_value = loss(model, inputs, targets, training=True)

return loss_value, tape.gradient(loss_value, model.trainable_

variables)

Create an optimizer

An optimizer function uses the computed gradients to adjust the


model weights and biases to minimize the loss. This iterative

process aims to find the model parameters that result in the

least error. We apply the common Adam optimizer function.

optimizer = tf.keras.optimizers.Adam()

Create custom training loop

The training loop feeds the training images to the network while

computing the metrics. We use the SparseCategoricalAccuracy to

compute the accuracy because the labels are integers. If labels

are one-hot encoded, the CategoricalAccuracy is used. We

Deep learning with TensorFlow and Keras

328

use tqdm to display a progress bar of the training process. The

training process involves the following steps:

Pass the training data to the network for one epoch.

Obtain the training images and labels for each batch.

Run predictions using the network and compare the result

with the true values.

Update model parameters using the Adam optimizer.

Track the training metrics for visualization later.

Repeat the process for the specified number of epochs.


from tqdm.notebook import trange

## Note: Rerunning this cell uses the same model parameters

# Keep results for plotting

train_loss_results = []

train_accuracy_results = []

num_epochs = 10

for epoch in trange(num_epochs):

epoch_loss_avg = tf.keras.metrics.Mean()

epoch_accuracy = tf.keras.metrics.SparseCategoricalAccuracy()

# Training loop - using batches of 32

for x, y in training_data:

# Optimize the model

loss_value, grads = grad(model, x, y)

optimizer.apply_gradients(zip(grads, model.trainable_variab

les))

# Track progress

epoch_loss_avg.update_state(loss_value) # Add current batc

h loss

# Compare predicted label to actual label

Deep learning with TensorFlow and Keras


329

# training=True is needed only if there are layers with dif

ferent

# behavior during training versus inference (e.g. Dropout).

epoch_accuracy.update_state(y, model(x, training=True))

# End epoch

train_loss_results.append(epoch_loss_avg.result())

train_accuracy_results.append(epoch_accuracy.result())

print("Epoch {}: Loss: {:.3f}, Accuracy: {:.3%}".format(epoch

+ 1,

epoch_loss_avg.result(),

epoch_accuracy.result()))

Visualize the loss

Next, visualize the training loss and accuracy with Matplotlib.

fig, axes = plt.subplots(2, sharex=True, figsize=(12, 8))


fig.suptitle('Training Metrics')

axes[0].set_ylabel("Loss", fontsize=14)

axes[0].plot(train_loss_results)

axes[1].set_ylabel("Accuracy", fontsize=14)

axes[1].set_xlabel("Epoch", fontsize=14)

axes[1].plot(train_accuracy_results)

plt.show()

Deep learning with TensorFlow and Keras

330

Evaluate model on test dataset


To evaluate the network's performance, we loop through the

test data, make predictions and compare them with the true

values. tf.math.argmax returns the axis of the largest predicted

value.

test_accuracy = tf.keras.metrics.Accuracy()

for (x, y) in testing_data:

# training=False is needed only if there are layers with diff

erent

# behavior during training versus inference (e.g. Dropout).

logits = model(x, training=False)

Deep learning with TensorFlow and Keras

331

prediction = tf.math.argmax(logits, axis=1, output_type=tf.in t64)

test_accuracy(prediction, y)

print("Test set accuracy: {:.3%}".format(test_accuracy.result

()))

# Test set accuracy: 87.870%

Use the trained model to make

predictions

Let's use the trained model to make predictions on new cloth


images and print the prediction. The model outputs logits which

we pass to the tf.nn.softmax . This ensures that the sum of all

the outputs sums to 1. We, therefore, take the maximum value

as the predicted value. Obtaining the index of the maximum

and mapping it to categories gives the predicted class.

# training=False is needed only if there are layers with differ

ent

# behavior during training versus inference (e.g. Dropout).

predictions = model(test_images[0:5], training=False)

class_names = ["T-shirt/top","Trouser","Pullover","Dress","Coa
t","Sandal","Shirt","Sneaker","Bag","Ankle boot"]

for i, logits in enumerate(predictions):

class_idx = tf.math.argmax(logits).numpy()

p = tf.nn.softmax(logits)[class_idx]

name = class_names[class_idx]

print("Image {} prediction: {} ({:4.1f}%)".format(i, name, 10

0*p))

Final thoughts

Deep learning with TensorFlow and Keras

332
You have now learned to create custom layers and training

loops in Keras. This helps understand the underlying

processing that happens when you call the fit method from

Keras. It's also important if you want more fine-grained control

of the network's training process. Specifically, we have seen

that creating custom training loops involves:

Design the network using custom layers or using the Keras

built-in layers.

Creating custom loss functions.

Building a custom function to compute model gradients.

Defining the optimizer function.

Creating the custom loop function that utilizes the loss and

gradient functions.

How to train deep learning

models on Apple Silicon GPU

Training deep learning models on GPUs is faster than training

on the CPU, especially when training large image or language

models. You can train deep learning models with GPUs on

Apple Silicon devices.

To train deep learning models with GPU on Apple Silicon


devices, you need to install the TensorFlow or PyTorch

versions that are compatible with Apple Silicon devices.

Deep learning with TensorFlow and Keras

333

The next time you train a deep learning model, the training will be on
the GPU by default.

Training deep learning models

on Apple Silicon

In Apple Silicon rendering of advanced graphics in GPUs is

done by Metal. The Metal framework gives apps direct access

to GPU on Mac computers. You can leverage this framework

to:

Render games in complex environments.

Process large datasets.

Video editing.

Training deep learning models on the GPU.

Deep learning with TensorFlow and Keras

334

The Metal Performance Shaders framework provides a

large library of optimized compute and rendering shader


that take advantage of each GPU’s unique hardware. Th

Metal Performance Shaders framework contains a

collection of highly optimized compute and graphics

shaders that are designed to integrate easily and efficien

into your Metal app. These data-parallel primitives are

specially tuned to take advantage of the unique hardwar

characteristics of each GPU family to ensure optimal

performance. https://developer.apple.com/documentatio

TensorFlow introduced the PluggableDevice architecture to

enable the running of TensorFlow on new devices without

changing TensorFlow code. The PluggableDevice architecture

allows integration with new devices without changing

TensorFlow core code.

TensorFlow

Training TensorFlow deep learning networks on Apple Silicon is

done through the PluggableDevice architecture. We, therefore,

install the compatible TensorFlow version and the Tensorflow-

metal PluggableDevice.

Install Tensorflow-metal

PluggableDevice
Deep learning with TensorFlow and Keras

335

Let's walk through the installation instructions. Start by

installing Conda and creating a virtual environment where

TensorFlow will be installed. For this to work, ensure that you

are using Python 3.8 or 3.9.

chmod +x ~/Downloads/Miniforge3-MacOSX-arm64.sh

sh ~/Downloads/Miniforge3-MacOSX-arm64.sh

source ~/miniforge3/bin/activate

Next, install the TensorFlow dependencies in this environment:

conda install -c apple tensorflow-deps

Install TensorFlow:

python -m pip install tensorflow-macos

Finally, install the tensorflow-metal plugin.

python -m pip install tensorflow-metal

You are now ready to train deep learning models on Apple

Silicon with TensorFlow.

Train TensorFlow model on

Apple Silicon GPU

Deep learning with TensorFlow and Keras


336

After performing the above instructions, deep learning models

will be trained by default on the GPU.

You can check the activity monitor to confirm that Python is

using the GPU.

PyTorch

As of this writing, you must install the Preview (Nightly) build to train the
PyTorch model on Apple Silicon GPUs. This will be

stable in the PyTorch v1.12 release.

conda install pytorch torchvision -c pytorch-nightly

To train PyTorch models on GPUs on Apple Silicon, set Metal

Performance Shaders (MPS) as the backend.

device = torch.device("mps")

print(f"Using {device} device")

# Using mps device


Deep learning with TensorFlow and Keras

337

Final thoughts

The repo below has example TensorFlow and PyTorch

notebooks showing how to train deep learning models in Apple

Silicon. Leave a comment below if you have any challenges

setting up PyTorch and TensorFlow on Apple Silicon.

Object detection with

TensorFlow 2 Object detection

API

Building object detection and image segmentation models is

slightly different from other models. Majorly because you have

to use specialized models and prepare the data in a particular

way. This article will examine how to perform object detection

and image segmentation on a custom dataset using the

TensorFlow 2 Object Detection API.

Let's dive right in!

Object detection datasets

Deep learning with TensorFlow and Keras

338
In this article, we'll use the Coco Car Damage Detection

Dataset available on Kaggle. It contains car images with

damages. It can be used to train a model to detect damages on

cars and car parts. The dataset has already been annotated,

and the corresponding COCO files are provided.

Preparing datasets for object

detection

If you have a custom dataset you'd like to use, then you have to

do the labeling and annotation yourself. There are many tools

and online platforms that can help you achieve this. If you

would like to stick to open source, Labelme is an excellent alternative.

The video below shows how to create polygons on the car

dataset. After completing an annotation, you will have to save

it. Once you save it, Labelme will store the resulting JSON file

in the same folder as the data.

Deep learning with TensorFlow and Keras

339
If you are looking for an online tool, here are some platforms

that I have interacted with:

Roboflow Universe provides numerous object detection

and image segmentation datasets. You can search the

platform and switch the car images dataset. If you choose

that route, download the TFRecord format from the

platform. If you have a custom dataset, you can also

perform the annotation on Roboflow.

Ango AI provides some public datasets to kickstart your classification and


object detection projects. They also offer

a platform that you can use to label and annotate the

images.
Segments AI lists some object detection and image segmentation datasets
that you can clone into your

projects. You can also perform annotation on their platform.

Deep learning with TensorFlow and Keras

340

What is TensorFlow 2 Object

Detection API?

The TensorFlow Object Detection API is an open-source

computer vision framework for building object detection and

image segmentation models that can localize multiple objects in

the same image. The framework works for both TensorFlow 1

and 2. Users are, however, encouraged to use the TF 2 version

because it contains new architectures.

Some of the architectures and models that TensorFlow 2 Object

Detection API supports include:

CenterNet

EfficientDet

SSD MobileNet

SSD ResNet

Faster R-CNN
ExtremeNet

Mask RCNN

The models can be downloaded from the TensorFlow 2

Detection Model Zoo. You need their corresponding config files

to train one of the object detection models from scratch. In this

project, we'll use the Mask RCNN model, but you can also try

the other models.

Deep learning with TensorFlow and Keras

341

Install TensorFlow 2 Object

Detection API on Google Colab

At this point, you now have an object detection dataset. Either

the car images data and the corresponding COCO JSON files

or a dataset you have created yourself or downloaded

somewhere.

We will run this project on Google Colab to utilize free GPU

resources for training the model. Let's install the TensorFlow 2

Object Detection API on Colab. The first step is to clone the TF

2 Object Detection GitHub repo:

!git clone https://github.com/tensorflow/models.git


Next, run these commands to install TF 2 Object Detection API

on Colab:

%%bash

cd models/research

# Compile protos.

protoc object_detection/protos/*.proto --python_out=.

# Install TensorFlow Object Detection API.

cp object_detection/packages/tf2/setup.py .

python -m pip install --use-feature=2020-resolver .

Deep learning with TensorFlow and Keras

342

Install TensorFlow 2 Object

Detection API locally

If you'd like to use the API locally, the developers recommend

that you install it using Docker:

# From the root of the git repository

docker build -f research/object_detection/dockerfiles/tf2/Docke

rfile -t od .

docker run -it od

Next, import the Object Detection API plus a couple of other


common data science packages. If you are able to import the

Object Detection package, it means that the installation ran

successfully.

import matplotlib

import matplotlib.pyplot as plt

import os

import random

import io

import imageio

import glob

import scipy.misc

import numpy as np

from six import BytesIO

from PIL import Image, ImageDraw, ImageFont

from IPython.display import display, Javascript

from IPython.display import Image as IPyImage

import tensorflow as tf

from google.colab import files

import pathlib

import itertools
import random

Deep learning with TensorFlow and Keras

343

from object_detection.utils import ops as utils_ops

from object_detection.utils import label_map_util

from object_detection.utils import visualization_utils as viz_u

tils

from object_detection.utils import config_util

from object_detection.utils import colab_utils

from object_detection.builders import model_builder

%matplotlib inline

Download object detection

dataset

The dataset and the config file for the model we'll be training

can be downloaded from this GitHub repo. You have to make

some changes after you download the config from the object

detection repo. We'll discuss those changes in a moment.

!git clone https://github.com/mlnuggets/maskrcnn.git

Download Mask R-CNN model

The next step is to download the Mask R-CNN model that we'll
fine tune. Extract the file to get the trained model checkpoint.

import wget

model_link = "http://download.tensorflow.org/models/object_dete

ction/tf2/20200711/mask_rcnn_inception_resnet_v2_1024x1024_coco

17_gpu-8.tar.gz"

wget.download(model_link)

Deep learning with TensorFlow and Keras

344

import tarfile
tar = tarfile.open('/content/mask_rcnn_inception_resnet_v2_1024

x1024_coco17_gpu-8.tar.gz')

tar.extractall('.')

tar.close()

The compressed file also contains the model's configuration

file. You will always have to edit this file after downloading each

model.

Let's look at the items in the configuration file that you need to

update.

Deep learning with TensorFlow and Keras

345

Edit the object detection

pipeline config file

The config file you'll get after cloning this repo has been edited

to run smoothly on Google Colab. If you are running this project

elsewhere you'll need to update the file paths. In a nutshell,

here are the items you need to update after downloading the

Mask RCNN config file from the TensorFlow 2 Object Detection

API repo:

num_classes to 5 because the dataset has 5


classes, headlamp , front_bumper , hood , door ,

and rear_bumper .

image_resizer to 512 from 1024 reducing the size of the

images hence reducing training time.

num_steps to 1000 from 200000 to reduce the training time.

The more the steps the longer it will take to train the model.

You can increase the steps if the loss is still decreasing and

validation metrics are going up.

batch_size = 1 to dictate the number of images to be fed in

memory while training.

fine_tune_checkpoint to point to the path of the Mask R-CNN

model downloaded above. This ensures that we are not

training the model from scratch.

fine_tune_checkpoint_type to detection from classification si

we are training an object detection model.

Deep learning with TensorFlow and Keras

346

train_input_reader to point to the label_map_path and path to the


TFRecords. More on TF Records later.

eval_input_reader is the same as train_input_reader but for


the test data.

Convert the images to

TFRecords

The object detection models expect the images to be

in TFRecord format. Fortunately, the TensorFlow 2 Object

Detect API repo provides a script for performing the conversion.

The script takes the following arguments:

Directory of the training images.

Folder containing the test images.

File containing training image annotations.

File containing test image annotation.

Directory where the generated TFRecords should be

stored.

!python /content/maskrcnn/create_coco_tf_record.py --logtostder

r\

--train_image_dir=/content/maskrcnn/data/train \

--test_image_dir=/content/maskrcnn/data/val \

--train_annotations_file=/content/maskrcnn/data/train/COC

O_mul_train_annos.json \

--test_annotations_file=/content/maskrcnn/data/val/COCO_m
ul_val_annos.json \

--output_dir=/content/maskrcnn/data/tf_records

Deep learning with TensorFlow and Keras

347

Train the model

You now have everything you need to train this Mask R-CNN

object detection model. The next step is to run the training

script. The model training script takes the following arguments:

pipeline_config_path the path to the updated model

configuration file.

model_dir the directory where the trained model will be

saved.
!python /content/models/research/object_detection/model_main_tf

2.py\

--pipeline_config_path=/content/maskrcnn/mask_rcnn_inceptio

n_resnet_v2_1024x1024_coco17_gpu-8-colab.config \

--model_dir=/content/training \

--alsologtostderr

Deep learning with TensorFlow and Keras

348

You might get an OpenCV error on Colab. This error can be fixed by
installing the right version of

OpenCV.

pip uninstall opencv-python-headless==4.5.5.62

pip install opencv-python-headless==4.5.2.52

If you get a cuDNN error, you can fix it by installing the right version of
cuDNN.

!apt install --allow-change-held-packages libcudnn8=8.1.0.77-1+

cuda11.2

Model evaluation and

visualization

When training is complete you can run TensorBoard to show the


visualization of the training and testing metrics such as the
localization loss.

Deep learning with TensorFlow and Keras

349

%load_ext tensorboard

%tensorboard --logdir '/content/training/train'

Run conversion script

The next step is to export the model for inference. The

conversion script expects:

trained_checkpoint_dir the last checkpoint of the trained

model.

output_directory where the exported model will be saved.

pipeline_config_path the path to the pipeline configuration

file.
The conversion script will output checkpoint files,

a SavedModel, and the model config file.

Deep learning with TensorFlow and Keras

350

Download model from Google

Colab
You may want to download the converted model or trained

model. That can be done by zipping the files and using Colab

utilities to download the compressed file.

Deep learning with TensorFlow and Keras

351

!zip -r /content/maskrcnn.zip /content/training

!zip -r /content/fine_tuned_model.zip /content/finetuned-maskrc

nn

files.download("/content/maskrcnn.zip")

files.download("/content/fine_tuned_model.zip")

Object detection with Mask R-

CNN

It's now time to use the trained Mask R-CNN model to perform

object detection on test car images. Luckily, the TensorFlow 2

Object Detection API provides all the utilities needed to do this.

The first one is a function that loads an image and converts it

into a NumPy array.

Load an image from file into a

NumPy array

The function expects a path to an image file and returns


a NumPy array.

def load_image_into_numpy_array(path):

"""Load an image from file into a numpy array.

Puts image into numpy array to feed into tensorflow graph.

Note that by convention we put it into a numpy array with sha

pe

(height, width, channels), where channels=3 for RGB.

Args:

path: a file path.

Returns:

Deep learning with TensorFlow and Keras

352

uint8 numpy array with shape (img_height, img_width, 3)

"""

img_data = tf.io.gfile.GFile(path, 'rb').read()

image = Image.open(BytesIO(img_data))

(im_width, im_height) = image.size

return np.array(image.getdata()).reshape(

(im_height, im_width, 3)).astype(np.uint8)

Visualize detections
The next utility is a function for plotting the detections

using Matplotlib.

def plot_detections(image_np,

boxes,

classes,

scores,

category_index,

figsize=(12, 16),

image_name=None):

"""Wrapper function to visualize detections.

Args:

image_np: uint8 numpy array with shape (img_height, img_wid

th, 3)

boxes: a numpy array of shape [N, 4]

classes: a numpy array of shape [N]. Note that class indice

s are 1-based,

and match the keys in the label map.

scores: a numpy array of shape [N] or None. If scores=Non

e, then

this function assumes that the boxes to be plotted are gr


oundtruth

boxes and plot all boxes as black with no classes or scor

es.

category_index: a dict containing category dictionaries (ea

ch holding

category index ìdànd category namènamè) keyed by ca

tegory indices.

Deep learning with TensorFlow and Keras

353

figsize: size for the figure.

image_name: a name for the image file.

"""

image_np_with_annotations = image_np.copy()

viz_utils.visualize_boxes_and_labels_on_image_array(

image_np_with_annotations,

boxes,

classes,

scores,

category_index,

use_normalized_coordinates=True,
min_score_thresh=0.8)

if image_name:

plt.imsave(image_name, image_np_with_annotations)

else:

plt.imshow(image_np_with_annotations)

Create model from the last

checkpoint

Let's now create a detection model from the last saved model

checkpoint.

filenames = list(pathlib.Path('/content/training/').glob('*.ind

ex'))

filenames.sort()

print(filenames)

#recover our saved model

model_dir = '/content/training/'

#generally you want to put the last ckpt from training in here

configs = config_util.get_configs_from_pipeline_file(pipeline_f

ile)

model_config = configs['model']

detection_model = model_builder.build(
model_config=model_config, is_training=False)

Deep learning with TensorFlow and Keras

354

# Restore checkpoint

ckpt = tf.compat.v2.train.Checkpoint(

model=detection_model)

ckpt.restore(os.path.join(str(filenames[-1]).replace('.inde

x','')))

def get_model_detection_function(model):

"""Get a tf.function for detection."""

@tf.function

def detect_fn(image):

"""Detect objects in image."""

image, shapes = model.preprocess(image)

prediction_dict = model.predict(image, shapes)

detections = model.postprocess(prediction_dict, shapes)

return detections, prediction_dict, tf.reshape(shapes, [-

1])

return detect_fn

detect_fn = get_model_detection_function(detection_model
Map labels for inference

decoding

Next, we declare variables that are important for decoding the

model output. For instance, the categories and file containing

the training categories.

#map labels for inference decoding

label_map_path = configs['eval_input_config'].label_map_path

label_map = label_map_util.load_labelmap(label_map_path)

Deep learning with TensorFlow and Keras

355

categories = label_map_util.convert_label_map_to_categories(

label_map,

max_num_classes=label_map_util.get_max_label_map_index(labe

l_map),

use_display_name=True)

category_index = label_map_util.create_category_index(categorie

s)

label_map_dict = label_map_util.get_label_map_dict(label_map, u

se_display_name=True)

Run detector on test image


The next step is to run the Mask R-CNN object detection model

on some test images.

#run detector on test image

#it takes a little longer on the first run and then runs at nor

mal speed.

TEST_IMAGE_PATHS = glob.glob('/content/maskrcnn/data/test/*.jp

g')

image_path = random.choice(TEST_IMAGE_PATHS)

image_np = load_image_into_numpy_array(image_path)

input_tensor = tf.convert_to_tensor(

np.expand_dims(image_np, 0), dtype=tf.float32)

detections, predictions_dict, shapes = detect_fn(input_tensor)

label_id_offset = 1

image_np_with_detections = image_np.copy()

viz_utils.visualize_boxes_and_labels_on_image_array(

image_np_with_detections,

detections['detection_boxes'][0].numpy(),

(detections['detection_classes'][0].numpy() + label_id_of

fset).astype(int),

detections['detection_scores'][0].numpy(),
category_index,

use_normalized_coordinates=True,

Deep learning with TensorFlow and Keras

356

max_boxes_to_draw=200,

min_score_thresh=.5,

agnostic_mode=False,

)
plt.figure(figsize=(12,16))

plt.imshow(image_np_with_detections)

plt.axis("off")

plt.show()

Image segmentation with Mask

R-CNN

Deep learning with TensorFlow and Keras

357

The Mask R-CNN object detection model can be used for both

object detection and image segmentation. Let's start by loading

the fined tuned model.

def load_model(model_dir):

model = tf.saved_model.load(str(model_dir))

return model

model_dir = '/content/finetuned-maskrcnn/saved_model'

masking_model = load_model(model_dir)

Set label map

The model also needs labels for decoding the output.

# List of the strings that is used to add correct label for eac

h box.
PATH_TO_LABELS = '/content/maskrcnn/data/labelmap.pbtxt'

category_index = label_map_util.create_category_index_from_labe

lmap(PATH_TO_LABELS, use_display_name=True)

Set test image paths

The next step is to define the path to the test images. In this

case, we'll use all the test images because they aren't that

many.

PATH_TO_TEST_IMAGES_DIR = pathlib.Path('/content/maskrcnn/data/

test')

Deep learning with TensorFlow and Keras

358

TEST_IMAGE_PATHS =
sorted(list(PATH_TO_TEST_IMAGES_DIR.glob("*.

jpg")))

Create inference function

The segmentation utility is also provided on the TensorFlow 2

Object Detection API repo.

def run_inference_for_single_image(model, image_np):

print("Running inference for : ",image_path)

# The input needs to be a tensor, convert it using `tf.conv


ert_to_tensor`.

input_tensor = tf.convert_to_tensor(image_np)

# The model expects a batch of images, so add an axis with

`tf.newaxis`.

input_tensor = input_tensor[tf.newaxis, ...]

detections = model(input_tensor)

# All outputs are batches tensors.

# Convert to numpy arrays, and take index [0] to remove the

batch dimension.

# We're only interested in the first num_detections.

num_detections = int(detections.pop("num_detections"))

detections = dict(itertools.islice(detections.items(), num_

detections))

detections["num_detections"] = num_detections

image_np_with_detections = image_np.copy()

# Handle models with masks:

if "detection_masks" in detections:

# Reframe the the bbox mask to the image size.

Deep learning with TensorFlow and Keras

359
detection_masks_reframed = utils_ops.reframe_box_masks_

to_image_masks(

detections["detection_masks"][0], detections["det

ection_boxes"][0],

image_np.shape[0], image_np.shape[1])

detection_masks_reframed = tf.cast(detection_masks_refr

amed > 0.5,

tf.uint8)

detections["detection_masks_reframed"] = detection_mask

s_reframed.numpy()

boxes = np.asarray(detections["detection_boxes"][0])

classes = np.asarray(detections["detection_classes"][0]).as

type(np.int64)

scores = np.asarray(detections["detection_scores"][0])

mask = np.asarray(detections["detection_masks_reframed"])

# Visualizing the results

viz_utils.visualize_boxes_and_labels_on_image_array(

image_np_with_detections,

boxes,

classes,
scores,

category_index,

instance_masks=mask,

use_normalized_coordinates=True,

line_thickness=3)

# Display image with detections and segmented parts

display(Image.fromarray(image_np_with_detections))

Perform segmentation and

detection

The next step is to load an image as a NumPy array and use

the above function to start detecting objects.

Deep learning with TensorFlow and Keras

360
def show_inference(model, image_path):

# Load image

image_np = np.array(Image.open(image_path))

# Actual detection.

output_dict = run_inference_for_single_image(model, image_n

p)

for image_path in TEST_IMAGE_PATHS:

show_inference(masking_model, image_path)

Final thoughts

In this article, we have seen how you can train an object

detection model using the TensorFlow 2 Object Detection API.

More specifically, we covered:


Deep learning with TensorFlow and Keras

361

Dataset preparation for object detection tasks.

The TensorFlow 2 Object Detection API.

How to install the TensorFlow Object Detection API locally

and on Google Colab.

Setting the configurations for the Mask R-CNN object

detection model.

Converting images into TFRecord format.

Training an object detection model.

Evaluating the object detection model.

Using the Mask R-CNN for object detection.

Image segmentation with the Mask R-CNN model.

Click the Colab link to try the project from start to finish. You can also
replace the dataset with another one. If you change

the model, remember to edit the model configuration file.

Always ensure that the paths in the configuration file point to

the right locations.

Appendix

This book is provided in line with our terms and privacy policy.
Disclaimer

The information in this eBook is not meant to be applied as is in

Deep learning with TensorFlow and Keras

362

a production

environment. By applying it to a production environment, you

take full responsibility

for your actions.

The author has made every effort to ensure the accuracy of the

information within

this book was correct at the time of publication. The author

does not assume and

hereby disclaims any liability to any party for any loss, damage,

or disruption caused

by errors or omissions, whether such errors or omissions result

from accident,

negligence, or any other cause.

No part of this eBook may be reproduced or transmitted in any

form or by any

means, electronic or mechanical, recording or by any


information storage and

retrieval system, without written permission from the author.

Copyright

Deep learning with TensorFlow and Keras

© Copyright Derrick Mwiti. All Rights Reserved.

Other things to learn

Deep learning with TensorFlow and Keras

363

Learn Python

Learn data science

Learn Streamlit

Deep learning with TensorFlow and Keras

364

You might also like