0% found this document useful (0 votes)
27 views

Deep Learning Project for Computer Vision with Python 2022

Uploaded by

cristinaliman
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views

Deep Learning Project for Computer Vision with Python 2022

Uploaded by

cristinaliman
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 297

About the Authors

Tony Snake was received Bachelor of Computer Science from the American
University, and Bachelor of Business Administration from the American
University, USA.
He is becoming Ph.D. Candidate of Department of Data Informatics,
(National) Korea Maritime and Ocean University, Busan 49112, Republic of
Korea (South Korea).
His research interests are social network analysis, big data, AI and robotics.
He received Best Paper Award the 15th International Conference on
Multimedia Information Technology and Applications (MITA 2019)

Table of Contents
Contents
About the Authors
Table of Contents
Deep Learning Project for Computer Vision with Python
PART 1: Satellite Image Classification using TensorFlow in Python
Getting Started
Preparing the Dataset
Building the Model
Fine-tuning the Model
Model Evaluation
Final Thoughts
Source Code:
PART 2: Age and Gender Detection using OpenCV in Python
Source Code:
PART 3: Gender Detection using OpenCV in Python
Pre-requisites
Conclusion
Source Code:
PART 4: Age Detection using OpenCV in Python
Wrap-up
Source Code:
PART 5: SIFT Feature Extraction using OpenCV in Python
Scale-space Extrema Detection
Keypoint Localization
Orientation Assignment
Keypoint Description
Python Implementation
Conclusion
Source Code:
PART 6: How to Apply HOG Feature Extraction in Python
Resizing the Image
Calculating Gradients
Calculating the Magnitude
Calculating the Orientation
Python Code
Conclusion
Source Code:
PART 7: Image Transformations using OpenCV in Python
Introduction
The Use of Image Transformation
Image Translation
Image Scaling
Image Shearing
Shearing in the x-axis Direction
Shearing in the y-axis Direction
Image Reflection
Image Rotation
Image Cropping
Conclusion
Source Code:
PART 8: How to Make a Barcode Reader in Python
Conclusion
Source Code:
PART 9: How to Perform Malaria Classification using TensorFlow 2 and Keras in Python
Downloading the Dataset
Image Preprocessing with OpenCV
Preparing and Normalizing the Dataset
Implementing the CNN Model Architecture
Model Evaluation
Saving the model
Source Code:
PART 10: Skin Cancer Detection using TensorFlow in Python
Preparing the Dataset
Building the Model
Training the Model
Model Evaluation
Sensitivity
Specificity
Receiver Operating Characteristic
Conclusion
Source Code:
PART 11: Use K-Means Clustering for Image Segmentation using OpenCV in Python
Want to Learn More?
Source Code:
PART 12: Detect Contours in Images using OpenCV in Python
Source Code:
PART 13: Optical Character Recognition (OCR) in Python
Source Code:
PART 14: Detect Shapes in Images in Python using OpenCV
Detecting Lines
Detecting Circles
Source Code:
PART 15: Perform Edge Detection in Python using OpenCV
Source Code:
PART 16: Use Transfer Learning for Image Classification using TensorFlow in Python
What is Transfer Learning
Loading & Preparing the Dataset
Constructing the Model
Training the Model
Testing the Model
Conclusion
Source Code:
PART 17: Generate and Read QR Code in Python
Generate QR Code
Read QR Code
Source Code:
PART 18: Make an Image Classifier in Python using Tensorflow 2 and Keras
Hyper Parameters
Understanding and Loading CIFAR-10 Dataset
Constructing the Model
Training the Model
Testing the Model
Conclusion
Source Code:
PART 19: Face Detection using OpenCV in Python
Face Detection using Haar Cascades
Face Detection using SSDs
Source Code:
Summary

Deep Learning Project for


Computer Vision with Python
PART 1: Satellite Image Classification
using TensorFlow in Python
Learn how to fine-tune the current state-of-the-art EffecientNet V2 model to perform image
classification on satellite data (EuroSAT) using TensorFlow in Python.

Satellite image classification is undoubtedly crucial for many applications in


agriculture, environmental monitoring, urban planning, and more.
Applications such as crop monitoring, land and forest cover mapping are
emerging to be utilized by governments and companies, and labs for real-
world use.

In this tutorial, you will learn how to build a satellite image classifier using
the TensorFlow framework in Python.

We will be using the EuroSAT dataset based on Sentinel-2 satellite images


covering 13 spectral bands. It consists of 27,000 labeled samples of 10
different classes: annual and permanent crop, forest, herbaceous vegetation,
highway, industrial, pasture, residential, river, and sea lake.

EuroSAT dataset comes in two varieties:

rgb (default) with RGB that contain only the R, G, B frequency bands
encoded as JPEG images.
all : contains all 13 bands in the original value range.
Getting Started
To get started, let's install TensorFlow and some other helper tools:

$ pip install tensorflow tensorflow_addons tensorflow_datasets


tensorflow_hub numpy matplotlib seaborn sklearn

We use tensorflow_addons to calculate the F1 score during the training of the


model.

We will use the EfficientNetV2 model which is the current state of the art on
most image classification tasks. We use tensorflow_hub to load this pre-trained
CNN model for fine-tuning.

Preparing the Dataset


Importing the necessary libraries:

import os

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import tensorflow as tf
import tensorflow_datasets as tfds
import tensorflow_hub as hub
import tensorflow_addons as tfa

Downloading and loading the dataset:

# load the whole dataset, for data info


all_ds = tfds.load("eurosat", with_info=True)
# load training, testing & validation sets, splitting by 60%, 20% and 20%
respectively
train_ds = tfds.load("eurosat", split="train[:60%]")
test_ds = tfds.load("eurosat", split="train[60%:80%]")
valid_ds = tfds.load("eurosat", split="train[80%:]")

We split our dataset into 60% training, 20% validation during training, and
20% for testing. The below code is responsible for setting some variables we
use for later:

# the class names


class_names = all_ds[1].features["label"].names
# total number of classes (10)
num_classes = len(class_names)
num_examples = all_ds[1].splits["train"].num_examples

We grab the list of classes from the all_ds dataset as it was loaded
with with_info set to True , we also get the number of samples from it.

Next, I'm going to make a bar plot to see the number of samples in each class:

# make a plot for number of samples on each class


fig, ax = plt.subplots(1, 1, figsize=(14,10))
labels, counts = np.unique(np.fromiter(all_ds[0]["train"].map(lambda x:
x["label"]), np.int32),
return_counts=True)

plt.ylabel('Counts')
plt.xlabel('Labels')
sns.barplot(x = [class_names[l] for l in labels], y = counts, ax=ax)
for i, x_ in enumerate(labels):
ax.text(x_-0.2, counts[i]+5, counts[i])
# set the title
ax.set_title("Bar Plot showing Number of Samples on Each Class")
# save the image
# plt.savefig("class_samples.png")
Output:

3,000 samples on half of the classes, others have 2,500 samples, while
pasture only 2,000 samples.

Now let's take our training and validation sets and prepare them before
training:

def prepare_for_training(ds, cache=True, batch_size=64,


shuffle_buffer_size=1000):
if cache:
if isinstance(cache, str):
ds = ds.cache(cache)
else:
ds = ds.cache()
ds = ds.map(lambda d: (d["image"], tf.one_hot(d["label"], num_classes)))
# shuffle the dataset
ds = ds.shuffle(buffer_size=shuffle_buffer_size)
# Repeat forever
ds = ds.repeat()
# split to batches
ds = ds.batch(batch_size)
# `prefetch` lets the dataset fetch batches in the background while the model
# is training.
ds = ds.prefetch(buffer_size=tf.data.experimental.AUTOTUNE)
return ds

Here is what this function does:

cache() : This method saves the preprocessed dataset into a local cache
file. This will only preprocess it the very first time (in the first epoch
during training).
map() : We map our dataset so each sample will be a tuple of an image
and its corresponding label one-hot encoded with tf.one_hot() .
shuffle() : To shuffle the dataset so the samples are in random order.
repeat() Every time we iterate over the dataset, it'll repeatedly generate
samples for us; this will help us during the training.
batch() : We batch our dataset into 64 or 32 samples per training step.
prefetch() : This will enable us to fetch batches in the background while
the model is training.

Let's run it for the training and validation sets:

batch_size = 64

# preprocess training & validation sets


train_ds = prepare_for_training(train_ds, batch_size=batch_size)
valid_ds = prepare_for_training(valid_ds, batch_size=batch_size)
Let's see what our data looks like:

# validating shapes
for el in valid_ds.take(1):
print(el[0].shape, el[1].shape)
for el in train_ds.take(1):
print(el[0].shape, el[1].shape)

Output:

(64, 64, 64, 3) (64, 10)


(64, 64, 64, 3) (64, 10)

Fantastic, both the training and validation have the same shape; where the
batch size is 64, and the image shape is (64, 64, 3) . The targets have the shape
of (64, 10) as it's 64 samples with 10 classes one-hot encoded.

Let's visualize the first batch from the training dataset:

# take the first batch of the training set


batch = next(iter(train_ds))

def show_batch(batch):
plt.figure(figsize=(16, 16))
for n in range(min(32, batch_size)):
ax = plt.subplot(batch_size//8, 8, n + 1)
# show the image
plt.imshow(batch[0][n])
# and put the corresponding label as title upper to the image
plt.title(class_names[tf.argmax(batch[1][n].numpy())])
plt.axis('off')
plt.savefig("sample-images.png")

# showing a batch of images along with labels


show_batch(batch)

Output:

Building the Model


Right. Now that we have our data prepared for training, let's build our model.
First, downloading EfficientNetV2 and loading it as a hub.KerasLayer :

model_url =
"https://tfhub.dev/google/imagenet/efficientnet_v2_imagenet1k_l/feature_vector/2"

# download & load the layer as a feature vector


keras_layer = hub.KerasLayer(model_url, output_shape=[1280],
trainable=True)

We set the model_url to hub.KerasLayer so we get EfficientNetV2 as an image


feature extractor. However, we set trainable to True so we're adjusting the pre-
trained weights a bit for our dataset (i.e., fine-tuning).

Building the model:

m = tf.keras.Sequential([
keras_layer,
tf.keras.layers.Dense(num_classes, activation="softmax")
])
# build the model with input image shape as (64, 64, 3)
m.build([None, 64, 64, 3])
m.compile(
loss="categorical_crossentropy",
optimizer="adam",
metrics=["accuracy", tfa.metrics.F1Score(num_classes)]
)

m.summary()

We use Sequential() , the first layer is the pre-trained CNN model, and we add a
fully connected layer with the size of the number of classes as an output
layer.

Finally, the model is built and compiled with categorical cross-entropy, adam
optimizer, and accuracy and F1 score as metrics. Output:

Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
================================================================
keras_layer (KerasLayer) (None, 1280) 117746848

dense (Dense) (None, 10) 12810

================================================================
Total params: 117,759,658
Trainable params: 117,247,082
Non-trainable params: 512,576
_________________________________________________________________

Fine-tuning the Model


We have the data and model right, let's begin fine-tuning our model:

model_name = "satellite-classification"
model_path = os.path.join("results", model_name + ".h5")
model_checkpoint = tf.keras.callbacks.ModelCheckpoint(model_path,
save_best_only=True, verbose=1)

# set the training & validation steps since we're using .repeat() on our dataset
# number of training steps
n_training_steps = int(num_examples * 0.6) // batch_size
# number of validation steps
n_validation_steps = int(num_examples * 0.2) // batch_size

# train the model


history = m.fit(
train_ds, validation_data=valid_ds,
steps_per_epoch=n_training_steps,
validation_steps=n_validation_steps,
verbose=1, epochs=5,
callbacks=[model_checkpoint]
)

The training will take several minutes, depending on your GPU. Here is the
output:

Epoch 1/5
253/253 [==============================] - ETA: 0s - loss: 0.3780
- accuracy: 0.8859 - f1_score: 0.8832
Epoch 00001: val_loss improved from inf to 0.16415, saving model to
results/satellite-classification.h5
253/253 [==============================] - 158s 438ms/step -
loss: 0.3780 - accuracy: 0.8859 - f1_score: 0.8832 - val_loss: 0.1641 -
val_accuracy: 0.9513 - val_f1_score: 0.9501
Epoch 2/5
253/253 [==============================] - ETA: 0s - loss: 0.1531
- accuracy: 0.9536 - f1_score: 0.9525
Epoch 00002: val_loss improved from 0.16415 to 0.12853, saving model to
results/satellite-classification.h5
253/253 [==============================] - 106s 421ms/step -
loss: 0.1531 - accuracy: 0.9536 - f1_score: 0.9525 - val_loss: 0.1285 -
val_accuracy: 0.9568 - val_f1_score: 0.9559
Epoch 3/5
253/253 [==============================] - ETA: 0s - loss: 0.1092
- accuracy: 0.9660 - f1_score: 0.9654
Epoch 00003: val_loss improved from 0.12853 to 0.12095, saving model to
results/satellite-classification.h5
253/253 [==============================] - 107s 424ms/step -
loss: 0.1092 - accuracy: 0.9660 - f1_score: 0.9654 - val_loss: 0.1210 -
val_accuracy: 0.9619 - val_f1_score: 0.9605
Epoch 4/5
253/253 [==============================] - ETA: 0s - loss: 0.1042
- accuracy: 0.9692 - f1_score: 0.9687
Epoch 00004: val_loss did not improve from 0.12095
253/253 [==============================] - 100s 394ms/step -
loss: 0.1042 - accuracy: 0.9692 - f1_score: 0.9687 - val_loss: 0.1435 -
val_accuracy: 0.9565 - val_f1_score: 0.9572
Epoch 5/5
253/253 [==============================] - ETA: 0s - loss: 0.1003
- accuracy: 0.9700 - f1_score: 0.9695
Epoch 00005: val_loss improved from 0.12095 to 0.09841, saving model to
results/satellite-classification.h5
253/253 [==============================] - 107s 423ms/step -
loss: 0.1003 - accuracy: 0.9700 - f1_score: 0.9695 - val_loss: 0.0984 -
val_accuracy: 0.9702 - val_f1_score: 0.9687

As you can see, the model improved to about 97% accuracy on the validation
set on epoch 5. You can increase the number of epochs to see whether it can
improve further.

Model Evaluation
Up until now, we're only validating on the validation set during training. This
section uses our model to predict satellite images that the model has never
seen before. Loading the best weights:

# load the best weights


m.load_weights(model_path)
Extracting all the testing images and labels individually from test_ds :

# number of testing steps


n_testing_steps = int(all_ds[1].splits["train"].num_examples * 0.2)
# get all testing images as NumPy array
images = np.array([ d["image"] for d in test_ds.take(n_testing_steps) ])
print("images.shape:", images.shape)
# get all testing labels as NumPy array
labels = np.array([ d["label"] for d in test_ds.take(n_testing_steps) ])
print("labels.shape:", labels.shape)

Output:

images.shape: (5400, 64, 64, 3)


labels.shape: (5400,)

As expected, 5,400 images and labels , let's use the model to predict these
images and then compare the predictions with the true labels :

# feed the images to get predictions


predictions = m.predict(images)
# perform argmax to get class index
predictions = np.argmax(predictions, axis=1)
print("predictions.shape:", predictions.shape)

Output:

predictions.shape: (5400,)
from sklearn.metrics import f1_score

accuracy = tf.keras.metrics.Accuracy()
accuracy.update_state(labels, predictions)
print("Accuracy:", accuracy.result().numpy())
print("F1 Score:", f1_score(labels, predictions, average="macro"))

Output:

Accuracy: 0.9677778
F1 Score: 0.9655686619720163

That's good accuracy! Let's draw the confusion matrix for all the classes:

# compute the confusion matrix


cmn = tf.math.confusion_matrix(labels, predictions).numpy()
# normalize the matrix to be in percentages
cmn = cmn.astype('float') / cmn.sum(axis=0)[:, np.newaxis]
# make a plot for the confusion matrix
fig, ax = plt.subplots(figsize=(10,10))
sns.heatmap(cmn, annot=True, fmt='.2f',
xticklabels=[f"pred_{c}" for c in class_names],
yticklabels=[f"true_{c}" for c in class_names],
# cmap="Blues"
cmap="rocket_r"
)
plt.ylabel('Actual')
plt.xlabel('Predicted')
# plot the resulting confusion matrix
plt.savefig("confusion-matrix.png")
# plt.show()

Output:

As you can see, the model is accurate in most of the classes, especially on
forest images, as it achieved 100%. However, it's down to 91% for pasture,
and the model sometimes predicts the pasture as permanent corp, also on
herbaceous vegetation. Most of the confusion is between corp, pasture, and
herbaceous vegetation as they all look similar and, most of the time, green
from the satellite.

Let's show some examples that the model predicted:

def show_predicted_samples():
plt.figure(figsize=(14, 14))
for n in range(64):
ax = plt.subplot(8, 8, n + 1)
# show the image
plt.imshow(images[n])
# and put the corresponding label as title upper to the image
if predictions[n] == labels[n]:
# correct prediction
ax.set_title(class_names[predictions[n]], color="green")
else:
# wrong prediction
ax.set_title(f"{class_names[predictions[n]]}/T:
{class_names[labels[n]]}", color="red")
plt.axis('off')
plt.savefig("predicted-sample-images.png")

# showing a batch of images along with predictions labels


show_predicted_samples()

Output:
In all 64 images, only one (red label in the above image) failed to predict the
actual class. It was predicted as a pasture where it should be a permanent
crop.

Final Thoughts
Alright! That's it for the tutorial. If you want further improvement, I highly
advise you to explore on TensorFlow hub, where you find the state-of-the-art
pre-trained CNN models and feature extractors.

I also suggest you try out different optimizers and increase the number of
epochs to see if you can improve it. You can use TensorBoard to track the
accuracy of each change you make. Make sure you include the variables in
the model name.

If you want more in-depth information, I encourage you to check


the EuroSAT paper, where they achieved 98.57% accuracy with the 13 bands
version of the dataset (1.93GB). You can also use this version of the dataset
by passing "eurosat/all" instead of standard "eurosat" to the tfds.load() method.

Source Code:

satellite_image_classification.py
# -*- coding: utf-8 -*-
"""Satellite-Image-Classification-with-TensorFlow_PythonCode.ipynb

Automatically generated by Colaboratory.

Original file is located at


https://colab.research.google.com/drive/1SVpaW9HSebpHNYf6LXTm7elnHOSdQA5i
"""

!pip install tensorflow tensorflow_addons tensorflow_datasets tensorflow_hub numpy matplotlib


seaborn

import os

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import tensorflow as tf
import tensorflow_datasets as tfds
import tensorflow_hub as hub
import tensorflow_addons as tfa

# load the whole dataset, for data info


all_ds = tfds.load("eurosat", with_info=True)
# load training, testing & validation sets, splitting by 60%, 20% and 20% respectively
train_ds = tfds.load("eurosat", split="train[:60%]")
test_ds = tfds.load("eurosat", split="train[60%:80%]")
valid_ds = tfds.load("eurosat", split="train[80%:]")

# the class names


class_names = all_ds[1].features["label"].names
# total number of classes (10)
num_classes = len(class_names)
num_examples = all_ds[1].splits["train"].num_examples

# make a plot for number of samples on each class


fig, ax = plt.subplots(1, 1, figsize=(14,10))
labels, counts = np.unique(np.fromiter(all_ds[0]["train"].map(lambda x: x["label"]), np.int32),
return_counts=True)

plt.ylabel('Counts')
plt.xlabel('Labels')
sns.barplot(x = [class_names[l] for l in labels], y = counts, ax=ax)
for i, x_ in enumerate(labels):
ax.text(x_-0.2, counts[i]+5, counts[i])
# set the title
ax.set_title("Bar Plot showing Number of Samples on Each Class")
# save the image
# plt.savefig("class_samples.png")

def prepare_for_training(ds, cache=True, batch_size=64, shuffle_buffer_size=1000):


if cache:
if isinstance(cache, str):
ds = ds.cache(cache)
else:
ds = ds.cache()
ds = ds.map(lambda d: (d["image"], tf.one_hot(d["label"], num_classes)))
# shuffle the dataset
ds = ds.shuffle(buffer_size=shuffle_buffer_size)
# Repeat forever
ds = ds.repeat()
# split to batches
ds = ds.batch(batch_size)
# `prefetch` lets the dataset fetch batches in the background while the model
# is training.
ds = ds.prefetch(buffer_size=tf.data.experimental.AUTOTUNE)
return ds

batch_size = 64

# preprocess training & validation sets


train_ds = prepare_for_training(train_ds, batch_size=batch_size)
valid_ds = prepare_for_training(valid_ds, batch_size=batch_size)

# validating shapes
for el in valid_ds.take(1):
print(el[0].shape, el[1].shape)
for el in train_ds.take(1):
print(el[0].shape, el[1].shape)

# take the first batch of the training set


batch = next(iter(train_ds))

def show_batch(batch):
plt.figure(figsize=(16, 16))
for n in range(min(32, batch_size)):
ax = plt.subplot(batch_size//8, 8, n + 1)
# show the image
plt.imshow(batch[0][n])
# and put the corresponding label as title upper to the image
plt.title(class_names[tf.argmax(batch[1][n].numpy())])
plt.axis('off')
plt.savefig("sample-images.png")

# showing a batch of images along with labels


show_batch(batch)
model_url = "https://tfhub.dev/google/imagenet/efficientnet_v2_imagenet1k_l/feature_vector/2"

# download & load the layer as a feature vector


keras_layer = hub.KerasLayer(model_url, output_shape=[1280], trainable=True)

m = tf.keras.Sequential([
keras_layer,
tf.keras.layers.Dense(num_classes, activation="softmax")
])
# build the model with input image shape as (64, 64, 3)
m.build([None, 64, 64, 3])
m.compile(
loss="categorical_crossentropy",
optimizer="adam",
metrics=["accuracy", tfa.metrics.F1Score(num_classes)]
)

m.summary()

model_name = "satellite-classification"
model_path = os.path.join("results", model_name + ".h5")
model_checkpoint = tf.keras.callbacks.ModelCheckpoint(model_path, save_best_only=True,
verbose=1)

n_training_steps = int(num_examples * 0.6) // batch_size


n_validation_steps = int(num_examples * 0.2) // batch_size

history = m.fit(
train_ds, validation_data=valid_ds,
steps_per_epoch=n_training_steps,
validation_steps=n_validation_steps,
verbose=1, epochs=5,
callbacks=[model_checkpoint]
)

# number of testing steps


n_testing_steps = int(all_ds[1].splits["train"].num_examples * 0.2)
m.load_weights(model_path)

# get all testing images as NumPy array


images = np.array([ d["image"] for d in test_ds.take(n_testing_steps) ])
print("images.shape:", images.shape)

# get all testing labels as NumPy array


labels = np.array([ d["label"] for d in test_ds.take(n_testing_steps) ])
print("labels.shape:", labels.shape)

# feed the images to get predictions


predictions = m.predict(images)
# perform argmax to get class index
predictions = np.argmax(predictions, axis=1)
print("predictions.shape:", predictions.shape)

from sklearn.metrics import f1_score

accuracy = tf.keras.metrics.Accuracy()
accuracy.update_state(labels, predictions)
print("Accuracy:", accuracy.result().numpy())
print("F1 Score:", f1_score(labels, predictions, average="macro"))

# compute the confusion matrix


cmn = tf.math.confusion_matrix(labels, predictions).numpy()
# normalize the matrix to be in percentages
cmn = cmn.astype('float') / cmn.sum(axis=0)[:, np.newaxis]
# make a plot for the confusion matrix
fig, ax = plt.subplots(figsize=(10,10))
sns.heatmap(cmn, annot=True, fmt='.2f',
xticklabels=[f"pred_{c}" for c in class_names],
yticklabels=[f"true_{c}" for c in class_names],
# cmap="Blues"
cmap="rocket_r"
)
plt.ylabel('Actual')
plt.xlabel('Predicted')
# plot the resulting confusion matrix
plt.savefig("confusion-matrix.png")
# plt.show()

def show_predicted_samples():
plt.figure(figsize=(14, 14))
for n in range(64):
ax = plt.subplot(8, 8, n + 1)
# show the image
plt.imshow(images[n])
# and put the corresponding label as title upper to the image
if predictions[n] == labels[n]:
# correct prediction
ax.set_title(class_names[predictions[n]], color="green")
else:
# wrong prediction
ax.set_title(f"{class_names[predictions[n]]}/T:{class_names[labels[n]]}", color="red")
plt.axis('off')
plt.savefig("predicted-sample-images.png")

# showing a batch of images along with predictions labels


show_predicted_samples()
PART 2: Age and Gender Detection
using OpenCV in Python
Learn how to perform age and gender detection using OpenCV library in Python with camera or image
input.

In this tutorial, we will combine gender detection and age detection tutorials
to come up with a single code that detects both.

Let's get started. If you haven't OpenCV already installed, make sure to do
so:

$ pip install opencv-python numpy

Open up a new file. Importing the libraries:

# Import Libraries
import cv2
import numpy as np

Next, defining the variables of weights and architectures for face, age, and
gender detection models:

#
https://raw.githubusercontent.com/opencv/opencv/master/samples/dnn/face_detector/deplo
FACE_PROTO = "weights/deploy.prototxt.txt"
#
https://raw.githubusercontent.com/opencv/opencv_3rdparty/dnn_samples_face_detector_2
FACE_MODEL =
"weights/res10_300x300_ssd_iter_140000_fp16.caffemodel"
# The gender model architecture
# https://drive.google.com/open?
id=1W_moLzMlGiELyPxWiYQJ9KFaXroQ_NFQ
GENDER_MODEL = 'weights/deploy_gender.prototxt'
# The gender model pre-trained weights
# https://drive.google.com/open?
id=1AW3WduLk1haTVAxHOkVS_BEzel1WXQHP
GENDER_PROTO = 'weights/gender_net.caffemodel'
# Each Caffe Model impose the shape of the input image also image
preprocessing is required like mean
# substraction to eliminate the effect of illunination changes
MODEL_MEAN_VALUES = (78.4263377603, 87.7689143744,
114.895847746)
# Represent the gender classes
GENDER_LIST = ['Male', 'Female']
# The model architecture
# download from: https://drive.google.com/open?
id=1kiusFljZc9QfcIYdU2s7xrtWHTraHwmW
AGE_MODEL = 'weights/deploy_age.prototxt'
# The model pre-trained weights
# download from: https://drive.google.com/open?
id=1kWv0AjxGSN0g31OeJa02eBGM0R_jcjIl
AGE_PROTO = 'weights/age_net.caffemodel'
# Represent the 8 age classes of this CNN probability layer
AGE_INTERVALS = ['(0, 2)', '(4, 6)', '(8, 12)', '(15, 20)',
'(25, 32)', '(38, 43)', '(48, 53)', '(60, 100)']

Below are the necessary files to be included in the project directory:

gender_net.caffemodel :
It is the pre-trained model weights for gender
detection. You can download it here.
deploy_gender.prototxt : is the model architecture for the gender detection
model (a plain text file with a JSON-like structure containing all the
neural network layer’s definitions). Get it here.
age_net.caffemodel : It is the pre-trained model weights for age detection.
You can download it here.
deploy_age.prototxt : is the model architecture for the age detection model (a
plain text file with a JSON-like structure containing all the neural
network layer’s definitions). Get it here.
res10_300x300_ssd_iter_140000_fp16.caffemodel : The pre-trained model weights
for face detection, download here.
deploy.prototxt.txt : This is the model architecture for the face detection
model, download here.

Next, loading the models:

# Initialize frame size


frame_width = 1280
frame_height = 720
# load face Caffe model
face_net = cv2.dnn.readNetFromCaffe(FACE_PROTO, FACE_MODEL)
# Load age prediction model
age_net = cv2.dnn.readNetFromCaffe(AGE_MODEL, AGE_PROTO)
# Load gender prediction model
gender_net = cv2.dnn.readNetFromCaffe(GENDER_MODEL,
GENDER_PROTO)

Before trying to detect age and gender, we need a function to detect faces
first:

def get_faces(frame, confidence_threshold=0.5):


# convert the frame into a blob to be ready for NN input
blob = cv2.dnn.blobFromImage(frame, 1.0, (300, 300), (104, 177.0,
123.0))
# set the image as input to the NN
face_net.setInput(blob)
# perform inference and get predictions
output = np.squeeze(face_net.forward())
# initialize the result list
faces = []
# Loop over the faces detected
for i in range(output.shape[0]):
confidence = output[i, 2]
if confidence > confidence_threshold:
box = output[i, 3:7] * \
np.array([frame.shape[1], frame.shape[0],
frame.shape[1], frame.shape[0]])
# convert to integers
start_x, start_y, end_x, end_y = box.astype(np.int)
# widen the box a little
start_x, start_y, end_x, end_y = start_x - \
10, start_y - 10, end_x + 10, end_y + 10
start_x = 0 if start_x < 0 else start_x
start_y = 0 if start_y < 0 else start_y
end_x = 0 if end_x < 0 else end_x
end_y = 0 if end_y < 0 else end_y
# append to our list
faces.append((start_x, start_y, end_x, end_y))
return faces

The get_faces() function was grabbed from the face detection tutorial, so check
it out if you want more information.
Below is a function for simply displaying an image:

def display_img(title, img):


"""Displays an image on screen and maintains the output until the user
presses a key"""
# Display Image on screen
cv2.imshow(title, img)
# Mantain output until user presses a key
cv2.waitKey(0)
# Destroy windows when user presses a key
cv2.destroyAllWindows()

Below are is a function for dynamically resizing an image, we're going to


need it to resize the input images when exceeding a certain width:

# from: https://stackoverflow.com/questions/44650888/resize-an-image-
without-distortion-opencv
def image_resize(image, width = None, height = None, inter =
cv2.INTER_AREA):
# initialize the dimensions of the image to be resized and
# grab the image size
dim = None
(h, w) = image.shape[:2]
# if both the width and height are None, then return the
# original image
if width is None and height is None:
return image
# check to see if the width is None
if width is None:
# calculate the ratio of the height and construct the
# dimensions
r = height / float(h)
dim = (int(w * r), height)
# otherwise, the height is None
else:
# calculate the ratio of the width and construct the
# dimensions
r = width / float(w)
dim = (width, int(h * r))
# resize the image
return cv2.resize(image, dim, interpolation = inter)

Now that everything is ready, let's define our two functions for age and
gender detection:

def get_gender_predictions(face_img):
blob = cv2.dnn.blobFromImage(
image=face_img, scalefactor=1.0, size=(227, 227),
mean=MODEL_MEAN_VALUES, swapRB=False, crop=False
)
gender_net.setInput(blob)
return gender_net.forward()

def get_age_predictions(face_img):
blob = cv2.dnn.blobFromImage(
image=face_img, scalefactor=1.0, size=(227, 227),
mean=MODEL_MEAN_VALUES, swapRB=False
)
age_net.setInput(blob)
return age_net.forward()

The get_gender_predictions() and get_age_predictions() perform prediction on


the gender_net and age_net models to infer the gender and age of the input image
respectively.

Finally, we write our main function:

def predict_age_and_gender(input_path: str):


"""Predict the gender of the faces showing in the image"""
# Initialize frame size
# frame_width = 1280
# frame_height = 720
# Read Input Image
img = cv2.imread(input_path)
# resize the image, uncomment if you want to resize the image
# img = cv2.resize(img, (frame_width, frame_height))
# Take a of the initial image and resize it
frame = img.()
if frame.shape[1] > frame_width:
frame = image_resize(frame, width=frame_width)
# predict the faces
faces = get_faces(frame)
# Loop over the faces detected
# for idx, face in enumerate(faces):
for i, (start_x, start_y, end_x, end_y) in enumerate(faces):
face_img = frame[start_y: end_y, start_x: end_x]
age_preds = get_age_predictions(face_img)
gender_preds = get_gender_predictions(face_img)
i = gender_preds[0].argmax()
gender = GENDER_LIST[i]
gender_confidence_score = gender_preds[0][i]
i = age_preds[0].argmax()
age = AGE_INTERVALS[i]
age_confidence_score = age_preds[0][i]
# Draw the box
label = f"{gender}-{gender_confidence_score*100:.1f}%, {age}-
{age_confidence_score*100:.1f}%"
# label = "{}-{:.2f}%".format(gender, gender_confidence_score*100)
print(label)
yPos = start_y - 15
while yPos < 15:
yPos += 15
box_color = (255, 0, 0) if gender == "Male" else (147, 20, 255)
cv2.rectangle(frame, (start_x, start_y), (end_x, end_y), box_color, 2)
# Label processed image
font_scale = 0.54
cv2.putText(frame, label, (start_x, yPos),
cv2.FONT_HERSHEY_SIMPLEX, font_scale, box_color, 2)

# Display processed image


display_img("Gender Estimator", frame)
# uncomment if you want to save the image
cv2.imwrite("output.jpg", frame)
# Cleanup
cv2.destroyAllWindows()

The main function does the following:


First, it reads the image using the cv2.imread() method.
After the image is resized to the appropriate size, we use
our get_faces() function to get all the detected faces from the image.
We iterate on each detected face image and call
our get_age_predictions() and get_gender_predictions() to get the predictions.
We print the age and gender.
We draw a rectangle surrounding the face and also put the label that
contains the age and gender text along with confidence on the image.
Finally, we show the image.

Let's call it:

if __name__ == "__main__":
import sys
input_path = sys.argv[1]
predict_age_and_gender(input_path)

Done, let's run the script now (testing on this image):

$ python age_and_gender_detection.py images/girl.jpg

Output in the console:

Male-99.1%, (4, 6)-71.9%


Female-96.0%, (4, 6)-70.9%

The resulting image:


Here is another example:
Or this:
Awesome! If you see the text in the image is large or small, make sure to
tweak the font_scale floating-point variable on your image in
the predict_age_and_gender() function.

For more detail on how the gender and age prediction works, I suggest you
check the individual tutorials:

Age Detection using OpenCV in Python


Gender Detection using OpenCV in Python

If you want to use your camera, I made a Python script to read images from
your webcam and perform inference in real-time.

Finally, I've collected some useful resources and courses for you for further
learning, I highly recommend the following courses:

Machine Learning Specialization on Coursera.


Deep Learning Specialization on Coursera.
Introduction to Computer Vision and Image Processing
Source Code:

age_and_gender_detection.py
# Import Libraries
import cv2
import numpy as np

# https://raw.githubusercontent.com/opencv/opencv/master/samples/dnn/face_detector/deploy.prototxt
FACE_PROTO = "weights/deploy.prototxt.txt"
#
https://raw.githubusercontent.com/opencv/opencv_3rdparty/dnn_samples_face_detector_20180205_fp16/res10_300x30
FACE_MODEL = "weights/res10_300x300_ssd_iter_140000_fp16.caffemodel"
# The gender model architecture
# https://drive.google.com/open?id=1W_moLzMlGiELyPxWiYQJ9KFaXroQ_NFQ
GENDER_MODEL = 'weights/deploy_gender.prototxt'
# The gender model pre-trained weights
# https://drive.google.com/open?id=1AW3WduLk1haTVAxHOkVS_BEzel1WXQHP
GENDER_PROTO = 'weights/gender_net.caffemodel'
# Each Caffe Model impose the shape of the input image also image preprocessing is required like
mean
# substraction to eliminate the effect of illunination changes
MODEL_MEAN_VALUES = (78.4263377603, 87.7689143744, 114.895847746)
# Represent the gender classes
GENDER_LIST = ['Male', 'Female']
# The model architecture
# download from: https://drive.google.com/open?id=1kiusFljZc9QfcIYdU2s7xrtWHTraHwmW
AGE_MODEL = 'weights/deploy_age.prototxt'
# The model pre-trained weights
# download from: https://drive.google.com/open?id=1kWv0AjxGSN0g31OeJa02eBGM0R_jcjIl
AGE_PROTO = 'weights/age_net.caffemodel'
# Represent the 8 age classes of this CNN probability layer
AGE_INTERVALS = ['(0, 2)', '(4, 6)', '(8, 12)', '(15, 20)',
'(25, 32)', '(38, 43)', '(48, 53)', '(60, 100)']
# Initialize frame size
frame_width = 1280
frame_height = 720
# load face Caffe model
face_net = cv2.dnn.readNetFromCaffe(FACE_PROTO, FACE_MODEL)
# Load age prediction model
age_net = cv2.dnn.readNetFromCaffe(AGE_MODEL, AGE_PROTO)
# Load gender prediction model
gender_net = cv2.dnn.readNetFromCaffe(GENDER_MODEL, GENDER_PROTO)

def get_faces(frame, confidence_threshold=0.5):


# convert the frame into a blob to be ready for NN input
blob = cv2.dnn.blobFromImage(frame, 1.0, (300, 300), (104, 177.0, 123.0))
# set the image as input to the NN
face_net.setInput(blob)
# perform inference and get predictions
output = np.squeeze(face_net.forward())
# initialize the result list
faces = []
# Loop over the faces detected
for i in range(output.shape[0]):
confidence = output[i, 2]
if confidence > confidence_threshold:
box = output[i, 3:7] * \
np.array([frame.shape[1], frame.shape[0],
frame.shape[1], frame.shape[0]])
# convert to integers
start_x, start_y, end_x, end_y = box.astype(np.int)
# widen the box a little
start_x, start_y, end_x, end_y = start_x - \
10, start_y - 10, end_x + 10, end_y + 10
start_x = 0 if start_x < 0 else start_x
start_y = 0 if start_y < 0 else start_y
end_x = 0 if end_x < 0 else end_x
end_y = 0 if end_y < 0 else end_y
# append to our list
faces.append((start_x, start_y, end_x, end_y))
return faces

def display_img(title, img):


"""Displays an image on screen and maintains the output until the user presses a key"""
# Display Image on screen
cv2.imshow(title, img)
# Mantain output until user presses a key
cv2.waitKey(0)
# Destroy windows when user presses a key
cv2.destroyAllWindows()

# from: https://stackoverflow.com/questions/44650888/resize-an-image-without-distortion-opencv
def image_resize(image, width = None, height = None, inter = cv2.INTER_AREA):
# initialize the dimensions of the image to be resized and
# grab the image size
dim = None
(h, w) = image.shape[:2]
# if both the width and height are None, then return the
# original image
if width is None and height is None:
return image
# check to see if the width is None
if width is None:
# calculate the ratio of the height and construct the
# dimensions
r = height / float(h)
dim = (int(w * r), height)
# otherwise, the height is None
else:
# calculate the ratio of the width and construct the
# dimensions
r = width / float(w)
dim = (width, int(h * r))
# resize the image
return cv2.resize(image, dim, interpolation = inter)
def get_gender_predictions(face_img):
blob = cv2.dnn.blobFromImage(
image=face_img, scalefactor=1.0, size=(227, 227),
mean=MODEL_MEAN_VALUES, swapRB=False, crop=False
)
gender_net.setInput(blob)
return gender_net.forward()

def get_age_predictions(face_img):
blob = cv2.dnn.blobFromImage(
image=face_img, scalefactor=1.0, size=(227, 227),
mean=MODEL_MEAN_VALUES, swapRB=False
)
age_net.setInput(blob)
return age_net.forward()

def predict_age_and_gender(input_path: str):


"""Predict the gender of the faces showing in the image"""
# Initialize frame size
# frame_width = 1280
# frame_height = 720
# Read Input Image
img = cv2.imread(input_path)
# resize the image, uncomment if you want to resize the image
# img = cv2.resize(img, (frame_width, frame_height))
# Take a of the initial image and resize it
frame = img.()
if frame.shape[1] > frame_width:
frame = image_resize(frame, width=frame_width)
# predict the faces
faces = get_faces(frame)
# Loop over the faces detected
# for idx, face in enumerate(faces):
for i, (start_x, start_y, end_x, end_y) in enumerate(faces):
face_img = frame[start_y: end_y, start_x: end_x]
age_preds = get_age_predictions(face_img)
gender_preds = get_gender_predictions(face_img)
i = gender_preds[0].argmax()
gender = GENDER_LIST[i]
gender_confidence_score = gender_preds[0][i]
i = age_preds[0].argmax()
age = AGE_INTERVALS[i]
age_confidence_score = age_preds[0][i]
# Draw the box
label = f"{gender}-{gender_confidence_score*100:.1f}%, {age}-
{age_confidence_score*100:.1f}%"
# label = "{}-{:.2f}%".format(gender, gender_confidence_score*100)
print(label)
yPos = start_y - 15
while yPos < 15:
yPos += 15
box_color = (255, 0, 0) if gender == "Male" else (147, 20, 255)
cv2.rectangle(frame, (start_x, start_y), (end_x, end_y), box_color, 2)
# Label processed image
font_scale = 0.54
cv2.putText(frame, label, (start_x, yPos),
cv2.FONT_HERSHEY_SIMPLEX, font_scale, box_color, 2)

# Display processed image


display_img("Gender Estimator", frame)
# uncomment if you want to save the image
cv2.imwrite("output.jpg", frame)
# Cleanup
cv2.destroyAllWindows()

if __name__ == "__main__":
import sys
input_path = sys.argv[1]
predict_age_and_gender(input_path)
age_and_gender_detection_live.py
# Import Libraries
import cv2
import numpy as np

# The gender model architecture


# https://drive.google.com/open?id=1W_moLzMlGiELyPxWiYQJ9KFaXroQ_NFQ
GENDER_MODEL = 'weights/deploy_gender.prototxt'
# The gender model pre-trained weights
# https://drive.google.com/open?id=1AW3WduLk1haTVAxHOkVS_BEzel1WXQHP
GENDER_PROTO = 'weights/gender_net.caffemodel'
# Each Caffe Model impose the shape of the input image also image preprocessing is required like
mean
# substraction to eliminate the effect of illunination changes
MODEL_MEAN_VALUES = (78.4263377603, 87.7689143744, 114.895847746)
# Represent the gender classes
GENDER_LIST = ['Male', 'Female']
# https://raw.githubusercontent.com/opencv/opencv/master/samples/dnn/face_detector/deploy.prototxt
FACE_PROTO = "weights/deploy.prototxt.txt"
#
https://raw.githubusercontent.com/opencv/opencv_3rdparty/dnn_samples_face_detector_20180205_fp16/res10_300x30
FACE_MODEL = "weights/res10_300x300_ssd_iter_140000_fp16.caffemodel"
# The model architecture
# download from: https://drive.google.com/open?id=1kiusFljZc9QfcIYdU2s7xrtWHTraHwmW
AGE_MODEL = 'weights/deploy_age.prototxt'
# The model pre-trained weights
# download from: https://drive.google.com/open?id=1kWv0AjxGSN0g31OeJa02eBGM0R_jcjIl
AGE_PROTO = 'weights/age_net.caffemodel'
# Represent the 8 age classes of this CNN probability layer
AGE_INTERVALS = ['(0, 2)', '(4, 6)', '(8, 12)', '(15, 20)',
'(25, 32)', '(38, 43)', '(48, 53)', '(60, 100)']
# Initialize frame size
frame_width = 1280
frame_height = 720
# load face Caffe model
face_net = cv2.dnn.readNetFromCaffe(FACE_PROTO, FACE_MODEL)
# Load age prediction model
age_net = cv2.dnn.readNetFromCaffe(AGE_MODEL, AGE_PROTO)
# Load gender prediction model
gender_net = cv2.dnn.readNetFromCaffe(GENDER_MODEL, GENDER_PROTO)

def get_faces(frame, confidence_threshold=0.5):


# convert the frame into a blob to be ready for NN input
blob = cv2.dnn.blobFromImage(frame, 1.0, (300, 300), (104, 177.0, 123.0))
# set the image as input to the NN
face_net.setInput(blob)
# perform inference and get predictions
output = np.squeeze(face_net.forward())
# initialize the result list
faces = []
# Loop over the faces detected
for i in range(output.shape[0]):
confidence = output[i, 2]
if confidence > confidence_threshold:
box = output[i, 3:7] * \
np.array([frame.shape[1], frame.shape[0],
frame.shape[1], frame.shape[0]])
# convert to integers
start_x, start_y, end_x, end_y = box.astype(np.int)
# widen the box a little
start_x, start_y, end_x, end_y = start_x - \
10, start_y - 10, end_x + 10, end_y + 10
start_x = 0 if start_x < 0 else start_x
start_y = 0 if start_y < 0 else start_y
end_x = 0 if end_x < 0 else end_x
end_y = 0 if end_y < 0 else end_y
# append to our list
faces.append((start_x, start_y, end_x, end_y))
return faces

# from: https://stackoverflow.com/questions/44650888/resize-an-image-without-distortion-opencv
def image_resize(image, width = None, height = None, inter = cv2.INTER_AREA):
# initialize the dimensions of the image to be resized and
# grab the image size
dim = None
(h, w) = image.shape[:2]
# if both the width and height are None, then return the
# original image
if width is None and height is None:
return image
# check to see if the width is None
if width is None:
# calculate the ratio of the height and construct the
# dimensions
r = height / float(h)
dim = (int(w * r), height)
# otherwise, the height is None
else:
# calculate the ratio of the width and construct the
# dimensions
r = width / float(w)
dim = (width, int(h * r))
# resize the image
return cv2.resize(image, dim, interpolation = inter)

def get_gender_predictions(face_img):
blob = cv2.dnn.blobFromImage(
image=face_img, scalefactor=1.0, size=(227, 227),
mean=MODEL_MEAN_VALUES, swapRB=False, crop=False
)
gender_net.setInput(blob)
return gender_net.forward()

def get_age_predictions(face_img):
blob = cv2.dnn.blobFromImage(
image=face_img, scalefactor=1.0, size=(227, 227),
mean=MODEL_MEAN_VALUES, swapRB=False
)
age_net.setInput(blob)
return age_net.forward()
def predict_age_and_gender():
"""Predict the gender of the faces showing in the image"""
# create a new cam object
cap = cv2.VideoCapture(0)

while True:
_, img = cap.read()
# Take a of the initial image and resize it
frame = img.()
# resize if higher than frame_width
if frame.shape[1] > frame_width:
frame = image_resize(frame, width=frame_width)
# predict the faces
faces = get_faces(frame)
# Loop over the faces detected
# for idx, face in enumerate(faces):
for i, (start_x, start_y, end_x, end_y) in enumerate(faces):
face_img = frame[start_y: end_y, start_x: end_x]
# predict age
age_preds = get_age_predictions(face_img)
# predict gender
gender_preds = get_gender_predictions(face_img)
i = gender_preds[0].argmax()
gender = GENDER_LIST[i]
gender_confidence_score = gender_preds[0][i]
i = age_preds[0].argmax()
age = AGE_INTERVALS[i]
age_confidence_score = age_preds[0][i]
# Draw the box
label = f"{gender}-{gender_confidence_score*100:.1f}%, {age}-
{age_confidence_score*100:.1f}%"
# label = "{}-{:.2f}%".format(gender, gender_confidence_score*100)
print(label)
yPos = start_y - 15
while yPos < 15:
yPos += 15
box_color = (255, 0, 0) if gender == "Male" else (147, 20, 255)
cv2.rectangle(frame, (start_x, start_y), (end_x, end_y), box_color, 2)
# Label processed image
cv2.putText(frame, label, (start_x, yPos),
cv2.FONT_HERSHEY_SIMPLEX, 0.54, box_color, 2)

# Display processed image


cv2.imshow("Gender Estimator", frame)
if cv2.waitKey(1) == ord("q"):
break
# uncomment if you want to save the image
# cv2.imwrite("output.jpg", frame)
# Cleanup
cv2.destroyAllWindows()

if __name__ == "__main__":
predict_age_and_gender()
PART 3: Gender Detection using
OpenCV in Python
Learn how to perform gender detection on detected faces in images using OpenCV library in Python.

Automatic prediction of gender from face images has drawn a lot of attention
recently, due to its wide application in various facial analysis problems.
However, due to the large variations of face images (such as variation in
lighting, scale, and occlusion) the existing models are still behind the desired
accuracy level which is necessary for exploiting these models in real-world
applications.

The goal of this tutorial is to develop a lightweight command-line-based


utility, through Python-based modules to automatically detect faces in a static
image and to predict the gender of the spotted persons using a deep learning-
based gender detection model.

Please note that if you want to detect both gender and age in the same code at
the same time, check this tutorial for it.

Pre-requisites

The following components come into play:

OpenCV: is an open-source library for computer vision, machine


learning, and image processing. OpenCV supports a wide variety of
programming languages like Python, C++, Java and it is used for all
sorts of image and video analysis like facial detection and recognition,
photo editing, optical character recognition, and a whole heap more.
Using OpenCV comes with many benefits among which:
OpenCV is an open-source library and it is free of cost.
OpenCV is fast since it is written in C/C++.
OpenCV supports most Operating Systems such as
Windows, Linux, and macOS.
Suggestion: Check our computer vision tutorials for more OpenCV use
cases.

filetype: is a small and dependency-free Python package to infer file


and MIME types.

For the purpose of this article, we will use pre-trained Caffe models, one for
face detection taken from the face detection tutorial, and another model for
age detection. Below is the list of necessary files to include in our project
directory:

gender_net.caffemodel : It is the pre-trained model weights for gender


detection. You can download it here.
deploy_gender.prototxt : is the model architecture for the gender detection
model (a plain text file with a JSON-like structure containing all the
neural network layer’s definitions). Get it here.
res10_300x300_ssd_iter_140000_fp16.caffemodel : The pre-trained model weights
for face detection, download here.
deploy.prototxt.txt : This is the model architecture for the face detection
model, download here.

After downloading the 4 necessary files, put them in the weights folder:

To get started, let's install OpenCV and NumPy:

$ pip install opencv-python numpy

Open up a new Python file and follow along. First, let's import the necessary
modules and initialize the needed variables:
# Import Libraries
import cv2
import numpy as np

# The gender model architecture


# https://drive.google.com/open?
id=1W_moLzMlGiELyPxWiYQJ9KFaXroQ_NFQ
GENDER_MODEL = 'weights/deploy_gender.prototxt'
# The gender model pre-trained weights
# https://drive.google.com/open?
id=1AW3WduLk1haTVAxHOkVS_BEzel1WXQHP
GENDER_PROTO = 'weights/gender_net.caffemodel'
# Each Caffe Model impose the shape of the input image also image
preprocessing is required like mean
# substraction to eliminate the effect of illunination changes
MODEL_MEAN_VALUES = (78.4263377603, 87.7689143744,
114.895847746)
# Represent the gender classes
GENDER_LIST = ['Male', 'Female']
#
https://raw.githubusercontent.com/opencv/opencv/master/samples/dnn/face_detector/deplo
FACE_PROTO = "weights/deploy.prototxt.txt"
#
https://raw.githubusercontent.com/opencv/opencv_3rdparty/dnn_samples_face_detector_2
FACE_MODEL =
"weights/res10_300x300_ssd_iter_140000_fp16.caffemodel"

Next, let's load our models:

# load face Caffe model


face_net = cv2.dnn.readNetFromCaffe(FACE_PROTO, FACE_MODEL)
# Load gender prediction model
gender_net = cv2.dnn.readNetFromCaffe(GENDER_MODEL,
GENDER_PROTO)

Like the age detection tutorial, before going into detecting gender, we need a
way to detect faces, below function is mostly taken from the face detection
tutorial:

def get_faces(frame, confidence_threshold=0.5):


# convert the frame into a blob to be ready for NN input
blob = cv2.dnn.blobFromImage(frame, 1.0, (300, 300), (104, 177.0,
123.0))
# set the image as input to the NN
face_net.setInput(blob)
# perform inference and get predictions
output = np.squeeze(face_net.forward())
# initialize the result list
faces = []
# Loop over the faces detected
for i in range(output.shape[0]):
confidence = output[i, 2]
if confidence > confidence_threshold:
box = output[i, 3:7] * \
np.array([frame.shape[1], frame.shape[0],
frame.shape[1], frame.shape[0]])
# convert to integers
start_x, start_y, end_x, end_y = box.astype(np.int)
# widen the box a little
start_x, start_y, end_x, end_y = start_x - \
10, start_y - 10, end_x + 10, end_y + 10
start_x = 0 if start_x < 0 else start_x
start_y = 0 if start_y < 0 else start_y
end_x = 0 if end_x < 0 else end_x
end_y = 0 if end_y < 0 else end_y
# append to our list
faces.append((start_x, start_y, end_x, end_y))
return faces

Next, making a utility function to display an image:

def display_img(title, img):


"""Displays an image on screen and maintains the output until the user
presses a key"""
# Display Image on screen
cv2.imshow(title, img)
# Mantain output until user presses a key
cv2.waitKey(0)
# Destroy windows when user presses a key
cv2.destroyAllWindows()

Next, let's make two utility functions, one for finding the appropriate font
size to write in the image, and another for correctly resizing the image:

def get_optimal_font_scale(text, width):


"""Determine the optimal font scale based on the hosting frame width"""
for scale in reversed(range(0, 60, 1)):
textSize = cv2.getTextSize(text,
fontFace=cv2.FONT_HERSHEY_DUPLEX, fontScale=scale/10,
thickness=1)
new_width = textSize[0][0]
if (new_width <= width):
return scale/10
return 1

# from: https://stackoverflow.com/questions/44650888/resize-an-image-
without-distortion-opencv
def image_resize(image, width = None, height = None, inter =
cv2.INTER_AREA):
# initialize the dimensions of the image to be resized and
# grab the image size
dim = None
(h, w) = image.shape[:2]
# if both the width and height are None, then return the
# original image
if width is None and height is None:
return image
# check to see if the width is None
if width is None:
# calculate the ratio of the height and construct the
# dimensions
r = height / float(h)
dim = (int(w * r), height)
# otherwise, the height is None
else:
# calculate the ratio of the width and construct the
# dimensions
r = width / float(w)
dim = (width, int(h * r))
# resize the image
return cv2.resize(image, dim, interpolation = inter)
Now we know how to detect faces, let's make our core function to predict the
gender of each face detected:

def predict_gender(input_path: str):


"""Predict the gender of the faces showing in the image"""
# Read Input Image
img = cv2.imread(input_path)
# resize the image, uncomment if you want to resize the image
# img = cv2.resize(img, (frame_width, frame_height))
# Take a of the initial image and resize it
frame = img.()
if frame.shape[1] > frame_width:
frame = image_resize(frame, width=frame_width)
# predict the faces
faces = get_faces(frame)
# Loop over the faces detected
# for idx, face in enumerate(faces):
for i, (start_x, start_y, end_x, end_y) in enumerate(faces):
face_img = frame[start_y: end_y, start_x: end_x]
# image --> Input image to preprocess before passing it through our dnn
for classification.
# scale factor = After performing mean substraction we can optionally
scale the image by some factor. (if 1 -> no scaling)
# size = The spatial size that the CNN expects. Options are = (224*224,
227*227 or 299*299)
# mean = mean substraction values to be substracted from every channel
of the image.
# swapRB=OpenCV assumes images in BGR whereas the mean is
supplied in RGB. To resolve this we set swapRB to True.
blob = cv2.dnn.blobFromImage(image=face_img, scalefactor=1.0, size=
(
227, 227), mean=MODEL_MEAN_VALUES, swapRB=False,
crop=False)
# Predict Gender
gender_net.setInput(blob)
gender_preds = gender_net.forward()
i = gender_preds[0].argmax()
gender = GENDER_LIST[i]
gender_confidence_score = gender_preds[0][i]
# Draw the box
label = "{}-{:.2f}%".format(gender, gender_confidence_score*100)
print(label)
yPos = start_y - 15
while yPos < 15:
yPos += 15
# get the font scale for this image size
optimal_font_scale = get_optimal_font_scale(label,((end_x-
start_x)+25))
box_color = (255, 0, 0) if gender == "Male" else (147, 20, 255)
cv2.rectangle(frame, (start_x, start_y), (end_x, end_y), box_color, 2)
# Label processed image
cv2.putText(frame, label, (start_x, yPos),
cv2.FONT_HERSHEY_SIMPLEX, optimal_font_scale,
box_color, 2)

# Display processed image


display_img("Gender Estimator", frame)
# uncomment if you want to save the image
# cv2.imwrite("output.jpg", frame)
# Cleanup
cv2.destroyAllWindows()

Here is the process of the predict_gender() function:

We read the input image using the cv2.imread() function.


We resize the image if it's above the frame_width variable, feel free to edit
fitting your needs.
We use our previously defined get_faces() function to detect faces in the
image.
We iterate over each face, draw a rectangle around it, and pass it to the
gender detection model to perform inference on the gender.
Finally, we print the gender both in the console and in the image. After
that, we simply display the image and save it to disk if we want.

Alright, let's call our function now:

if __name__ == '__main__':
# Parsing command line arguments entered by user
import sys
predict_gender(sys.argv[1])

We simply use the sys module to get the image path from the command line.
Let's test this out, I'm testing on this stock image:

$ python predict_gender.py images\\pexels-karolina-grabowska-8526635.jpg

Here is the output in the console:

Female-97.36%
Female-98.34%
And the resulting image:

Here is another example:


Or this:

Conclusion

And there you go, now you have a Python code for detecting gender on any
image using the OpenCV library. The gender model seems to be accurate.
Source Code:

predict_gender.py
# Import Libraries
import cv2
import numpy as np

# The gender model architecture


# https://drive.google.com/open?id=1W_moLzMlGiELyPxWiYQJ9KFaXroQ_NFQ
GENDER_MODEL = 'weights/deploy_gender.prototxt'
# The gender model pre-trained weights
# https://drive.google.com/open?id=1AW3WduLk1haTVAxHOkVS_BEzel1WXQHP
GENDER_PROTO = 'weights/gender_net.caffemodel'
# Each Caffe Model impose the shape of the input image also image preprocessing is required like
mean
# substraction to eliminate the effect of illunination changes
MODEL_MEAN_VALUES = (78.4263377603, 87.7689143744, 114.895847746)
# Represent the gender classes
GENDER_LIST = ['Male', 'Female']
# https://raw.githubusercontent.com/opencv/opencv/master/samples/dnn/face_detector/deploy.prototxt
FACE_PROTO = "weights/deploy.prototxt.txt"
#
https://raw.githubusercontent.com/opencv/opencv_3rdparty/dnn_samples_face_detector_20180205_fp16/res10_300x30
FACE_MODEL = "weights/res10_300x300_ssd_iter_140000_fp16.caffemodel"

# load face Caffe model


face_net = cv2.dnn.readNetFromCaffe(FACE_PROTO, FACE_MODEL)
# Load gender prediction model
gender_net = cv2.dnn.readNetFromCaffe(GENDER_MODEL, GENDER_PROTO)

# Initialize frame size


frame_width = 1280
frame_height = 720

def get_faces(frame, confidence_threshold=0.5):


# convert the frame into a blob to be ready for NN input
blob = cv2.dnn.blobFromImage(frame, 1.0, (300, 300), (104, 177.0, 123.0))
# set the image as input to the NN
face_net.setInput(blob)
# perform inference and get predictions
output = np.squeeze(face_net.forward())
# initialize the result list
faces = []
# Loop over the faces detected
for i in range(output.shape[0]):
confidence = output[i, 2]
if confidence > confidence_threshold:
box = output[i, 3:7] * \
np.array([frame.shape[1], frame.shape[0],
frame.shape[1], frame.shape[0]])
# convert to integers
start_x, start_y, end_x, end_y = box.astype(np.int)
# widen the box a little
start_x, start_y, end_x, end_y = start_x - \
10, start_y - 10, end_x + 10, end_y + 10
start_x = 0 if start_x < 0 else start_x
start_y = 0 if start_y < 0 else start_y
end_x = 0 if end_x < 0 else end_x
end_y = 0 if end_y < 0 else end_y
# append to our list
faces.append((start_x, start_y, end_x, end_y))
return faces

def display_img(title, img):


"""Displays an image on screen and maintains the output until the user presses a key"""
# Display Image on screen
cv2.imshow(title, img)
# Mantain output until user presses a key
cv2.waitKey(0)
# Destroy windows when user presses a key
cv2.destroyAllWindows()

def get_optimal_font_scale(text, width):


"""Determine the optimal font scale based on the hosting frame width"""
for scale in reversed(range(0, 60, 1)):
textSize = cv2.getTextSize(text, fontFace=cv2.FONT_HERSHEY_DUPLEX, fontScale=scale/10,
thickness=1)
new_width = textSize[0][0]
if (new_width <= width):
return scale/10
return 1

# from: https://stackoverflow.com/questions/44650888/resize-an-image-without-distortion-opencv
def image_resize(image, width = None, height = None, inter = cv2.INTER_AREA):
# initialize the dimensions of the image to be resized and
# grab the image size
dim = None
(h, w) = image.shape[:2]
# if both the width and height are None, then return the
# original image
if width is None and height is None:
return image
# check to see if the width is None
if width is None:
# calculate the ratio of the height and construct the
# dimensions
r = height / float(h)
dim = (int(w * r), height)
# otherwise, the height is None
else:
# calculate the ratio of the width and construct the
# dimensions
r = width / float(w)
dim = (width, int(h * r))
# resize the image
return cv2.resize(image, dim, interpolation = inter)

def predict_gender(input_path: str):


"""Predict the gender of the faces showing in the image"""
# Read Input Image
img = cv2.imread(input_path)
# resize the image, uncomment if you want to resize the image
# img = cv2.resize(img, (frame_width, frame_height))
# Take a of the initial image and resize it
frame = img.()
if frame.shape[1] > frame_width:
frame = image_resize(frame, width=frame_width)
# predict the faces
faces = get_faces(frame)
# Loop over the faces detected
# for idx, face in enumerate(faces):
for i, (start_x, start_y, end_x, end_y) in enumerate(faces):
face_img = frame[start_y: end_y, start_x: end_x]
# image --> Input image to preprocess before passing it through our dnn for classification.
# scale factor = After performing mean substraction we can optionally scale the image by some
factor. (if 1 -> no scaling)
# size = The spatial size that the CNN expects. Options are = (224*224, 227*227 or 299*299)
# mean = mean substraction values to be substracted from every channel of the image.
# swapRB=OpenCV assumes images in BGR whereas the mean is supplied in RGB. To resolve
this we set swapRB to True.
blob = cv2.dnn.blobFromImage(image=face_img, scalefactor=1.0, size=(
227, 227), mean=MODEL_MEAN_VALUES, swapRB=False, crop=False)
# Predict Gender
gender_net.setInput(blob)
gender_preds = gender_net.forward()
i = gender_preds[0].argmax()
gender = GENDER_LIST[i]
gender_confidence_score = gender_preds[0][i]
# Draw the box
label = "{}-{:.2f}%".format(gender, gender_confidence_score*100)
print(label)
yPos = start_y - 15
while yPos < 15:
yPos += 15
# get the font scale for this image size
optimal_font_scale = get_optimal_font_scale(label,((end_x-start_x)+25))
box_color = (255, 0, 0) if gender == "Male" else (147, 20, 255)
cv2.rectangle(frame, (start_x, start_y), (end_x, end_y), box_color, 2)
# Label processed image
cv2.putText(frame, label, (start_x, yPos),
cv2.FONT_HERSHEY_SIMPLEX, optimal_font_scale, box_color, 2)

# Display processed image


display_img("Gender Estimator", frame)
# uncomment if you want to save the image
cv2.imwrite("output.jpg", frame)
# Cleanup
cv2.destroyAllWindows()

if __name__ == '__main__':
# Parsing command line arguments entered by user
import sys
predict_gender(sys.argv[1])

predict_gender_live.py
# Import Libraries
import cv2
import numpy as np

# The gender model architecture


# https://drive.google.com/open?id=1W_moLzMlGiELyPxWiYQJ9KFaXroQ_NFQ
GENDER_MODEL = 'weights/deploy_gender.prototxt'
# The gender model pre-trained weights
# https://drive.google.com/open?id=1AW3WduLk1haTVAxHOkVS_BEzel1WXQHP
GENDER_PROTO = 'weights/gender_net.caffemodel'
# Each Caffe Model impose the shape of the input image also image preprocessing is required like
mean
# substraction to eliminate the effect of illunination changes
MODEL_MEAN_VALUES = (78.4263377603, 87.7689143744, 114.895847746)
# Represent the gender classes
GENDER_LIST = ['Male', 'Female']
# https://raw.githubusercontent.com/opencv/opencv/master/samples/dnn/face_detector/deploy.prototxt
FACE_PROTO = "weights/deploy.prototxt.txt"
#
https://raw.githubusercontent.com/opencv/opencv_3rdparty/dnn_samples_face_detector_20180205_fp16/res10_300x30
FACE_MODEL = "weights/res10_300x300_ssd_iter_140000_fp16.caffemodel"

# load face Caffe model


face_net = cv2.dnn.readNetFromCaffe(FACE_PROTO, FACE_MODEL)
# Load gender prediction model
gender_net = cv2.dnn.readNetFromCaffe(GENDER_MODEL, GENDER_PROTO)

# Initialize frame size


frame_width = 1280
frame_height = 720

def get_faces(frame, confidence_threshold=0.5):


# convert the frame into a blob to be ready for NN input
blob = cv2.dnn.blobFromImage(frame, 1.0, (300, 300), (104, 177.0, 123.0))
# set the image as input to the NN
face_net.setInput(blob)
# perform inference and get predictions
output = np.squeeze(face_net.forward())
# initialize the result list
faces = []
# Loop over the faces detected
for i in range(output.shape[0]):
confidence = output[i, 2]
if confidence > confidence_threshold:
box = output[i, 3:7] * \
np.array([frame.shape[1], frame.shape[0],
frame.shape[1], frame.shape[0]])
# convert to integers
start_x, start_y, end_x, end_y = box.astype(np.int)
# widen the box a little
start_x, start_y, end_x, end_y = start_x - \
10, start_y - 10, end_x + 10, end_y + 10
start_x = 0 if start_x < 0 else start_x
start_y = 0 if start_y < 0 else start_y
end_x = 0 if end_x < 0 else end_x
end_y = 0 if end_y < 0 else end_y
# append to our list
faces.append((start_x, start_y, end_x, end_y))
return faces

def get_optimal_font_scale(text, width):


"""Determine the optimal font scale based on the hosting frame width"""
for scale in reversed(range(0, 60, 1)):
textSize = cv2.getTextSize(text, fontFace=cv2.FONT_HERSHEY_DUPLEX, fontScale=scale/10,
thickness=1)
new_width = textSize[0][0]
if (new_width <= width):
return scale/10
return 1

# from: https://stackoverflow.com/questions/44650888/resize-an-image-without-distortion-opencv
def image_resize(image, width = None, height = None, inter = cv2.INTER_AREA):
# initialize the dimensions of the image to be resized and
# grab the image size
dim = None
(h, w) = image.shape[:2]
# if both the width and height are None, then return the
# original image
if width is None and height is None:
return image
# check to see if the width is None
if width is None:
# calculate the ratio of the height and construct the
# dimensions
r = height / float(h)
dim = (int(w * r), height)
# otherwise, the height is None
else:
# calculate the ratio of the width and construct the
# dimensions
r = width / float(w)
dim = (width, int(h * r))
# resize the image
return cv2.resize(image, dim, interpolation = inter)

def predict_gender():
"""Predict the gender of the faces showing in the image"""
# create a new cam object
cap = cv2.VideoCapture(0)

while True:
_, img = cap.read()
# resize the image, uncomment if you want to resize the image
# img = cv2.resize(img, (frame_width, frame_height))
# Take a of the initial image and resize it
frame = img.()
if frame.shape[1] > frame_width:
frame = image_resize(frame, width=frame_width)
# predict the faces
faces = get_faces(frame)
# Loop over the faces detected
# for idx, face in enumerate(faces):
for i, (start_x, start_y, end_x, end_y) in enumerate(faces):
face_img = frame[start_y: end_y, start_x: end_x]
# image --> Input image to preprocess before passing it through our dnn for classification.
# scale factor = After performing mean substraction we can optionally scale the image by some
factor. (if 1 -> no scaling)
# size = The spatial size that the CNN expects. Options are = (224*224, 227*227 or 299*299)
# mean = mean substraction values to be substracted from every channel of the image.
# swapRB=OpenCV assumes images in BGR whereas the mean is supplied in RGB. To resolve
this we set swapRB to True.
blob = cv2.dnn.blobFromImage(image=face_img, scalefactor=1.0, size=(
227, 227), mean=MODEL_MEAN_VALUES, swapRB=False, crop=False)
# Predict Gender
gender_net.setInput(blob)
gender_preds = gender_net.forward()
i = gender_preds[0].argmax()
gender = GENDER_LIST[i]
gender_confidence_score = gender_preds[0][i]
# Draw the box
label = "{}-{:.2f}%".format(gender, gender_confidence_score*100)
print(label)
yPos = start_y - 15
while yPos < 15:
yPos += 15
# get the font scale for this image size
optimal_font_scale = get_optimal_font_scale(label,((end_x-start_x)+25))
box_color = (255, 0, 0) if gender == "Male" else (147, 20, 255)
cv2.rectangle(frame, (start_x, start_y), (end_x, end_y), box_color, 2)
# Label processed image
cv2.putText(frame, label, (start_x, yPos),
cv2.FONT_HERSHEY_SIMPLEX, optimal_font_scale, box_color, 2)

# Display processed image

# frame = cv2.resize(frame, (frame_height, frame_width))


cv2.imshow("Gender Estimator", frame)
if cv2.waitKey(1) == ord("q"):
break
# uncomment if you want to save the image
# cv2.imwrite("output.jpg", frame)
# Cleanup
cv2.destroyAllWindows()

if __name__ == '__main__':
predict_gender()
PART 4: Age Detection using
OpenCV in Python
Learn how to predict someone's age from his front face picture using OpenCV library in Python

Recently, wide attention has grown in the field of computer vision, especially
in face recognition, detection, and facial landmarks localization. Many
significant features can be directly derived from the human face, such as age,
gender, and emotions.

Age estimation can be defined as the automatic process of classifying the


facial image into the exact age or to a specific age range. Basically, age
estimation from the face is still a challenging problem, and guessing an exact
age from a single image is very difficult due to factors like makeup, lighting,
obstructions, and facial expressions.

Inspired by many ubiquitous applications spread across multiple channels


like "AgeBot" on Android, "Age Calculator" on iPhone, we are going to build
a simple Age estimator using OpenCV in Python.

The primary goal of this tutorial is to develop a lightweight command-line-


based utility, through Python-based modules and it is intended to describe the
steps to automatically detect faces in a static image and to predict the age of
the spotted persons using a deep learning-based age detection model.

Please note that if you want to detect both age and gender at the same time,
check this tutorial for it.

Learn also: Gender Detection using OpenCV in Python.

The following components come into play:

OpenCV: is an open-source library for computer vision, machine


learning, and image processing. OpenCV supports a wide variety of
programming languages like Python, C++, Java and it is used for all
sorts of image and video analysis like facial detection and recognition,
photo editing, optical character recognition, and a whole heap more.
Using OpenCV comes with many benefits among which:
OpenCV is an open-source library and it is free of cost.
OpenCV is fast since it is written in C/C++.
OpenCV supports most Operating Systems such as Windows,
Linux, and macOS.
Suggestion: Check our computer vision tutorials for more
OpenCV use cases.
filetype: is a small and dependency-free Python package to infer file
and MIME types.

For the purpose of this article, we will use pre-trained Caffe models, one for
face detection taken from the face detection tutorial, and another model for
age detection. Below is the list of necessary files to include in our project
directory:

age_net.caffemodel : It is the pre-trained model weights for age detection.


You can download it here.
deploy_age.prototxt : is the model architecture for the age detection model (a
plain text file with a JSON-like structure containing all the neural
network layer’s definitions). Get it here.
res10_300x300_ssd_iter_140000_fp16.caffemodel : The pre-trained model weights
for face detection, download here.
deploy.prototxt.txt : This is the model architecture for the face detection
model, download here.

After downloading the 4 necessary files, put them in a folder and call
it "weights":
To get started, let's install OpenCV and NumPy:

$ pip install opencv-python numpy

Open up a new Python file:

# Import Libraries
import cv2
import os
import filetype
import numpy as np

# The model architecture


# download from: https://drive.google.com/open?
id=1kiusFljZc9QfcIYdU2s7xrtWHTraHwmW
AGE_MODEL = 'weights/deploy_age.prototxt'
# The model pre-trained weights
# download from: https://drive.google.com/open?
id=1kWv0AjxGSN0g31OeJa02eBGM0R_jcjIl
AGE_PROTO = 'weights/age_net.caffemodel'
# Each Caffe Model impose the shape of the input image also image
preprocessing is required like mean
# substraction to eliminate the effect of illunination changes
MODEL_MEAN_VALUES = (78.4263377603, 87.7689143744,
114.895847746)
# Represent the 8 age classes of this CNN probability layer
AGE_INTERVALS = ['(0, 2)', '(4, 6)', '(8, 12)', '(15, 20)',
'(25, 32)', '(38, 43)', '(48, 53)', '(60, 100)']
# download from:
https://raw.githubusercontent.com/opencv/opencv/master/samples/dnn/face_detector/deplo
FACE_PROTO = "weights/deploy.prototxt.txt"
# download from:
https://raw.githubusercontent.com/opencv/opencv_3rdparty/dnn_samples_face_detector_2
FACE_MODEL =
"weights/res10_300x300_ssd_iter_140000_fp16.caffemodel"
# Initialize frame size
frame_width = 1280
frame_height = 720
# load face Caffe model
face_net = cv2.dnn.readNetFromCaffe(FACE_PROTO, FACE_MODEL)
# Load age prediction model
age_net = cv2.dnn.readNetFromCaffe(AGE_MODEL, AGE_PROTO)

Here we initialized the paths of our models' weights and architecture, the
image size we gonna resize to, and finally loading the models.

The variable AGE_INTERVALS is a list of the age classes of the age detection
model.

Next, let's make a function that takes an image as input, and returns a list of
detected faces:

def get_faces(frame, confidence_threshold=0.5):


"""Returns the box coordinates of all detected faces"""
# convert the frame into a blob to be ready for NN input
blob = cv2.dnn.blobFromImage(frame, 1.0, (300, 300), (104, 177.0,
123.0))
# set the image as input to the NN
face_net.setInput(blob)
# perform inference and get predictions
output = np.squeeze(face_net.forward())
# initialize the result list
faces = []
# Loop over the faces detected
for i in range(output.shape[0]):
confidence = output[i, 2]
if confidence > confidence_threshold:
box = output[i, 3:7] * np.array([frame_width, frame_height,
frame_width, frame_height])
# convert to integers
start_x, start_y, end_x, end_y = box.astype(np.int)
# widen the box a little
start_x, start_y, end_x, end_y = start_x - \
10, start_y - 10, end_x + 10, end_y + 10
start_x = 0 if start_x < 0 else start_x
start_y = 0 if start_y < 0 else start_y
end_x = 0 if end_x < 0 else end_x
end_y = 0 if end_y < 0 else end_y
# append to our list
faces.append((start_x, start_y, end_x, end_y))
return faces

Most of the code was grabbed from the face detection tutorial, check it out
for more information on how it's done.

Let's make a utility function that displays a given image:

def display_img(title, img):


"""Displays an image on screen and maintains the output until the user
presses a key"""
# Display Image on screen
cv2.imshow(title, img)
# Mantain output until user presses a key
cv2.waitKey(0)
# Destroy windows when user presses a key
cv2.destroyAllWindows()

Next, below are two utility functions, one for finding the appropriate font size
when printing text to the image, and another for dynamically resizing an
image:

def get_optimal_font_scale(text, width):


"""Determine the optimal font scale based on the hosting frame width"""
for scale in reversed(range(0, 60, 1)):
textSize = cv2.getTextSize(text,
fontFace=cv2.FONT_HERSHEY_DUPLEX, fontScale=scale/10,
thickness=1)
new_width = textSize[0][0]
if (new_width <= width):
return scale/10
return 1

# from: https://stackoverflow.com/questions/44650888/resize-an-image-
without-distortion-opencv
def image_resize(image, width = None, height = None, inter =
cv2.INTER_AREA):
# initialize the dimensions of the image to be resized and
# grab the image size
dim = None
(h, w) = image.shape[:2]
# if both the width and height are None, then return the
# original image
if width is None and height is None:
return image
# check to see if the width is None
if width is None:
# calculate the ratio of the height and construct the
# dimensions
r = height / float(h)
dim = (int(w * r), height)
# otherwise, the height is None
else:
# calculate the ratio of the width and construct the
# dimensions
r = width / float(w)
dim = (width, int(h * r))
# resize the image
return cv2.resize(image, dim, interpolation = inter)

Now we know how to detect faces, below function is responsible for


predicting age for every face detected:

def predict_age(input_path: str):


"""Predict the age of the faces showing in the image"""
# Read Input Image
img = cv2.imread(input_path)
# Take a of the initial image and resize it
frame = img.()
if frame.shape[1] > frame_width:
frame = image_resize(frame, width=frame_width)
faces = get_faces(frame)
for i, (start_x, start_y, end_x, end_y) in enumerate(faces):
face_img = frame[start_y: end_y, start_x: end_x]
# image --> Input image to preprocess before passing it through our dnn
for classification.
blob = cv2.dnn.blobFromImage(
image=face_img, scalefactor=1.0, size=(227, 227),
mean=MODEL_MEAN_VALUES, swapRB=False
)
# Predict Age
age_net.setInput(blob)
age_preds = age_net.forward()
print("="*30, f"Face {i+1} Prediction Probabilities", "="*30)
for i in range(age_preds[0].shape[0]):
print(f"{AGE_INTERVALS[i]}: {age_preds[0, i]*100:.2f}%")
i = age_preds[0].argmax()
age = AGE_INTERVALS[i]
age_confidence_score = age_preds[0][i]
# Draw the box
label = f"Age:{age} - {age_confidence_score*100:.2f}%"
print(label)
# get the position where to put the text
yPos = start_y - 15
while yPos < 15:
yPos += 15
# write the text into the frame
cv2.putText(frame, label, (start_x, yPos),
cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0),
thickness=2)
# draw the rectangle around the face
cv2.rectangle(frame, (start_x, start_y), (end_x, end_y), color=(255, 0,
0), thickness=2)
# Display processed image
display_img('Age Estimator', frame)
# save the image if you want
# cv2.imwrite("predicted_age.jpg", frame)

Here is the complete process of the above function:

We read the image using the cv2.imread() method.


After the image is resized to the appropriate size, we use
our get_faces() function to get all detected faces.
We iterate on each face image, we set it as input to the age prediction
model to perform age prediction.
We print the probabilities of each class, as well as the dominant one.
A rectangle and text containing age are drawn on the image.
Finally, we show the final image.

You can always uncomment the cv2.imwrite() line to save the new image.

Now let's write our main code:

if __name__ == '__main__':
# Parsing command line arguments entered by user
import sys
image_path = sys.argv[1]
predict_age(image_path)

We simply use Python's built-in sys module for getting user input, as we only
need one argument from the user and that's the image path, the argparse module
would be overkill.
Let's test the code on this stock photo:

$ python predict_age.py 3-people.jpg

Output:

============================== Face 1 Prediction Probabilities


==============================
(0, 2): 0.00%
(4, 6): 0.00%
(8, 12): 31.59%
(15, 20): 0.52%
(25, 32): 52.80%
(38, 43): 14.89%
(48, 53): 0.18%
(60, 100): 0.01%
Age:(25, 32) - 52.80%
============================== Face 2 Prediction Probabilities
==============================
(0, 2): 0.00%
(4, 6): 0.05%
(8, 12): 17.63%
(15, 20): 0.02%
(25, 32): 82.22%
(38, 43): 0.06%
(48, 53): 0.01%
(60, 100): 0.00%
Age:(25, 32) - 82.22%
============================== Face 3 Prediction Probabilities
==============================
(0, 2): 0.05%
(4, 6): 0.03%
(8, 12): 0.07%
(15, 20): 0.00%
(25, 32): 1.05%
(38, 43): 0.96%
(48, 53): 86.54%
(60, 100): 11.31%
Age:(48, 53) - 86.54%

And this is the resulting image:

Wrap-up
The age detection model is heavily biased toward the age group [25-32] .
Therefore you may pinpoint this discrepancy while testing this utility.

You can always tweak some parameters to make the model more accurate.
For instance, in the get_faces() function, I've widened the box by 10 pixels on
all sides, you can always change that to any value you feel good about.
Changing frame_width and frame_height is also a way to refine the accuracy of the
prediction.

Source Code:

predict_age.py
# Import Libraries
import cv2
import os
import filetype
import numpy as np

# The model architecture


# download from: https://drive.google.com/open?id=1kiusFljZc9QfcIYdU2s7xrtWHTraHwmW
AGE_MODEL = 'weights/deploy_age.prototxt'
# The model pre-trained weights
# download from: https://drive.google.com/open?id=1kWv0AjxGSN0g31OeJa02eBGM0R_jcjIl
AGE_PROTO = 'weights/age_net.caffemodel'
# Each Caffe Model impose the shape of the input image also image preprocessing is required like
mean
# substraction to eliminate the effect of illunination changes
MODEL_MEAN_VALUES = (78.4263377603, 87.7689143744, 114.895847746)
# Represent the 8 age classes of this CNN probability layer
AGE_INTERVALS = ['(0, 2)', '(4, 6)', '(8, 12)', '(15, 20)',
'(25, 32)', '(38, 43)', '(48, 53)', '(60, 100)']
# download from:
https://raw.githubusercontent.com/opencv/opencv/master/samples/dnn/face_detector/deploy.prototxt
FACE_PROTO = "weights/deploy.prototxt.txt"
# download from:
https://raw.githubusercontent.com/opencv/opencv_3rdparty/dnn_samples_face_detector_20180205_fp16/res10_300x30
FACE_MODEL = "weights/res10_300x300_ssd_iter_140000_fp16.caffemodel"
# Initialize frame size
frame_width = 1280
frame_height = 720

# load face Caffe model


face_net = cv2.dnn.readNetFromCaffe(FACE_PROTO, FACE_MODEL)
# Load age prediction model
age_net = cv2.dnn.readNetFromCaffe(AGE_MODEL, AGE_PROTO)

def get_faces(frame, confidence_threshold=0.5):


"""Returns the box coordinates of all detected faces"""
# convert the frame into a blob to be ready for NN input
blob = cv2.dnn.blobFromImage(frame, 1.0, (300, 300), (104, 177.0, 123.0))
# set the image as input to the NN
face_net.setInput(blob)
# perform inference and get predictions
output = np.squeeze(face_net.forward())
# initialize the result list
faces = []
# Loop over the faces detected
for i in range(output.shape[0]):
confidence = output[i, 2]
if confidence > confidence_threshold:
box = output[i, 3:7] * np.array([frame.shape[1], frame.shape[0], frame.shape[1],
frame.shape[0]])
# convert to integers
start_x, start_y, end_x, end_y = box.astype(np.int)
# widen the box a little
start_x, start_y, end_x, end_y = start_x - \
10, start_y - 10, end_x + 10, end_y + 10
start_x = 0 if start_x < 0 else start_x
start_y = 0 if start_y < 0 else start_y
end_x = 0 if end_x < 0 else end_x
end_y = 0 if end_y < 0 else end_y
# append to our list
faces.append((start_x, start_y, end_x, end_y))
return faces
def display_img(title, img):
"""Displays an image on screen and maintains the output until the user presses a key"""
# Display Image on screen
cv2.imshow(title, img)
# Mantain output until user presses a key
cv2.waitKey(0)
# Destroy windows when user presses a key
cv2.destroyAllWindows()

def get_optimal_font_scale(text, width):


"""Determine the optimal font scale based on the hosting frame width"""
for scale in reversed(range(0, 60, 1)):
textSize = cv2.getTextSize(text, fontFace=cv2.FONT_HERSHEY_DUPLEX, fontScale=scale/10,
thickness=1)
new_width = textSize[0][0]
if (new_width <= width):
return scale/10
return 1

# from: https://stackoverflow.com/questions/44650888/resize-an-image-without-distortion-opencv
def image_resize(image, width = None, height = None, inter = cv2.INTER_AREA):
# initialize the dimensions of the image to be resized and
# grab the image size
dim = None
(h, w) = image.shape[:2]
# if both the width and height are None, then return the
# original image
if width is None and height is None:
return image
# check to see if the width is None
if width is None:
# calculate the ratio of the height and construct the
# dimensions
r = height / float(h)
dim = (int(w * r), height)
# otherwise, the height is None
else:
# calculate the ratio of the width and construct the
# dimensions
r = width / float(w)
dim = (width, int(h * r))
# resize the image
return cv2.resize(image, dim, interpolation = inter)

def predict_age(input_path: str):


"""Predict the age of the faces showing in the image"""
# Read Input Image
img = cv2.imread(input_path)
# Take a of the initial image and resize it
frame = img.()
if frame.shape[1] > frame_width:
frame = image_resize(frame, width=frame_width)
faces = get_faces(frame)
for i, (start_x, start_y, end_x, end_y) in enumerate(faces):
face_img = frame[start_y: end_y, start_x: end_x]
# image --> Input image to preprocess before passing it through our dnn for classification.
blob = cv2.dnn.blobFromImage(
image=face_img, scalefactor=1.0, size=(227, 227),
mean=MODEL_MEAN_VALUES, swapRB=False
)
# Predict Age
age_net.setInput(blob)
age_preds = age_net.forward()
print("="*30, f"Face {i+1} Prediction Probabilities", "="*30)
for i in range(age_preds[0].shape[0]):
print(f"{AGE_INTERVALS[i]}: {age_preds[0, i]*100:.2f}%")
i = age_preds[0].argmax()
age = AGE_INTERVALS[i]
age_confidence_score = age_preds[0][i]
# Draw the box
label = f"Age:{age} - {age_confidence_score*100:.2f}%"
print(label)
# get the position where to put the text
yPos = start_y - 15
while yPos < 15:
yPos += 15
# write the text into the frame
cv2.putText(frame, label, (start_x, yPos),
cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), thickness=2)
# draw the rectangle around the face
cv2.rectangle(frame, (start_x, start_y), (end_x, end_y), color=(255, 0, 0), thickness=2)
# Display processed image
display_img('Age Estimator', frame)
# save the image if you want
# cv2.imwrite("predicted_age.jpg", frame)

if __name__ == '__main__':
# Parsing command line arguments entered by user
import sys
image_path = sys.argv[1]
predict_age(image_path)

predict_age_live.py

# Import Libraries
import cv2
import os
import filetype
import numpy as np

# The model architecture


# download from: https://drive.google.com/open?id=1kiusFljZc9QfcIYdU2s7xrtWHTraHwmW
AGE_MODEL = 'weights/deploy_age.prototxt'
# The model pre-trained weights
# download from: https://drive.google.com/open?id=1kWv0AjxGSN0g31OeJa02eBGM0R_jcjIl
AGE_PROTO = 'weights/age_net.caffemodel'
# Each Caffe Model impose the shape of the input image also image preprocessing is required like
mean
# substraction to eliminate the effect of illunination changes
MODEL_MEAN_VALUES = (78.4263377603, 87.7689143744, 114.895847746)
# Represent the 8 age classes of this CNN probability layer
AGE_INTERVALS = ['(0, 2)', '(4, 6)', '(8, 12)', '(15, 20)',
'(25, 32)', '(38, 43)', '(48, 53)', '(60, 100)']
# download from:
https://raw.githubusercontent.com/opencv/opencv/master/samples/dnn/face_detector/deploy.prototxt
FACE_PROTO = "weights/deploy.prototxt.txt"
# download from:
https://raw.githubusercontent.com/opencv/opencv_3rdparty/dnn_samples_face_detector_20180205_fp16/res10_300x30
FACE_MODEL = "weights/res10_300x300_ssd_iter_140000_fp16.caffemodel"

# Initialize frame size


frame_width = 1280
frame_height = 720

# load face Caffe model


face_net = cv2.dnn.readNetFromCaffe(FACE_PROTO, FACE_MODEL)
# Load age prediction model
age_net = cv2.dnn.readNetFromCaffe(AGE_MODEL, AGE_PROTO)

def get_faces(frame, confidence_threshold=0.5):


"""Returns the box coordinates of all detected faces"""
# convert the frame into a blob to be ready for NN input
blob = cv2.dnn.blobFromImage(frame, 1.0, (300, 300), (104, 177.0, 123.0))
# set the image as input to the NN
face_net.setInput(blob)
# perform inference and get predictions
output = np.squeeze(face_net.forward())
# initialize the result list
faces = []
# Loop over the faces detected
for i in range(output.shape[0]):
confidence = output[i, 2]
if confidence > confidence_threshold:
box = output[i, 3:7] * np.array([frame.shape[1], frame.shape[0], frame.shape[1],
frame.shape[0]])
# convert to integers
start_x, start_y, end_x, end_y = box.astype(np.int)
# widen the box a little
start_x, start_y, end_x, end_y = start_x - \
10, start_y - 10, end_x + 10, end_y + 10
start_x = 0 if start_x < 0 else start_x
start_y = 0 if start_y < 0 else start_y
end_x = 0 if end_x < 0 else end_x
end_y = 0 if end_y < 0 else end_y
# append to our list
faces.append((start_x, start_y, end_x, end_y))
return faces

def get_optimal_font_scale(text, width):


"""Determine the optimal font scale based on the hosting frame width"""
for scale in reversed(range(0, 60, 1)):
textSize = cv2.getTextSize(text, fontFace=cv2.FONT_HERSHEY_DUPLEX, fontScale=scale/10,
thickness=1)
new_width = textSize[0][0]
if (new_width <= width):
return scale/10
return 1

# from: https://stackoverflow.com/questions/44650888/resize-an-image-without-distortion-opencv
def image_resize(image, width = None, height = None, inter = cv2.INTER_AREA):
# initialize the dimensions of the image to be resized and
# grab the image size
dim = None
(h, w) = image.shape[:2]
# if both the width and height are None, then return the
# original image
if width is None and height is None:
return image
# check to see if the width is None
if width is None:
# calculate the ratio of the height and construct the
# dimensions
r = height / float(h)
dim = (int(w * r), height)
# otherwise, the height is None
else:
# calculate the ratio of the width and construct the
# dimensions
r = width / float(w)
dim = (width, int(h * r))
# resize the image
return cv2.resize(image, dim, interpolation = inter)

def predict_age():
"""Predict the age of the faces showing in the image"""

# create a new cam object


cap = cv2.VideoCapture(0)

while True:
_, img = cap.read()
# Take a of the initial image and resize it
frame = img.()
if frame.shape[1] > frame_width:
frame = image_resize(frame, width=frame_width)
faces = get_faces(frame)
for i, (start_x, start_y, end_x, end_y) in enumerate(faces):
face_img = frame[start_y: end_y, start_x: end_x]
# image --> Input image to preprocess before passing it through our dnn for classification.
blob = cv2.dnn.blobFromImage(
image=face_img, scalefactor=1.0, size=(227, 227),
mean=MODEL_MEAN_VALUES, swapRB=False
)
# Predict Age
age_net.setInput(blob)
age_preds = age_net.forward()
print("="*30, f"Face {i+1} Prediction Probabilities", "="*30)
for i in range(age_preds[0].shape[0]):
print(f"{AGE_INTERVALS[i]}: {age_preds[0, i]*100:.2f}%")
i = age_preds[0].argmax()
age = AGE_INTERVALS[i]
age_confidence_score = age_preds[0][i]
# Draw the box
label = f"Age:{age} - {age_confidence_score*100:.2f}%"
print(label)
# get the position where to put the text
yPos = start_y - 15
while yPos < 15:
yPos += 15
# write the text into the frame
cv2.putText(frame, label, (start_x, yPos),
cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), thickness=2)
# draw the rectangle around the face
cv2.rectangle(frame, (start_x, start_y), (end_x, end_y), color=(255, 0, 0), thickness=2)
# Display processed image
cv2.imshow('Age Estimator', frame)
if cv2.waitKey(1) == ord("q"):
break
# save the image if you want
# cv2.imwrite("predicted_age.jpg", frame)
cv2.destroyAllWindows()

if __name__ == '__main__':
predict_age()
PART 5: SIFT Feature Extraction
using OpenCV in Python
Learn how to compute and detect SIFT features for feature matching and more using OpenCV library
in Python.

SIFT stands for Scale Invariant Feature Transform, it is a feature extraction


method (among others, such as HOG feature extraction) where image content
is transformed into local feature coordinates that are invariant to translation,
scale and other image transformations.

In this tutorial, you will learn the theory behind SIFT as well as how to
implement it in Python using OpenCV library.

Below are the advantages of SIFT:

Locality: Features are local; robust to occlusion and clutter.


Distinctiveness: Individual features extracted can be matched to a large
dataset of objects.
Quantity: Using SIFT, we can extract many features from small
objects.
Efficiency: SIFT is close to real-time performance.

These are the high level details of SIFT:

1. Scale-space Extrema Detection: Identify locations and scales that can


be repeatedly assigned under different views of the same scene or
object.
2. Keypoint Localization: Fit a model to determine the location and scale
of features, selecting key points based on a measure of stability.
3. Orientation Assignment: Compute best orientation(s) for each keypoint
region.
4. Keypoint Description: Use local image gradients at the selected scale
and rotation to describe each keypoint region.
Scale-space Extrema Detection
In the first step, we identify locations and scales that can be repeatedly
assigned under different views of the same object or scene. For the
identification, we will search for stable features across multiple scales using a
continuous function of scale using the gaussian function.

The scale-space of an image is a function L(x, y, a) that is produced from the


convolution of a Gaussian kernel (at different scales) with the input image.

In each octave, the initial image is repeatedly convolved with Gaussians to


produce a set of scale-space images. At each level, the image is smoothed and
reduced in size. After each octave, the Gaussian image is down-sampled by a
factor of 2 to produce an image 1/4 the size to start the next level. The
adjacent Gaussians are subtracted to produce the DoG (Difference of
Gaussians).

For creating the first octave, a gaussian filter is applied to an input image
with different values of sigma, then for the 2nd and upcoming octaves, the
image is first down-sampled by a factor of 2 then applied Gaussian filters
with different values.

The sigma values are as follows.

Octave 1 uses scale of σ.


Octave 2 uses scale of 2σ.
And so on.

The following image shows four octaves and each octave contains six
images:
A question comes around about how many scales per octave? Research
shows that there should be 4 scales per octave:

Then two consecutive images in the octave are subtracted to obtain the
difference of gaussian.
Keypoint Localization
After taking the difference of gaussian, we need to detect the maxima and
minima in the scale space by comparing a pixel (x) with 26 pixels in the
current and adjacent scale. Each point is compared to its 8 neighbors in the
current image and 9 neighbors each in the scales above and below.

The following are the extrema points found in our example image:
Orientation Assignment
Orientation assignments are done to achieve rotation invariance. The gradient magnitude a
blurred image.

The magnitude represents the intensity of the pixel and the orientation gives
the direction for the same.

The formula used for gradient magnitude is:

The formula for direction calculation is:

Now we need to look at the orientation of each point. weights are also
assigned with the direction. The arrow in the blue square below as an
approximately 90-degree angle and its length shows that how much it counts.

A histogram is formed by quantizing the orientations into 36 bins, with each


bin covering 10 degrees. The histogram will show us how many pixels have a
certain angle. For example, how many pixels have 36 degrees angle?
Keypoint Description
At this point, each keypoint has a location, scale and orientation. Now we
need to compute a descriptor for that we need to use the normalized region
around the key point. We will first take a 16×16 neighborhood around the
key point. This 16×16 block is further divided into 4×4 sub-blocks and for
each of these sub-blocks, we generate the histogram using magnitude and
orientation.

Concatenate 16 histograms in one long vector of 128 dimensions. 4x4 times 8


directions gives a vector of 128 values.

Read also: How to Apply HOG Feature Extraction in Python.

Python Implementation
Now you hopefully understand the theory behind SIFT, let's dive into the
Python code using OpenCV. First, let's install a specific version of OpenCV
which implements SIFT:

pip3 install numpy opencv-python==3.4.2.16 opencv-contrib-


python==3.4.2.16

Open up a new Python file and follow along, I'm gonna operate on this table
that contain a specific book (get it here):

import cv2

# reading the image


img = cv2.imread('table.jpg')
# convert to greyscale
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

The above code loads the image and convert it to grayscale, let's create SIFT
feature extractor object:

# create SIFT feature extractor


sift = cv2.xfeatures2d.SIFT_create()

To detect the keypoints and descriptors, we simply pass the image


to detectAndCompute() method:

# detect features from the image


keypoints, descriptors = sift.detectAndCompute(img, None)
Finally, let's draw the keypoints, show and save the image:

# draw the detected key points


sift_image = cv2.drawKeypoints(gray, keypoints, img)
# show the image
cv2.imshow('image', sift_image)
# save the image
cv2.imwrite("table-sift.jpg", sift_image)
cv2.waitKey(0)
cv2.destroyAllWindows()

Here is the resulting image:


These SIFT feature points are useful for many use-cases, here are some:

Image alignment (homography, fundamental matrix)


Feature matching
3D reconstruction
Motion tracking
Object recognition
Indexing and database retrieval
Robot navigation

To make a real-world use in this demonstration, we're picking feature


matching, let's use OpenCV to match 2 images of the same object from
different angles (you can get the images in this Github repository):
import cv2

# read the images


img1 = cv2.imread('book.jpg')
img2 = cv2.imread('table.jpg')
# convert images to grayscale
img1 = cv2.cvtColor(img1, cv2.COLOR_BGR2GRAY)
img2 = cv2.cvtColor(img2, cv2.COLOR_BGR2GRAY)
# create SIFT object
sift = cv2.xfeatures2d.SIFT_create()
# detect SIFT features in both images
keypoints_1, descriptors_1 = sift.detectAndCompute(img1,None)
keypoints_2, descriptors_2 = sift.detectAndCompute(img2,None)

Now that we have keypoints and descriptors of both images, let's make a
matcher to match the descriptors:

# create feature matcher


bf = cv2.BFMatcher(cv2.NORM_L1, crossCheck=True)
# match descriptors of both images
matches = bf.match(descriptors_1,descriptors_2)

Let's sort the matches by distance and draw the first 50 matches:

# sort matches by distance


matches = sorted(matches, key = lambda x:x.distance)
# draw first 50 matches
matched_img = cv2.drawMatches(img1, keypoints_1, img2, keypoints_2,
matches[:50], img2, flags=2)
Finally, showing and saving the image:

# show the image


cv2.imshow('image', matched_img)
# save the image
cv2.imwrite("matched_images.jpg", matched_img)
cv2.waitKey(0)
cv2.destroyAllWindows()

Output:
Conclusion
Alright, in this tutorial, we've covered the basics of SIFT, I suggest you
read the original paper for more detailed information.

Also, OpenCV uses the default parameters of SIFT


in cv2.xfeatures2d.SIFT_create() method, you can change the number of features to
retain ( nfeatures ), nOctaveLayers , sigma and more.
Type help(cv2.xfeatures2d.SIFT_create) for more information.

Source Code:

sift.py
import cv2

# reading the image


img = cv2.imread('table.jpg')
# convert to greyscale
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# create SIFT feature extractor
sift = cv2.xfeatures2d.SIFT_create()
# detect features from the image
keypoints, descriptors = sift.detectAndCompute(img, None)
# draw the detected key points
sift_image = cv2.drawKeypoints(gray, keypoints, img)
# show the image
cv2.imshow('image', sift_image)
# save the image
cv2.imwrite("table-sift.jpg", sift_image)
cv2.waitKey(0)
cv2.destroyAllWindows()
feature_match.py
import cv2

# read the images


img1 = cv2.imread('book.jpg')
img2 = cv2.imread('table.jpg')

# convert images to grayscale


img1 = cv2.cvtColor(img1, cv2.COLOR_BGR2GRAY)
img2 = cv2.cvtColor(img2, cv2.COLOR_BGR2GRAY)

# create SIFT object


sift = cv2.xfeatures2d.SIFT_create()
# detect SIFT features in both images
keypoints_1, descriptors_1 = sift.detectAndCompute(img1,None)
keypoints_2, descriptors_2 = sift.detectAndCompute(img2,None)
# create feature matcher
bf = cv2.BFMatcher(cv2.NORM_L1, crossCheck=True)
# match descriptors of both images
matches = bf.match(descriptors_1,descriptors_2)
# sort matches by distance
matches = sorted(matches, key = lambda x:x.distance)
# draw first 50 matches
matched_img = cv2.drawMatches(img1, keypoints_1, img2, keypoints_2, matches[:50], img2, flags=2)
# show the image
cv2.imshow('image', matched_img)
# save the image
cv2.imwrite("matched_images.jpg", matched_img)
cv2.waitKey(0)
cv2.destroyAllWindows()
PART 6: How to Apply HOG Feature
Extraction in Python
Learn how to use scikit-image library to extract Histogram of Oriented Gradient (HOG) features from
images in Python.

The Histogram of Oriented Gradients (HOG) is a feature descriptor used


in computer vision and image processing applications for the purpose of
the object detection. It is a technique that counts events of gradient
orientation in a specific portion of an image or region of interest.

In 2005, Dalal and Triggs published a research paper named Histograms of


Oriented Gradients for Human Detection. After the release of this paper,
HOG is used in a lot of object detection applications.

Here are the most important aspects of HOG:

HOG focuses on the structure of the object. It extracts the information


of the edges magnitude as well as the orientation of the edges.
It uses a detection window of 64x128 pixels, so the image is first
converted into (64, 128) shape.
The image is then further divided into small parts, and then the gradient
and orientation of each part is calculated. It is divided into 8x16 cells
into blocks with 50% overlap, so there are going to be 7x15 =
105 blocks in total, and each block consists of 2x2 cells
with 8x8 pixels.
We take the 64 gradient vectors of each block (8x8 pixel cell) and put
them into a 9-bin histogram.
Below are the essential steps we take on HOG feature extraction:

Resizing the Image


As mentioned previously, if you have a wide image, then crop the image to
the specific part in which you want to apply HOG feature extraction, and then
resize it to the appropriate shape.

Calculating Gradients
Now after resizing, we need to calculate the gradient in the x and y direction.
The gradient is simply the small changes in the x and y directions, we need to
convolve two simple filters on the image.

The filter for calculating gradient in the x-direction is:

The following is when we apply this filter


to an image:

The filter for calculating gradient in the y-direction is:


The following is when we apply this filter to an image:

Calculating the Magnitude


To calculate the magnitude of the gradient, the following formula is used:
Calculating the Orientation
The gradient direction is given by:

Let's take an example, say we


have the matrix below:

The gradient in the x-axis will simply be 94-56 = 38,


and 93-55 = 38 in the y-axis.

The magnitude will be:


And the gradient direction will be:

Python Code

Now that we understand the theory, let's take a look on how we can
use scikit-image library to extract HOG features from images.

First, let's install the necessary libraries for this tutorial:

pip3 install scikit-image matplotlib

I'm gonna perform HOG on a cute cat image, get it here and put it in the
current working directory (you can use any image you want, of course). Let's
load the image and show it:

#importing required libraries


from skimage.io import imread
from skimage.transform import resize
from skimage.feature import hog
from skimage import exposure
import matplotlib.pyplot as plt

# reading the image


img = imread('cat.jpg')
plt.axis("off")
plt.imshow(img)
print(img.shape)
Output:

(1349, 1012, 3)

Resizing the image:

# resizing image
resized_img = resize(img, (128*4, 64*4))
plt.axis("off")
plt.imshow(resized_img)
print(resized_img.shape)

Output:

(128, 64, 3)
Now we simply use hog() function from scikit-image
library:

#creating hog features


fd, hog_image = hog(resized_img, orientations=9, pixels_per_cell=(8, 8),
cells_per_block=(2, 2), visualize=True, multichannel=True)
plt.axis("off")
plt.imshow(hog_image, cmap="gray")

Output:
The hog() function takes 6 parameters as input:

image : The target image you want to apply HOG feature extraction.
orientations : Number of bins in the histogram we want to create, the
original research paper used 9 bins so we will pass 9 as orientations.
pixels_per_cell : Determines the size of the cell, as we mentioned earlier, it
is 8x8.
cells_per_block : Number of cells per block, will be 2x2 as mentioned
previously.
visualize : A boolean whether to return the image of the HOG, we set it
to True so we can show the image.
multichannel : We set it to True to tell the function that the last dimension is
considered as a color channel, instead of spatial.

Finally, if you want to save the images:

# save the images


plt.imsave("resized_img.jpg", resized_img)
plt.imsave("hog_image.jpg", hog_image, cmap="gray")

Conclusion
Alright, now you know how to perform HOG feature extraction in Python
with the help of scikit-image library.

Source Code:

hog.py
#importing required libraries
from skimage.io import imread
from skimage.transform import resize
from skimage.feature import hog
import matplotlib.pyplot as plt

#reading the image


img = imread('cat.jpg')
plt.axis("off")
plt.imshow(img)
print(img.shape)

#resizing image
resized_img = resize(img, (128*4, 64*4))
plt.axis("off")
plt.imshow(resized_img)
plt.show()
print(resized_img.shape)

#creating hog features


fd, hog_image = hog(resized_img, orientations=9, pixels_per_cell=(8, 8),
cells_per_block=(2, 2), visualize=True, multichannel=True)
print(fd.shape)
print(hog_image.shape)
plt.axis("off")
plt.imshow(hog_image, cmap="gray")
plt.show()
# save the images
plt.imsave("resized_img.jpg", resized_img)
plt.imsave("hog_image.jpg", hog_image, cmap="gray")
PART 7: Image Transformations
using OpenCV in Python
Learn how to perform perspective image transformation techniques such as image translation,
reflection, rotation, scaling, shearing and cropping using OpenCV library in Python.

Introduction
Image transformation is a coordinate changing function, it maps some (x,
y) points in one coordinate system to points (x', y') in another coordinate
system.

For example, if we have (2, 3) points in x-y coordinate, and we plot the same
point in u-v coordinate, the same point is represented in different ways, as
shown in the figure below:

Here is the table of contents:


The Use of Image Transformation
Image Translation
Image Scaling
Image Shearing
Shearing in the x-axis Direction
Shearing in the y-axis Direction
Image Reflection
Image Rotation
Image Cropping
Conclusion
The Use of Image Transformation
In the image below, the geometric relation between the comic book and the
image on the right side is based on the similarity transformation (rotation,
translation, and scaling). If we need to train a machine learning model that
finds this comic book, then we need to input the image in a different shape
and angle.

Image transformation techniques can help us a lot in the preprocessing phase


of images in machine learning.

Matrices can represent images. Each value in a matrix is a pixel value at a


specific coordinate. Image transformation can be performed using matrix
multiplication. Mathematicians have worked out some matrices that can be
used to accomplish certain transformation operations.

Image Translation
Image translation is the rectilinear shift of an image from one location to
another, so the shifting of an object is called translation. The matrix shown
below is used for the translation of the image:

The value of bx defines how


much the image will be moved on the x-axis and the value of by determines
the movement of the image on the y-axis:

Now that you understand image translation, let's take a look at the Python
code. In OpenCV, there are two built-in functions for performing
transformations:

cv2.warpPerspective :takes (3x3) transformation matrix as input.


cv2.warpAffine : takes a (2x3) transformation matrix as input.

Both functions take three input parameters:

The input image.


Transformation matrix.
A tuple of the height and width of the image.

In this tutorial, we'll use cv2.warpPerspective() function.

The below code reads an input image (if you want the exact output, get the
demo image here and put it in the current working directory), translates it,
and shows it:

import numpy as np
import cv2
import matplotlib.pyplot as plt

# read the input image


img = cv2.imread("city.jpg")
# convert from BGR to RGB so we can plot using matplotlib
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
# disable x & y axis
plt.axis('off')
# show the image
plt.imshow(img)
plt.show()
# get the image shape
rows, cols, dim = img.shape
# transformation matrix for translation
M = np.float32([[1, 0, 50],
[0, 1, 50],
[0, 0, 1]])
# apply a perspective transformation to the image
translated_img = cv2.warpPerspective(img, M, (cols, rows))
# disable x & y axis
plt.axis('off')
# show the resulting image
plt.imshow(translated_img)
plt.show()
# save the resulting image to disk
plt.imsave("city_translated.jpg", translated_img)

Note that we use plt.axis('off') as we do not want to output the axis values, and
we show the image using matplotlib's imshow() function.

We also use plt.imsave() function to save the image locally.

Original image:
Translated image:

Image
Scaling
Image scaling is a process used to resize a digital image. OpenCV has a built-
in function cv2.resize() , but we will perform transformation using matrix
multiplication as previously. The matrix used for scaling is shown below:
Sx and Sy are the scaling
factors for the x-axis and y-axis, respectively.

The below code is responsible for reading the same image, defining the
transformation matrix for scaling, and showing the resulting image:

import numpy as np
import cv2
import matplotlib.pyplot as plt

# read the input image


img = cv2.imread("city.jpg")
# convert from BGR to RGB so we can plot using matplotlib
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
# disable x & y axis
plt.axis('off')
# show the image
plt.imshow(img)
plt.show()
# get the image shape
rows, cols, dim = img.shape
#transformation matrix for Scaling
M = np.float32([[1.5, 0 , 0],
[0, 1.8, 0],
[0, 0, 1]])
# apply a perspective transformation to the image
scaled_img = cv2.warpPerspective(img,M,(cols*2,rows*2))
# disable x & y axis
plt.axis('off')
# show the resulting image
plt.imshow(scaled_img)
plt.show()
# save the resulting image to disk
plt.imsave("city_scaled.jpg", scaled_img)

Output image:
Note that you can easily remove those black pixels with cropping, we'll cover
that in the end of the tutorial.

Read Also: How to Blur Faces in Images using OpenCV in Python.

Image Shearing
Shear mapping is a linear map that displaces each point in a fixed direction, it
substitutes every point horizontally or vertically by a specific value in
proportion to its x or y coordinates, there are two types of shearing effects.

Shearing in the x-axis Direction


When shearing is done in the x-axis direction, the boundaries of the image
that are parallel to the x-axis keep their location, and the edges parallel to the
y-axis change their place depending on the shearing factor:
Shearing in the y-axis Direction
When shearing is done in the y-axis direction, the boundaries of the image
that are parallel to the y-axis keep their location, and the edges parallel to the
x-axis change their place depending on the shearing factor.

The matrix for shearing is shown in the below figure:

Below is the code


responsible for shearing:

import numpy as np
import cv2
import matplotlib.pyplot as plt
# read the input image
img = cv2.imread("city.jpg")
# convert from BGR to RGB so we can plot using matplotlib
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
# disable x & y axis
plt.axis('off')
# show the image
plt.imshow(img)
plt.show()
# get the image shape
rows, cols, dim = img.shape
# transformation matrix for Shearing
# shearing applied to x-axis
M = np.float32([[1, 0.5, 0],
[0, 1 , 0],
[0, 0 , 1]])
# shearing applied to y-axis
# M = np.float32([[1, 0, 0],
# [0.5, 1, 0],
# [0, 0, 1]])
# apply a perspective transformation to the image
sheared_img = cv2.warpPerspective(img,M,(int(cols*1.5),int(rows*1.5)))
# disable x & y axis
plt.axis('off')
# show the resulting image
plt.imshow(sheared_img)
plt.show()
# save the resulting image to disk
plt.imsave("city_sheared.jpg", sheared_img)
The first matrix is shearing applied to the x-axis, if you want the y-axis, then
comment the first matrix and uncomment the second one.

X-axis sheared image:

Y-axis sheared image:


Related: Face Detection using OpenCV in Python.

Image Reflection
Image reflection (or mirroring) is useful for flipping an image, it can flip the
image vertically as well as horizontally, which is a particular case of scaling.
For reflection along the x-axis, we set the value of Sy to -1, and Sx to 1, and
vice-versa for the y-axis reflection.

The transformation matrix for reflection is shown below:

Here is the Python code for


reflection:
import numpy as np
import cv2
import matplotlib.pyplot as plt

# read the input image


img = cv2.imread("city.jpg")
# convert from BGR to RGB so we can plot using matplotlib
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
# disable x & y axis
plt.axis('off')
# show the image
plt.imshow(img)
plt.show()
# get the image shape
rows, cols, dim = img.shape
# transformation matrix for x-axis reflection
M = np.float32([[1, 0, 0 ],
[0, -1, rows],
[0, 0, 1 ]])
# transformation matrix for y-axis reflection
# M = np.float32([[-1, 0, cols],
# [ 0, 1, 0 ],
# [ 0, 0, 1 ]])
# apply a perspective transformation to the image
reflected_img = cv2.warpPerspective(img,M,(int(cols),int(rows)))
# disable x & y axis
plt.axis('off')
# show the resulting image
plt.imshow(reflected_img)
plt.show()
# save the resulting image to disk
plt.imsave("city_reflected.jpg", reflected_img)

As previously, this will reflect its x-axis, if you want y-axis reflection,
uncomment the second matrix and comment on the first one.

X-axis reflected image:

Y-axis reflected image:


Image
Rotation
Rotation is a concept in mathematics that is a motion of a certain space that
preserves at least one point. Image rotation is a common image processing
routine with applications in matching, alignment, and other image-based
algorithms, it is also extensively in data augmentation, especially when it
comes to image classification.

The
transformation matrix of rotation is shown in the below figure, where theta
(θ) is the angle of rotation:
Below is the Python
code for image rotation:

import numpy as np
import cv2
import matplotlib.pyplot as plt

# read the input image


img = cv2.imread("city.jpg")
# convert from BGR to RGB so we can plot using matplotlib
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
# disable x & y axis
plt.axis('off')
# show the image
plt.imshow(img)
plt.show()
# get the image shape
rows, cols, dim = img.shape
#angle from degree to radian
angle = np.radians(10)
#transformation matrix for Rotation
M = np.float32([[np.cos(angle), -(np.sin(angle)), 0],
[np.sin(angle), np.cos(angle), 0],
[0, 0, 1]])
# apply a perspective transformation to the image
rotated_img = cv2.warpPerspective(img, M, (int(cols),int(rows)))
# disable x & y axis
plt.axis('off')
# show the resulting image
plt.imshow(rotated_img)
plt.show()
# save the resulting image to disk
plt.imsave("city_rotated.jpg", rotated_img)

Output image:

This was
rotated by 10° ( np.radians(10) ), you're free to edit it as you wish!

Related: How to Perform Edge Detection in Python using OpenCV.


Image Cropping
Image cropping is the removal of unwanted outer areas from an image, a lot
of the above examples introduced black pixels, you can easily remove them
using cropping. The below code does that:

import numpy as np
import cv2
import matplotlib.pyplot as plt

# read the input image


img = cv2.imread("city.jpg")
# convert from BGR to RGB so we can plot using matplotlib
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
# disable x & y axis
plt.axis('off')
# show the image
plt.imshow(img)
plt.show()
# get 200 pixels from 100 to 300 on both x-axis & y-axis
# change that if you will, just make sure you don't exceed cols & rows
cropped_img = img[100:300, 100:300]
# disable x & y axis
plt.axis('off')
# show the resulting image
plt.imshow(cropped_img)
plt.show()
# save the resulting image to disk
plt.imsave("city_cropped.jpg", cropped_img)
Since OpenCV loads the image as a numpy array, we can crop the image
simply by indexing the array, in our case, we chose to get 200 pixels
from 100 to 300 on both axes, here is the output image:

Conclusion
In this tutorial, we've covered the basics of image processing and
transformation, which are image translation, scaling, shearing, reflection,
rotation, and cropping.

Source Code:

translation.py
import numpy as np
import cv2
import matplotlib.pyplot as plt

# read the input image


img = cv2.imread("city.jpg")
# convert from BGR to RGB so we can plot using matplotlib
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
# disable x & y axis
plt.axis('off')
# show the image
plt.imshow(img)
plt.show()

# get the image shape


rows, cols, dim = img.shape

# transformation matrix for translation


M = np.float32([[1, 0, 50],
[0, 1, 50],
[0, 0, 1]])
# apply a perspective transformation to the image
translated_img = cv2.warpPerspective(img, M, (cols, rows))
# disable x & y axis
plt.axis('off')
# show the resulting image
plt.imshow(translated_img)
plt.show()
# save the resulting image to disk
plt.imsave("city_translated.jpg", translated_img)

scaling.py

import numpy as np
import cv2
import matplotlib.pyplot as plt

# read the input image


img = cv2.imread("city.jpg")
# convert from BGR to RGB so we can plot using matplotlib
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
# disable x & y axis
plt.axis('off')
# show the image
plt.imshow(img)
plt.show()

# get the image shape


rows, cols, dim = img.shape

#transformation matrix for Scaling


M = np.float32([[1.5, 0 , 0],
[0, 1.8, 0],
[0, 0, 1]])
# apply a perspective transformation to the image
scaled_img = cv2.warpPerspective(img,M,(cols*2,rows*2))
# disable x & y axis
plt.axis('off')
# show the resulting image
plt.imshow(scaled_img)
plt.show()
# save the resulting image to disk
plt.imsave("city_scaled.jpg", scaled_img)

shearing.py
import numpy as np
import cv2
import matplotlib.pyplot as plt

# read the input image


img = cv2.imread("city.jpg")
# convert from BGR to RGB so we can plot using matplotlib
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
# disable x & y axis
plt.axis('off')
# show the image
plt.imshow(img)
plt.show()

# get the image shape


rows, cols, dim = img.shape

# transformation matrix for Shearing


# shearing applied to x-axis
M = np.float32([[1, 0.5, 0],
[0, 1 , 0],
[0, 0 , 1]])
# shearing applied to y-axis
# M = np.float32([[1, 0, 0],
# [0.5, 1, 0],
# [0, 0, 1]])

# apply a perspective transformation to the image


sheared_img = cv2.warpPerspective(img,M,(int(cols*1.5),int(rows*1.5)))
# disable x & y axis
plt.axis('off')
# show the resulting image
plt.imshow(sheared_img)
plt.show()
# save the resulting image to disk
plt.imsave("city_sheared.jpg", sheared_img)

reflection.py

import numpy as np
import cv2
import matplotlib.pyplot as plt

# read the input image


img = cv2.imread("city.jpg")
# convert from BGR to RGB so we can plot using matplotlib
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
# disable x & y axis
plt.axis('off')
# show the image
plt.imshow(img)
plt.show()

# get the image shape


rows, cols, dim = img.shape

# transformation matrix for x-axis reflection


M = np.float32([[1, 0, 0 ],
[0, -1, rows],
[0, 0, 1 ]])
# transformation matrix for y-axis reflection
# M = np.float32([[-1, 0, cols],
# [ 0, 1, 0 ],
# [ 0, 0, 1 ]])
# apply a perspective transformation to the image
reflected_img = cv2.warpPerspective(img,M,(int(cols),int(rows)))
# disable x & y axis
plt.axis('off')
# show the resulting image
plt.imshow(reflected_img)
plt.show()
# save the resulting image to disk
plt.imsave("city_reflected.jpg", reflected_img)

rotation.py
import numpy as np
import cv2
import matplotlib.pyplot as plt

# read the input image


img = cv2.imread("city.jpg")
# convert from BGR to RGB so we can plot using matplotlib
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
# disable x & y axis
plt.axis('off')
# show the image
plt.imshow(img)
plt.show()

# get the image shape


rows, cols, dim = img.shape

#angle from degree to radian


angle = np.radians(10)
#transformation matrix for Rotation
M = np.float32([[np.cos(angle), -(np.sin(angle)), 0],
[np.sin(angle), np.cos(angle), 0],
[0, 0, 1]])
# apply a perspective transformation to the image
rotated_img = cv2.warpPerspective(img, M, (int(cols),int(rows)))
# disable x & y axis
plt.axis('off')
# show the resulting image
plt.imshow(rotated_img)
plt.show()
# save the resulting image to disk
plt.imsave("city_rotated.jpg", rotated_img)

cropping.py

import numpy as np
import cv2
import matplotlib.pyplot as plt

# read the input image


img = cv2.imread("city.jpg")
# convert from BGR to RGB so we can plot using matplotlib
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
# disable x & y axis
plt.axis('off')
# show the image
plt.imshow(img)
plt.show()

# get 200 pixels from 100 to 300 on both x-axis & y-axis
# change that if you will, just make sure you don't exceed cols & rows
cropped_img = img[100:300, 100:300]
# disable x & y axis
plt.axis('off')
# show the resulting image
plt.imshow(cropped_img)
plt.show()
# save the resulting image to disk
plt.imsave("city_cropped.jpg", cropped_img)
PART 8: How to Make a Barcode
Reader in Python
Learn how to make a barcode scanner that decodes barcodes and draw them in the image using pyzbar
and OpenCV libraries in Python

A barcode is a method of representing data in a visual and machine-readable


form, it consists of bars and spaces. Today, we see barcodes everywhere,
especially in products in supermarkets.

Barcodes can be read by an optical barcode scanner, but in this tutorial, we


will make a script in Python that is able to read and decode barcodes, as well
as a drawing where they're located in a given image.

Related: How to Extract Frames from Video in Python.

To get started, we need to install few libraries:

pip3 install pyzbar opencv-python

Once you have these installed, open up a new Python file and import them:

from pyzbar import pyzbar


import cv2

I have few images to test with, you can use any image you want from the
internet or your own disk, but you can get my test images in this directory.

I have wrapped every functionality into a function, the first function we


gonna discuss is the following:

def decode(image):
# decodes all barcodes from an image
decoded_objects = pyzbar.decode(image)
for obj in decoded_objects:
# draw the barcode
print("detected barcode:", obj)
image = draw_barcode(obj, image)
# print barcode type & data
print("Type:", obj.type)
print("Data:", obj.data)
print()

return image

decode()function takes an image as a numpy array, and uses pyzbar.decode() that


is responsible for decoding all barcodes from a single image and returns a
bunch of useful information about each barcode detected.

We then iterate over all detected barcodes and draw a rectangle around the
barcode and prints the type and the data of the barcode.

To make things clear, the following is how each obj looked like if we print it:

Decoded(data=b'43770929851162', type='I25', rect=Rect(left=62, top=0,


width=694, height=180), polygon=[Point(x=62, y=1), Point(x=62, y=179),
Point(x=756, y=180), Point(x=756, y=0)])

So pyzbar.decode() function returns the data containing the barcode, the type of
barcode, as well as the location points as a rectangle and a polygon.

This brings us to the next function that we used, draw_barcode() :


def draw_barcode(decoded, image):
# n_points = len(decoded.polygon)
# for i in range(n_points):
# image = cv2.line(image, decoded.polygon[i], decoded.polygon[(i+1)
% n_points], color=(0, 255, 0), thickness=5)
# uncomment above and comment below if you want to draw a polygon
and not a rectangle
image = cv2.rectangle(image, (decoded.rect.left, decoded.rect.top),
(decoded.rect.left + decoded.rect.width, decoded.rect.top +
decoded.rect.height),
color=(0, 255, 0),
thickness=5)
return image

This function takes the decoded object we just saw, and the image itself, it
draws a rectangle around the barcode using cv2.rectangle() function, or you can
uncomment the other version of the function; drawing the polygon
using cv2.line() function, the choice is yours. I preferred the rectangle version.

Finally, it returns the image that contains the drawn barcodes. Now let's use
these functions for our example images:

if __name__ == "__main__":
from glob import glob

barcodes = glob("barcode*.png")
for barcode_file in barcodes:
# load the image to opencv
img = cv2.imread(barcode_file)
# decode detected barcodes & get the image
# that is drawn
img = decode(img)
# show the image
cv2.imshow("img", img)
cv2.waitKey(0)

In my current directory, I have barcode1.png, barcode2.png,


and barcode3.png, which are all example images of a scanned barcode, I
used glob so I can get all these images as a list and iterate over them.

On each file, we load it using cv2.imread() function, and use the previously
discussed decode() function to decode the barcodes and then we show the
actual image.

Note that this will also detect QR codes, and that's fine, but for more accurate
results, I suggest you check the dedicated tutorial for detecting and
generating qr codes in Python.

When I run the script, it shows each image and prints the type and data of it,
press any key and you'll get the next image, here is my output:

detected barcode: Decoded(data=b'0036000291452', type='EAN13',


rect=Rect(left=124, top=58, width=965, height=812), polygon=[Point(x=124,
y=59), Point(x=124, y=869), Point(x=621, y=870), Point(x=1089, y=870),
Point(x=1089, y=58)])
Type: EAN13
Data: b'0036000291452'

detected barcode: Decoded(data=b'Wikipedia', type='CODE128',


rect=Rect(left=593, top=4, width=0, height=294), polygon=[Point(x=593,
y=4), Point(x=593, y=298)])
Type: CODE128
Data: b'Wikipedia'
detected barcode: Decoded(data=b'43770929851162', type='I25',
rect=Rect(left=62, top=0, width=694, height=180), polygon=[Point(x=62,
y=1), Point(x=62, y=179), Point(x=756, y=180), Point(x=756, y=0)])
Type: I25
Data: b'43770929851162'

Here is the last image that is shown:

Conclusion
That is awesome, now you have a great tool to make your own barcode
scanner in Python. I know you all want to read directly from the camera, as a
result, I have prepared the code that reads from the camera and detects
barcodes in a live manner, check it here!

You can also add some sort of a beep when each barcode is detected, just like
in supermarkets, check the tutorial for playing sounds that may help you
accomplish that.

Source Code:

barcode_reader.py
from pyzbar import pyzbar
import cv2

def decode(image):
# decodes all barcodes from an image
decoded_objects = pyzbar.decode(image)
for obj in decoded_objects:
# draw the barcode
print("detected barcode:", obj)
image = draw_barcode(obj, image)
# print barcode type & data
print("Type:", obj.type)
print("Data:", obj.data)
print()

return image

def draw_barcode(decoded, image):


# n_points = len(decoded.polygon)
# for i in range(n_points):
# image = cv2.line(image, decoded.polygon[i], decoded.polygon[(i+1) % n_points], color=(0, 255,
0), thickness=5)
# uncomment above and comment below if you want to draw a polygon and not a rectangle
image = cv2.rectangle(image, (decoded.rect.left, decoded.rect.top),
(decoded.rect.left + decoded.rect.width, decoded.rect.top + decoded.rect.height),
color=(0, 255, 0),
thickness=5)
return image

if __name__ == "__main__":
from glob import glob

barcodes = glob("barcode*.png")
for barcode_file in barcodes:
# load the image to opencv
img = cv2.imread(barcode_file)
# decode detected barcodes & get the image
# that is drawn
img = decode(img)
# show the image
cv2.imshow("img", img)
cv2.waitKey(0)

live_barcode_reader.py

from pyzbar import pyzbar


import cv2

def draw_barcode(decoded, image):


# n_points = len(decoded.polygon)
# for i in range(n_points):
# image = cv2.line(image, decoded.polygon[i], decoded.polygon[(i+1) % n_points], color=(0, 255,
0), thickness=5)
image = cv2.rectangle(image, (decoded.rect.left, decoded.rect.top),
(decoded.rect.left + decoded.rect.width, decoded.rect.top + decoded.rect.height),
color=(0, 255, 0),
thickness=5)
return image

def decode(image):
# decodes all barcodes from an image
decoded_objects = pyzbar.decode(image)
for obj in decoded_objects:
# draw the barcode
image = draw_barcode(obj, image)
# print barcode type & data
print("Type:", obj.type)
print("Data:", obj.data)
print()

return image
if __name__ == "__main__":
cap = cv2.VideoCapture(0)
while True:
# read the frame from the camera
_, frame = cap.read()
# decode detected barcodes & get the image
# that is drawn
frame, decoded_objects = decode(frame)
# show the image in the window
cv2.imshow("frame", frame)
if cv2.waitKey(1) == ord("q"):
break
PART 9: How to Perform Malaria
Classification using TensorFlow 2 and
Keras in Python
Learn how to build a deep learning malaria detection model to classify cell images to either infected or
not infected with Malaria Tensorflow 2 and Keras API in Python.

Deep Learning use cases in medicine have known a big leap those past years,
from patient automatic diagnosis to computer vision, many cutting-edge
models are being developed in this domain.

In this tutorial, we will implement a deep learning model using TensorFlow


(Keras API) for a binary classification task which consists of labeling cells'
images into either infected or not with Malaria.

Installing required libraries and frameworks:

pip install numpy tensorflow opencv-python sklearn matplotlib

Downloading the Dataset

We gonna be using Malaria Cell Images Dataset from Kaggle, after


downloading and unzipping the folder, you'll see cell_images, this folder will
contain two subfolders: Parasitized, Uninfected and another
duplicated cell_images folder, feel free to delete that one.

I also invite you to move an image from both classes to another


folder testing-samples, so we can make inferences on it when we finish
training our model.
Image Preprocessing with OpenCV
OpenCV is an optimized open-source library for image processing
and computer vision. We will use it to preprocess our images and turn them
to greyscale in the form of a NumPy array (numerical format) and resize it to
a (70x70) shape:

import cv2
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Conv2D, MaxPool2D, Flatten,
Activation
from sklearn.model_selection import train_test_split
import numpy as np
import matplotlib.pyplot as plt
import glob
import os
# after you extract the dataset,
# put cell_images folder in the working directory
img_dir="cell_images"
img_size=70

def load_img_data(path):
image_files = glob.glob(os.path.join(path, "Parasitized/*.png")) + \
glob.glob(os.path.join(path, "Uninfected/*.png"))
X, y = [], []
for image_file in image_files:
# 0 for uninfected and 1 for infected
label = 0 if "Uninfected" in image_file else 1
# load the image in gray scale
img_arr = cv2.imread(image_file, cv2.IMREAD_GRAYSCALE)
# resize the image to (70x70)
img_resized = cv2.resize(img_arr, (img_size, img_size))
X.append(img_resized)
y.append(label)
return X, y

We used glob built-in module to get all images in that format (ending
with .png in a specific folder).

Then we iterate over these image file names and load each image in
grayscale, resize it and append it to our array, we also do the same for labels
(0 for uninfected and 1 for parasitized).

Preparing and Normalizing the Dataset


Now that we have our function to load the dataset, let's call it and perform
some preparation:

# load the data


X, y = load_img_data(img_dir)
# reshape to (n_samples, 70, 70, 1) (to fit the NN)
X = np.array(X).reshape(-1, img_size, img_size, 1)
# scale pixels from the range [0, 255] to [0, 1]
# to help the neural network learn much faster
X = X / 255

# shuffle & split the dataset


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1,
stratify=y)
print("Total training samples:", X_train.shape)
print("Total validation samples:", X_test.shape[0])

After we load our dataset preprocessed, we extend our images array shape
into (n_samples, 70, 70, 1) to fit the neural network input.

In addition, to help the network converge faster, we should perform data


normalization. There are scaling methods from that in sklearn such as:

StandardScaler: x_norm = (x - mean) / std (where std is the Standard


Deviation)
MinMaxScaler :
x_norm = (x - x_min) / (x_max - x_min) this results
to x_norm ranging between 0 and 1.

In our case, we won't be using those. Instead, we will divide by 255 since the
biggest value a pixel can achieve is 255, this will results to pixels ranging
between 0 and 1 after applying the scaling.

Then we will use the train_test_split() method from sklearn to divide the dataset
into training and testing sets, we used 10% of the total data for validation it
later on. The stratify parameter will preserve the proportion of target as in the
original dataset, in the train and test datasets as well.

method shuffles the data by default ( shuffle is set to True ), we


train_test_split()
want to do that, since the original ordering is composed of straight 0s labels
in the first half and straight 1s labels in the second half, which may result in
bad training of the network later on.

Implementing the CNN Model Architecture

Our neural network architecture will follow somehow the same architecture
presented in the figure:

In our case, we will add 3 convolution layers then we Flatten to be followed by


fully connected layers composed by Dense layers.

Let us define those layers and properties:

Convolution Layers: The role of convolution layers is to reduce the


images into easier forms by maintaining only the most important
features. A matrix filter will traverse the images to apply the
convolution operations.
Pooling Layers: Their role consists in reducing the spacial volume
resulted from the convolution operations. There are two types of
pooling layers; average pooling layers and max-pooling layers (in our
case we will use the latter).
Flatten: A layer responsible for transforming the results from the
convolution and polling into a 1D shape to be feed-forwarded afterward
in a fully connected layer.
model = Sequential()
model.add(Conv2D(64, (3, 3), input_shape=X_train.shape[1:]))
model.add(Activation("relu"))
model.add(MaxPool2D(pool_size=(2, 2)))

model.add(Conv2D(64, (3, 3)))


model.add(Activation("relu"))
model.add(MaxPool2D(pool_size=(2, 2)))

model.add(Conv2D(64, (3, 3)))


model.add(Activation("relu"))
model.add(MaxPool2D(pool_size=(2, 2)))

model.add(Flatten())

model.add(Dense(64))
model.add(Activation("relu"))

model.add(Dense(64))
model.add(Activation("relu"))

model.add(Dense(1))
model.add(Activation("sigmoid"))

model.compile(loss="binary_crossentropy", optimizer="adam", metrics=


["accuracy"])

# train the model with 3 epochs, 64 batch size


model.fit(X_train, np.array(y_train), batch_size=64, epochs=3,
validation_split=0.2)
# if you already trained the model, uncomment below and comment above
# so you can only load the previously trained model
# model.load_weights("malaria-cell-cnn.h5")

Since the output is binary (either infected or not infected) we have


used Sigmoid (1/(1+exp(-x)) as the activation function of the output layer.

Here is my training output:

Train on 19840 samples, validate on 4960 samples


Epoch 1/3
19840/19840 [==============================] - 14s 704us/sample
- loss: 0.5067 - accuracy: 0.7135 - val_loss: 0.1949 - val_accuracy: 0.9300
Epoch 2/3
19840/19840 [==============================] - 12s 590us/sample
- loss: 0.1674 - accuracy: 0.9391 - val_loss: 0.1372 - val_accuracy: 0.9482
Epoch 3/3
19840/19840 [==============================] - 12s 592us/sample
- loss: 0.1428 - accuracy: 0.9495 - val_loss: 0.1344 - val_accuracy: 0.9518

As you may have noticed, we achieved an accuracy of 95% on the training


dataset and its validation split.

Model Evaluation
Now let us use the evaluate() from Keras API to evaluate the model on the
testing dataset:

loss, accuracy = model.evaluate(X_test, np.array(y_test), verbose=0)


print(f"Testing on {len(X_test)} images, the results are\n Accuracy:
{accuracy} | Loss: {loss}")

Testing on 2756 images, the results are


Accuracy: 0.9444847702980042 | Loss: 0.15253388267470028

The model performed well also in the test data with an accuracy
reaching 94%.

Now let's use this model to make inferences on the two images we put in
the testing-samples folder earlier in this tutorial. First, let's plot them:

# testing some images


uninfected_cell = "cell_images/testing-
samples/C1_thinF_IMG_20150604_104919_cell_82.png"
infected_cell = "cell_images/testing-
samples/C38P3thinF_original_IMG_20150621_112116_cell_204.png"

_, ax = plt.subplots(1, 2)
ax[0].imshow(plt.imread(uninfected_cell))
ax[0].title.set_text("Uninfected Cell")
ax[1].imshow(plt.imread(infected_cell))
ax[1].title.set_text("Parasitized Cell")
plt.show()

Output:
Great, now let's load these images and perform preprocessing:

img_arr_uninfected = cv2.imread(uninfected_cell,
cv2.IMREAD_GRAYSCALE)
img_arr_infected = cv2.imread(infected_cell, cv2.IMREAD_GRAYSCALE)
# resize the images to (70x70)
img_arr_uninfected = cv2.resize(img_arr_uninfected, (img_size, img_size))
img_arr_infected = cv2.resize(img_arr_infected, (img_size, img_size))
# scale to [0, 1]
img_arr_infected = img_arr_infected / 255
img_arr_uninfected = img_arr_uninfected / 255
# reshape to fit the neural network dimensions
# (changing shape from (70, 70) to (1, 70, 70, 1))
img_arr_infected = img_arr_infected.reshape(1, *img_arr_infected.shape)
img_arr_infected = np.expand_dims(img_arr_infected, axis=3)
img_arr_uninfected = img_arr_uninfected.reshape(1,
*img_arr_uninfected.shape)
img_arr_uninfected = np.expand_dims(img_arr_uninfected, axis=3)

All we have to do now is to use predict() method to make inference:

# perform inference
infected_result = model.predict(img_arr_infected)[0][0]
uninfected_result = model.predict(img_arr_uninfected)[0][0]
print(f"Infected: {infected_result}")
print(f"Uninfected: {uninfected_result}")

Output:

Infected: 0.9827326536178589
Uninfected: 0.005085020791739225

Awesome, the model is 98% sure that the infected cell is in fact infected, and
he's sure 99.5% of the time that the uninfected cell is uninfected.

Saving the model


Finally, we will conclude all this process by saving our model.

# save the model & weights


model.save("malaria-cell-cnn.h5")

Conclusion:
In this tutorial you have learned:

How to process raw images, convert them in greyscale and in a NumPy


array (numerical format) using OpenCV.
The architecture behind a Convolutional Neural Network with its
various components.
To implement a CNN in a Tensorflow/Keras.
To evaluate, and save a deep learning model, as well as performing
inference on it.

I encourage you to tweak the model parameters, or you may want to use
transfer learning so you can perform much better. You can also train on
colored images instead of greyscale, this may help!

There are other metrics besides accuracy, such as sensitivity and specificity,
which are widely used in the medical field, I invite you to add them here as
well. If you're not sure how there is a tutorial on skin cancer detection in
which we did all of that!

Source Code:

malaria-classification.py
import cv2
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Conv2D, MaxPool2D, Flatten, Activation
from sklearn.model_selection import train_test_split
import numpy as np
import matplotlib.pyplot as plt

import glob
import os
# after you extract the dataset,
# put cell_images folder in the working directory
img_dir="cell_images"
img_size=70

def load_img_data(path):
image_files = glob.glob(os.path.join(path, "Parasitized/*.png")) + \
glob.glob(os.path.join(path, "Uninfected/*.png"))
X, y = [], []
for image_file in image_files:
# 0 for uninfected and 1 for infected
label = 0 if "Uninfected" in image_file else 1
# load the image in gray scale
img_arr = cv2.imread(image_file, cv2.IMREAD_GRAYSCALE)
# resize the image to (70x70)
img_resized = cv2.resize(img_arr, (img_size, img_size))
X.append(img_resized)
y.append(label)
return X, y

# load the data


X, y = load_img_data(img_dir)
# reshape to (n_samples, 70, 70, 1) (to fit the NN)
X = np.array(X).reshape(-1, img_size, img_size, 1)
# scale pixels from the range [0, 255] to [0, 1]
# to help the neural network learn much faster
X = X / 255

# shuffle & split the dataset


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, stratify=y)
print("Total training samples:", X_train.shape)
print("Total validation samples:", X_test.shape[0])

model = Sequential()
model.add(Conv2D(64, (3, 3), input_shape=X_train.shape[1:]))
model.add(Activation("relu"))
model.add(MaxPool2D(pool_size=(2, 2)))
model.add(Conv2D(64, (3, 3)))
model.add(Activation("relu"))
model.add(MaxPool2D(pool_size=(2, 2)))

model.add(Conv2D(64, (3, 3)))


model.add(Activation("relu"))
model.add(MaxPool2D(pool_size=(2, 2)))

model.add(Flatten())

model.add(Dense(64))
model.add(Activation("relu"))

model.add(Dense(64))
model.add(Activation("relu"))

model.add(Dense(1))
model.add(Activation("sigmoid"))

model.compile(loss="binary_crossentropy", optimizer="adam", metrics=["accuracy"])

# train the model with 3 epochs, 64 batch size


model.fit(X_train, np.array(y_train), batch_size=64, epochs=3, validation_split=0.2)
# if you already trained the model, uncomment below and comment above
# so you can only load the previously trained model
# model.load_weights("malaria-cell-cnn.h5")

loss, accuracy = model.evaluate(X_test, np.array(y_test), verbose=0)


print(f"Testing on {len(X_test)} images, the results are\n Accuracy: {accuracy} | Loss: {loss}")

# save the model & weights


model.save("malaria-cell-cnn.h5")

# testing some images


uninfected_cell = "cell_images/testing-samples/C1_thinF_IMG_20150604_104919_cell_82.png"
infected_cell = "cell_images/testing-
samples/C38P3thinF_original_IMG_20150621_112116_cell_204.png"
_, ax = plt.subplots(1, 2)
ax[0].imshow(plt.imread(uninfected_cell))
ax[0].title.set_text("Uninfected Cell")
ax[1].imshow(plt.imread(infected_cell))
ax[1].title.set_text("Parasitized Cell")
plt.show()

img_arr_uninfected = cv2.imread(uninfected_cell, cv2.IMREAD_GRAYSCALE)


img_arr_infected = cv2.imread(infected_cell, cv2.IMREAD_GRAYSCALE)
# resize the images to (70x70)
img_arr_uninfected = cv2.resize(img_arr_uninfected, (img_size, img_size))
img_arr_infected = cv2.resize(img_arr_infected, (img_size, img_size))
# scale to [0, 1]
img_arr_infected = img_arr_infected / 255
img_arr_uninfected = img_arr_uninfected / 255
# reshape to fit the neural network dimensions
# (changing shape from (70, 70) to (1, 70, 70, 1))
img_arr_infected = img_arr_infected.reshape(1, *img_arr_infected.shape)
img_arr_infected = np.expand_dims(img_arr_infected, axis=3)
img_arr_uninfected = img_arr_uninfected.reshape(1, *img_arr_uninfected.shape)
img_arr_uninfected = np.expand_dims(img_arr_uninfected, axis=3)
# perform inference
infected_result = model.predict(img_arr_infected)[0][0]
uninfected_result = model.predict(img_arr_uninfected)[0][0]
print(f"Infected: {infected_result}")
print(f"Uninfected: {uninfected_result}")
PART 10: Skin Cancer Detection
using TensorFlow in Python
Learn how to use transfer learning to build a model that is able to classify benign and malignant
(melanoma) skin diseases in Python using TensorFlow 2.

Skin cancer is an abnormal growth of skin cells, it is one of the most common
cancers and unfortunately, it can become deadly. The good news though is
when caught early, your dermatologist can treat it and eliminate it entirely.

Using deep learning and neural networks, we'll be able to classify benign and
malignant skin diseases, which may help the doctor diagnose cancer at an
earlier stage. In this tutorial, we will make a skin disease classifier that tries
to distinguish between benign (nevus and seborrheic keratosis) and malignant
(melanoma) skin diseases from only photographic images
using TensorFlow framework in Python.

To get started, let's install the required libraries:

pip3 install tensorflow tensorflow_hub matplotlib seaborn numpy pandas


sklearn imblearn

Open up a new notebook (or Google Colab) and import the necessary
modules:

import tensorflow as tf
import tensorflow_hub as hub
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
from tensorflow.keras.utils import get_file
from sklearn.metrics import roc_curve, auc, confusion_matrix
from imblearn.metrics import sensitivity_score, specificity_score

import os
import glob
import zipfile
import random

# to get consistent results after multiple runs


tf.random.set_seed(7)
np.random.seed(7)
random.seed(7)

# 0 for benign, 1 for malignant


class_names = ["benign", "malignant"]

Preparing the Dataset


For this tutorial, we'll be using only a small part of ISIC archive dataset, the
below function downloads and extract the dataset into a new data folder:

def download_and_extract_dataset():
# dataset from https://github.com/udacity/dermatologist-ai
# 5.3GB
train_url = "https://s3-us-west-1.amazonaws.com/udacity-
dlnfd/datasets/skin-cancer/train.zip"
# 824.5MB
valid_url = "https://s3-us-west-1.amazonaws.com/udacity-
dlnfd/datasets/skin-cancer/valid.zip"
# 5.1GB
test_url = "https://s3-us-west-1.amazonaws.com/udacity-
dlnfd/datasets/skin-cancer/test.zip"
for i, download_link in enumerate([valid_url, train_url, test_url]):
temp_file = f"temp{i}.zip"
data_dir = get_file(origin=download_link, fname=os.path.join(os.getcwd(),
temp_file))
print("Extracting", download_link)
with zipfile.ZipFile(data_dir, "r") as z:
z.extractall("data")
# remove the temp file
os.remove(temp_file)

# comment the below line if you already downloaded the dataset


download_and_extract_dataset()

This will take several minutes depending on your connection, after that,
the data folder will appear that contains the training, validation and testing
sets. Each set is a folder that has three categories of skin disease images
(nevus, seborrheic_keratosis and melanoma).

Note: You may struggle to download the dataset using the above Python
function when you have a slow Internet connection, in that case, you should
download it and extract it manually in the folder data in the current directory.

Now that we have the dataset in our machine, let's find a way to label these
images, remember we're going to classify only benign and malignant skin
diseases, so we need to label nevus and seborrheic keratosis as the value 0
and melanoma 1.

The below cell generates a metadata CSV file for each set, each row in the
CSV file corresponds to a path to an image along with its label (0 or 1):

# preparing data
# generate CSV metadata file to read img paths and labels from it
def generate_csv(folder, label2int):
folder_name = os.path.basename(folder)
labels = list(label2int)
# generate CSV file
df = pd.DataFrame(columns=["filepath", "label"])
i=0
for label in labels:
print("Reading", os.path.join(folder, label, "*"))
for filepath in glob.glob(os.path.join(folder, label, "*")):
df.loc[i] = [filepath, label2int[label]]
i += 1
output_file = f"{folder_name}.csv"
print("Saving", output_file)
df.to_csv(output_file)

# generate CSV files for all data portions, labeling nevus and seborrheic
keratosis
# as 0 (benign), and melanoma as 1 (malignant)
# you should replace "data" path to your extracted dataset path
# don't replace if you used download_and_extract_dataset() function
generate_csv("data/train", {"nevus": 0, "seborrheic_keratosis": 0,
"melanoma": 1})
generate_csv("data/valid", {"nevus": 0, "seborrheic_keratosis": 0,
"melanoma": 1})
generate_csv("data/test", {"nevus": 0, "seborrheic_keratosis": 0, "melanoma":
1})

The generate_csv() function accepts 2 arguments, the first is the path of the set,
for example, if you have downloaded and extract the dataset in "E:\datasets\skin-
cancer" , then the training set should be something like "E:\datasets\skin-cancer\train".
The second parameter is a dictionary that maps each skin disease category to
its corresponding label value (again, 0 for benign and 1 for malignant).

The reason I did a function like this is the ability to use it on other skin
disease classifications (such as melanocytic classification), so you can add
more skin diseases and use it for other problems as well.

Once you run the cell, you notice that 3 CSV files will appear in your current
directory. Now let's use the from_tensor_slices() method from tf.data API to load
these metadata files:

# loading data
train_metadata_filename = "train.csv"
valid_metadata_filename = "valid.csv"
# load CSV files as DataFrames
df_train = pd.read_csv(train_metadata_filename)
df_valid = pd.read_csv(valid_metadata_filename)
n_training_samples = len(df_train)
n_validation_samples = len(df_valid)
print("Number of training samples:", n_training_samples)
print("Number of validation samples:", n_validation_samples)
train_ds = tf.data.Dataset.from_tensor_slices((df_train["filepath"],
df_train["label"]))
valid_ds = tf.data.Dataset.from_tensor_slices((df_valid["filepath"],
df_valid["label"]))

Now we have loaded the dataset ( train_ds and valid_ds ), each sample is a tuple
of filepath (path to the image file) and label (0 for benign and 1 for malignant),
here is the output:

Number of training samples: 2000


Number of validation samples: 150
Let's load the images:

# preprocess data
def decode_img(img):
# convert the compressed string to a 3D uint8 tensor
img = tf.image.decode_jpeg(img, channels=3)
# Use `convert_image_dtype` to convert to floats in the [0,1] range.
img = tf.image.convert_image_dtype(img, tf.float32)
# resize the image to the desired size.
return tf.image.resize(img, [299, 299])

def process_path(filepath, label):


# load the raw data from the file as a string
img = tf.io.read_file(filepath)
img = decode_img(img)
return img, label

valid_ds = valid_ds.map(process_path)
train_ds = train_ds.map(process_path)
# test_ds = test_ds
for image, label in train_ds.take(1):
print("Image shape:", image.shape)
print("Label:", label.numpy())

The above code uses map() method to execute process_path() function on each
sample on both sets, it'll basically load the images, decode the image format,
convert the image pixels to be in the range [0, 1] and resize it to (299, 299, 3) , we
then take one image and print its shape:

Image shape: (299, 299, 3)


Label: 0

Everything is as expected, now let's prepare this dataset for training:

# training parameters
batch_size = 64
optimizer = "rmsprop"

def prepare_for_training(ds, cache=True, batch_size=64,


shuffle_buffer_size=1000):
if cache:
if isinstance(cache, str):
ds = ds.cache(cache)
else:
ds = ds.cache()
# shuffle the dataset
ds = ds.shuffle(buffer_size=shuffle_buffer_size)
# Repeat forever
ds = ds.repeat()
# split to batches
ds = ds.batch(batch_size)
# `prefetch` lets the dataset fetch batches in the background while the model
# is training.
ds = ds.prefetch(buffer_size=tf.data.experimental.AUTOTUNE)
return ds
valid_ds = prepare_for_training(valid_ds, batch_size=batch_size,
cache="valid-cached-data")
train_ds = prepare_for_training(train_ds, batch_size=batch_size,
cache="train-cached-data")

Here is what we did:

cache() : Since we're making too many calculations on each set, we


used cache() method to save our preprocessed dataset into a local cache
file, this will only preprocess it the very first time (in the first epoch
during training).
shuffle() : To basically shuffle the dataset, so the samples are in random
order.
repeat() : Every time we iterate over the dataset, it'll keep generating
samples for us repeatedly, this will help us during the training.
batch() : We batch our dataset into 64 or 32 samples per training step.
prefetch() : This will enable us to fetch batches in the background while
the model is training.

The below cell gets the first validation batch and plots the images along with
their corresponding label:

batch = next(iter(valid_ds))

def show_batch(batch):
plt.figure(figsize=(12,12))
for n in range(25):
ax = plt.subplot(5,5,n+1)
plt.imshow(batch[0][n])
plt.title(class_names[batch[1][n].numpy()].title())
plt.axis('off')
show_batch(batch)

Output:

As you can see, it's extremely hard to differentiate between malignant and
benign diseases, let's see how our model will deal with it.

Great, now our dataset is ready, let's dive into building our model.

Building the Model


Notice before, we resized all images to (299, 299, 3) , and that's because of
what InceptionV3 architecture expects as input, so we'll be using transfer
learning with TensorFlow Hub library to download and load the InceptionV3
architecture along with its ImageNet pre-trained weights:

# building the model


# InceptionV3 model & pre-trained weights
module_url = "https://tfhub.dev/google/tf2-
preview/inception_v3/feature_vector/4"
m = tf.keras.Sequential([
hub.KerasLayer(module_url, output_shape=[2048], trainable=False),
tf.keras.layers.Dense(1, activation="sigmoid")
])

m.build([None, 299, 299, 3])


m.compile(loss="binary_crossentropy", optimizer=optimizer, metrics=
["accuracy"])
m.summary()

We set trainable to False so we won't be able to adjust the pre-trained weights


during our training, we also added a final output layer with 1 unit that is
expected to output a value between 0 and 1 (close to 0 means benign,
and 1 for malignant).

After that, since this is a binary classification, we built our model


using binary crossentropy loss, and used accuracy as our metric (not that
reliable metric, we'll see sooner why), here is the output of our model
summary:

Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
================================================================
keras_layer (KerasLayer) multiple 21802784
_________________________________________________________________
dense (Dense) multiple 2049
================================================================
Total params: 21,804,833
Trainable params: 2,049
Non-trainable params: 21,802,784
_________________________________________________________________

Learn also: Satellite Image Classification using TensorFlow in Python

Training the Model


We now have our dataset and the model, let's get them together:

model_name = f"benign-vs-malignant_{batch_size}_{optimizer}"
tensorboard = tf.keras.callbacks.TensorBoard(log_dir=os.path.join("logs",
model_name))
# saves model checkpoint whenever we reach better weights
modelcheckpoint = tf.keras.callbacks.ModelCheckpoint(model_name +
"_{val_loss:.3f}.h5", save_best_only=True, verbose=1)

history = m.fit(train_ds, validation_data=valid_ds,


steps_per_epoch=n_training_samples // batch_size,
validation_steps=n_validation_samples // batch_size, verbose=1,
epochs=100,
callbacks=[tensorboard, modelcheckpoint])

We're using ModelCheckpoint callback to save the best weights so far on


each epoch, that's why I set epochs to 100, that's because it can converge to
better weights at any time, to save your time, feel free to reduce that to 30 or
so.

I also added tensorboard as a callback in case you want to experiment with


different hyperparameter values.

Since fit() method doesn't know the number of samples there are in the
dataset, we need to specify steps_per_epoch and validation_steps parameters for the
number of iterations (the number of samples divided by the batch size) of the
training set and validation set respectively.

Here is a part of the output during training:

Train for 31 steps, validate for 2 steps


Epoch 1/100
30/31 [============================>.] - ETA: 9s - loss: 0.4609 -
accuracy: 0.7760
Epoch 00001: val_loss improved from inf to 0.49703, saving model to
benign-vs-malignant_64_rmsprop_0.497.h5
31/31 [==============================] - 282s 9s/step - loss:
0.4646 - accuracy: 0.7722 - val_loss: 0.4970 - val_accuracy: 0.8125
<..SNIPED..>
Epoch 27/100
30/31 [============================>.] - ETA: 0s - loss: 0.2982 -
accuracy: 0.8708
Epoch 00027: val_loss improved from 0.40253 to 0.38991, saving model to
benign-vs-malignant_64_rmsprop_0.390.h5
31/31 [==============================] - 21s 691ms/step - loss:
0.3025 - accuracy: 0.8684 - val_loss: 0.3899 - val_accuracy: 0.8359
<..SNIPED..>
Epoch 41/100
30/31 [============================>.] - ETA: 0s - loss: 0.2800 -
accuracy: 0.8802
Epoch 00041: val_loss did not improve from 0.38991
31/31 [==============================] - 21s 690ms/step - loss:
0.2829 - accuracy: 0.8790 - val_loss: 0.3948 - val_accuracy: 0.8281
Epoch 42/100
30/31 [============================>.] - ETA: 0s - loss: 0.2680 -
accuracy: 0.8859
Epoch 00042: val_loss did not improve from 0.38991
31/31 [==============================] - 21s 693ms/step - loss:
0.2722 - accuracy: 0.8831 - val_loss: 0.4572 - val_accuracy: 0.8047

Model Evaluation
First, let's load our test set, just like previously:

# evaluation
# load testing set
test_metadata_filename = "test.csv"
df_test = pd.read_csv(test_metadata_filename)
n_testing_samples = len(df_test)
print("Number of testing samples:", n_testing_samples)
test_ds = tf.data.Dataset.from_tensor_slices((df_test["filepath"],
df_test["label"]))

def prepare_for_testing(ds, cache=True, shuffle_buffer_size=1000):


if cache:
if isinstance(cache, str):
ds = ds.cache(cache)
else:
ds = ds.cache()
ds = ds.shuffle(buffer_size=shuffle_buffer_size)
return ds
test_ds = test_ds.map(process_path)
test_ds = prepare_for_testing(test_ds, cache="test-cached-data")

The above code loads our test data and prepares it for testing:

Number of testing samples: 600

600 images of the shape (299, 299, 3) can fit our memory, let's convert our test
set from tf.data into a NumPy array:

# convert testing set to numpy array to fit in memory (don't do that when
testing
# set is too large)
y_test = np.zeros((n_testing_samples,))
X_test = np.zeros((n_testing_samples, 299, 299, 3))
for i, (img, label) in enumerate(test_ds.take(n_testing_samples)):
# print(img.shape, label.shape)
X_test[i] = img
y_test[i] = label.numpy()

print("y_test.shape:", y_test.shape)

The above cell will construct our arrays, it will take some time the first time
it's executed because it's doing all the preprocessing defined
in process_path() and prepare_for_testing() functions.

Now let's load our optimal weights that were saved


by ModelCheckpoint during the training:
# load the weights with the least loss
m.load_weights("benign-vs-malignant_64_rmsprop_0.390.h5")

You may not have the exact filename of the optimal weights, you need to
search for the saved weights in the current directory that has the least loss, the
below code evaluates the model using accuracy metric:

print("Evaluating the model...")


loss, accuracy = m.evaluate(X_test, y_test, verbose=0)
print("Loss:", loss, " Accuracy:", accuracy)

Output:

Evaluating the model...


Loss: 0.4476394319534302 Accuracy: 0.8

We've reached about 84% accuracy on the validation set and 80% on the test
set, but that's not all. Since our dataset is largely unbalanced, accuracy doesn't
tell everything. In fact, a model that predicts every image as benign would get
an accuracy of 80% , since malignant samples are about 20% of the total
validation set.

As a result, we need a better way to evaluate our model, in the upcoming


cells, we'll use seaborn and matplotlib libraries to draw the confusion
matrix that tells us more about how well our model is doing.

But before we do that, I just want to make something clear: we all know that
predicting a malignant disease as benign is a terrible mistake, you can kill
people doing that! So we need a way to predict even more malignant cases
even that we have very few malignant samples compared to benign. A good
method is introducing a threshold.
Remember the output of the neural network is a value between 0 and 1. In the
normal way, when the neural network produces a value between 0 and 0.5,
we automatically assign it as benign, and from 0.5 to 1.0 as malignant. And
since we want to be aware of the fact that we can predict a malignant disease
as benign (that's only one of the many reasons), we can say for example,
from 0 to 0.3 is benign, and from 0.3 to 1.0 is malignant, this means we are
using a threshold value of 0.3, this will improve our predictions.

The below function does that:

def get_predictions(threshold=None):
"""
Returns predictions for binary classification given `threshold`
For instance, if threshold is 0.3, then it'll output 1 (malignant) for that
sample if
the probability of 1 is 30% or more (instead of 50%)
"""
y_pred = m.predict(X_test)
if not threshold:
threshold = 0.5
result = np.zeros((n_testing_samples,))
for i in range(n_testing_samples):
# test melanoma probability
if y_pred[i][0] >= threshold:
result[i] = 1
# else, it's 0 (benign)
return result

threshold = 0.23
# get predictions with 23% threshold
# which means if the model is 23% sure or more that is malignant,
# it's assigned as malignant, otherwise it's benign
y_pred = get_predictions(threshold)

Now let's draw our confusion matrix and interpret it:

def plot_confusion_matrix(y_test, y_pred):


cmn = confusion_matrix(y_test, y_pred)
# Normalise
cmn = cmn.astype('float') / cmn.sum(axis=1)[:, np.newaxis]
# print it
print(cmn)
fig, ax = plt.subplots(figsize=(10,10))
sns.heatmap(cmn, annot=True, fmt='.2f',
xticklabels=[f"pred_{c}" for c in class_names],
yticklabels=[f"true_{c}" for c in class_names],
cmap="Blues"
)
plt.ylabel('Actual')
plt.xlabel('Predicted')
# plot the resulting confusion matrix
plt.show()

plot_confusion_matrix(y_test, y_pred)

Output:
Sensitivity
So our model gets about 0.72 probability of a positive test given that the
patient has the disease (bottom right of the confusion matrix), that's often
called sensitivity.

Sensitivity is a statistical measure that is widely used in medicine that is


given by the following formula (from Wikipedia):

So in our example, out of all patients that have a malignant skin disease, we
successfully predicted 72% of them as malignant, not bad but needs
improvements.

Specificity
The other metric is specificity, you can read it in the top left of the confusion
matrix, we got about 63% . It is basically the probability of a negative test
given that the patient is well:

In our example, out of all patients that has a benign, we predicted 63% of
them as benign.

With high specificity, the test rarely gives positive results in healthy patients,
whereas a high sensitivity means that the model is reliable when its result is
negative, I invite you to read more about it in this Wikipedia article.

Alternatively, you can use imblearn module to get these scores:

sensitivity = sensitivity_score(y_test, y_pred)


specificity = specificity_score(y_test, y_pred)

print("Melanoma Sensitivity:", sensitivity)


print("Melanoma Specificity:", specificity)

Output:

Melanoma Sensitivity: 0.717948717948718


Melanoma Specificity: 0.6252587991718427
Receiver Operating Characteristic

Another good metric is ROC, which is basically a graphical plot that shows
us the diagnostic ability of our binary classifier, it features a true positive rate
on the Y-axis and a false-positive rate on the X-axis. The perfect point we
want to reach is in the top left corner of the plot, here is the code for plotting
the ROC curve using matplotlib:

def plot_roc_auc(y_true, y_pred):


"""
This function plots the ROC curves and provides the scores.
"""
# prepare for figure
plt.figure()
fpr, tpr, _ = roc_curve(y_true, y_pred)
# obtain ROC AUC
roc_auc = auc(fpr, tpr)
# print score
print(f"ROC AUC: {roc_auc:.3f}")
# plot ROC curve
plt.plot(fpr, tpr, color="blue", lw=2,
label='ROC curve (area = {f:.2f})'.format(d=1, f=roc_auc))
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC curves')
plt.legend(loc="lower right")
plt.show()
plot_roc_auc(y_test, y_pred)

Output:

ROC AUC: 0.671

Awesome, since we want to maximize the true positive rate, and minimize
the false positive rate, calculating the area underneath the ROC curve proves
to be useful, we got 0.671 as the Area Under Curve ROC (ROC AUC), an
area of 1 means the model is ideal for all cases.

Conclusion
We're done! There you have it, see how you can improve the model, we only
used 2000 training samples, go to ISIC archive and download more and add
them to the data folder, the scores will improve significantly depending on the
number of samples you add. You can use ISIC archive downloader which
may help you download the dataset in the way you want.

I also encourage you to tweak the hyperparameters such as the threshold we


set earlier, and see if you can get better sensitivity and specificity scores.

I used InceptionV3 model architecture, you're free to use any CNN


architecture you want, I invite you to browse TensorFlow hub and choose the
newest model. For example, in satellite image classification, we've chosen
EfficientNET V2, try it out and you may increase the performance
significantly!

Source Code:

skin-cancer-detection.py
# coding: utf-8

# In[1]:

import tensorflow as tf
import tensorflow_hub as hub
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
from tensorflow.keras.utils import get_file
from sklearn.metrics import roc_curve, auc, confusion_matrix
from imblearn.metrics import sensitivity_score, specificity_score
import os
import glob
import zipfile
import random

# to get consistent results after multiple runs


tf.random.set_seed(7)
np.random.seed(7)
random.seed(7)

# 0 for benign, 1 for malignant


class_names = ["benign", "malignant"]

def download_and_extract_dataset():
# dataset from https://github.com/udacity/dermatologist-ai
# 5.3GB
train_url = "https://s3-us-west-1.amazonaws.com/udacity-dlnfd/datasets/skin-cancer/train.zip"
# 824.5MB
valid_url = "https://s3-us-west-1.amazonaws.com/udacity-dlnfd/datasets/skin-cancer/valid.zip"
# 5.1GB
test_url = "https://s3-us-west-1.amazonaws.com/udacity-dlnfd/datasets/skin-cancer/test.zip"
for i, download_link in enumerate([valid_url, train_url, test_url]):
temp_file = f"temp{i}.zip"
data_dir = get_file(origin=download_link, fname=os.path.join(os.getcwd(), temp_file))
print("Extracting", download_link)
with zipfile.ZipFile(data_dir, "r") as z:
z.extractall("data")
# remove the temp file
os.remove(temp_file)

# comment the below line if you already downloaded the dataset


download_and_extract_dataset()

# In[2]:
# preparing data
# generate CSV metadata file to read img paths and labels from it
def generate_csv(folder, label2int):
folder_name = os.path.basename(folder)
labels = list(label2int)
# generate CSV file
df = pd.DataFrame(columns=["filepath", "label"])
i=0
for label in labels:
print("Reading", os.path.join(folder, label, "*"))
for filepath in glob.glob(os.path.join(folder, label, "*")):
df.loc[i] = [filepath, label2int[label]]
i += 1
output_file = f"{folder_name}.csv"
print("Saving", output_file)
df.to_csv(output_file)

# generate CSV files for all data portions, labeling nevus and seborrheic keratosis
# as 0 (benign), and melanoma as 1 (malignant)
# you should replace "data" path to your extracted dataset path
# don't replace if you used download_and_extract_dataset() function
generate_csv("data/train", {"nevus": 0, "seborrheic_keratosis": 0, "melanoma": 1})
generate_csv("data/valid", {"nevus": 0, "seborrheic_keratosis": 0, "melanoma": 1})
generate_csv("data/test", {"nevus": 0, "seborrheic_keratosis": 0, "melanoma": 1})

# In[3]:

# loading data
train_metadata_filename = "train.csv"
valid_metadata_filename = "valid.csv"
# load CSV files as DataFrames
df_train = pd.read_csv(train_metadata_filename)
df_valid = pd.read_csv(valid_metadata_filename)
n_training_samples = len(df_train)
n_validation_samples = len(df_valid)
print("Number of training samples:", n_training_samples)
print("Number of validation samples:", n_validation_samples)
train_ds = tf.data.Dataset.from_tensor_slices((df_train["filepath"], df_train["label"]))
valid_ds = tf.data.Dataset.from_tensor_slices((df_valid["filepath"], df_valid["label"]))

# In[4]:

# preprocess data
def decode_img(img):
# convert the compressed string to a 3D uint8 tensor
img = tf.image.decode_jpeg(img, channels=3)
# Use `convert_image_dtype` to convert to floats in the [0,1] range.
img = tf.image.convert_image_dtype(img, tf.float32)
# resize the image to the desired size.
return tf.image.resize(img, [299, 299])

def process_path(filepath, label):


# load the raw data from the file as a string
img = tf.io.read_file(filepath)
img = decode_img(img)
return img, label

valid_ds = valid_ds.map(process_path)
train_ds = train_ds.map(process_path)
# test_ds = test_ds
# for image, label in train_ds.take(1):
# print("Image shape:", image.shape)
# print("Label:", label.numpy())

# In[5]:

# training parameters
batch_size = 64
optimizer = "rmsprop"

# In[6]:

def prepare_for_training(ds, cache=True, batch_size=64, shuffle_buffer_size=1000):


if cache:
if isinstance(cache, str):
ds = ds.cache(cache)
else:
ds = ds.cache()
# shuffle the dataset
ds = ds.shuffle(buffer_size=shuffle_buffer_size)

# Repeat forever
ds = ds.repeat()
# split to batches
ds = ds.batch(batch_size)

# `prefetch` lets the dataset fetch batches in the background while the model
# is training.
ds = ds.prefetch(buffer_size=tf.data.experimental.AUTOTUNE)

return ds

valid_ds = prepare_for_training(valid_ds, batch_size=batch_size, cache="valid-cached-data")


train_ds = prepare_for_training(train_ds, batch_size=batch_size, cache="train-cached-data")

# In[9]:

batch = next(iter(valid_ds))

def show_batch(batch):
plt.figure(figsize=(12,12))
for n in range(25):
ax = plt.subplot(5,5,n+1)
plt.imshow(batch[0][n])
plt.title(class_names[batch[1][n].numpy()].title())
plt.axis('off')

show_batch(batch)

# In[7]:

# building the model


# InceptionV3 model & pre-trained weights
module_url = "https://tfhub.dev/google/tf2-preview/inception_v3/feature_vector/4"
m = tf.keras.Sequential([
hub.KerasLayer(module_url, output_shape=[2048], trainable=False),
tf.keras.layers.Dense(1, activation="sigmoid")
])

m.build([None, 299, 299, 3])


m.compile(loss="binary_crossentropy", optimizer=optimizer, metrics=["accuracy"])
m.summary()

# In[9]:

model_name = f"benign-vs-malignant_{batch_size}_{optimizer}"
tensorboard = tf.keras.callbacks.TensorBoard(log_dir=os.path.join("logs", model_name))
# saves model checkpoint whenever we reach better weights
modelcheckpoint = tf.keras.callbacks.ModelCheckpoint(model_name + "_{val_loss:.3f}.h5",
save_best_only=True, verbose=1)

history = m.fit(train_ds, validation_data=valid_ds,


steps_per_epoch=n_training_samples // batch_size,
validation_steps=n_validation_samples // batch_size, verbose=1, epochs=100,
callbacks=[tensorboard, modelcheckpoint])
# In[8]:

# evaluation

# load testing set


test_metadata_filename = "test.csv"
df_test = pd.read_csv(test_metadata_filename)
n_testing_samples = len(df_test)
print("Number of testing samples:", n_testing_samples)
test_ds = tf.data.Dataset.from_tensor_slices((df_test["filepath"], df_test["label"]))

def prepare_for_testing(ds, cache=True, shuffle_buffer_size=1000):


# This is a small dataset, only load it once, and keep it in memory.
# use `.cache(filename)` to cache preprocessing work for datasets that don't
# fit in memory.
if cache:
if isinstance(cache, str):
ds = ds.cache(cache)
else:
ds = ds.cache()

ds = ds.shuffle(buffer_size=shuffle_buffer_size)

return ds

test_ds = test_ds.map(process_path)
test_ds = prepare_for_testing(test_ds, cache="test-cached-data")

# In[9]:

# convert testing set to numpy array to fit in memory (don't do that when testing
# set is too large)
y_test = np.zeros((n_testing_samples,))
X_test = np.zeros((n_testing_samples, 299, 299, 3))
for i, (img, label) in enumerate(test_ds.take(n_testing_samples)):
# print(img.shape, label.shape)
X_test[i] = img
y_test[i] = label.numpy()

print("y_test.shape:", y_test.shape)

# In[10]:

# load the weights with the least loss


m.load_weights("benign-vs-malignant_64_rmsprop_0.390.h5")

# In[11]:

print("Evaluating the model...")


loss, accuracy = m.evaluate(X_test, y_test, verbose=0)
print("Loss:", loss, " Accuracy:", accuracy)

# In[14]:

from sklearn.metrics import accuracy_score

def get_predictions(threshold=None):
"""
Returns predictions for binary classification given `threshold`
For instance, if threshold is 0.3, then it'll output 1 (malignant) for that sample if
the probability of 1 is 30% or more (instead of 50%)
"""
y_pred = m.predict(X_test)
if not threshold:
threshold = 0.5
result = np.zeros((n_testing_samples,))
for i in range(n_testing_samples):
# test melanoma probability
if y_pred[i][0] >= threshold:
result[i] = 1
# else, it's 0 (benign)
return result

threshold = 0.23
# get predictions with 23% threshold
# which means if the model is 23% sure or more that is malignant,
# it's assigned as malignant, otherwise it's benign
y_pred = get_predictions(threshold)
accuracy_after = accuracy_score(y_test, y_pred)
print("Accuracy after setting the threshold:", accuracy_after)

# In[16]:

import seaborn as sns


from sklearn.metrics import roc_curve, auc, confusion_matrix

def plot_confusion_matrix(y_test, y_pred):


cmn = confusion_matrix(y_test, y_pred)
# Normalise
cmn = cmn.astype('float') / cmn.sum(axis=1)[:, np.newaxis]
# print it
print(cmn)
fig, ax = plt.subplots(figsize=(10,10))
sns.heatmap(cmn, annot=True, fmt='.2f',
xticklabels=[f"pred_{c}" for c in class_names],
yticklabels=[f"true_{c}" for c in class_names],
cmap="Blues"
)
plt.ylabel('Actual')
plt.xlabel('Predicted')
# plot the resulting confusion matrix
plt.show()

def plot_roc_auc(y_true, y_pred):


"""
This function plots the ROC curves and provides the scores.
"""
# prepare for figure
plt.figure()
fpr, tpr, _ = roc_curve(y_true, y_pred)
# obtain ROC AUC
roc_auc = auc(fpr, tpr)
# print score
print(f"ROC AUC: {roc_auc:.3f}")
# plot ROC curve
plt.plot(fpr, tpr, color="blue", lw=2,
label='ROC curve (area = {f:.2f})'.format(d=1, f=roc_auc))
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC curves')
plt.legend(loc="lower right")
plt.show()

plot_confusion_matrix(y_test, y_pred)
plot_roc_auc(y_test, y_pred)
sensitivity = sensitivity_score(y_test, y_pred)
specificity = specificity_score(y_test, y_pred)

print("Melanoma Sensitivity:", sensitivity)


print("Melanoma Specificity:", specificity)

# In[24]:
def plot_images(X_test, y_pred, y_test):
predicted_class_names = np.array([class_names[int(round(id))] for id in y_pred])
# some nice plotting
plt.figure(figsize=(10,9))
for n in range(30, 60):
plt.subplot(6,5,n-30+1)
plt.subplots_adjust(hspace = 0.3)
plt.imshow(X_test[n])
# get the predicted label
predicted_label = predicted_class_names[n]
# get the actual true label
true_label = class_names[int(round(y_test[n]))]
if predicted_label == true_label:
color = "blue"
title = predicted_label.title()
else:
color = "red"
title = f"{predicted_label.title()}, true:{true_label.title()}"
plt.title(title, color=color)
plt.axis('off')
_ = plt.suptitle("Model predictions (blue: correct, red: incorrect)")
plt.show()

plot_images(X_test, y_pred, y_test)


PART 11: Use K-Means Clustering
for Image Segmentation using OpenCV
in Python
Using K-Means Clustering unsupervised machine learning algorithm to segment different parts of an
image using OpenCV in Python.

Image segmentation is the process of partitioning an image into multiple


different regions (or segments). The goal is to change the representation of
the image into an easier and more meaningful image.

It is an important step in image processing, as real-world images don't always


contain only one object that we wanna classify. For instance, for self-driving
cars, the image would contain the road, cars, pedestrians, etc. So we may
need to use segmentation here to separate objects and analyze each object
individually (i.e., image classification) to check what it is.

In this tutorial, we will see one image segmentation method, which is K-


Means Clustering.

K-Means clustering is an unsupervised machine learning algorithm that aims


to partition N observations into K clusters in which each observation belongs
to the cluster with the nearest mean. A cluster refers to a collection of data
points aggregated together because of certain similarities. For image
segmentation, clusters here are different image colors.

The following video should make you familiar with the K-Means clustering
algorithm:

Before we dive into the code, we need to install the required libraries:

pip3 install opencv-python numpy matplotlib


Let's import them:

import cv2
import numpy as np
import matplotlib.pyplot as plt

I'm going to use this image for demonstration purposes. Feel free to use any:

Loading the image:

# read the image


image = cv2.imread("image.jpg")

Before we do anything, let's convert the image into RGB format:

# convert to RGB
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
We going to use the cv2.kmeans() function, which takes a 2D array as input,
and since our original image is 3D (width, height, and depth
of 3 RGB values), we need to flatten the height and width into a single vector
of pixels (3 RGB values):

# reshape the image to a 2D array of pixels and 3 color values (RGB)


pixel_values = image.reshape((-1, 3))
# convert to float
pixel_values = np.float32(pixel_values)

Let's try to print the shape of the resulting pixel values:

print(pixel_values.shape)

Output:

(2073600, 3)

As expected, this results from flattening a high resolution (1920,


1050) image.

If you watched the video that explains the algorithm, you'd see he says
around minute 3 that the algorithm stops when none of the cluster
assignments change. Well, we going to cheat a little bit here since this is a
large number of data points, so it'll take a lot of time to process, we are going
to stop either when some number of iterations is exceeded (say 100), or if the
clusters move less than some epsilon value (let's pick 0.2 here), the below
code defines the stopping criteria in OpenCV:

# define stopping criteria


criteria = (cv2.TERM_CRITERIA_EPS +
cv2.TERM_CRITERIA_MAX_ITER, 100, 0.2)

If you look at the image, there are three primary colors (green for trees, blue
for the sea/lake, and white to orange for the sky). As a result, we going to use
three clusters for this image:

# number of clusters (K)


k=3
_, labels, (centers) = cv2.kmeans(pixel_values, k, None, criteria, 10,
cv2.KMEANS_RANDOM_CENTERS)

labels array is the cluster label for each pixel which is


either 0, 1, or 2 (since k = 3), centers refer to the center points (each
centroid's value).

cv2.KMEANS_RANDOM_CENTERS just indicates OpenCV to randomly


assign the values of the clusters initially.

If you look back at the code, we didn't mention that we converted the
flattened image pixel values to floats; we did that
because cv2.kmeans() expects that, let's convert them back to 8-bit pixel
values:

# convert back to 8 bit values


centers = np.uint8(centers)

# flatten the labels array


labels = labels.flatten()

Now let's construct the segmented image:


# convert all pixels to the color of the centroids
segmented_image = centers[labels.flatten()]

Converting back to the original image shape and showing it:

# reshape back to the original image dimension


segmented_image = segmented_image.reshape(image.shape)
# show the image
plt.imshow(segmented_image)
plt.show()

Here is the resulting image:

Awesome, we can also disable some clusters in the image. For instance, let's
disable cluster number 2 and show the original image:

# disable only the cluster number 2 (turn the pixel into black)
masked_image = np.(image)
# convert to the shape of a vector of pixel values
masked_image = masked_image.reshape((-1, 3))
# color (i.e cluster) to disable
cluster = 2
masked_image[labels == cluster] = [0, 0, 0]
# convert back to original shape
masked_image = masked_image.reshape(image.shape)
# show the image
plt.imshow(masked_image)
plt.show()

Here is the resulting image:

Wow, it turns out that cluster 2 is the trees. Feel free to:

Disable other clusters and see which is segmented accurately.


Tweak the parameters for better results (such as k ).
Use other images that clearly contain different objects with different
colors.

Note that there are other segmentation techniques such as Hough


transform, contour detection, and the current state-of-the-art semantic
segmentation.

Want to Learn More?


Here are some useful resources you can read:

K-Means Clustering in OpenCV.


Introduction to Image Segmentation with K-Means clustering.

Source Code:

kmeans_segmentation.py
import cv2
import numpy as np
import matplotlib.pyplot as plt
import sys

# read the image


image = cv2.imread(sys.argv[1])

# convert to RGB
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

# reshape the image to a 2D array of pixels and 3 color values (RGB)


pixel_values = image.reshape((-1, 3))
# convert to float
pixel_values = np.float32(pixel_values)

# define stopping criteria


criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 100, 0.2)

# number of clusters (K)


k=3
compactness, labels, (centers) = cv2.kmeans(pixel_values, k, None, criteria, 10,
cv2.KMEANS_RANDOM_CENTERS)

# convert back to 8 bit values


centers = np.uint8(centers)

# flatten the labels array


labels = labels.flatten()

# convert all pixels to the color of the centroids


segmented_image = centers[labels]

# reshape back to the original image dimension


segmented_image = segmented_image.reshape(image.shape)

# show the image


plt.imshow(segmented_image)
plt.show()

# disable only the cluster number 2 (turn the pixel into black)
masked_image = np.(image)
# convert to the shape of a vector of pixel values
masked_image = masked_image.reshape((-1, 3))
# color (i.e cluster) to disable
cluster = 2
masked_image[labels == cluster] = [0, 0, 0]

# convert back to original shape


masked_image = masked_image.reshape(image.shape)
# show the image
plt.imshow(masked_image)
plt.show()

live_kmeans_segmentation.py (using live cam)


import cv2
import numpy as np
cap = cv2.VideoCapture(0)
k=5

# define stopping criteria


criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 100, 0.2)

while True:
# read the image
_, image = cap.read()

# reshape the image to a 2D array of pixels and 3 color values (RGB)


pixel_values = image.reshape((-1, 3))
# convert to float
pixel_values = np.float32(pixel_values)

# number of clusters (K)


_, labels, (centers) = cv2.kmeans(pixel_values, k, None, criteria, 10,
cv2.KMEANS_RANDOM_CENTERS)

# convert back to 8 bit values


centers = np.uint8(centers)

# convert all pixels to the color of the centroids


segmented_image = centers[labels.flatten()]

# reshape back to the original image dimension


segmented_image = segmented_image.reshape(image.shape)

# reshape labels too


labels = labels.reshape(image.shape[0], image.shape[1])

cv2.imshow("segmented_image", segmented_image)
# visualize each segment

if cv2.waitKey(1) == ord("q"):
break
cap.release()
cv2.destroyAllWindows()
PART 12: Detect Contours in
Images using OpenCV in Python
Learning how to detect contours in images for image segmentation, shape analysis and object detection
and recognition using OpenCV in Python.

A contour is a closed curve joining all the continuous points having some
color or intensity, they represent the shapes of objects found in an image.
Contour detection is a useful technique for shape analysis and object
detection and recognition.

In a previous tutorial, we have discussed edge detection using the Canny


algorithm and we've seen how to implement it in OpenCV, you may ask,
what's the difference between edge detection and contour detection?

Well, when we perform edge detection, we find the points where the intensity
of colors changes significantly, and then we simply turn those pixels on.
However, contours are abstract collections of points and segments
corresponding to the shapes of the objects in the image. As a result, we can
manipulate contours in our programs such as counting the number of
contours, using them to categorize the shapes of objects, cropping objects
from an image (image segmentation), and much more.

Contour detection is not the only algorithm for image segmentation though,
there are a lot of others, such as the current state-of-the-art semantic
segmentation, hough transform, and K-Means segmentation.

For better accuracy, here is the whole pipeline that we gonna follow to
successfully detect contours in an image:

Convert the image to a binary image, it is a common practice for the


input image to be a binary image (which should be a result of a
thresholded image or edge detection).
Finding the contours using findContours() OpenCV function.
Draw these contours and show the image.
Alright, let's get started. First, let's install the dependencies for this tutorial:

pip3 install matplotlib opencv-python

Importing the necessary modules:

import cv2
import matplotlib.pyplot as plt

We gonna use this image for this tutorial:

Let's load it:

# read the image


image = cv2.imread("thumbs_up_down.jpg")

Converting it to RGB and then grayscale:


# convert to RGB
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
# convert to grayscale
gray = cv2.cvtColor(image, cv2.COLOR_RGB2GRAY)

As mentioned earlier in this tutorial, we gonna need to create a binary image,


which means each pixel of the image is either black or white. This is a
necessity in OpenCV, finding contours is like finding a white object from a
black background, objects to be found should be white and the background
should be black.

# create a binary thresholded image


_, binary = cv2.threshold(gray, 225, 255, cv2.THRESH_BINARY_INV)
# show it
plt.imshow(binary, cmap="gray")
plt.show()

The above code creates the binary image by disabling (setting to 0) pixels
that have a value of less than 225 and turning on (setting to 255) the pixels
that has a value of more than 225, here is the output image:
Now, this is easy for OpenCV to detect contours:

# find the contours from the thresholded image


contours, hierarchy = cv2.findContours(binary, cv2.RETR_TREE,
cv2.CHAIN_APPROX_SIMPLE)
# draw all contours
image = cv2.drawContours(image, contours, -1, (0, 255, 0), 2)

The above code finds contours within the binary image and draws them with
a thick green line to the image, let's show it:

# show the image with the drawn contours


plt.imshow(image)
plt.show()

Output image:
To achieve good results on different and real-world images, you need to tune
your threshold value or perform edge detection. For instance, for a pancakes
image, I've decreased the threshold to 127, here is the result:
Alright, this is it for this tutorial, if you want to test this on your live camera,
head to this link.

Source Code:

contour_detector.py
import cv2
import matplotlib.pyplot as plt

# read the image


image = cv2.imread("thumbs_up_down.jpg")
# convert to RGB
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
# convert to grayscale
gray = cv2.cvtColor(image, cv2.COLOR_RGB2GRAY)
# create a binary thresholded image
_, binary = cv2.threshold(gray, 225, 255, cv2.THRESH_BINARY_INV)
# show it
plt.imshow(binary, cmap="gray")
plt.show()
# find the contours from the thresholded image
contours, hierarchy = cv2.findContours(binary, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
# draw all contours
image = cv2.drawContours(image, contours, -1, (0, 255, 0), 2)
# show the image with the drawn contours
plt.imshow(image)
plt.show()

live-contour-detector.py
import cv2

cap = cv2.VideoCapture(0)

while True:
_, frame = cap.read()
# convert to grayscale
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
# create a binary thresholded image
_, binary = cv2.threshold(gray, 255 // 2, 255, cv2.THRESH_BINARY_INV)
# find the contours from the thresholded image
contours, hierarchy = cv2.findContours(binary, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
# draw all contours
image = cv2.drawContours(frame, contours, -1, (0, 255, 0), 2)
# show the images
cv2.imshow("gray", gray)
cv2.imshow("image", image)
cv2.imshow("binary", binary)
if cv2.waitKey(1) == ord("q"):
break

cap.release()
cv2.destroyAllWindows()
PART 13: Optical Character
Recognition (OCR) in Python
Learn how to Use Tesseract OCR library and pytesseract wrapper for optical character recognition
(OCR) to convert text in images into digital text in Python.

Humans can easily understand the text content of an image simply by looking
at it. However, it is not the case for computers. They need some sort of a
structured method or algorithm to be able to understand it. This is
where Optical Character Recognition (OCR) comes into play.

Optical Character Recognition is the process of detecting text content on


images and converts it to machine-encoded text that we can access and
manipulate in Python (or any programming language) as a string variable. In
this tutorial, we gonna use the Tesseract library to do that.

Tesseract library contains an OCR engine and a command-line program, so it


has nothing to do with Python, please follow their official guide for
installation, as it is a required tool for this tutorial.

We gonna use pytesseract module for Python which is a wrapper for the
Tesseract-OCR engine, so we can access it via Python.

The most recent stable version of tesseract is 4 which uses a new recurrent
neural network (LSTM) based OCR engine which is focused on line
recognition.

Let's get started, you need to install:

Tesseract-OCR Engine (follow their guide for your operating system).


pytesseract wrapper module using:
pip3 install pytesseract

Other utility modules for this tutorial:


pip3 install numpy matplotlib opencv-python pillow

After you have everything installed in your machine, open up a new Python
file and follow along:

import pytesseract
import cv2
import matplotlib.pyplot as plt
from PIL import Image

For demonstration purposes, I'm gonna use this image for recognition:

I've named it "test.png" and put it in the current directory, let's load this
image:

# read the image using OpenCV


image = cv2.imread("test.png")
# or you can use Pillow
# image = Image.open("test.png")

As you may notice, you can load the image either with OpenCV or Pillow, I
prefer using OpenCV as it enables us to use the live camera.

Let's recognize that text:

# get the string


string = pytesseract.image_to_string(image)
# print it
print(string)

Note: If the above code raises an error, please consider adding Tesseract-
OCR binaries to PATH variables. Read their official installation guide more
carefully.

image_to_string() function does exactly what you expect, it converts the


containing image text to characters, let's see the result:

This is a lot of 12 point text to test the


ocr code and see if it works on all types
of file format.

The quick brown dog jumped over the


lazy fox. The quick brown dog jumped
over the lazy fox. The quick brown dog
jumped over the lazy fox. The quick
brown dog jumped over the lazy fox.

Excellent, there is another function image_to_data() which outputs more


information than that, including words with their
corresponding width, height, and x, y coordinates, this will enable us to make
a lot of useful stuff. For instance, let's search for words in the document and
draw a bounding box around a specific word of our choice, below code,
handles that:

# make a of this image to draw in


image_ = image.()
# the target word to search for
target_word = "dog"
# get all data from the image
data = pytesseract.image_to_data(image,
output_type=pytesseract.Output.DICT)

So we're going to search for the word "dog" in the text document, we want
the output data to be structured and not a raw string, that's why I
passed output_type to be a dictionary, so we can easily get each word's data
(you can print data dictionary to see how the output is organized).

Let's get all the occurrences of that word:

# get all occurences of the that word


word_occurences = [ i for i, word in enumerate(data["text"]) if word.lower()
== target_word ]

Now let's draw a surrounding box on each word:


for occ in word_occurences:
# extract the width, height, top and left position for that detected word
w = data["width"][occ]
h = data["height"][occ]
l = data["left"][occ]
t = data["top"][occ]
# define all the surrounding box points
p1 = (l, t)
p2 = (l + w, t)
p3 = (l + w, t + h)
p4 = (l, t + h)
# draw the 4 lines (rectangular)
image_ = cv2.line(image_, p1, p2, color=(255, 0, 0), thickness=2)
image_ = cv2.line(image_, p2, p3, color=(255, 0, 0), thickness=2)
image_ = cv2.line(image_, p3, p4, color=(255, 0, 0), thickness=2)
image_ = cv2.line(image_, p4, p1, color=(255, 0, 0), thickness=2)

Saving and showing the resulting image:

plt.imsave("all_dog_words.png", image_)
plt.imshow(image_)
plt.show()

Take a look on the result:


Amazing, isn't it? This is not all! you can pass lang parameter
to image_to_string() or image_to_data() functions to make it easy
recognizing text in different languages. You can also use
the image_to_boxes() function which recognizes characters and their box
boundaries, please refer to their official documentation and available
languages for more information.

A note though, this method is ideal for recognizing text in scanned


documents and papers. Other uses of OCR include the automation of passport
recognition and extraction of information from them, data entry processes,
detection and recognition of car number plates, and much more!

Also, this won't work very well on hand-written text, complex real-world
images, and unclear images or images that contain an exclusive amount of
text.

Alright, that's it for this tutorial, let us see what you can build with this
utility!

We have made a tutorial where you can use OCR to extract text from images
inside PDF files, check it out!

Source Code:

extracting_text.py

import pytesseract
import cv2
import matplotlib.pyplot as plt
import sys
from PIL import Image

# read the image using OpenCV


# from the command line first argument
image = cv2.imread(sys.argv[1])
# or you can use Pillow
# image = Image.open(sys.argv[1])

# get the string


string = pytesseract.image_to_string(image)
# print it
print(string)

# get all data


data = pytesseract.image_to_data(image)

print(data)

draw_boxes.py
import pytesseract
import cv2
import matplotlib.pyplot as plt
from PIL import Image

# read the image using OpenCV


image = cv2.imread("test.png")
# make a of this image to draw in
image_ = image.()
# the target word to search for
target_word = "dog"
# get all data from the image
data = pytesseract.image_to_data(image, output_type=pytesseract.Output.DICT)
# get all occurences of the that word
word_occurences = [ i for i, word in enumerate(data["text"]) if word == target_word ]

for occ in word_occurences:


# extract the width, height, top and left position for that detected word
w = data["width"][occ]
h = data["height"][occ]
l = data["left"][occ]
t = data["top"][occ]
# define all the surrounding box points
p1 = (l, t)
p2 = (l + w, t)
p3 = (l + w, t + h)
p4 = (l, t + h)
# draw the 4 lines (rectangular)
image_ = cv2.line(image_, p1, p2, color=(255, 0, 0), thickness=2)
image_ = cv2.line(image_, p2, p3, color=(255, 0, 0), thickness=2)
image_ = cv2.line(image_, p3, p4, color=(255, 0, 0), thickness=2)
image_ = cv2.line(image_, p4, p1, color=(255, 0, 0), thickness=2)

plt.imsave("all_dog_words.png", image_)
plt.imshow(image_)
plt.show()
live_recognizer.py (using cam)
import pytesseract
import cv2
import matplotlib.pyplot as plt
from PIL import Image

# the target word to search for


target_word = "your"

cap = cv2.VideoCapture(0)

while True:
# read the image from the cam
_, image = cap.read()
# make a of this image to draw in
image_ = image.()
# get all data from the image
data = pytesseract.image_to_data(image, output_type=pytesseract.Output.DICT)
# print the data
print(data["text"])
# get all occurences of the that word
word_occurences = [ i for i, word in enumerate(data["text"]) if word.lower() == target_word ]

for occ in word_occurences:


# extract the width, height, top and left position for that detected word
w = data["width"][occ]
h = data["height"][occ]
l = data["left"][occ]
t = data["top"][occ]
# define all the surrounding box points
p1 = (l, t)
p2 = (l + w, t)
p3 = (l + w, t + h)
p4 = (l, t + h)
# draw the 4 lines (rectangular)
image_ = cv2.line(image_, p1, p2, color=(255, 0, 0), thickness=2)
image_ = cv2.line(image_, p2, p3, color=(255, 0, 0), thickness=2)
image_ = cv2.line(image_, p3, p4, color=(255, 0, 0), thickness=2)
image_ = cv2.line(image_, p4, p1, color=(255, 0, 0), thickness=2)

if cv2.waitKey(1) == ord("q"):
break

cv2.imshow("image_", image_)

cap.release()
cv2.destroyAllWindows()
PART 14: Detect Shapes in Images
in Python using OpenCV
Detecting shapes, lines and circles in images using Hough Transform technique with OpenCV in
Python. Hough transform is a popular feature extraction technique to detect any shape within an image.

In the previous tutorial, we have seen how you can detect edges in an image.
However, that's not usually enough in the image processing phase. In this
tutorial, you will learn how you can detect shapes (mainly lines and circles)
in images using Hough Transform technique in Python
using OpenCV library.

The Hough Transform is a popular feature extraction technique to detect any


shape within an image. It is mainly used in image analysis, computer vision
and image recognition.

Let's get started, installing the requirements:

pip3 install opencv-python numpy matplotlib

Importing the modules:

import numpy as np
import matplotlib.pyplot as plt
import cv2

Detecting Lines
I'm gonna use a photo of a computer monitor, make sure you have the
photo monitor.jpg in your current directory (you're free to use any):

# read the image


image = cv2.imread("monitor.jpg")

We need to convert this image to gray scale for edge detection:

# convert to grayscale
grayscale = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

Let's detect the edges of the image:

# perform edge detection


edges = cv2.Canny(grayscale, 30, 100)

If you're not sure what cv2.Canny is doing, refer to this tutorial.

Now we have detected the edges in the image, it is suited for us to use hough
transform to detect the lines:

# detect lines in the image using hough lines technique


lines = cv2.HoughLinesP(edges, 1, np.pi/180, 60, np.array([]), 50, 5)

cv2.HoughLinesP() function finds line segments in a binary image using the


probabilistic Hough transform. For more information about its parameters,
check this tutorial.

Let's draw the lines:

# iterate over the output lines and draw them


for line in lines:
for x1, y1, x2, y2 in line:
cv2.line(image, (x1, y1), (x2, y2), (20, 220, 20), 3)

Showing the image:

# show the image


plt.imshow(image)
plt.show()

Here is my output:

The green lines are the lines we just drew, as you can see, most of the
monitor is surrounded by green lines, feel free to tweak the parameters to get
better results.

Here is the full code for detecting lines in your live camera:
import numpy as np
import matplotlib.pyplot as plt
import cv2

cap = cv2.VideoCapture(0)

while True:
_, image = cap.read()
# convert to grayscale
grayscale = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# perform edge detection
edges = cv2.Canny(grayscale, 30, 100)
# detect lines in the image using hough lines technique
lines = cv2.HoughLinesP(edges, 1, np.pi/180, 60, np.array([]), 50, 5)
# iterate over the output lines and draw them
for line in lines:
for x1, y1, x2, y2 in line:
cv2.line(image, (x1, y1), (x2, y2), (255, 0, 0), 3)
cv2.line(edges, (x1, y1), (x2, y2), (255, 0, 0), 3)
# show images
cv2.imshow("image", image)
cv2.imshow("edges", edges)
if cv2.waitKey(1) == ord("q"):
break
cap.release()
cv2.destroyAllWindows()

Detecting Circles
In order to detect circles, we gonna need to use cv2.HoughCircles() method
instead, I have coins.jpg image (which contains several coins) in the current
directory, let's load it:

# load the image


img = cv2.imread("coins.jpg")

Next, we gonna create a new of this image, in which we're going to draw the
detected circles:

# convert BGR to RGB to be suitable for showing using matplotlib library


img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
# make a of the original image
cimg = img.()

In order to pass the image to that method, we need to convert it to grayscale


and blur the image, cv2.medianBlur() does the job:

# convert image to grayscale


img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# apply a blur using the median filter
img = cv2.medianBlur(img, 5)

After that, let's detect the circles:

# finds the circles in the grayscale image using the Hough transform
circles = cv2.HoughCircles(image=img, method=cv2.HOUGH_GRADIENT,
dp=0.9,
minDist=80, param1=110, param2=39, maxRadius=70)
In case you're wondering what does these parameters refer to,
type help(cv2.HoughCircles) and you'll find a good explanation.

Finally, let's draw and show the circles we justed detected:

for co, i in enumerate(circles[0, :], start=1):


# draw the outer circle in green
cv2.circle(cimg,(i[0],i[1]),i[2],(0,255,0),2)
# draw the center of the circle in red
cv2.circle(cimg,(i[0],i[1]),2,(0,0,255),3)

# print the number of circles detected


print("Number of circles detected:", co)
# save the image, convert to BGR to save with proper colors
# cv2.imwrite("coins_circles_detected.png", cimg)
# show the image
plt.imshow(cimg)
plt.show()

Here is my result:
As you can see, it isn't perfect, as it doesn't detect all circles in the image, try
to tune the parameters passed to cv2.HoughCircles() method and see if you
achieve better results.

Alright, that's it for now, here are the references of this tutorial:

Hough Line Transform.


Hough Circle Transform
Official OpenCV documentation.
Source Code:

shape_detector.py
import numpy as np
import matplotlib.pyplot as plt
import cv2
import sys

# read the image from arguments


image = cv2.imread(sys.argv[1])

# convert to grayscale
grayscale = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

# perform edge detection


edges = cv2.Canny(grayscale, 30, 100)

# detect lines in the image using hough lines technique


lines = cv2.HoughLinesP(edges, 1, np.pi/180, 60, np.array([]), 50, 5)
# iterate over the output lines and draw them
for line in lines:
for x1, y1, x2, y2 in line:
cv2.line(image, (x1, y1), (x2, y2), color=(20, 220, 20), thickness=3)

# show the image


plt.imshow(image)
plt.show()

live_shape_detector.py

import numpy as np
import matplotlib.pyplot as plt
import cv2

cap = cv2.VideoCapture(0)

while True:
_, image = cap.read()
# convert to grayscale
grayscale = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# perform edge detection
edges = cv2.Canny(grayscale, 30, 100)
# detect lines in the image using hough lines technique
lines = cv2.HoughLinesP(edges, 1, np.pi/180, 60, np.array([]), 50, 5)
# iterate over the output lines and draw them
for line in lines:
for x1, y1, x2, y2 in line:
cv2.line(image, (x1, y1), (x2, y2), (255, 0, 0), 3)
cv2.line(edges, (x1, y1), (x2, y2), (255, 0, 0), 3)
# show images
cv2.imshow("image", image)
cv2.imshow("edges", edges)
if cv2.waitKey(1) == ord("q"):
break

cap.release()
cv2.destroyAllWindows()

circle_detector.py
import cv2
import numpy as np
import matplotlib.pyplot as plt
import sys

# load the image


img = cv2.imread(sys.argv[1])
# convert BGR to RGB to be suitable for showing using matplotlib library
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
# make a of the original image
cimg = img.()
# convert image to grayscale
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# apply a blur using the median filter
img = cv2.medianBlur(img, 5)
# finds the circles in the grayscale image using the Hough transform
circles = cv2.HoughCircles(image=img, method=cv2.HOUGH_GRADIENT, dp=0.9,
minDist=80, param1=110, param2=39, maxRadius=70)
for co, i in enumerate(circles[0, :], start=1):
# draw the outer circle
cv2.circle(cimg,(i[0],i[1]),i[2],(0,255,0),2)
# draw the center of the circle
cv2.circle(cimg,(i[0],i[1]),2,(0,0,255),3)

# print the number of circles detected


print("Number of circles detected:", co)
# save the image, convert to BGR to save with proper colors
# cv2.imwrite("coins_circles_detected.png", cimg)
# show the image
plt.imshow(cimg)
plt.show()
PART 15: Perform Edge Detection
in Python using OpenCV
Learning how to apply edge detection in computer vision applications using canny edge detector
algorithm with OpenCV in Python.

Edge detection is an image processing technique for finding the boundaries of


objects within images. It mainly works by detecting discontinuities in
brightness. One of the most popular and widely used algorithm is Canny edge
detector.

Canny edge detector is an edge detection operator that uses multi-stage


algorithm to detect a wide range of edges in images.

The main stages are:

1. Filtering out noise using Gaussian blur algorithm.


2. Finding the strength and direction of edges using Sobel Filters.
3. Isolating the strongest edges and thin them to one-pixel wide lines by
applying non-maximum suppression.
4. Using hysteresis to isolate the best edges.

Learn more here about the theory behind Canny edge detector.

Alright, let's implement it in Python using OpenCV, installing it:

pip3 install opencv-python matplotlib numpy

Open up a new Python file and follow along:

import cv2
import numpy as np
import matplotlib.pyplot as plt

Now let's read the image when want to detect its edges:

# read the image


image = cv2.imread("little_flower.jpg")

I have an example image in my current directory, make sure you do too.

Before we pass the image to the Canny edge detector, we need to convert the
image to gray scale:

# convert it to grayscale
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

Let's see it:

# show the grayscale image


plt.imshow(gray, cmap="gray")
plt.show()
All we need to do now, is to pass this image to cv2.Canny() function which
finds edges in the input image and marks them in the output map edges using
the Canny algorithm:

# perform the canny edge detector to detect image edges


edges = cv2.Canny(gray, threshold1=30, threshold2=100)

The smallest value between threshold1 and threshold2 is used for edge
linking. The largest value is used to find initial segments of strong edges.

Let's see the resulting image:

Interesting, try to fine tune the threshold values and see if you can make it
better.

If you want to use the live camera, here is the full code for that:

import numpy as np
import cv2
cap = cv2.VideoCapture(0)

while True:
_, frame = cap.read()
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
edges = cv2.Canny(gray, 30, 100)
cv2.imshow("edges", edges)
cv2.imshow("gray", gray)
if cv2.waitKey(1) == ord("q"):
break

cap.release()
cv2.destroyAllWindows()

Alright, we are done!

The purpose of detecting edges is to capture important events and changes in


the properties of the world. It is one of the fundamental steps in image
processing, image pattern recognition, and computer vision techniques.

Source Code:

edge_detector.py
import cv2
import numpy as np
import matplotlib.pyplot as plt
import sys
# read the image
image = cv2.imread(sys.argv[1])

# convert it to grayscale
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

# show the grayscale image, if you want to show, uncomment 2 below lines
# plt.imshow(gray, cmap="gray")
# plt.show()

# perform the canny edge detector to detect image edges


edges = cv2.Canny(gray, threshold1=30, threshold2=100)

# show the detected edges


plt.imshow(edges, cmap="gray")
plt.show()

live_edge_detector.py
import numpy as np
import cv2

cap = cv2.VideoCapture(0)

while True:
_, frame = cap.read()
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
edges = cv2.Canny(gray, 30, 100)
cv2.imshow("edges", edges)
cv2.imshow("gray", gray)
if cv2.waitKey(1) == ord("q"):
break

cap.release()
cv2.destroyAllWindows()
PART 16: Use Transfer Learning
for Image Classification using
TensorFlow in Python
Learn what is transfer learning and how to use pre trained MobileNet model for better performance to
classify flowers using TensorFlow in Python.

In the real world, it is rare to train a Convolutional Neural Network (CNN)


from scratch, as it is hard to collect a massive dataset to get better
performance. Instead, it is common to use a pre-trained network on a very
large dataset and tune it for your classification problem, this process is
called Transfer Learning.

What is Transfer Learning


It is a machine learning method where a model is trained on a task that can be
trained (or tuned) for another task, it is very popular nowadays especially in
computer vision and natural language processing problems. Transfer learning
is very handy given the enormous resources required to train deep learning
models. Here are the most important benefits of transfer learning:

Speeds up training time.


It requires fewer data.
Use the state-of-the-art models that are developed by deep learning
experts.

For these reasons, it is better to use transfer learning for image classification
problems instead of creating your model and training from scratch, models
such as ResNet, InceptionV3, Xception, and MobileNet are trained on a
massive dataset called ImageNet which contains more than 14 million images
that classify 1000 different objects.

Loading & Preparing the Dataset


We gonna be using the flower photos dataset, which consists of 5 types of
flowers (daisy, dandelion, roses, sunflowers and tulips).

After you have everything installed by the following command:

pip3 install tensorflow numpy matplotlib

Open up a new Python file and import the necessary modules:

import tensorflow as tf
from tensorflow.keras.models import Model
from tensorflow.keras.applications import MobileNetV2, ResNet50,
InceptionV3 # try to use them and see which is better
from tensorflow.keras.layers import Dense
from tensorflow.keras.callbacks import ModelCheckpoint, TensorBoard
from tensorflow.keras.utils import get_file
from tensorflow.keras.preprocessing.image import ImageDataGenerator
import os
import pathlib
import numpy as np

The dataset comes with inconsistent image sizes, as a result, we gonna need
to resize all the images to a shape that is acceptable by MobileNet (the model
that we gonna use):

batch_size = 32
# 5 types of flowers
num_classes = 5
# training for 10 epochs
epochs = 10
# size of each image
IMAGE_SHAPE = (224, 224, 3)

Let's load the dataset:

def load_data():
"""This function downloads, extracts, loads, normalizes and one-hot
encodes Flower Photos dataset"""
# download the dataset and extract it
data_dir =
get_file(origin='https://storage.googleapis.com/download.tensorflow.org/example_images
fname='flower_photos', untar=True)
data_dir = pathlib.Path(data_dir)
# count how many images are there
image_count = len(list(data_dir.glob('*/*.jpg')))
print("Number of images:", image_count)
# get all classes for this dataset (types of flowers) excluding LICENSE file
CLASS_NAMES = np.array([item.name for item in data_dir.glob('*') if
item.name != "LICENSE.txt"])
# roses = list(data_dir.glob('roses/*'))
# 20% validation set 80% training set
image_generator = ImageDataGenerator(rescale=1/255,
validation_split=0.2)
# make the training dataset generator
train_data_gen =
image_generator.flow_from_directory(directory=str(data_dir),
batch_size=batch_size,
classes=list(CLASS_NAMES),
target_size=(IMAGE_SHAPE[0], IMAGE_SHAPE[1]),
shuffle=True, subset="training")
# make the validation dataset generator
test_data_gen =
image_generator.flow_from_directory(directory=str(data_dir),
batch_size=batch_size,
classes=list(CLASS_NAMES),
target_size=(IMAGE_SHAPE[0], IMAGE_SHAPE[1]),
shuffle=True, subset="validation")
return train_data_gen, test_data_gen, CLASS_NAMES

The above function downloads and extracts the dataset, and then use
the ImageDataGenerator Keras utility class to wrap the dataset in a Python
generator (so the images only loads to memory by batches, not in one shot).

After that, we scale and resize the images to a fixed shape and then split the
dataset by 80% for training and 20% for validation.

I also encourage you to change this function to use tf.data API instead, the
dataset is already in Tensorflow datasets and you can load it as we did in this
tutorial.

Constructing the Model


We are going to use the MobileNetV2 model, it is not a very heavy model but
does a good job in the training and testing process.

As mentioned earlier, this model is trained to classify different 1000 objects,


we need a way to tune this model so it can be suitable for just our flower
classification. As a result, we are going to remove that last fully connected
layer, and add our own final layer that consists of 5 units with softmax
activation function:

def create_model(input_shape):
# load MobileNetV2
model = MobileNetV2(input_shape=input_shape)
# remove the last fully connected layer
model.layers.pop()
# freeze all the weights of the model except the last 4 layers
for layer in model.layers[:-4]:
layer.trainable = False
# construct our own fully connected layer for classification
output = Dense(num_classes, activation="softmax")
# connect that dense layer to the model
output = output(model.layers[-1].output)
model = Model(inputs=model.inputs, outputs=output)
# print the summary of the model architecture
model.summary()
# training the model using adam optimizer
model.compile(loss="categorical_crossentropy", optimizer="adam",
metrics=["accuracy"])
return model

The above function will first download the model weights (if not available)
and then remove the last layer.

After that, we freeze the last layers, that's because it is pre-trained, we don't
wanna modify these weights. However, it is a good practice to retrain the last
convolutional layer as this dataset is quite similar to the original ImageNet
dataset, so we won't ruin the weights (that much).

Finally, we construct our own dense layer that consists of five neurons and
connect it to the last layer of the MobileNetV2 model. The following figure
demonstrates the architecture:
Note that you can use the TensorFlow hub to load this model very easily,
check this link to use their code snippet for creating the model.

Training the Model


Let's use the above two functions to start training:

if __name__ == "__main__":
# load the data generators
train_generator, validation_generator, class_names = load_data()
# constructs the model
model = create_model(input_shape=IMAGE_SHAPE)
# model name
model_name = "MobileNetV2_finetune_last5"
# some nice callbacks
tensorboard = TensorBoard(log_dir=os.path.join("logs", model_name))
checkpoint = ModelCheckpoint(os.path.join("results", f"{model_name}" +
"-loss-{val_loss:.2f}.h5"),
save_best_only=True,
verbose=1)
# make sure results folder exist
if not os.path.isdir("results"):
os.mkdir("results")
# count number of steps per epoch
training_steps_per_epoch = np.ceil(train_generator.samples / batch_size)
validation_steps_per_epoch = np.ceil(validation_generator.samples /
batch_size)
# train using the generators
model.fit_generator(train_generator,
steps_per_epoch=training_steps_per_epoch,
validation_data=validation_generator,
validation_steps=validation_steps_per_epoch,
epochs=epochs, verbose=1, callbacks=[tensorboard,
checkpoint])

Nothing fancy here, loading the data, constructing the model, and then using
some callbacks for tracking and saving the best models.

As soon as you execute the script, the training process begins, you'll notice
that not all weights are being trained:

Total params: 2,264,389


Trainable params: 418,565
Non-trainable params: 1,845,824

It'll take several minutes depending on your hardware.

I used tensorboard to experiment a little bit, for example, I tried freezing all
the weights except for the last classification layer, decreasing the optimizer
learning rate, used some image flipping, zooming, and general augmentation,
here is a screenshot:
MobileNetV2 was the model I froze all its weights (except for the last 5
unit dense layer of course).
MobileNetV2_augmentation uses some image augmentation.
MobileNetV2_finetune_last5 the model we're using right now, which does not
freeze the last 4 layers of the MobileNetV2 model.
MobileNetV2_finetune_last5_less_lr was dominant for almost 86% accuracy,
that's because once you don't freeze the trained weights, you need to
decrease the learning rate so you can slowly adjust the weights to your
dataset. This was an Adam optimizer with a 0.0005 learning rate.

Note: to modify the learning rate, you can import Adam optimizer
from keras.optimizers package, and then compile the model
with optimizer=Adam(lr=0.0005) parameter.

Testing the Model


Now to evaluate our model, we need to load the optimal weights via
the model.load_weights() method, you need to choose the weights that has the
least loss value, in my case, it's 0.63 loss:
# load the data generators
train_generator, validation_generator, class_names = load_data()
# constructs the model
model = create_model(input_shape=IMAGE_SHAPE)
# load the optimal weights
model.load_weights("results/MobileNetV2_finetune_last5-loss-0.63.h5")
validation_steps_per_epoch = np.ceil(validation_generator.samples /
batch_size)
# print the validation loss & accuracy
evaluation = model.evaluate_generator(validation_generator,
steps=validation_steps_per_epoch, verbose=1)
print("Val loss:", evaluation[0])
print("Val Accuracy:", evaluation[1])

Make sure to use the optimal weights, the one which has the lower loss and
higher accuracy.

Output:

23/23 [==============================] - 4s 178ms/step - loss:


0.6338 - accuracy: 0.8140
Val loss: 0.6337507224601248
Val Accuracy: 0.81395346

Okay, let's visualize a little bit, we are going to plot a complete batch of
images with its corresponding predicted and correct labels:

# get a random batch of images


image_batch, label_batch = next(iter(validation_generator))
# turn the original labels into human-readable text
label_batch = [class_names[np.argmax(label_batch[i])] for i in
range(batch_size)]
# predict the images on the model
predicted_class_names = model.predict(image_batch)
predicted_ids = [np.argmax(predicted_class_names[i]) for i in
range(batch_size)]
# turn the predicted vectors to human readable labels
predicted_class_names = np.array([class_names[id] for id in predicted_ids])
# some nice plotting
plt.figure(figsize=(10,9))
for n in range(30):
plt.subplot(6,5,n+1)
plt.subplots_adjust(hspace = 0.3)
plt.imshow(image_batch[n])
if predicted_class_names[n] == label_batch[n]:
color = "blue"
title = predicted_class_names[n].title()
else:
color = "red"
title = f"{predicted_class_names[n].title()}, correct:{label_batch[n]}"
plt.title(title, color=color)
plt.axis('off')
_ = plt.suptitle("Model predictions (blue: correct, red: incorrect)")
plt.show()

Once you run it, you'll get something like this:


Awesome! As you can see, out of 30 images, 25 was correctly predicted,
that's a good result though, as some flower images are a little ambiguous.

Conclusion
Alright, that's it. In this tutorial, you discovered how you can use transfer
learning to quickly develop and use state-of-the-art models using Tensorflow
and Keras in Python.

I highly encourage you to use other models that were mentioned above, try to
fine-tune them as well, good luck!

Source Code:

train.py
import tensorflow as tf
from tensorflow.keras.models import Model
from tensorflow.keras.applications import MobileNetV2, ResNet50, InceptionV3 # try to use them and
see which is better
from tensorflow.keras.layers import Dense
from tensorflow.keras.callbacks import ModelCheckpoint, TensorBoard
from tensorflow.keras.utils import get_file
from tensorflow.keras.preprocessing.image import ImageDataGenerator
import os
import pathlib
import numpy as np

batch_size = 32
num_classes = 5
epochs = 10

IMAGE_SHAPE = (224, 224, 3)

def load_data():
"""This function downloads, extracts, loads, normalizes and one-hot encodes Flower Photos
dataset"""
# download the dataset and extract it
data_dir =
get_file(origin='https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz'
fname='flower_photos', untar=True)
data_dir = pathlib.Path(data_dir)

# count how many images are there


image_count = len(list(data_dir.glob('*/*.jpg')))
print("Number of images:", image_count)

# get all classes for this dataset (types of flowers) excluding LICENSE file
CLASS_NAMES = np.array([item.name for item in data_dir.glob('*') if item.name !=
"LICENSE.txt"])

# roses = list(data_dir.glob('roses/*'))
# 20% validation set 80% training set
image_generator = ImageDataGenerator(rescale=1/255, validation_split=0.2)

# make the training dataset generator


train_data_gen = image_generator.flow_from_directory(directory=str(data_dir),
batch_size=batch_size,
classes=list(CLASS_NAMES), target_size=(IMAGE_SHAPE[0],
IMAGE_SHAPE[1]),
shuffle=True, subset="training")
# make the validation dataset generator
test_data_gen = image_generator.flow_from_directory(directory=str(data_dir),
batch_size=batch_size,
classes=list(CLASS_NAMES), target_size=(IMAGE_SHAPE[0],
IMAGE_SHAPE[1]),
shuffle=True, subset="validation")

return train_data_gen, test_data_gen, CLASS_NAMES

def create_model(input_shape):
# load MobileNetV2
model = MobileNetV2(input_shape=input_shape)
# remove the last fully connected layer
model.layers.pop()
# freeze all the weights of the model except the last 4 layers
for layer in model.layers[:-4]:
layer.trainable = False
# construct our own fully connected layer for classification
output = Dense(num_classes, activation="softmax")
# connect that dense layer to the model
output = output(model.layers[-1].output)

model = Model(inputs=model.inputs, outputs=output)

# print the summary of the model architecture


model.summary()

# training the model using adam optimizer


model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])
return model

if __name__ == "__main__":
# load the data generators
train_generator, validation_generator, class_names = load_data()

# constructs the model


model = create_model(input_shape=IMAGE_SHAPE)
# model name
model_name = "MobileNetV2_finetune_last5"

# some nice callbacks


tensorboard = TensorBoard(log_dir=os.path.join("logs", model_name))
checkpoint = ModelCheckpoint(os.path.join("results", f"{model_name}" + "-loss-
{val_loss:.2f}.h5"),
save_best_only=True,
verbose=1)

# make sure results folder exist


if not os.path.isdir("results"):
os.mkdir("results")

# count number of steps per epoch


training_steps_per_epoch = np.ceil(train_generator.samples / batch_size)
validation_steps_per_epoch = np.ceil(validation_generator.samples / batch_size)

# train using the generators


model.fit_generator(train_generator, steps_per_epoch=training_steps_per_epoch,
validation_data=validation_generator, validation_steps=validation_steps_per_epoch,
epochs=epochs, verbose=1, callbacks=[tensorboard, checkpoint])

test.py
from train import load_data, create_model, IMAGE_SHAPE, batch_size, np
import matplotlib.pyplot as plt
# load the data generators
train_generator, validation_generator, class_names = load_data()
# constructs the model
model = create_model(input_shape=IMAGE_SHAPE)
# load the optimal weights
model.load_weights("results/MobileNetV2_finetune_last5_less_lr-loss-0.45-acc-0.86.h5")

validation_steps_per_epoch = np.ceil(validation_generator.samples / batch_size)


# print the validation loss & accuracy
evaluation = model.evaluate_generator(validation_generator, steps=validation_steps_per_epoch,
verbose=1)
print("Val loss:", evaluation[0])
print("Val Accuracy:", evaluation[1])

# get a random batch of images


image_batch, label_batch = next(iter(validation_generator))
# turn the original labels into human-readable text
label_batch = [class_names[np.argmax(label_batch[i])] for i in range(batch_size)]
# predict the images on the model
predicted_class_names = model.predict(image_batch)
predicted_ids = [np.argmax(predicted_class_names[i]) for i in range(batch_size)]
# turn the predicted vectors to human readable labels
predicted_class_names = np.array([class_names[id] for id in predicted_ids])

# some nice plotting


plt.figure(figsize=(10,9))
for n in range(30):
plt.subplot(6,5,n+1)
plt.subplots_adjust(hspace = 0.3)
plt.imshow(image_batch[n])
if predicted_class_names[n] == label_batch[n]:
color = "blue"
title = predicted_class_names[n].title()
else:
color = "red"
title = f"{predicted_class_names[n].title()}, correct:{label_batch[n]}"
plt.title(title, color=color)
plt.axis('off')
_ = plt.suptitle("Model predictions (blue: correct, red: incorrect)")
plt.show()
PART 17: Generate and Read QR
Code in Python
Learning how you can generate and read QR Code in Python using qrcode and OpenCV libraries.

QR code is a type of matrix barcode that is a machine-readable optical label


that contains information about the item to which it is attached. In practice,
QR codes often contain data for a locator, identifier, or tracker that points to a
website or application, etc.

In this tutorial, you will learn how to generate and read QR codes in Python
using qrcode and OpenCV libraries.

Installing required dependencies:

pip3 install opencv-python qrcode numpy

Generate QR Code
First, let's start by generating QR codes, it is basically straightforward
using qrcode library:

import qrcode
# example data
data = "https://www.bbc.com"
# output file name
filename = "site.png"
# generate qr code
img = qrcode.make(data)
# save img to a file
img.save(filename)
This will generate a new image file in the current directory with the name
of "site.png", which contains a QR code image of the data specified (in this
case, this website URL), will look something like this:

You can also use this library to have full control with QR code generation
using the qrcode.QRCode() class, in which you can instantiate and specify the
size, fill color, back color, and error correction, like so:

import qrcode
import numpy as np
# data to encode
data = "https://www.bbc.com"
# instantiate QRCode object
qr = qrcode.QRCode(version=1, box_size=10, border=4)
# add data to the QR code
qr.add_data(data)
# compile the data into a QR code array
qr.make()
# print the image shape
print("The shape of the QR image:", np.array(qr.get_matrix()).shape)
# transfer the array into an actual image
img = qr.make_image(fill_color="white", back_color="black")
# save it to a file
img.save("site_inversed.png")

So in the creation of QRCode class, we specify the version parameter, which is


an integer from 1 to 40 that controls the size of the QR code image (1 is
small, 21x21 matrix, 40 is 185x185 matrix), but this will be overwritten when
the data doesn't fit the size you specify. In our case, it will scale up to version
3 automatically.

parameter controls how many pixels each box of the QR code is,
box_size
whereas the border controls how many boxes thick the border should be.

We then add the data using the qr.add_data() method, compiles it to an array
using qr.make() method, and then make the actual image
using qr.make_image() method. We specified white as the fill_color and black as
the back_color , which is the exact opposite of the default QR code, check it out:

And the shape of the image was indeed


scaled up and wasn't 21x21:

The shape of the QR image: (37, 37)


Read QR Code
There are many tools that read QR codes. However, we will be
using OpenCV for that, as it is popular and easy to integrate with the webcam
or any video.

Alright, open up a new Python file and follow along with me, let's read the
image that we just generated:

import cv2
# read the QRCODE image
img = cv2.imread("site.png")

Luckily for us, OpenCV already got QR code detector built-in:

# initialize the cv2 QRCode detector


detector = cv2.QRCodeDetector()

We have the image and the detector, let's detect and decode that data:

# detect and decode


data, bbox, straight_qrcode = detector.detectAndDecode(img)

The detectAndDecode() function takes an image as an input and decodes it to


return a tuple of 3 values: the data decoded from the QR code, the output
array of vertices of the found QR code quadrangle, and the output image
containing rectified and binarized QR code.

We just need data and bbox here, bbox will help us draw the quadrangle in
the image and data will be printed to the console!
Let's do it:

# if there is a QR code
if bbox is not None:
print(f"QRCode data:\n{data}")
# display the image with lines
# length of bounding box
n_lines = len(bbox)
for i in range(n_lines):
# draw all lines
point1 = tuple(bbox[i][0])
point2 = tuple(bbox[(i+1) % n_lines][0])
cv2.line(img, point1, point2, color=(255, 0, 0), thickness=2)

cv2.line() function draws a line segment connecting two points, we retrieve


these points from bbox array that was decoded
by detectAndDecode() previously. we specified a blue color ( (255, 0, 0) is
blue as OpenCV uses BGR colors ) and thickness of 2.

Finally, let's show the image and quit when a key is pressed:

# display the result


cv2.imshow("img", img)
cv2.waitKey(0)
cv2.destroyAllWindows()

Once you run this, the decoded data is printed:

QRCode data:
https://www.bbc.com
And the following image is shown:

As you can see, the blue lines are drawn in the exact QR code
borders. Awesome, we are done with this script, try to run it with different
data and see your own results!

Note that this is ideal for QR codes and not for barcodes, If you want to read
barcodes, check this tutorial that is dedicated to that!

If you want to detect and decode QR codes live using your webcam (and I'm
sure you do), here is a code for that:

import cv2
# initalize the cam
cap = cv2.VideoCapture(0)
# initialize the cv2 QRCode detector
detector = cv2.QRCodeDetector()
while True:
_, img = cap.read()
# detect and decode
data, bbox, _ = detector.detectAndDecode(img)
# check if there is a QRCode in the image
if bbox is not None:
# display the image with lines
for i in range(len(bbox)):
# draw all lines
cv2.line(img, tuple(bbox[i][0]), tuple(bbox[(i+1) % len(bbox)][0]),
color=(255, 0, 0), thickness=2)
if data:
print("[+] QR Code detected, data:", data)
# display the result
cv2.imshow("img", img)
if cv2.waitKey(1) == ord("q"):
break
cap.release()
cv2.destroyAllWindows()

Awesome, we are done with this tutorial, you can now integrate this into your
own applications!

Source Code:

generate_qrcode.py
import qrcode
import sys

# from command line arguments


data = sys.argv[1]
filename = sys.argv[2]
# generate qr code
img = qrcode.make(data)
# save img to a file
img.save(filename)

generate_qrcode_with_control.py
import qrcode
import numpy as np
# data to encode
data = "https://www.thepythoncode.com"

# instantiate QRCode object


qr = qrcode.QRCode(version=1, box_size=10, border=4)
# add data to the QR code
qr.add_data(data)
# compile the data into a QR code array
qr.make()
# print the image shape
print("The shape of the QR image:", np.array(qr.get_matrix()).shape)
# transfer the array into an actual image
img = qr.make_image(fill_color="white", back_color="black")
# save it to a file
img.save("site_inversed.png")

read_qrcode.py
import cv2
import sys

filename = sys.argv[1]

# read the QRCODE image


img = cv2.imread(filename)
# initialize the cv2 QRCode detector
detector = cv2.QRCodeDetector()

# detect and decode


data, bbox, straight_qrcode = detector.detectAndDecode(img)

# if there is a QR code
if bbox is not None:
print(f"QRCode data:\n{data}")
# display the image with lines
# length of bounding box
n_lines = len(bbox)
for i in range(n_lines):
# draw all lines
point1 = tuple(bbox[i][0])
point2 = tuple(bbox[(i+1) % n_lines][0])
cv2.line(img, point1, point2, color=(255, 0, 0), thickness=2)

# display the result


cv2.imshow("img", img)
cv2.waitKey(0)
cv2.destroyAllWindows()

read_qrcode_live.py
import cv2

# initalize the cam


cap = cv2.VideoCapture(0)

# initialize the cv2 QRCode detector


detector = cv2.QRCodeDetector()

while True:
_, img = cap.read()

# detect and decode


data, bbox, _ = detector.detectAndDecode(img)

# check if there is a QRCode in the image


if bbox is not None:
# display the image with lines
for i in range(len(bbox)):
# draw all lines
cv2.line(img, tuple(bbox[i][0]), tuple(bbox[(i+1) % len(bbox)][0]), color=(255, 0, 0),
thickness=2)

if data:
print("[+] QR Code detected, data:", data)

# display the result


cv2.imshow("img", img)

if cv2.waitKey(1) == ord("q"):
break

cap.release()
cv2.destroyAllWindows()
PART 18: Make an Image Classifier
in Python using Tensorflow 2 and
Keras
Building and training a model that classifies CIFAR-10 dataset images that were loaded using
Tensorflow Datasets which consists of airplanes, dogs, cats and other 7 objects using Tensorflow 2 and
Keras libraries in Python.

Image classification refers to a process in computer vision that can classify an


image according to its visual content. For example, an image classification
algorithm can be designed to tell if an image contains a cat or a dog. While
detecting an object is trivial for humans, robust image classification is still a
challenge in computer vision applications.

In this tutorial, you will learn how to successfully classify images in


the CIFAR-10 dataset (which consists of airplanes, dogs, cats, and other 7
objects) using Tensorflow in Python.

Note that there is a difference between image classification and object


detection, image classification is about classifying an image into some
category, like in this example, the input is an image and the output is a single
class label (10 classes). Object detection is about detecting, classifying, and
localizing objects in real-world images, one of the main algorithms is YOLO
object detection.

We will preprocess the images and labels, then train a convolutional neural
network on all the training samples. The images will need to
be normalized and the labels need to be one-hot encoded.

First, let's install the requirements for this project:

pip3 install numpy matplotlib tensorflow==2.0.0 tensorflow_datasets


For instance, open up an empty python file and call it train.py and follow
along. Importing Tensorflow:

from tensorflow.keras.models import Sequential


from tensorflow.keras.layers import Dense, Dropout, Activation, Flatten
from tensorflow.keras.layers import Conv2D, MaxPooling2D
from tensorflow.keras.callbacks import TensorBoard
import tensorflow as tf
import tensorflow_datasets as tfds
import os

As you may expect, we'll be using tf.data API to load CIFAR-10 dataset.

Hyper Parameters
I have experimented with various parameters, and found this as optimal ones:

# hyper-parameters
batch_size = 64
# 10 categories of images (CIFAR-10)
num_classes = 10
# number of training epochs
epochs = 30

num_classes just refers to the number of categories to classify, in this


case, CIFAR-10 has only 10 categories of images.

Understanding and Loading CIFAR-10 Dataset


The dataset consists of 10 classes of images which its labels ranging
from 0 to 9:
0: airplane.
1: automobile.
2: bird.
3: cat.
4: deer.
5: dog.
6: frog.
7: horse.
8: ship.
9: truck.
50000 samples for training data, and 10000 samples for testing data.
Each sample is an image of 32x32x3 pixels (width and height
of 32 and 3 depth which is RGB values).

Let's load this:

def load_data():
"""
This function loads CIFAR-10 dataset, and preprocess it
"""
def preprocess_image(image, label):
# convert [0, 255] range integers to [0, 1] range floats
image = tf.image.convert_image_dtype(image, tf.float32)
return image, label
# loading the CIFAR-10 dataset, splitted between train and test sets
ds_train, info = tfds.load("cifar10", with_info=True, split="train",
as_supervised=True)
ds_test = tfds.load("cifar10", split="test", as_supervised=True)
# repeat dataset forever, shuffle, preprocess, split by batch
ds_train =
ds_train.repeat().shuffle(1024).map(preprocess_image).batch(batch_size)
ds_test =
ds_test.repeat().shuffle(1024).map(preprocess_image).batch(batch_size)
return ds_train, ds_test, info

This function loads the dataset using Tensorflow Datasets module, we


set with_info to True in order to get some information about this dataset, you
can print it out and see what different fields and their values are, we'll be
using the info for getting the number of samples in training and testing sets.

After that, we:

Repeat the dataset forever using the repeat() method, this will enable us
to generate data samples repeatedly (we'll specify stopping conditions
in the training phase).
Shuffle it.
Normalize images to be between 0 and 1, this will help the neural
network to train much faster, we used the map() method that accepts a
callback function that takes the image and label as arguments, we
simply used the built-in Tensorflow's convert_image_dtype() method
that does that.
Finally, we batch our dataset by 64 samples using the batch() function,
so each time we generate new data points, it'll return 64 images and
their 64 labels.
Constructing the Model

The following model will be used:

def create_model(input_shape):
# building the model
model = Sequential()
model.add(Conv2D(filters=32, kernel_size=(3, 3), padding="same",
input_shape=input_shape))
model.add(Activation("relu"))
model.add(Conv2D(filters=32, kernel_size=(3, 3), padding="same"))
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Conv2D(filters=64, kernel_size=(3, 3), padding="same"))
model.add(Activation("relu"))
model.add(Conv2D(filters=64, kernel_size=(3, 3), padding="same"))
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Conv2D(filters=128, kernel_size=(3, 3), padding="same"))
model.add(Activation("relu"))
model.add(Conv2D(filters=128, kernel_size=(3, 3), padding="same"))
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
# flattening the convolutions
model.add(Flatten())
# fully-connected layer
model.add(Dense(1024))
model.add(Activation("relu"))
model.add(Dropout(0.5))
model.add(Dense(num_classes, activation="softmax"))
# print the summary of the model architecture
model.summary()
# training the model using adam optimizer
model.compile(loss="sparse_categorical_crossentropy",
optimizer="adam", metrics=["accuracy"])
return model

That's 3 layers of 2 ConvNets with a max-pooling and ReLU activation


function and then a fully connected with 1024 units. This is relatively a small
model comparing to ResNet50 or Xception which are state-of-the-art. If you
feel to use models made by deep learning experts, you need to use transfer
learning.

Training the Model


Now, let's train the model:

if __name__ == "__main__":
# load the data
ds_train, ds_test, info = load_data()
# constructs the model
model = create_model(input_shape=info.features["image"].shape)
# some nice callbacks
logdir = os.path.join("logs", "cifar10-model-v1")
tensorboard = TensorBoard(log_dir=logdir)
# make sure results folder exist
if not os.path.isdir("results"):
os.mkdir("results")
# train
model.fit(ds_train, epochs=epochs, validation_data=ds_test, verbose=1,
steps_per_epoch=info.splits["train"].num_examples // batch_size,
validation_steps=info.splits["test"].num_examples // batch_size,
callbacks=[tensorboard])
# save the model to disk
model.save("results/cifar10-model-v1.h5")

After loading the data and creating the model, I used Tensorboard that will be
tracking the accuracy and loss in each epoch and providing us with nice
visualization.
We will be using the "results" folder to save our models, if you're not sure
how you can handle files and directories in Python, check this tutorial.

Since ds_train and ds_test will generate data samples in batches repeatedly, we
need to specify the number of steps per epoch, and that's the number of
samples divided by the batch size, and it is the same for validation_steps as well.

Run this, it will take several minutes to complete training, depending on your
CPU/GPU.

You'll get a similar result to this:

Epoch 1/30
781/781 [==============================] - 20s 26ms/step - loss:
1.6503 - accuracy: 0.3905 - val_loss: 1.2835 - val_accuracy: 0.5238
Epoch 2/30
781/781 [==============================] - 16s 21ms/step - loss:
1.1847 - accuracy: 0.5750 - val_loss: 0.9773 - val_accuracy: 0.6542

All the way to the final epoch:

Epoch 29/30
781/781 [==============================] - 16s 21ms/step - loss:
0.4094 - accuracy: 0.8570 - val_loss: 0.5954 - val_accuracy: 0.8089
Epoch 30/30
781/781 [==============================] - 16s 21ms/step - loss:
0.4130 - accuracy: 0.8563 - val_loss: 0.6128 - val_accuracy: 0.8060

Now to open tensorboard, all you need to do is to type this command in the
terminal or the command prompt in the current directory:

tensorboard --logdir="logs"
Open up a browser tab and type localhost:6006, you'll be redirected to
tensorboard, here is my result:

Clearly, we are on the right track, validation loss is decreasing, and the
accuracy is increasing all the way to about 81%. That's great!

Testing the Model


Once training is completed, it'll save the final model and weights in
the results folder, in that way, we can train only once and make predictions
whenever we desire.

Open up a new python file called test.py and follow along.

Importing necessary utilities:

from train import load_data, batch_size


from tensorflow.keras.models import load_model
import matplotlib.pyplot as plt
import numpy as np

Let's make a Python dictionary that maps each integer value to its
corresponding label in the dataset:

# CIFAR-10 classes
categories = {
0: "airplane",
1: "automobile",
2: "bird",
3: "cat",
4: "deer",
5: "dog",
6: "frog",
7: "horse",
8: "ship",
9: "truck"
}
Loading the test data and the model:

# load the testing set


ds_train, ds_test, info = load_data()
# load the model with final model weights
model = load_model("results/cifar10-model-v1.h5")

Evaluation:

# evaluation
loss, accuracy = model.evaluate(X_test, y_test)
print("Test accuracy:", accuracy*100, "%")

Let's take a random image and make a prediction:

# get prediction for this image


data_sample = next(iter(ds_test))
sample_image = data_sample[0].numpy()[0]
sample_label = categories[data_sample[1].numpy()[0]]
prediction = np.argmax(model.predict(sample_image.reshape(-1,
*sample_image.shape))[0])
print("Predicted label:", categories[prediction])
print("True label:", sample_label)

We've used next(iter(ds_test)) to get the next testing batch and then extracted the
first image and label in that batch and made predictions on the model, here is
the result:

156/156 [==============================] -
3s 20ms/step - loss: 0.6119 - accuracy: 0.8063
Test accuracy: 80.62900900840759 %
Predicted label: frog
True label: frog

The model says it's a frog, let's check it:

# show the image


plt.axis('off')
plt.imshow(sample_image)
plt.show()

Result: Tiny little frog! The model was right!

Conclusion

Alright, we are done with this tutorial, 81% isn't bad for this little CNN, I
highly encourage you to tweak the model or check ResNet50, Xception, or
other state-of-the-art models to get higher performance!

If you're not sure how to use these models, I have a tutorial on this: How to
Use Transfer Learning for Image Classification using Keras in Python.

You may wonder that these images are so simple, 32x32 grid isn't how the
real world is, images aren't simple like that, they often contain many objects,
complex patterns, and so on. As a result, it is often a common practice to use
image segmentation methods such as contour detection or K-Means
clustering segmentation before passing to any classification techniques.
Source Code:

train.py
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Activation, Flatten
from tensorflow.keras.layers import Conv2D, MaxPooling2D
from tensorflow.keras.callbacks import TensorBoard
import tensorflow as tf
import tensorflow_datasets as tfds
import os

# hyper-parameters
batch_size = 64
# 10 categories of images (CIFAR-10)
num_classes = 10
# number of training epochs
epochs = 30

def create_model(input_shape):
"""
Constructs the model:
- 32 Convolutional (3x3)
- Relu
- 32 Convolutional (3x3)
- Relu
- Max pooling (2x2)
- Dropout

- 64 Convolutional (3x3)
- Relu
- 64 Convolutional (3x3)
- Relu
- Max pooling (2x2)
- Dropout

- 128 Convolutional (3x3)


- Relu
- 128 Convolutional (3x3)
- Relu
- Max pooling (2x2)
- Dropout

- Flatten (To make a 1D vector out of convolutional layers)


- 1024 Fully connected units
- Relu
- Dropout
- 10 Fully connected units (each corresponds to a label category (cat, dog, etc.))
"""

# building the model


model = Sequential()

model.add(Conv2D(filters=32, kernel_size=(3, 3), padding="same", input_shape=input_shape))


model.add(Activation("relu"))
model.add(Conv2D(filters=32, kernel_size=(3, 3), padding="same"))
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Conv2D(filters=64, kernel_size=(3, 3), padding="same"))


model.add(Activation("relu"))
model.add(Conv2D(filters=64, kernel_size=(3, 3), padding="same"))
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Conv2D(filters=128, kernel_size=(3, 3), padding="same"))


model.add(Activation("relu"))
model.add(Conv2D(filters=128, kernel_size=(3, 3), padding="same"))
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

# flattening the convolutions


model.add(Flatten())
# fully-connected layers
model.add(Dense(1024))
model.add(Activation("relu"))
model.add(Dropout(0.5))
model.add(Dense(num_classes, activation="softmax"))

# print the summary of the model architecture


model.summary()

# training the model using adam optimizer


model.compile(loss="sparse_categorical_crossentropy", optimizer="adam", metrics=["accuracy"])
return model

def load_data():
"""
This function loads CIFAR-10 dataset, and preprocess it
"""
# Loading data using Keras
# loading the CIFAR-10 dataset, splitted between train and test sets
# (X_train, y_train), (X_test, y_test) = cifar10.load_data()
# print("Training samples:", X_train.shape[0])
# print("Testing samples:", X_test.shape[0])
# print(f"Images shape: {X_train.shape[1:]}")

# # converting image labels to binary class matrices


# y_train = to_categorical(y_train, num_classes)
# y_test = to_categorical(y_test, num_classes)

# # convert to floats instead of int, so we can divide by 255


# X_train = X_train.astype("float32")
# X_test = X_test.astype("float32")
# X_train /= 255
# X_test /= 255
# return (X_train, y_train), (X_test, y_test)
# Loading data using Tensorflow Datasets
def preprocess_image(image, label):
# convert [0, 255] range integers to [0, 1] range floats
image = tf.image.convert_image_dtype(image, tf.float32)
return image, label
# loading the CIFAR-10 dataset, splitted between train and test sets
ds_train, info = tfds.load("cifar10", with_info=True, split="train", as_supervised=True)
ds_test = tfds.load("cifar10", split="test", as_supervised=True)
# repeat dataset forever, shuffle, preprocess, split by batch
ds_train = ds_train.repeat().shuffle(1024).map(preprocess_image).batch(batch_size)
ds_test = ds_test.repeat().shuffle(1024).map(preprocess_image).batch(batch_size)
return ds_train, ds_test, info

if __name__ == "__main__":

# load the data


ds_train, ds_test, info = load_data()
# (X_train, y_train), (X_test, y_test) = load_data()

# constructs the model


# model = create_model(input_shape=X_train.shape[1:])
model = create_model(input_shape=info.features["image"].shape)

# some nice callbacks


logdir = os.path.join("logs", "cifar10-model-v1")
tensorboard = TensorBoard(log_dir=logdir)

# make sure results folder exist


if not os.path.isdir("results"):
os.mkdir("results")

# train
# model.fit(X_train, y_train,
# batch_size=batch_size,
# epochs=epochs,
# validation_data=(X_test, y_test),
# callbacks=[tensorboard, checkpoint],
# shuffle=True)
model.fit(ds_train, epochs=epochs, validation_data=ds_test, verbose=1,
steps_per_epoch=info.splits["train"].num_examples // batch_size,
validation_steps=info.splits["test"].num_examples // batch_size,
callbacks=[tensorboard])

# save the model to disk


model.save("results/cifar10-model-v1.h5")

test.py
from train import load_data, batch_size
from tensorflow.keras.models import load_model
import matplotlib.pyplot as plt
import numpy as np

# CIFAR-10 classes
categories = {
0: "airplane",
1: "automobile",
2: "bird",
3: "cat",
4: "deer",
5: "dog",
6: "frog",
7: "horse",
8: "ship",
9: "truck"
}

# load the testing set


# (_, _), (X_test, y_test) = load_data()
ds_train, ds_test, info = load_data()
# load the model with final model weights
model = load_model("results/cifar10-model-v1.h5")
# evaluation
loss, accuracy = model.evaluate(ds_test, steps=info.splits["test"].num_examples // batch_size)
print("Test accuracy:", accuracy*100, "%")

# get prediction for this image


data_sample = next(iter(ds_test))
sample_image = data_sample[0].numpy()[0]
sample_label = categories[data_sample[1].numpy()[0]]
prediction = np.argmax(model.predict(sample_image.reshape(-1, *sample_image.shape))[0])
print("Predicted label:", categories[prediction])
print("True label:", sample_label)

# show the first image


plt.axis('off')
plt.imshow(sample_image)
plt.show()
PART 19: Face Detection using
OpenCV in Python
Performing face detection using both Haar Cascades and Single Shot MultiBox Detector methods with
OpenCV's dnn module in Python.

Object detection is a computer technology related to computer vision and


image processing that deals with detecting instances of semantic objects of a
certain class (such as human faces, cars, fruits, etc.) in digital images and
videos.

In this tutorial, we will be building a simple Python script that deals with
detecting human faces in an image, we will be using two methods in OpenCV
library. First, we are going to use haar cascade classifiers, which is an easy
way (and not that accurate as well) and most convenient way for beginners.

After that, we'll dive into using Single Shot Multibox Detectors (or SSDs in
short), which is a method for detecting objects in images using a single deep
neural network.

Note: It is worth to mention that you need to distinguish between object


detection and object classification, object detection is about detecting an
object and where it is located in an image, while object classification is
recognizing which class the object belongs to. If you are interested in image
classification, head to this tutorial.

Face Detection using Haar Cascades


Haar feature-based cascade classifiers is a machine learning based
approach where a cascade function is trained from a lot of positive and
negative images. It is then used to detect objects in other images.

The nice thing about haar feature-based cascade classifiers is that you can
make a classifier of any object you want, OpenCV already provided some
classifier parameters to you, so you don't have to collect any data to train on
it.

To get started, install the requirements:

pip3 install opencv-python numpy

Alright, create a new Python file and follow along, let's first
import OpenCV:

import cv2

You gonna need a sample image to test with, make sure it has clear front
faces in it, I will use this stock image that contains two nice lovely kids:

# loading the test image


image = cv2.imread("kids.jpg")

The function imread() loads an image from the specified file and returns it as a
numpy N-dimensional array.

Before we detect faces in the image, we will first need to convert the image to
grayscale, that is because the function we gonna use to detect faces expects a
grayscale image:

# converting to grayscale
image_gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

The function cvtColor() converts an input image from one color space to
another, we specified cv2.COLOR_BGR2GRAY code, which means
converting from BGR (Blue Green Red) to grayscale.

Since this tutorial is about detecting human faces, go ahead and download the
haar cascade for human face detection in this list. More
precisely, "haarcascade_frontalface_default.xml". Let's put it in a folder
called "cascades" and then load it:

# initialize the face recognizer (default face haar cascade)


face_cascade =
cv2.CascadeClassifier("cascades/haarcascade_fontalface_default.xml")

Let's now detect all the faces in the image:

# detect all the faces in the image


faces = face_cascade.detectMultiScale(image_gray)
# print the number of faces detected
print(f"{len(faces)} faces detected in the image.")

function takes an image as parameter and detects objects of


detectMultiScale()
different sizes as a list of rectangles, let's draw these rectangles in the image:

# for every face, draw a blue rectangle


for x, y, width, height in faces:
cv2.rectangle(image, (x, y), (x + width, y + height), color=(255, 0, 0),
thickness=2)

Finally, let's save the new image:

# save the image with rectangles


cv2.imwrite("kids_detected.jpg", image)
Here is my resulting image:

Pretty cool, right? Feel free to use other object classifiers, other images and
even more interesting, use your webcam ! Here is the code for that:

import cv2

# create a new cam object


cap = cv2.VideoCapture(0)
# initialize the face recognizer (default face haar cascade)
face_cascade =
cv2.CascadeClassifier("cascades/haarcascade_fontalface_default.xml")

while True:
# read the image from the cam
_, image = cap.read()
# converting to grayscale
image_gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# detect all the faces in the image
faces = face_cascade.detectMultiScale(image_gray, 1.3, 5)
# for every face, draw a blue rectangle
for x, y, width, height in faces:
cv2.rectangle(image, (x, y), (x + width, y + height), color=(255, 0, 0),
thickness=2)
cv2.imshow("image", image)
if cv2.waitKey(1) == ord("q"):
break

cap.release()
cv2.destroyAllWindows()

Once you execute that (if you have a webcam of course), it will open up your
webcam and start drawing blue rectangles around all front faces in the image.
The code isn't that challenging, all I changed is, instead of reading the image
from a file, I created a VideoCapture object that reads from it every time in
a while loop, once you press the q button, the main loop will end.

Face Detection using SSDs


As you can see, the previous method isn't that challenging. Unfortunately, it
is obsolete and it is rarely used today in the real world. However, neural
networks always come into the rescue, and luckily for us, OpenCV provides
us with the amazing dnn module within cv2 package, which enables to make
inference on pre-trained deep learning models.

To get started predicting faces using SSDs in OpenCV, you need to download
the ResNet face detection model architecture along with its pre-trained
weights, and then save them into weights folder in the current working
directory:

import cv2
import numpy as np

#
https://raw.githubusercontent.com/opencv/opencv/master/samples/dnn/face_detector/deplo
prototxt_path = "weights/deploy.prototxt.txt"
#
https://raw.githubusercontent.com/opencv/opencv_3rdparty/dnn_samples_face_detector_2
model_path = "weights/res10_300x300_ssd_iter_140000_fp16.caffemodel"

Now to load the actual model, we need to use readNetFromCaffe() method that
takes the model architecture and weights as arguments:

# load Caffe model


model = cv2.dnn.readNetFromCaffe(prototxt_path, model_path)

We gonna use the same image that's used above:

# read the desired image


image = cv2.imread("kids.jpg")
# get width and height of the image
h, w = image.shape[:2]

Now to pass this image into the neural network, we need to prepare it. More
specifically, we need to resize the image to the shape of (300, 300) and
performs mean subtraction as it's trained that way:

# preprocess the image: resize and performs mean subtraction


blob = cv2.dnn.blobFromImage(image, 1.0, (300, 300), (104.0, 177.0, 123.0))

Let's use this blob object as the input of the network and perform feed forward
to get detected faces:

# set the image into the input of the neural network


model.setInput(blob)
# perform inference and get the result
output = np.squeeze(model.forward())

Now output object has all detected objects (faces in this case), let's iterate
over this array and draw all faces in the image that has confidence of more
than 50%:

font_scale = 1.0
for i in range(0, output.shape[0]):
# get the confidence
confidence = output[i, 2]
# if confidence is above 50%, then draw the surrounding box
if confidence > 0.5:
# get the surrounding box cordinates and upscale them to original image
box = output[i, 3:7] * np.array([w, h, w, h])
# convert to integers
start_x, start_y, end_x, end_y = box.astype(np.int)
# draw the rectangle surrounding the face
cv2.rectangle(image, (start_x, start_y), (end_x, end_y), color=(255, 0,
0), thickness=2)
# draw text as well
cv2.putText(image, f"{confidence*100:.2f}%", (start_x, start_y-5),
cv2.FONT_HERSHEY_SIMPLEX, font_scale, (255, 0, 0), 2)

After we extracted the confidence of the model of the detected object, we get
the surrounding box and multiply it by the width and height of original image to
get the right box coordinates, because as you remember, we've resized the
image previously to (300, 300) , so the output should be between 0 and 300 as
well.

In this case, we didn't only draw the surrounding boxes, but we write some
text indicating the confidence as a percentage, let's show and save the new
image:

# show the image


cv2.imshow("image", image)
cv2.waitKey(0)
# save the image with rectangles
cv2.imwrite("kids_detected_dnn.jpg", image)

Here is the resulting image:


Awesome, this method is way better and accurate, but it may be lower in
terms of FPS if you're predicting faces in real-time, as is it's not as fast as the
haar cascade method.

By the way, if you want to detect faces using this method in real-time using
your camera, you can check the full code page.

There are many real-world applications for face detection, for instance, we've
used face detection to blur faces in images and videos in real-time using
OpenCV as well!

Alright, this is it for this tutorial, you can get all tutorial materials (including
the testing image, the haar cascade parameters, SSDs model weights, and the
full code) here.

Source Code:

face_detection.py
import cv2

# loading the test image


image = cv2.imread("kids.jpg")

# converting to grayscale
image_gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

# initialize the face recognizer (default face haar cascade)


face_cascade = cv2.CascadeClassifier("cascades/haarcascade_fontalface_default.xml")

# detect all the faces in the image


faces = face_cascade.detectMultiScale(image_gray, 1.3, 5)
# print the number of faces detected
print(f"{len(faces)} faces detected in the image.")
# for every face, draw a blue rectangle
for x, y, width, height in faces:
cv2.rectangle(image, (x, y), (x + width, y + height), color=(255, 0, 0), thickness=2)

# save the image with rectangles


cv2.imwrite("kids_detected.jpg", image)

live_face_detection.py

import cv2

# create a new cam object


cap = cv2.VideoCapture(0)

# initialize the face recognizer (default face haar cascade)


face_cascade = cv2.CascadeClassifier("cascades/haarcascade_fontalface_default.xml")

while True:
# read the image from the cam
_, image = cap.read()
# converting to grayscale
image_gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

# detect all the faces in the image


faces = face_cascade.detectMultiScale(image_gray, 1.3, 5)

# for every face, draw a blue rectangle


for x, y, width, height in faces:
cv2.rectangle(image, (x, y), (x + width, y + height), color=(255, 0, 0), thickness=2)

cv2.imshow("image", image)

if cv2.waitKey(1) == ord("q"):
break
cap.release()
cv2.destroyAllWindows()

face_detection_dnn.py
import cv2
import numpy as np

# https://raw.githubusercontent.com/opencv/opencv/master/samples/dnn/face_detector/deploy.prototxt
prototxt_path = "weights/deploy.prototxt.txt"
#
https://raw.githubusercontent.com/opencv/opencv_3rdparty/dnn_samples_face_detector_20180205_fp16/res10_300x30
model_path = "weights/res10_300x300_ssd_iter_140000_fp16.caffemodel"

# load Caffe model


model = cv2.dnn.readNetFromCaffe(prototxt_path, model_path)

# read the desired image


image = cv2.imread("kids.jpg")
# get width and height of the image
h, w = image.shape[:2]

# preprocess the image: resize and performs mean subtraction


blob = cv2.dnn.blobFromImage(image, 1.0, (300, 300), (104.0, 177.0, 123.0))
# set the image into the input of the neural network
model.setInput(blob)
# perform inference and get the result
output = np.squeeze(model.forward())
font_scale = 1.0
for i in range(0, output.shape[0]):
# get the confidence
confidence = output[i, 2]
# if confidence is above 50%, then draw the surrounding box
if confidence > 0.5:
# get the surrounding box cordinates and upscale them to original image
box = output[i, 3:7] * np.array([w, h, w, h])
# convert to integers
start_x, start_y, end_x, end_y = box.astype(np.int)
# draw the rectangle surrounding the face
cv2.rectangle(image, (start_x, start_y), (end_x, end_y), color=(255, 0, 0), thickness=2)
# draw text as well
cv2.putText(image, f"{confidence*100:.2f}%", (start_x, start_y-5),
cv2.FONT_HERSHEY_SIMPLEX, font_scale, (255, 0, 0), 2)
# show the image
cv2.imshow("image", image)
cv2.waitKey(0)
# save the image with rectangles
cv2.imwrite("kids_detected_dnn.jpg", image)

live_face_detection_dnn.py
import cv2
import numpy as np

# https://raw.githubusercontent.com/opencv/opencv/master/samples/dnn/face_detector/deploy.prototxt
prototxt_path = "weights/deploy.prototxt.txt"
#
https://raw.githubusercontent.com/opencv/opencv_3rdparty/dnn_samples_face_detector_20180205_fp16/res10_300x30
model_path = "weights/res10_300x300_ssd_iter_140000_fp16.caffemodel"

# load Caffe model


model = cv2.dnn.readNetFromCaffe(prototxt_path, model_path)

cap = cv2.VideoCapture(0)

while True:

# read the desired image


_, image = cap.read()
# get width and height of the image
h, w = image.shape[:2]

# preprocess the image: resize and performs mean subtraction


blob = cv2.dnn.blobFromImage(image, 1.0, (300, 300), (104.0, 177.0, 123.0))
# set the image into the input of the neural network
model.setInput(blob)
# perform inference and get the result
output = np.squeeze(model.forward())
for i in range(0, output.shape[0]):
# get the confidence
confidence = output[i, 2]
# if confidence is above 45%, then draw the surrounding box
if confidence > 0.45:
# get the surrounding box cordinates and upscale them to original image
box = output[i, 3:7] * np.array([w, h, w, h])
# convert to integers
start_x, start_y, end_x, end_y = box.astype(np.int)
# draw the rectangle surrounding the face
cv2.rectangle(image, (start_x, start_y), (end_x, end_y), color=(255, 0, 0), thickness=2)
# draw text as well
cv2.putText(image, f"{confidence*100:.2f}%", (start_x, start_y-5),
cv2.FONT_HERSHEY_SIMPLEX, 0.5, (255, 0, 0), 2)
# show the image
cv2.imshow("image", image)
if cv2.waitKey(1) == ord("q"):
break

cv2.destroyAllWindows()
cap.release()

Summary
This book is dedicated to the readers who take time to write me each day.
Every morning I’m greeted by various emails — some with requests, a few
with complaints, and then there are the very few that just say thank you. All
these emails encourage and challenge me as an author — to better both my
books and myself.
Thank you!

You might also like