Deep Learning Project for Computer Vision with Python 2022
Deep Learning Project for Computer Vision with Python 2022
Tony Snake was received Bachelor of Computer Science from the American
University, and Bachelor of Business Administration from the American
University, USA.
He is becoming Ph.D. Candidate of Department of Data Informatics,
(National) Korea Maritime and Ocean University, Busan 49112, Republic of
Korea (South Korea).
His research interests are social network analysis, big data, AI and robotics.
He received Best Paper Award the 15th International Conference on
Multimedia Information Technology and Applications (MITA 2019)
Table of Contents
Contents
About the Authors
Table of Contents
Deep Learning Project for Computer Vision with Python
PART 1: Satellite Image Classification using TensorFlow in Python
Getting Started
Preparing the Dataset
Building the Model
Fine-tuning the Model
Model Evaluation
Final Thoughts
Source Code:
PART 2: Age and Gender Detection using OpenCV in Python
Source Code:
PART 3: Gender Detection using OpenCV in Python
Pre-requisites
Conclusion
Source Code:
PART 4: Age Detection using OpenCV in Python
Wrap-up
Source Code:
PART 5: SIFT Feature Extraction using OpenCV in Python
Scale-space Extrema Detection
Keypoint Localization
Orientation Assignment
Keypoint Description
Python Implementation
Conclusion
Source Code:
PART 6: How to Apply HOG Feature Extraction in Python
Resizing the Image
Calculating Gradients
Calculating the Magnitude
Calculating the Orientation
Python Code
Conclusion
Source Code:
PART 7: Image Transformations using OpenCV in Python
Introduction
The Use of Image Transformation
Image Translation
Image Scaling
Image Shearing
Shearing in the x-axis Direction
Shearing in the y-axis Direction
Image Reflection
Image Rotation
Image Cropping
Conclusion
Source Code:
PART 8: How to Make a Barcode Reader in Python
Conclusion
Source Code:
PART 9: How to Perform Malaria Classification using TensorFlow 2 and Keras in Python
Downloading the Dataset
Image Preprocessing with OpenCV
Preparing and Normalizing the Dataset
Implementing the CNN Model Architecture
Model Evaluation
Saving the model
Source Code:
PART 10: Skin Cancer Detection using TensorFlow in Python
Preparing the Dataset
Building the Model
Training the Model
Model Evaluation
Sensitivity
Specificity
Receiver Operating Characteristic
Conclusion
Source Code:
PART 11: Use K-Means Clustering for Image Segmentation using OpenCV in Python
Want to Learn More?
Source Code:
PART 12: Detect Contours in Images using OpenCV in Python
Source Code:
PART 13: Optical Character Recognition (OCR) in Python
Source Code:
PART 14: Detect Shapes in Images in Python using OpenCV
Detecting Lines
Detecting Circles
Source Code:
PART 15: Perform Edge Detection in Python using OpenCV
Source Code:
PART 16: Use Transfer Learning for Image Classification using TensorFlow in Python
What is Transfer Learning
Loading & Preparing the Dataset
Constructing the Model
Training the Model
Testing the Model
Conclusion
Source Code:
PART 17: Generate and Read QR Code in Python
Generate QR Code
Read QR Code
Source Code:
PART 18: Make an Image Classifier in Python using Tensorflow 2 and Keras
Hyper Parameters
Understanding and Loading CIFAR-10 Dataset
Constructing the Model
Training the Model
Testing the Model
Conclusion
Source Code:
PART 19: Face Detection using OpenCV in Python
Face Detection using Haar Cascades
Face Detection using SSDs
Source Code:
Summary
In this tutorial, you will learn how to build a satellite image classifier using
the TensorFlow framework in Python.
rgb (default) with RGB that contain only the R, G, B frequency bands
encoded as JPEG images.
all : contains all 13 bands in the original value range.
Getting Started
To get started, let's install TensorFlow and some other helper tools:
We will use the EfficientNetV2 model which is the current state of the art on
most image classification tasks. We use tensorflow_hub to load this pre-trained
CNN model for fine-tuning.
import os
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import tensorflow as tf
import tensorflow_datasets as tfds
import tensorflow_hub as hub
import tensorflow_addons as tfa
We split our dataset into 60% training, 20% validation during training, and
20% for testing. The below code is responsible for setting some variables we
use for later:
We grab the list of classes from the all_ds dataset as it was loaded
with with_info set to True , we also get the number of samples from it.
Next, I'm going to make a bar plot to see the number of samples in each class:
plt.ylabel('Counts')
plt.xlabel('Labels')
sns.barplot(x = [class_names[l] for l in labels], y = counts, ax=ax)
for i, x_ in enumerate(labels):
ax.text(x_-0.2, counts[i]+5, counts[i])
# set the title
ax.set_title("Bar Plot showing Number of Samples on Each Class")
# save the image
# plt.savefig("class_samples.png")
Output:
3,000 samples on half of the classes, others have 2,500 samples, while
pasture only 2,000 samples.
Now let's take our training and validation sets and prepare them before
training:
cache() : This method saves the preprocessed dataset into a local cache
file. This will only preprocess it the very first time (in the first epoch
during training).
map() : We map our dataset so each sample will be a tuple of an image
and its corresponding label one-hot encoded with tf.one_hot() .
shuffle() : To shuffle the dataset so the samples are in random order.
repeat() Every time we iterate over the dataset, it'll repeatedly generate
samples for us; this will help us during the training.
batch() : We batch our dataset into 64 or 32 samples per training step.
prefetch() : This will enable us to fetch batches in the background while
the model is training.
batch_size = 64
# validating shapes
for el in valid_ds.take(1):
print(el[0].shape, el[1].shape)
for el in train_ds.take(1):
print(el[0].shape, el[1].shape)
Output:
Fantastic, both the training and validation have the same shape; where the
batch size is 64, and the image shape is (64, 64, 3) . The targets have the shape
of (64, 10) as it's 64 samples with 10 classes one-hot encoded.
def show_batch(batch):
plt.figure(figsize=(16, 16))
for n in range(min(32, batch_size)):
ax = plt.subplot(batch_size//8, 8, n + 1)
# show the image
plt.imshow(batch[0][n])
# and put the corresponding label as title upper to the image
plt.title(class_names[tf.argmax(batch[1][n].numpy())])
plt.axis('off')
plt.savefig("sample-images.png")
Output:
model_url =
"https://tfhub.dev/google/imagenet/efficientnet_v2_imagenet1k_l/feature_vector/2"
m = tf.keras.Sequential([
keras_layer,
tf.keras.layers.Dense(num_classes, activation="softmax")
])
# build the model with input image shape as (64, 64, 3)
m.build([None, 64, 64, 3])
m.compile(
loss="categorical_crossentropy",
optimizer="adam",
metrics=["accuracy", tfa.metrics.F1Score(num_classes)]
)
m.summary()
We use Sequential() , the first layer is the pre-trained CNN model, and we add a
fully connected layer with the size of the number of classes as an output
layer.
Finally, the model is built and compiled with categorical cross-entropy, adam
optimizer, and accuracy and F1 score as metrics. Output:
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
================================================================
keras_layer (KerasLayer) (None, 1280) 117746848
================================================================
Total params: 117,759,658
Trainable params: 117,247,082
Non-trainable params: 512,576
_________________________________________________________________
model_name = "satellite-classification"
model_path = os.path.join("results", model_name + ".h5")
model_checkpoint = tf.keras.callbacks.ModelCheckpoint(model_path,
save_best_only=True, verbose=1)
# set the training & validation steps since we're using .repeat() on our dataset
# number of training steps
n_training_steps = int(num_examples * 0.6) // batch_size
# number of validation steps
n_validation_steps = int(num_examples * 0.2) // batch_size
The training will take several minutes, depending on your GPU. Here is the
output:
Epoch 1/5
253/253 [==============================] - ETA: 0s - loss: 0.3780
- accuracy: 0.8859 - f1_score: 0.8832
Epoch 00001: val_loss improved from inf to 0.16415, saving model to
results/satellite-classification.h5
253/253 [==============================] - 158s 438ms/step -
loss: 0.3780 - accuracy: 0.8859 - f1_score: 0.8832 - val_loss: 0.1641 -
val_accuracy: 0.9513 - val_f1_score: 0.9501
Epoch 2/5
253/253 [==============================] - ETA: 0s - loss: 0.1531
- accuracy: 0.9536 - f1_score: 0.9525
Epoch 00002: val_loss improved from 0.16415 to 0.12853, saving model to
results/satellite-classification.h5
253/253 [==============================] - 106s 421ms/step -
loss: 0.1531 - accuracy: 0.9536 - f1_score: 0.9525 - val_loss: 0.1285 -
val_accuracy: 0.9568 - val_f1_score: 0.9559
Epoch 3/5
253/253 [==============================] - ETA: 0s - loss: 0.1092
- accuracy: 0.9660 - f1_score: 0.9654
Epoch 00003: val_loss improved from 0.12853 to 0.12095, saving model to
results/satellite-classification.h5
253/253 [==============================] - 107s 424ms/step -
loss: 0.1092 - accuracy: 0.9660 - f1_score: 0.9654 - val_loss: 0.1210 -
val_accuracy: 0.9619 - val_f1_score: 0.9605
Epoch 4/5
253/253 [==============================] - ETA: 0s - loss: 0.1042
- accuracy: 0.9692 - f1_score: 0.9687
Epoch 00004: val_loss did not improve from 0.12095
253/253 [==============================] - 100s 394ms/step -
loss: 0.1042 - accuracy: 0.9692 - f1_score: 0.9687 - val_loss: 0.1435 -
val_accuracy: 0.9565 - val_f1_score: 0.9572
Epoch 5/5
253/253 [==============================] - ETA: 0s - loss: 0.1003
- accuracy: 0.9700 - f1_score: 0.9695
Epoch 00005: val_loss improved from 0.12095 to 0.09841, saving model to
results/satellite-classification.h5
253/253 [==============================] - 107s 423ms/step -
loss: 0.1003 - accuracy: 0.9700 - f1_score: 0.9695 - val_loss: 0.0984 -
val_accuracy: 0.9702 - val_f1_score: 0.9687
As you can see, the model improved to about 97% accuracy on the validation
set on epoch 5. You can increase the number of epochs to see whether it can
improve further.
Model Evaluation
Up until now, we're only validating on the validation set during training. This
section uses our model to predict satellite images that the model has never
seen before. Loading the best weights:
Output:
As expected, 5,400 images and labels , let's use the model to predict these
images and then compare the predictions with the true labels :
Output:
predictions.shape: (5400,)
from sklearn.metrics import f1_score
accuracy = tf.keras.metrics.Accuracy()
accuracy.update_state(labels, predictions)
print("Accuracy:", accuracy.result().numpy())
print("F1 Score:", f1_score(labels, predictions, average="macro"))
Output:
Accuracy: 0.9677778
F1 Score: 0.9655686619720163
That's good accuracy! Let's draw the confusion matrix for all the classes:
Output:
As you can see, the model is accurate in most of the classes, especially on
forest images, as it achieved 100%. However, it's down to 91% for pasture,
and the model sometimes predicts the pasture as permanent corp, also on
herbaceous vegetation. Most of the confusion is between corp, pasture, and
herbaceous vegetation as they all look similar and, most of the time, green
from the satellite.
def show_predicted_samples():
plt.figure(figsize=(14, 14))
for n in range(64):
ax = plt.subplot(8, 8, n + 1)
# show the image
plt.imshow(images[n])
# and put the corresponding label as title upper to the image
if predictions[n] == labels[n]:
# correct prediction
ax.set_title(class_names[predictions[n]], color="green")
else:
# wrong prediction
ax.set_title(f"{class_names[predictions[n]]}/T:
{class_names[labels[n]]}", color="red")
plt.axis('off')
plt.savefig("predicted-sample-images.png")
Output:
In all 64 images, only one (red label in the above image) failed to predict the
actual class. It was predicted as a pasture where it should be a permanent
crop.
Final Thoughts
Alright! That's it for the tutorial. If you want further improvement, I highly
advise you to explore on TensorFlow hub, where you find the state-of-the-art
pre-trained CNN models and feature extractors.
I also suggest you try out different optimizers and increase the number of
epochs to see if you can improve it. You can use TensorBoard to track the
accuracy of each change you make. Make sure you include the variables in
the model name.
Source Code:
satellite_image_classification.py
# -*- coding: utf-8 -*-
"""Satellite-Image-Classification-with-TensorFlow_PythonCode.ipynb
import os
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import tensorflow as tf
import tensorflow_datasets as tfds
import tensorflow_hub as hub
import tensorflow_addons as tfa
plt.ylabel('Counts')
plt.xlabel('Labels')
sns.barplot(x = [class_names[l] for l in labels], y = counts, ax=ax)
for i, x_ in enumerate(labels):
ax.text(x_-0.2, counts[i]+5, counts[i])
# set the title
ax.set_title("Bar Plot showing Number of Samples on Each Class")
# save the image
# plt.savefig("class_samples.png")
batch_size = 64
# validating shapes
for el in valid_ds.take(1):
print(el[0].shape, el[1].shape)
for el in train_ds.take(1):
print(el[0].shape, el[1].shape)
def show_batch(batch):
plt.figure(figsize=(16, 16))
for n in range(min(32, batch_size)):
ax = plt.subplot(batch_size//8, 8, n + 1)
# show the image
plt.imshow(batch[0][n])
# and put the corresponding label as title upper to the image
plt.title(class_names[tf.argmax(batch[1][n].numpy())])
plt.axis('off')
plt.savefig("sample-images.png")
m = tf.keras.Sequential([
keras_layer,
tf.keras.layers.Dense(num_classes, activation="softmax")
])
# build the model with input image shape as (64, 64, 3)
m.build([None, 64, 64, 3])
m.compile(
loss="categorical_crossentropy",
optimizer="adam",
metrics=["accuracy", tfa.metrics.F1Score(num_classes)]
)
m.summary()
model_name = "satellite-classification"
model_path = os.path.join("results", model_name + ".h5")
model_checkpoint = tf.keras.callbacks.ModelCheckpoint(model_path, save_best_only=True,
verbose=1)
history = m.fit(
train_ds, validation_data=valid_ds,
steps_per_epoch=n_training_steps,
validation_steps=n_validation_steps,
verbose=1, epochs=5,
callbacks=[model_checkpoint]
)
accuracy = tf.keras.metrics.Accuracy()
accuracy.update_state(labels, predictions)
print("Accuracy:", accuracy.result().numpy())
print("F1 Score:", f1_score(labels, predictions, average="macro"))
def show_predicted_samples():
plt.figure(figsize=(14, 14))
for n in range(64):
ax = plt.subplot(8, 8, n + 1)
# show the image
plt.imshow(images[n])
# and put the corresponding label as title upper to the image
if predictions[n] == labels[n]:
# correct prediction
ax.set_title(class_names[predictions[n]], color="green")
else:
# wrong prediction
ax.set_title(f"{class_names[predictions[n]]}/T:{class_names[labels[n]]}", color="red")
plt.axis('off')
plt.savefig("predicted-sample-images.png")
In this tutorial, we will combine gender detection and age detection tutorials
to come up with a single code that detects both.
Let's get started. If you haven't OpenCV already installed, make sure to do
so:
# Import Libraries
import cv2
import numpy as np
Next, defining the variables of weights and architectures for face, age, and
gender detection models:
#
https://raw.githubusercontent.com/opencv/opencv/master/samples/dnn/face_detector/deplo
FACE_PROTO = "weights/deploy.prototxt.txt"
#
https://raw.githubusercontent.com/opencv/opencv_3rdparty/dnn_samples_face_detector_2
FACE_MODEL =
"weights/res10_300x300_ssd_iter_140000_fp16.caffemodel"
# The gender model architecture
# https://drive.google.com/open?
id=1W_moLzMlGiELyPxWiYQJ9KFaXroQ_NFQ
GENDER_MODEL = 'weights/deploy_gender.prototxt'
# The gender model pre-trained weights
# https://drive.google.com/open?
id=1AW3WduLk1haTVAxHOkVS_BEzel1WXQHP
GENDER_PROTO = 'weights/gender_net.caffemodel'
# Each Caffe Model impose the shape of the input image also image
preprocessing is required like mean
# substraction to eliminate the effect of illunination changes
MODEL_MEAN_VALUES = (78.4263377603, 87.7689143744,
114.895847746)
# Represent the gender classes
GENDER_LIST = ['Male', 'Female']
# The model architecture
# download from: https://drive.google.com/open?
id=1kiusFljZc9QfcIYdU2s7xrtWHTraHwmW
AGE_MODEL = 'weights/deploy_age.prototxt'
# The model pre-trained weights
# download from: https://drive.google.com/open?
id=1kWv0AjxGSN0g31OeJa02eBGM0R_jcjIl
AGE_PROTO = 'weights/age_net.caffemodel'
# Represent the 8 age classes of this CNN probability layer
AGE_INTERVALS = ['(0, 2)', '(4, 6)', '(8, 12)', '(15, 20)',
'(25, 32)', '(38, 43)', '(48, 53)', '(60, 100)']
gender_net.caffemodel :
It is the pre-trained model weights for gender
detection. You can download it here.
deploy_gender.prototxt : is the model architecture for the gender detection
model (a plain text file with a JSON-like structure containing all the
neural network layer’s definitions). Get it here.
age_net.caffemodel : It is the pre-trained model weights for age detection.
You can download it here.
deploy_age.prototxt : is the model architecture for the age detection model (a
plain text file with a JSON-like structure containing all the neural
network layer’s definitions). Get it here.
res10_300x300_ssd_iter_140000_fp16.caffemodel : The pre-trained model weights
for face detection, download here.
deploy.prototxt.txt : This is the model architecture for the face detection
model, download here.
Before trying to detect age and gender, we need a function to detect faces
first:
The get_faces() function was grabbed from the face detection tutorial, so check
it out if you want more information.
Below is a function for simply displaying an image:
# from: https://stackoverflow.com/questions/44650888/resize-an-image-
without-distortion-opencv
def image_resize(image, width = None, height = None, inter =
cv2.INTER_AREA):
# initialize the dimensions of the image to be resized and
# grab the image size
dim = None
(h, w) = image.shape[:2]
# if both the width and height are None, then return the
# original image
if width is None and height is None:
return image
# check to see if the width is None
if width is None:
# calculate the ratio of the height and construct the
# dimensions
r = height / float(h)
dim = (int(w * r), height)
# otherwise, the height is None
else:
# calculate the ratio of the width and construct the
# dimensions
r = width / float(w)
dim = (width, int(h * r))
# resize the image
return cv2.resize(image, dim, interpolation = inter)
Now that everything is ready, let's define our two functions for age and
gender detection:
def get_gender_predictions(face_img):
blob = cv2.dnn.blobFromImage(
image=face_img, scalefactor=1.0, size=(227, 227),
mean=MODEL_MEAN_VALUES, swapRB=False, crop=False
)
gender_net.setInput(blob)
return gender_net.forward()
def get_age_predictions(face_img):
blob = cv2.dnn.blobFromImage(
image=face_img, scalefactor=1.0, size=(227, 227),
mean=MODEL_MEAN_VALUES, swapRB=False
)
age_net.setInput(blob)
return age_net.forward()
if __name__ == "__main__":
import sys
input_path = sys.argv[1]
predict_age_and_gender(input_path)
For more detail on how the gender and age prediction works, I suggest you
check the individual tutorials:
If you want to use your camera, I made a Python script to read images from
your webcam and perform inference in real-time.
Finally, I've collected some useful resources and courses for you for further
learning, I highly recommend the following courses:
age_and_gender_detection.py
# Import Libraries
import cv2
import numpy as np
# https://raw.githubusercontent.com/opencv/opencv/master/samples/dnn/face_detector/deploy.prototxt
FACE_PROTO = "weights/deploy.prototxt.txt"
#
https://raw.githubusercontent.com/opencv/opencv_3rdparty/dnn_samples_face_detector_20180205_fp16/res10_300x30
FACE_MODEL = "weights/res10_300x300_ssd_iter_140000_fp16.caffemodel"
# The gender model architecture
# https://drive.google.com/open?id=1W_moLzMlGiELyPxWiYQJ9KFaXroQ_NFQ
GENDER_MODEL = 'weights/deploy_gender.prototxt'
# The gender model pre-trained weights
# https://drive.google.com/open?id=1AW3WduLk1haTVAxHOkVS_BEzel1WXQHP
GENDER_PROTO = 'weights/gender_net.caffemodel'
# Each Caffe Model impose the shape of the input image also image preprocessing is required like
mean
# substraction to eliminate the effect of illunination changes
MODEL_MEAN_VALUES = (78.4263377603, 87.7689143744, 114.895847746)
# Represent the gender classes
GENDER_LIST = ['Male', 'Female']
# The model architecture
# download from: https://drive.google.com/open?id=1kiusFljZc9QfcIYdU2s7xrtWHTraHwmW
AGE_MODEL = 'weights/deploy_age.prototxt'
# The model pre-trained weights
# download from: https://drive.google.com/open?id=1kWv0AjxGSN0g31OeJa02eBGM0R_jcjIl
AGE_PROTO = 'weights/age_net.caffemodel'
# Represent the 8 age classes of this CNN probability layer
AGE_INTERVALS = ['(0, 2)', '(4, 6)', '(8, 12)', '(15, 20)',
'(25, 32)', '(38, 43)', '(48, 53)', '(60, 100)']
# Initialize frame size
frame_width = 1280
frame_height = 720
# load face Caffe model
face_net = cv2.dnn.readNetFromCaffe(FACE_PROTO, FACE_MODEL)
# Load age prediction model
age_net = cv2.dnn.readNetFromCaffe(AGE_MODEL, AGE_PROTO)
# Load gender prediction model
gender_net = cv2.dnn.readNetFromCaffe(GENDER_MODEL, GENDER_PROTO)
# from: https://stackoverflow.com/questions/44650888/resize-an-image-without-distortion-opencv
def image_resize(image, width = None, height = None, inter = cv2.INTER_AREA):
# initialize the dimensions of the image to be resized and
# grab the image size
dim = None
(h, w) = image.shape[:2]
# if both the width and height are None, then return the
# original image
if width is None and height is None:
return image
# check to see if the width is None
if width is None:
# calculate the ratio of the height and construct the
# dimensions
r = height / float(h)
dim = (int(w * r), height)
# otherwise, the height is None
else:
# calculate the ratio of the width and construct the
# dimensions
r = width / float(w)
dim = (width, int(h * r))
# resize the image
return cv2.resize(image, dim, interpolation = inter)
def get_gender_predictions(face_img):
blob = cv2.dnn.blobFromImage(
image=face_img, scalefactor=1.0, size=(227, 227),
mean=MODEL_MEAN_VALUES, swapRB=False, crop=False
)
gender_net.setInput(blob)
return gender_net.forward()
def get_age_predictions(face_img):
blob = cv2.dnn.blobFromImage(
image=face_img, scalefactor=1.0, size=(227, 227),
mean=MODEL_MEAN_VALUES, swapRB=False
)
age_net.setInput(blob)
return age_net.forward()
if __name__ == "__main__":
import sys
input_path = sys.argv[1]
predict_age_and_gender(input_path)
age_and_gender_detection_live.py
# Import Libraries
import cv2
import numpy as np
# from: https://stackoverflow.com/questions/44650888/resize-an-image-without-distortion-opencv
def image_resize(image, width = None, height = None, inter = cv2.INTER_AREA):
# initialize the dimensions of the image to be resized and
# grab the image size
dim = None
(h, w) = image.shape[:2]
# if both the width and height are None, then return the
# original image
if width is None and height is None:
return image
# check to see if the width is None
if width is None:
# calculate the ratio of the height and construct the
# dimensions
r = height / float(h)
dim = (int(w * r), height)
# otherwise, the height is None
else:
# calculate the ratio of the width and construct the
# dimensions
r = width / float(w)
dim = (width, int(h * r))
# resize the image
return cv2.resize(image, dim, interpolation = inter)
def get_gender_predictions(face_img):
blob = cv2.dnn.blobFromImage(
image=face_img, scalefactor=1.0, size=(227, 227),
mean=MODEL_MEAN_VALUES, swapRB=False, crop=False
)
gender_net.setInput(blob)
return gender_net.forward()
def get_age_predictions(face_img):
blob = cv2.dnn.blobFromImage(
image=face_img, scalefactor=1.0, size=(227, 227),
mean=MODEL_MEAN_VALUES, swapRB=False
)
age_net.setInput(blob)
return age_net.forward()
def predict_age_and_gender():
"""Predict the gender of the faces showing in the image"""
# create a new cam object
cap = cv2.VideoCapture(0)
while True:
_, img = cap.read()
# Take a of the initial image and resize it
frame = img.()
# resize if higher than frame_width
if frame.shape[1] > frame_width:
frame = image_resize(frame, width=frame_width)
# predict the faces
faces = get_faces(frame)
# Loop over the faces detected
# for idx, face in enumerate(faces):
for i, (start_x, start_y, end_x, end_y) in enumerate(faces):
face_img = frame[start_y: end_y, start_x: end_x]
# predict age
age_preds = get_age_predictions(face_img)
# predict gender
gender_preds = get_gender_predictions(face_img)
i = gender_preds[0].argmax()
gender = GENDER_LIST[i]
gender_confidence_score = gender_preds[0][i]
i = age_preds[0].argmax()
age = AGE_INTERVALS[i]
age_confidence_score = age_preds[0][i]
# Draw the box
label = f"{gender}-{gender_confidence_score*100:.1f}%, {age}-
{age_confidence_score*100:.1f}%"
# label = "{}-{:.2f}%".format(gender, gender_confidence_score*100)
print(label)
yPos = start_y - 15
while yPos < 15:
yPos += 15
box_color = (255, 0, 0) if gender == "Male" else (147, 20, 255)
cv2.rectangle(frame, (start_x, start_y), (end_x, end_y), box_color, 2)
# Label processed image
cv2.putText(frame, label, (start_x, yPos),
cv2.FONT_HERSHEY_SIMPLEX, 0.54, box_color, 2)
if __name__ == "__main__":
predict_age_and_gender()
PART 3: Gender Detection using
OpenCV in Python
Learn how to perform gender detection on detected faces in images using OpenCV library in Python.
Automatic prediction of gender from face images has drawn a lot of attention
recently, due to its wide application in various facial analysis problems.
However, due to the large variations of face images (such as variation in
lighting, scale, and occlusion) the existing models are still behind the desired
accuracy level which is necessary for exploiting these models in real-world
applications.
Please note that if you want to detect both gender and age in the same code at
the same time, check this tutorial for it.
Pre-requisites
For the purpose of this article, we will use pre-trained Caffe models, one for
face detection taken from the face detection tutorial, and another model for
age detection. Below is the list of necessary files to include in our project
directory:
After downloading the 4 necessary files, put them in the weights folder:
Open up a new Python file and follow along. First, let's import the necessary
modules and initialize the needed variables:
# Import Libraries
import cv2
import numpy as np
Like the age detection tutorial, before going into detecting gender, we need a
way to detect faces, below function is mostly taken from the face detection
tutorial:
Next, let's make two utility functions, one for finding the appropriate font
size to write in the image, and another for correctly resizing the image:
# from: https://stackoverflow.com/questions/44650888/resize-an-image-
without-distortion-opencv
def image_resize(image, width = None, height = None, inter =
cv2.INTER_AREA):
# initialize the dimensions of the image to be resized and
# grab the image size
dim = None
(h, w) = image.shape[:2]
# if both the width and height are None, then return the
# original image
if width is None and height is None:
return image
# check to see if the width is None
if width is None:
# calculate the ratio of the height and construct the
# dimensions
r = height / float(h)
dim = (int(w * r), height)
# otherwise, the height is None
else:
# calculate the ratio of the width and construct the
# dimensions
r = width / float(w)
dim = (width, int(h * r))
# resize the image
return cv2.resize(image, dim, interpolation = inter)
Now we know how to detect faces, let's make our core function to predict the
gender of each face detected:
if __name__ == '__main__':
# Parsing command line arguments entered by user
import sys
predict_gender(sys.argv[1])
We simply use the sys module to get the image path from the command line.
Let's test this out, I'm testing on this stock image:
Female-97.36%
Female-98.34%
And the resulting image:
Conclusion
And there you go, now you have a Python code for detecting gender on any
image using the OpenCV library. The gender model seems to be accurate.
Source Code:
predict_gender.py
# Import Libraries
import cv2
import numpy as np
# from: https://stackoverflow.com/questions/44650888/resize-an-image-without-distortion-opencv
def image_resize(image, width = None, height = None, inter = cv2.INTER_AREA):
# initialize the dimensions of the image to be resized and
# grab the image size
dim = None
(h, w) = image.shape[:2]
# if both the width and height are None, then return the
# original image
if width is None and height is None:
return image
# check to see if the width is None
if width is None:
# calculate the ratio of the height and construct the
# dimensions
r = height / float(h)
dim = (int(w * r), height)
# otherwise, the height is None
else:
# calculate the ratio of the width and construct the
# dimensions
r = width / float(w)
dim = (width, int(h * r))
# resize the image
return cv2.resize(image, dim, interpolation = inter)
if __name__ == '__main__':
# Parsing command line arguments entered by user
import sys
predict_gender(sys.argv[1])
predict_gender_live.py
# Import Libraries
import cv2
import numpy as np
# from: https://stackoverflow.com/questions/44650888/resize-an-image-without-distortion-opencv
def image_resize(image, width = None, height = None, inter = cv2.INTER_AREA):
# initialize the dimensions of the image to be resized and
# grab the image size
dim = None
(h, w) = image.shape[:2]
# if both the width and height are None, then return the
# original image
if width is None and height is None:
return image
# check to see if the width is None
if width is None:
# calculate the ratio of the height and construct the
# dimensions
r = height / float(h)
dim = (int(w * r), height)
# otherwise, the height is None
else:
# calculate the ratio of the width and construct the
# dimensions
r = width / float(w)
dim = (width, int(h * r))
# resize the image
return cv2.resize(image, dim, interpolation = inter)
def predict_gender():
"""Predict the gender of the faces showing in the image"""
# create a new cam object
cap = cv2.VideoCapture(0)
while True:
_, img = cap.read()
# resize the image, uncomment if you want to resize the image
# img = cv2.resize(img, (frame_width, frame_height))
# Take a of the initial image and resize it
frame = img.()
if frame.shape[1] > frame_width:
frame = image_resize(frame, width=frame_width)
# predict the faces
faces = get_faces(frame)
# Loop over the faces detected
# for idx, face in enumerate(faces):
for i, (start_x, start_y, end_x, end_y) in enumerate(faces):
face_img = frame[start_y: end_y, start_x: end_x]
# image --> Input image to preprocess before passing it through our dnn for classification.
# scale factor = After performing mean substraction we can optionally scale the image by some
factor. (if 1 -> no scaling)
# size = The spatial size that the CNN expects. Options are = (224*224, 227*227 or 299*299)
# mean = mean substraction values to be substracted from every channel of the image.
# swapRB=OpenCV assumes images in BGR whereas the mean is supplied in RGB. To resolve
this we set swapRB to True.
blob = cv2.dnn.blobFromImage(image=face_img, scalefactor=1.0, size=(
227, 227), mean=MODEL_MEAN_VALUES, swapRB=False, crop=False)
# Predict Gender
gender_net.setInput(blob)
gender_preds = gender_net.forward()
i = gender_preds[0].argmax()
gender = GENDER_LIST[i]
gender_confidence_score = gender_preds[0][i]
# Draw the box
label = "{}-{:.2f}%".format(gender, gender_confidence_score*100)
print(label)
yPos = start_y - 15
while yPos < 15:
yPos += 15
# get the font scale for this image size
optimal_font_scale = get_optimal_font_scale(label,((end_x-start_x)+25))
box_color = (255, 0, 0) if gender == "Male" else (147, 20, 255)
cv2.rectangle(frame, (start_x, start_y), (end_x, end_y), box_color, 2)
# Label processed image
cv2.putText(frame, label, (start_x, yPos),
cv2.FONT_HERSHEY_SIMPLEX, optimal_font_scale, box_color, 2)
if __name__ == '__main__':
predict_gender()
PART 4: Age Detection using
OpenCV in Python
Learn how to predict someone's age from his front face picture using OpenCV library in Python
Recently, wide attention has grown in the field of computer vision, especially
in face recognition, detection, and facial landmarks localization. Many
significant features can be directly derived from the human face, such as age,
gender, and emotions.
Please note that if you want to detect both age and gender at the same time,
check this tutorial for it.
For the purpose of this article, we will use pre-trained Caffe models, one for
face detection taken from the face detection tutorial, and another model for
age detection. Below is the list of necessary files to include in our project
directory:
After downloading the 4 necessary files, put them in a folder and call
it "weights":
To get started, let's install OpenCV and NumPy:
# Import Libraries
import cv2
import os
import filetype
import numpy as np
Here we initialized the paths of our models' weights and architecture, the
image size we gonna resize to, and finally loading the models.
The variable AGE_INTERVALS is a list of the age classes of the age detection
model.
Next, let's make a function that takes an image as input, and returns a list of
detected faces:
Most of the code was grabbed from the face detection tutorial, check it out
for more information on how it's done.
Next, below are two utility functions, one for finding the appropriate font size
when printing text to the image, and another for dynamically resizing an
image:
# from: https://stackoverflow.com/questions/44650888/resize-an-image-
without-distortion-opencv
def image_resize(image, width = None, height = None, inter =
cv2.INTER_AREA):
# initialize the dimensions of the image to be resized and
# grab the image size
dim = None
(h, w) = image.shape[:2]
# if both the width and height are None, then return the
# original image
if width is None and height is None:
return image
# check to see if the width is None
if width is None:
# calculate the ratio of the height and construct the
# dimensions
r = height / float(h)
dim = (int(w * r), height)
# otherwise, the height is None
else:
# calculate the ratio of the width and construct the
# dimensions
r = width / float(w)
dim = (width, int(h * r))
# resize the image
return cv2.resize(image, dim, interpolation = inter)
You can always uncomment the cv2.imwrite() line to save the new image.
if __name__ == '__main__':
# Parsing command line arguments entered by user
import sys
image_path = sys.argv[1]
predict_age(image_path)
We simply use Python's built-in sys module for getting user input, as we only
need one argument from the user and that's the image path, the argparse module
would be overkill.
Let's test the code on this stock photo:
Output:
Wrap-up
The age detection model is heavily biased toward the age group [25-32] .
Therefore you may pinpoint this discrepancy while testing this utility.
You can always tweak some parameters to make the model more accurate.
For instance, in the get_faces() function, I've widened the box by 10 pixels on
all sides, you can always change that to any value you feel good about.
Changing frame_width and frame_height is also a way to refine the accuracy of the
prediction.
Source Code:
predict_age.py
# Import Libraries
import cv2
import os
import filetype
import numpy as np
# from: https://stackoverflow.com/questions/44650888/resize-an-image-without-distortion-opencv
def image_resize(image, width = None, height = None, inter = cv2.INTER_AREA):
# initialize the dimensions of the image to be resized and
# grab the image size
dim = None
(h, w) = image.shape[:2]
# if both the width and height are None, then return the
# original image
if width is None and height is None:
return image
# check to see if the width is None
if width is None:
# calculate the ratio of the height and construct the
# dimensions
r = height / float(h)
dim = (int(w * r), height)
# otherwise, the height is None
else:
# calculate the ratio of the width and construct the
# dimensions
r = width / float(w)
dim = (width, int(h * r))
# resize the image
return cv2.resize(image, dim, interpolation = inter)
if __name__ == '__main__':
# Parsing command line arguments entered by user
import sys
image_path = sys.argv[1]
predict_age(image_path)
predict_age_live.py
# Import Libraries
import cv2
import os
import filetype
import numpy as np
# from: https://stackoverflow.com/questions/44650888/resize-an-image-without-distortion-opencv
def image_resize(image, width = None, height = None, inter = cv2.INTER_AREA):
# initialize the dimensions of the image to be resized and
# grab the image size
dim = None
(h, w) = image.shape[:2]
# if both the width and height are None, then return the
# original image
if width is None and height is None:
return image
# check to see if the width is None
if width is None:
# calculate the ratio of the height and construct the
# dimensions
r = height / float(h)
dim = (int(w * r), height)
# otherwise, the height is None
else:
# calculate the ratio of the width and construct the
# dimensions
r = width / float(w)
dim = (width, int(h * r))
# resize the image
return cv2.resize(image, dim, interpolation = inter)
def predict_age():
"""Predict the age of the faces showing in the image"""
while True:
_, img = cap.read()
# Take a of the initial image and resize it
frame = img.()
if frame.shape[1] > frame_width:
frame = image_resize(frame, width=frame_width)
faces = get_faces(frame)
for i, (start_x, start_y, end_x, end_y) in enumerate(faces):
face_img = frame[start_y: end_y, start_x: end_x]
# image --> Input image to preprocess before passing it through our dnn for classification.
blob = cv2.dnn.blobFromImage(
image=face_img, scalefactor=1.0, size=(227, 227),
mean=MODEL_MEAN_VALUES, swapRB=False
)
# Predict Age
age_net.setInput(blob)
age_preds = age_net.forward()
print("="*30, f"Face {i+1} Prediction Probabilities", "="*30)
for i in range(age_preds[0].shape[0]):
print(f"{AGE_INTERVALS[i]}: {age_preds[0, i]*100:.2f}%")
i = age_preds[0].argmax()
age = AGE_INTERVALS[i]
age_confidence_score = age_preds[0][i]
# Draw the box
label = f"Age:{age} - {age_confidence_score*100:.2f}%"
print(label)
# get the position where to put the text
yPos = start_y - 15
while yPos < 15:
yPos += 15
# write the text into the frame
cv2.putText(frame, label, (start_x, yPos),
cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), thickness=2)
# draw the rectangle around the face
cv2.rectangle(frame, (start_x, start_y), (end_x, end_y), color=(255, 0, 0), thickness=2)
# Display processed image
cv2.imshow('Age Estimator', frame)
if cv2.waitKey(1) == ord("q"):
break
# save the image if you want
# cv2.imwrite("predicted_age.jpg", frame)
cv2.destroyAllWindows()
if __name__ == '__main__':
predict_age()
PART 5: SIFT Feature Extraction
using OpenCV in Python
Learn how to compute and detect SIFT features for feature matching and more using OpenCV library
in Python.
In this tutorial, you will learn the theory behind SIFT as well as how to
implement it in Python using OpenCV library.
For creating the first octave, a gaussian filter is applied to an input image
with different values of sigma, then for the 2nd and upcoming octaves, the
image is first down-sampled by a factor of 2 then applied Gaussian filters
with different values.
The following image shows four octaves and each octave contains six
images:
A question comes around about how many scales per octave? Research
shows that there should be 4 scales per octave:
Then two consecutive images in the octave are subtracted to obtain the
difference of gaussian.
Keypoint Localization
After taking the difference of gaussian, we need to detect the maxima and
minima in the scale space by comparing a pixel (x) with 26 pixels in the
current and adjacent scale. Each point is compared to its 8 neighbors in the
current image and 9 neighbors each in the scales above and below.
The following are the extrema points found in our example image:
Orientation Assignment
Orientation assignments are done to achieve rotation invariance. The gradient magnitude a
blurred image.
The magnitude represents the intensity of the pixel and the orientation gives
the direction for the same.
Now we need to look at the orientation of each point. weights are also
assigned with the direction. The arrow in the blue square below as an
approximately 90-degree angle and its length shows that how much it counts.
Python Implementation
Now you hopefully understand the theory behind SIFT, let's dive into the
Python code using OpenCV. First, let's install a specific version of OpenCV
which implements SIFT:
Open up a new Python file and follow along, I'm gonna operate on this table
that contain a specific book (get it here):
import cv2
The above code loads the image and convert it to grayscale, let's create SIFT
feature extractor object:
Now that we have keypoints and descriptors of both images, let's make a
matcher to match the descriptors:
Let's sort the matches by distance and draw the first 50 matches:
Output:
Conclusion
Alright, in this tutorial, we've covered the basics of SIFT, I suggest you
read the original paper for more detailed information.
Source Code:
sift.py
import cv2
Calculating Gradients
Now after resizing, we need to calculate the gradient in the x and y direction.
The gradient is simply the small changes in the x and y directions, we need to
convolve two simple filters on the image.
Python Code
Now that we understand the theory, let's take a look on how we can
use scikit-image library to extract HOG features from images.
I'm gonna perform HOG on a cute cat image, get it here and put it in the
current working directory (you can use any image you want, of course). Let's
load the image and show it:
(1349, 1012, 3)
# resizing image
resized_img = resize(img, (128*4, 64*4))
plt.axis("off")
plt.imshow(resized_img)
print(resized_img.shape)
Output:
(128, 64, 3)
Now we simply use hog() function from scikit-image
library:
Output:
The hog() function takes 6 parameters as input:
image : The target image you want to apply HOG feature extraction.
orientations : Number of bins in the histogram we want to create, the
original research paper used 9 bins so we will pass 9 as orientations.
pixels_per_cell : Determines the size of the cell, as we mentioned earlier, it
is 8x8.
cells_per_block : Number of cells per block, will be 2x2 as mentioned
previously.
visualize : A boolean whether to return the image of the HOG, we set it
to True so we can show the image.
multichannel : We set it to True to tell the function that the last dimension is
considered as a color channel, instead of spatial.
Conclusion
Alright, now you know how to perform HOG feature extraction in Python
with the help of scikit-image library.
Source Code:
hog.py
#importing required libraries
from skimage.io import imread
from skimage.transform import resize
from skimage.feature import hog
import matplotlib.pyplot as plt
#resizing image
resized_img = resize(img, (128*4, 64*4))
plt.axis("off")
plt.imshow(resized_img)
plt.show()
print(resized_img.shape)
Introduction
Image transformation is a coordinate changing function, it maps some (x,
y) points in one coordinate system to points (x', y') in another coordinate
system.
For example, if we have (2, 3) points in x-y coordinate, and we plot the same
point in u-v coordinate, the same point is represented in different ways, as
shown in the figure below:
Image Translation
Image translation is the rectilinear shift of an image from one location to
another, so the shifting of an object is called translation. The matrix shown
below is used for the translation of the image:
Now that you understand image translation, let's take a look at the Python
code. In OpenCV, there are two built-in functions for performing
transformations:
The below code reads an input image (if you want the exact output, get the
demo image here and put it in the current working directory), translates it,
and shows it:
import numpy as np
import cv2
import matplotlib.pyplot as plt
Note that we use plt.axis('off') as we do not want to output the axis values, and
we show the image using matplotlib's imshow() function.
Original image:
Translated image:
Image
Scaling
Image scaling is a process used to resize a digital image. OpenCV has a built-
in function cv2.resize() , but we will perform transformation using matrix
multiplication as previously. The matrix used for scaling is shown below:
Sx and Sy are the scaling
factors for the x-axis and y-axis, respectively.
The below code is responsible for reading the same image, defining the
transformation matrix for scaling, and showing the resulting image:
import numpy as np
import cv2
import matplotlib.pyplot as plt
Output image:
Note that you can easily remove those black pixels with cropping, we'll cover
that in the end of the tutorial.
Image Shearing
Shear mapping is a linear map that displaces each point in a fixed direction, it
substitutes every point horizontally or vertically by a specific value in
proportion to its x or y coordinates, there are two types of shearing effects.
import numpy as np
import cv2
import matplotlib.pyplot as plt
# read the input image
img = cv2.imread("city.jpg")
# convert from BGR to RGB so we can plot using matplotlib
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
# disable x & y axis
plt.axis('off')
# show the image
plt.imshow(img)
plt.show()
# get the image shape
rows, cols, dim = img.shape
# transformation matrix for Shearing
# shearing applied to x-axis
M = np.float32([[1, 0.5, 0],
[0, 1 , 0],
[0, 0 , 1]])
# shearing applied to y-axis
# M = np.float32([[1, 0, 0],
# [0.5, 1, 0],
# [0, 0, 1]])
# apply a perspective transformation to the image
sheared_img = cv2.warpPerspective(img,M,(int(cols*1.5),int(rows*1.5)))
# disable x & y axis
plt.axis('off')
# show the resulting image
plt.imshow(sheared_img)
plt.show()
# save the resulting image to disk
plt.imsave("city_sheared.jpg", sheared_img)
The first matrix is shearing applied to the x-axis, if you want the y-axis, then
comment the first matrix and uncomment the second one.
Image Reflection
Image reflection (or mirroring) is useful for flipping an image, it can flip the
image vertically as well as horizontally, which is a particular case of scaling.
For reflection along the x-axis, we set the value of Sy to -1, and Sx to 1, and
vice-versa for the y-axis reflection.
As previously, this will reflect its x-axis, if you want y-axis reflection,
uncomment the second matrix and comment on the first one.
The
transformation matrix of rotation is shown in the below figure, where theta
(θ) is the angle of rotation:
Below is the Python
code for image rotation:
import numpy as np
import cv2
import matplotlib.pyplot as plt
Output image:
This was
rotated by 10° ( np.radians(10) ), you're free to edit it as you wish!
import numpy as np
import cv2
import matplotlib.pyplot as plt
Conclusion
In this tutorial, we've covered the basics of image processing and
transformation, which are image translation, scaling, shearing, reflection,
rotation, and cropping.
Source Code:
translation.py
import numpy as np
import cv2
import matplotlib.pyplot as plt
scaling.py
import numpy as np
import cv2
import matplotlib.pyplot as plt
shearing.py
import numpy as np
import cv2
import matplotlib.pyplot as plt
reflection.py
import numpy as np
import cv2
import matplotlib.pyplot as plt
rotation.py
import numpy as np
import cv2
import matplotlib.pyplot as plt
cropping.py
import numpy as np
import cv2
import matplotlib.pyplot as plt
# get 200 pixels from 100 to 300 on both x-axis & y-axis
# change that if you will, just make sure you don't exceed cols & rows
cropped_img = img[100:300, 100:300]
# disable x & y axis
plt.axis('off')
# show the resulting image
plt.imshow(cropped_img)
plt.show()
# save the resulting image to disk
plt.imsave("city_cropped.jpg", cropped_img)
PART 8: How to Make a Barcode
Reader in Python
Learn how to make a barcode scanner that decodes barcodes and draw them in the image using pyzbar
and OpenCV libraries in Python
Once you have these installed, open up a new Python file and import them:
I have few images to test with, you can use any image you want from the
internet or your own disk, but you can get my test images in this directory.
def decode(image):
# decodes all barcodes from an image
decoded_objects = pyzbar.decode(image)
for obj in decoded_objects:
# draw the barcode
print("detected barcode:", obj)
image = draw_barcode(obj, image)
# print barcode type & data
print("Type:", obj.type)
print("Data:", obj.data)
print()
return image
We then iterate over all detected barcodes and draw a rectangle around the
barcode and prints the type and the data of the barcode.
To make things clear, the following is how each obj looked like if we print it:
So pyzbar.decode() function returns the data containing the barcode, the type of
barcode, as well as the location points as a rectangle and a polygon.
This function takes the decoded object we just saw, and the image itself, it
draws a rectangle around the barcode using cv2.rectangle() function, or you can
uncomment the other version of the function; drawing the polygon
using cv2.line() function, the choice is yours. I preferred the rectangle version.
Finally, it returns the image that contains the drawn barcodes. Now let's use
these functions for our example images:
if __name__ == "__main__":
from glob import glob
barcodes = glob("barcode*.png")
for barcode_file in barcodes:
# load the image to opencv
img = cv2.imread(barcode_file)
# decode detected barcodes & get the image
# that is drawn
img = decode(img)
# show the image
cv2.imshow("img", img)
cv2.waitKey(0)
On each file, we load it using cv2.imread() function, and use the previously
discussed decode() function to decode the barcodes and then we show the
actual image.
Note that this will also detect QR codes, and that's fine, but for more accurate
results, I suggest you check the dedicated tutorial for detecting and
generating qr codes in Python.
When I run the script, it shows each image and prints the type and data of it,
press any key and you'll get the next image, here is my output:
Conclusion
That is awesome, now you have a great tool to make your own barcode
scanner in Python. I know you all want to read directly from the camera, as a
result, I have prepared the code that reads from the camera and detects
barcodes in a live manner, check it here!
You can also add some sort of a beep when each barcode is detected, just like
in supermarkets, check the tutorial for playing sounds that may help you
accomplish that.
Source Code:
barcode_reader.py
from pyzbar import pyzbar
import cv2
def decode(image):
# decodes all barcodes from an image
decoded_objects = pyzbar.decode(image)
for obj in decoded_objects:
# draw the barcode
print("detected barcode:", obj)
image = draw_barcode(obj, image)
# print barcode type & data
print("Type:", obj.type)
print("Data:", obj.data)
print()
return image
if __name__ == "__main__":
from glob import glob
barcodes = glob("barcode*.png")
for barcode_file in barcodes:
# load the image to opencv
img = cv2.imread(barcode_file)
# decode detected barcodes & get the image
# that is drawn
img = decode(img)
# show the image
cv2.imshow("img", img)
cv2.waitKey(0)
live_barcode_reader.py
def decode(image):
# decodes all barcodes from an image
decoded_objects = pyzbar.decode(image)
for obj in decoded_objects:
# draw the barcode
image = draw_barcode(obj, image)
# print barcode type & data
print("Type:", obj.type)
print("Data:", obj.data)
print()
return image
if __name__ == "__main__":
cap = cv2.VideoCapture(0)
while True:
# read the frame from the camera
_, frame = cap.read()
# decode detected barcodes & get the image
# that is drawn
frame, decoded_objects = decode(frame)
# show the image in the window
cv2.imshow("frame", frame)
if cv2.waitKey(1) == ord("q"):
break
PART 9: How to Perform Malaria
Classification using TensorFlow 2 and
Keras in Python
Learn how to build a deep learning malaria detection model to classify cell images to either infected or
not infected with Malaria Tensorflow 2 and Keras API in Python.
Deep Learning use cases in medicine have known a big leap those past years,
from patient automatic diagnosis to computer vision, many cutting-edge
models are being developed in this domain.
import cv2
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Conv2D, MaxPool2D, Flatten,
Activation
from sklearn.model_selection import train_test_split
import numpy as np
import matplotlib.pyplot as plt
import glob
import os
# after you extract the dataset,
# put cell_images folder in the working directory
img_dir="cell_images"
img_size=70
def load_img_data(path):
image_files = glob.glob(os.path.join(path, "Parasitized/*.png")) + \
glob.glob(os.path.join(path, "Uninfected/*.png"))
X, y = [], []
for image_file in image_files:
# 0 for uninfected and 1 for infected
label = 0 if "Uninfected" in image_file else 1
# load the image in gray scale
img_arr = cv2.imread(image_file, cv2.IMREAD_GRAYSCALE)
# resize the image to (70x70)
img_resized = cv2.resize(img_arr, (img_size, img_size))
X.append(img_resized)
y.append(label)
return X, y
We used glob built-in module to get all images in that format (ending
with .png in a specific folder).
Then we iterate over these image file names and load each image in
grayscale, resize it and append it to our array, we also do the same for labels
(0 for uninfected and 1 for parasitized).
After we load our dataset preprocessed, we extend our images array shape
into (n_samples, 70, 70, 1) to fit the neural network input.
In our case, we won't be using those. Instead, we will divide by 255 since the
biggest value a pixel can achieve is 255, this will results to pixels ranging
between 0 and 1 after applying the scaling.
Then we will use the train_test_split() method from sklearn to divide the dataset
into training and testing sets, we used 10% of the total data for validation it
later on. The stratify parameter will preserve the proportion of target as in the
original dataset, in the train and test datasets as well.
Our neural network architecture will follow somehow the same architecture
presented in the figure:
model.add(Flatten())
model.add(Dense(64))
model.add(Activation("relu"))
model.add(Dense(64))
model.add(Activation("relu"))
model.add(Dense(1))
model.add(Activation("sigmoid"))
Model Evaluation
Now let us use the evaluate() from Keras API to evaluate the model on the
testing dataset:
The model performed well also in the test data with an accuracy
reaching 94%.
Now let's use this model to make inferences on the two images we put in
the testing-samples folder earlier in this tutorial. First, let's plot them:
_, ax = plt.subplots(1, 2)
ax[0].imshow(plt.imread(uninfected_cell))
ax[0].title.set_text("Uninfected Cell")
ax[1].imshow(plt.imread(infected_cell))
ax[1].title.set_text("Parasitized Cell")
plt.show()
Output:
Great, now let's load these images and perform preprocessing:
img_arr_uninfected = cv2.imread(uninfected_cell,
cv2.IMREAD_GRAYSCALE)
img_arr_infected = cv2.imread(infected_cell, cv2.IMREAD_GRAYSCALE)
# resize the images to (70x70)
img_arr_uninfected = cv2.resize(img_arr_uninfected, (img_size, img_size))
img_arr_infected = cv2.resize(img_arr_infected, (img_size, img_size))
# scale to [0, 1]
img_arr_infected = img_arr_infected / 255
img_arr_uninfected = img_arr_uninfected / 255
# reshape to fit the neural network dimensions
# (changing shape from (70, 70) to (1, 70, 70, 1))
img_arr_infected = img_arr_infected.reshape(1, *img_arr_infected.shape)
img_arr_infected = np.expand_dims(img_arr_infected, axis=3)
img_arr_uninfected = img_arr_uninfected.reshape(1,
*img_arr_uninfected.shape)
img_arr_uninfected = np.expand_dims(img_arr_uninfected, axis=3)
# perform inference
infected_result = model.predict(img_arr_infected)[0][0]
uninfected_result = model.predict(img_arr_uninfected)[0][0]
print(f"Infected: {infected_result}")
print(f"Uninfected: {uninfected_result}")
Output:
Infected: 0.9827326536178589
Uninfected: 0.005085020791739225
Awesome, the model is 98% sure that the infected cell is in fact infected, and
he's sure 99.5% of the time that the uninfected cell is uninfected.
Conclusion:
In this tutorial you have learned:
I encourage you to tweak the model parameters, or you may want to use
transfer learning so you can perform much better. You can also train on
colored images instead of greyscale, this may help!
There are other metrics besides accuracy, such as sensitivity and specificity,
which are widely used in the medical field, I invite you to add them here as
well. If you're not sure how there is a tutorial on skin cancer detection in
which we did all of that!
Source Code:
malaria-classification.py
import cv2
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Conv2D, MaxPool2D, Flatten, Activation
from sklearn.model_selection import train_test_split
import numpy as np
import matplotlib.pyplot as plt
import glob
import os
# after you extract the dataset,
# put cell_images folder in the working directory
img_dir="cell_images"
img_size=70
def load_img_data(path):
image_files = glob.glob(os.path.join(path, "Parasitized/*.png")) + \
glob.glob(os.path.join(path, "Uninfected/*.png"))
X, y = [], []
for image_file in image_files:
# 0 for uninfected and 1 for infected
label = 0 if "Uninfected" in image_file else 1
# load the image in gray scale
img_arr = cv2.imread(image_file, cv2.IMREAD_GRAYSCALE)
# resize the image to (70x70)
img_resized = cv2.resize(img_arr, (img_size, img_size))
X.append(img_resized)
y.append(label)
return X, y
model = Sequential()
model.add(Conv2D(64, (3, 3), input_shape=X_train.shape[1:]))
model.add(Activation("relu"))
model.add(MaxPool2D(pool_size=(2, 2)))
model.add(Conv2D(64, (3, 3)))
model.add(Activation("relu"))
model.add(MaxPool2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(64))
model.add(Activation("relu"))
model.add(Dense(64))
model.add(Activation("relu"))
model.add(Dense(1))
model.add(Activation("sigmoid"))
Skin cancer is an abnormal growth of skin cells, it is one of the most common
cancers and unfortunately, it can become deadly. The good news though is
when caught early, your dermatologist can treat it and eliminate it entirely.
Using deep learning and neural networks, we'll be able to classify benign and
malignant skin diseases, which may help the doctor diagnose cancer at an
earlier stage. In this tutorial, we will make a skin disease classifier that tries
to distinguish between benign (nevus and seborrheic keratosis) and malignant
(melanoma) skin diseases from only photographic images
using TensorFlow framework in Python.
Open up a new notebook (or Google Colab) and import the necessary
modules:
import tensorflow as tf
import tensorflow_hub as hub
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
from tensorflow.keras.utils import get_file
from sklearn.metrics import roc_curve, auc, confusion_matrix
from imblearn.metrics import sensitivity_score, specificity_score
import os
import glob
import zipfile
import random
def download_and_extract_dataset():
# dataset from https://github.com/udacity/dermatologist-ai
# 5.3GB
train_url = "https://s3-us-west-1.amazonaws.com/udacity-
dlnfd/datasets/skin-cancer/train.zip"
# 824.5MB
valid_url = "https://s3-us-west-1.amazonaws.com/udacity-
dlnfd/datasets/skin-cancer/valid.zip"
# 5.1GB
test_url = "https://s3-us-west-1.amazonaws.com/udacity-
dlnfd/datasets/skin-cancer/test.zip"
for i, download_link in enumerate([valid_url, train_url, test_url]):
temp_file = f"temp{i}.zip"
data_dir = get_file(origin=download_link, fname=os.path.join(os.getcwd(),
temp_file))
print("Extracting", download_link)
with zipfile.ZipFile(data_dir, "r") as z:
z.extractall("data")
# remove the temp file
os.remove(temp_file)
This will take several minutes depending on your connection, after that,
the data folder will appear that contains the training, validation and testing
sets. Each set is a folder that has three categories of skin disease images
(nevus, seborrheic_keratosis and melanoma).
Note: You may struggle to download the dataset using the above Python
function when you have a slow Internet connection, in that case, you should
download it and extract it manually in the folder data in the current directory.
Now that we have the dataset in our machine, let's find a way to label these
images, remember we're going to classify only benign and malignant skin
diseases, so we need to label nevus and seborrheic keratosis as the value 0
and melanoma 1.
The below cell generates a metadata CSV file for each set, each row in the
CSV file corresponds to a path to an image along with its label (0 or 1):
# preparing data
# generate CSV metadata file to read img paths and labels from it
def generate_csv(folder, label2int):
folder_name = os.path.basename(folder)
labels = list(label2int)
# generate CSV file
df = pd.DataFrame(columns=["filepath", "label"])
i=0
for label in labels:
print("Reading", os.path.join(folder, label, "*"))
for filepath in glob.glob(os.path.join(folder, label, "*")):
df.loc[i] = [filepath, label2int[label]]
i += 1
output_file = f"{folder_name}.csv"
print("Saving", output_file)
df.to_csv(output_file)
# generate CSV files for all data portions, labeling nevus and seborrheic
keratosis
# as 0 (benign), and melanoma as 1 (malignant)
# you should replace "data" path to your extracted dataset path
# don't replace if you used download_and_extract_dataset() function
generate_csv("data/train", {"nevus": 0, "seborrheic_keratosis": 0,
"melanoma": 1})
generate_csv("data/valid", {"nevus": 0, "seborrheic_keratosis": 0,
"melanoma": 1})
generate_csv("data/test", {"nevus": 0, "seborrheic_keratosis": 0, "melanoma":
1})
The generate_csv() function accepts 2 arguments, the first is the path of the set,
for example, if you have downloaded and extract the dataset in "E:\datasets\skin-
cancer" , then the training set should be something like "E:\datasets\skin-cancer\train".
The second parameter is a dictionary that maps each skin disease category to
its corresponding label value (again, 0 for benign and 1 for malignant).
The reason I did a function like this is the ability to use it on other skin
disease classifications (such as melanocytic classification), so you can add
more skin diseases and use it for other problems as well.
Once you run the cell, you notice that 3 CSV files will appear in your current
directory. Now let's use the from_tensor_slices() method from tf.data API to load
these metadata files:
# loading data
train_metadata_filename = "train.csv"
valid_metadata_filename = "valid.csv"
# load CSV files as DataFrames
df_train = pd.read_csv(train_metadata_filename)
df_valid = pd.read_csv(valid_metadata_filename)
n_training_samples = len(df_train)
n_validation_samples = len(df_valid)
print("Number of training samples:", n_training_samples)
print("Number of validation samples:", n_validation_samples)
train_ds = tf.data.Dataset.from_tensor_slices((df_train["filepath"],
df_train["label"]))
valid_ds = tf.data.Dataset.from_tensor_slices((df_valid["filepath"],
df_valid["label"]))
Now we have loaded the dataset ( train_ds and valid_ds ), each sample is a tuple
of filepath (path to the image file) and label (0 for benign and 1 for malignant),
here is the output:
# preprocess data
def decode_img(img):
# convert the compressed string to a 3D uint8 tensor
img = tf.image.decode_jpeg(img, channels=3)
# Use `convert_image_dtype` to convert to floats in the [0,1] range.
img = tf.image.convert_image_dtype(img, tf.float32)
# resize the image to the desired size.
return tf.image.resize(img, [299, 299])
valid_ds = valid_ds.map(process_path)
train_ds = train_ds.map(process_path)
# test_ds = test_ds
for image, label in train_ds.take(1):
print("Image shape:", image.shape)
print("Label:", label.numpy())
The above code uses map() method to execute process_path() function on each
sample on both sets, it'll basically load the images, decode the image format,
convert the image pixels to be in the range [0, 1] and resize it to (299, 299, 3) , we
then take one image and print its shape:
# training parameters
batch_size = 64
optimizer = "rmsprop"
The below cell gets the first validation batch and plots the images along with
their corresponding label:
batch = next(iter(valid_ds))
def show_batch(batch):
plt.figure(figsize=(12,12))
for n in range(25):
ax = plt.subplot(5,5,n+1)
plt.imshow(batch[0][n])
plt.title(class_names[batch[1][n].numpy()].title())
plt.axis('off')
show_batch(batch)
Output:
As you can see, it's extremely hard to differentiate between malignant and
benign diseases, let's see how our model will deal with it.
Great, now our dataset is ready, let's dive into building our model.
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
================================================================
keras_layer (KerasLayer) multiple 21802784
_________________________________________________________________
dense (Dense) multiple 2049
================================================================
Total params: 21,804,833
Trainable params: 2,049
Non-trainable params: 21,802,784
_________________________________________________________________
model_name = f"benign-vs-malignant_{batch_size}_{optimizer}"
tensorboard = tf.keras.callbacks.TensorBoard(log_dir=os.path.join("logs",
model_name))
# saves model checkpoint whenever we reach better weights
modelcheckpoint = tf.keras.callbacks.ModelCheckpoint(model_name +
"_{val_loss:.3f}.h5", save_best_only=True, verbose=1)
Since fit() method doesn't know the number of samples there are in the
dataset, we need to specify steps_per_epoch and validation_steps parameters for the
number of iterations (the number of samples divided by the batch size) of the
training set and validation set respectively.
Model Evaluation
First, let's load our test set, just like previously:
# evaluation
# load testing set
test_metadata_filename = "test.csv"
df_test = pd.read_csv(test_metadata_filename)
n_testing_samples = len(df_test)
print("Number of testing samples:", n_testing_samples)
test_ds = tf.data.Dataset.from_tensor_slices((df_test["filepath"],
df_test["label"]))
The above code loads our test data and prepares it for testing:
600 images of the shape (299, 299, 3) can fit our memory, let's convert our test
set from tf.data into a NumPy array:
# convert testing set to numpy array to fit in memory (don't do that when
testing
# set is too large)
y_test = np.zeros((n_testing_samples,))
X_test = np.zeros((n_testing_samples, 299, 299, 3))
for i, (img, label) in enumerate(test_ds.take(n_testing_samples)):
# print(img.shape, label.shape)
X_test[i] = img
y_test[i] = label.numpy()
print("y_test.shape:", y_test.shape)
The above cell will construct our arrays, it will take some time the first time
it's executed because it's doing all the preprocessing defined
in process_path() and prepare_for_testing() functions.
You may not have the exact filename of the optimal weights, you need to
search for the saved weights in the current directory that has the least loss, the
below code evaluates the model using accuracy metric:
Output:
We've reached about 84% accuracy on the validation set and 80% on the test
set, but that's not all. Since our dataset is largely unbalanced, accuracy doesn't
tell everything. In fact, a model that predicts every image as benign would get
an accuracy of 80% , since malignant samples are about 20% of the total
validation set.
But before we do that, I just want to make something clear: we all know that
predicting a malignant disease as benign is a terrible mistake, you can kill
people doing that! So we need a way to predict even more malignant cases
even that we have very few malignant samples compared to benign. A good
method is introducing a threshold.
Remember the output of the neural network is a value between 0 and 1. In the
normal way, when the neural network produces a value between 0 and 0.5,
we automatically assign it as benign, and from 0.5 to 1.0 as malignant. And
since we want to be aware of the fact that we can predict a malignant disease
as benign (that's only one of the many reasons), we can say for example,
from 0 to 0.3 is benign, and from 0.3 to 1.0 is malignant, this means we are
using a threshold value of 0.3, this will improve our predictions.
def get_predictions(threshold=None):
"""
Returns predictions for binary classification given `threshold`
For instance, if threshold is 0.3, then it'll output 1 (malignant) for that
sample if
the probability of 1 is 30% or more (instead of 50%)
"""
y_pred = m.predict(X_test)
if not threshold:
threshold = 0.5
result = np.zeros((n_testing_samples,))
for i in range(n_testing_samples):
# test melanoma probability
if y_pred[i][0] >= threshold:
result[i] = 1
# else, it's 0 (benign)
return result
threshold = 0.23
# get predictions with 23% threshold
# which means if the model is 23% sure or more that is malignant,
# it's assigned as malignant, otherwise it's benign
y_pred = get_predictions(threshold)
plot_confusion_matrix(y_test, y_pred)
Output:
Sensitivity
So our model gets about 0.72 probability of a positive test given that the
patient has the disease (bottom right of the confusion matrix), that's often
called sensitivity.
So in our example, out of all patients that have a malignant skin disease, we
successfully predicted 72% of them as malignant, not bad but needs
improvements.
Specificity
The other metric is specificity, you can read it in the top left of the confusion
matrix, we got about 63% . It is basically the probability of a negative test
given that the patient is well:
In our example, out of all patients that has a benign, we predicted 63% of
them as benign.
With high specificity, the test rarely gives positive results in healthy patients,
whereas a high sensitivity means that the model is reliable when its result is
negative, I invite you to read more about it in this Wikipedia article.
Output:
Another good metric is ROC, which is basically a graphical plot that shows
us the diagnostic ability of our binary classifier, it features a true positive rate
on the Y-axis and a false-positive rate on the X-axis. The perfect point we
want to reach is in the top left corner of the plot, here is the code for plotting
the ROC curve using matplotlib:
Output:
Awesome, since we want to maximize the true positive rate, and minimize
the false positive rate, calculating the area underneath the ROC curve proves
to be useful, we got 0.671 as the Area Under Curve ROC (ROC AUC), an
area of 1 means the model is ideal for all cases.
Conclusion
We're done! There you have it, see how you can improve the model, we only
used 2000 training samples, go to ISIC archive and download more and add
them to the data folder, the scores will improve significantly depending on the
number of samples you add. You can use ISIC archive downloader which
may help you download the dataset in the way you want.
Source Code:
skin-cancer-detection.py
# coding: utf-8
# In[1]:
import tensorflow as tf
import tensorflow_hub as hub
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
from tensorflow.keras.utils import get_file
from sklearn.metrics import roc_curve, auc, confusion_matrix
from imblearn.metrics import sensitivity_score, specificity_score
import os
import glob
import zipfile
import random
def download_and_extract_dataset():
# dataset from https://github.com/udacity/dermatologist-ai
# 5.3GB
train_url = "https://s3-us-west-1.amazonaws.com/udacity-dlnfd/datasets/skin-cancer/train.zip"
# 824.5MB
valid_url = "https://s3-us-west-1.amazonaws.com/udacity-dlnfd/datasets/skin-cancer/valid.zip"
# 5.1GB
test_url = "https://s3-us-west-1.amazonaws.com/udacity-dlnfd/datasets/skin-cancer/test.zip"
for i, download_link in enumerate([valid_url, train_url, test_url]):
temp_file = f"temp{i}.zip"
data_dir = get_file(origin=download_link, fname=os.path.join(os.getcwd(), temp_file))
print("Extracting", download_link)
with zipfile.ZipFile(data_dir, "r") as z:
z.extractall("data")
# remove the temp file
os.remove(temp_file)
# In[2]:
# preparing data
# generate CSV metadata file to read img paths and labels from it
def generate_csv(folder, label2int):
folder_name = os.path.basename(folder)
labels = list(label2int)
# generate CSV file
df = pd.DataFrame(columns=["filepath", "label"])
i=0
for label in labels:
print("Reading", os.path.join(folder, label, "*"))
for filepath in glob.glob(os.path.join(folder, label, "*")):
df.loc[i] = [filepath, label2int[label]]
i += 1
output_file = f"{folder_name}.csv"
print("Saving", output_file)
df.to_csv(output_file)
# generate CSV files for all data portions, labeling nevus and seborrheic keratosis
# as 0 (benign), and melanoma as 1 (malignant)
# you should replace "data" path to your extracted dataset path
# don't replace if you used download_and_extract_dataset() function
generate_csv("data/train", {"nevus": 0, "seborrheic_keratosis": 0, "melanoma": 1})
generate_csv("data/valid", {"nevus": 0, "seborrheic_keratosis": 0, "melanoma": 1})
generate_csv("data/test", {"nevus": 0, "seborrheic_keratosis": 0, "melanoma": 1})
# In[3]:
# loading data
train_metadata_filename = "train.csv"
valid_metadata_filename = "valid.csv"
# load CSV files as DataFrames
df_train = pd.read_csv(train_metadata_filename)
df_valid = pd.read_csv(valid_metadata_filename)
n_training_samples = len(df_train)
n_validation_samples = len(df_valid)
print("Number of training samples:", n_training_samples)
print("Number of validation samples:", n_validation_samples)
train_ds = tf.data.Dataset.from_tensor_slices((df_train["filepath"], df_train["label"]))
valid_ds = tf.data.Dataset.from_tensor_slices((df_valid["filepath"], df_valid["label"]))
# In[4]:
# preprocess data
def decode_img(img):
# convert the compressed string to a 3D uint8 tensor
img = tf.image.decode_jpeg(img, channels=3)
# Use `convert_image_dtype` to convert to floats in the [0,1] range.
img = tf.image.convert_image_dtype(img, tf.float32)
# resize the image to the desired size.
return tf.image.resize(img, [299, 299])
valid_ds = valid_ds.map(process_path)
train_ds = train_ds.map(process_path)
# test_ds = test_ds
# for image, label in train_ds.take(1):
# print("Image shape:", image.shape)
# print("Label:", label.numpy())
# In[5]:
# training parameters
batch_size = 64
optimizer = "rmsprop"
# In[6]:
# Repeat forever
ds = ds.repeat()
# split to batches
ds = ds.batch(batch_size)
# `prefetch` lets the dataset fetch batches in the background while the model
# is training.
ds = ds.prefetch(buffer_size=tf.data.experimental.AUTOTUNE)
return ds
# In[9]:
batch = next(iter(valid_ds))
def show_batch(batch):
plt.figure(figsize=(12,12))
for n in range(25):
ax = plt.subplot(5,5,n+1)
plt.imshow(batch[0][n])
plt.title(class_names[batch[1][n].numpy()].title())
plt.axis('off')
show_batch(batch)
# In[7]:
# In[9]:
model_name = f"benign-vs-malignant_{batch_size}_{optimizer}"
tensorboard = tf.keras.callbacks.TensorBoard(log_dir=os.path.join("logs", model_name))
# saves model checkpoint whenever we reach better weights
modelcheckpoint = tf.keras.callbacks.ModelCheckpoint(model_name + "_{val_loss:.3f}.h5",
save_best_only=True, verbose=1)
# evaluation
ds = ds.shuffle(buffer_size=shuffle_buffer_size)
return ds
test_ds = test_ds.map(process_path)
test_ds = prepare_for_testing(test_ds, cache="test-cached-data")
# In[9]:
# convert testing set to numpy array to fit in memory (don't do that when testing
# set is too large)
y_test = np.zeros((n_testing_samples,))
X_test = np.zeros((n_testing_samples, 299, 299, 3))
for i, (img, label) in enumerate(test_ds.take(n_testing_samples)):
# print(img.shape, label.shape)
X_test[i] = img
y_test[i] = label.numpy()
print("y_test.shape:", y_test.shape)
# In[10]:
# In[11]:
# In[14]:
def get_predictions(threshold=None):
"""
Returns predictions for binary classification given `threshold`
For instance, if threshold is 0.3, then it'll output 1 (malignant) for that sample if
the probability of 1 is 30% or more (instead of 50%)
"""
y_pred = m.predict(X_test)
if not threshold:
threshold = 0.5
result = np.zeros((n_testing_samples,))
for i in range(n_testing_samples):
# test melanoma probability
if y_pred[i][0] >= threshold:
result[i] = 1
# else, it's 0 (benign)
return result
threshold = 0.23
# get predictions with 23% threshold
# which means if the model is 23% sure or more that is malignant,
# it's assigned as malignant, otherwise it's benign
y_pred = get_predictions(threshold)
accuracy_after = accuracy_score(y_test, y_pred)
print("Accuracy after setting the threshold:", accuracy_after)
# In[16]:
plot_confusion_matrix(y_test, y_pred)
plot_roc_auc(y_test, y_pred)
sensitivity = sensitivity_score(y_test, y_pred)
specificity = specificity_score(y_test, y_pred)
# In[24]:
def plot_images(X_test, y_pred, y_test):
predicted_class_names = np.array([class_names[int(round(id))] for id in y_pred])
# some nice plotting
plt.figure(figsize=(10,9))
for n in range(30, 60):
plt.subplot(6,5,n-30+1)
plt.subplots_adjust(hspace = 0.3)
plt.imshow(X_test[n])
# get the predicted label
predicted_label = predicted_class_names[n]
# get the actual true label
true_label = class_names[int(round(y_test[n]))]
if predicted_label == true_label:
color = "blue"
title = predicted_label.title()
else:
color = "red"
title = f"{predicted_label.title()}, true:{true_label.title()}"
plt.title(title, color=color)
plt.axis('off')
_ = plt.suptitle("Model predictions (blue: correct, red: incorrect)")
plt.show()
The following video should make you familiar with the K-Means clustering
algorithm:
Before we dive into the code, we need to install the required libraries:
import cv2
import numpy as np
import matplotlib.pyplot as plt
I'm going to use this image for demonstration purposes. Feel free to use any:
# convert to RGB
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
We going to use the cv2.kmeans() function, which takes a 2D array as input,
and since our original image is 3D (width, height, and depth
of 3 RGB values), we need to flatten the height and width into a single vector
of pixels (3 RGB values):
print(pixel_values.shape)
Output:
(2073600, 3)
If you watched the video that explains the algorithm, you'd see he says
around minute 3 that the algorithm stops when none of the cluster
assignments change. Well, we going to cheat a little bit here since this is a
large number of data points, so it'll take a lot of time to process, we are going
to stop either when some number of iterations is exceeded (say 100), or if the
clusters move less than some epsilon value (let's pick 0.2 here), the below
code defines the stopping criteria in OpenCV:
If you look at the image, there are three primary colors (green for trees, blue
for the sea/lake, and white to orange for the sky). As a result, we going to use
three clusters for this image:
If you look back at the code, we didn't mention that we converted the
flattened image pixel values to floats; we did that
because cv2.kmeans() expects that, let's convert them back to 8-bit pixel
values:
Awesome, we can also disable some clusters in the image. For instance, let's
disable cluster number 2 and show the original image:
# disable only the cluster number 2 (turn the pixel into black)
masked_image = np.(image)
# convert to the shape of a vector of pixel values
masked_image = masked_image.reshape((-1, 3))
# color (i.e cluster) to disable
cluster = 2
masked_image[labels == cluster] = [0, 0, 0]
# convert back to original shape
masked_image = masked_image.reshape(image.shape)
# show the image
plt.imshow(masked_image)
plt.show()
Wow, it turns out that cluster 2 is the trees. Feel free to:
Source Code:
kmeans_segmentation.py
import cv2
import numpy as np
import matplotlib.pyplot as plt
import sys
# convert to RGB
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
# disable only the cluster number 2 (turn the pixel into black)
masked_image = np.(image)
# convert to the shape of a vector of pixel values
masked_image = masked_image.reshape((-1, 3))
# color (i.e cluster) to disable
cluster = 2
masked_image[labels == cluster] = [0, 0, 0]
while True:
# read the image
_, image = cap.read()
cv2.imshow("segmented_image", segmented_image)
# visualize each segment
if cv2.waitKey(1) == ord("q"):
break
cap.release()
cv2.destroyAllWindows()
PART 12: Detect Contours in
Images using OpenCV in Python
Learning how to detect contours in images for image segmentation, shape analysis and object detection
and recognition using OpenCV in Python.
A contour is a closed curve joining all the continuous points having some
color or intensity, they represent the shapes of objects found in an image.
Contour detection is a useful technique for shape analysis and object
detection and recognition.
Well, when we perform edge detection, we find the points where the intensity
of colors changes significantly, and then we simply turn those pixels on.
However, contours are abstract collections of points and segments
corresponding to the shapes of the objects in the image. As a result, we can
manipulate contours in our programs such as counting the number of
contours, using them to categorize the shapes of objects, cropping objects
from an image (image segmentation), and much more.
Contour detection is not the only algorithm for image segmentation though,
there are a lot of others, such as the current state-of-the-art semantic
segmentation, hough transform, and K-Means segmentation.
For better accuracy, here is the whole pipeline that we gonna follow to
successfully detect contours in an image:
import cv2
import matplotlib.pyplot as plt
The above code creates the binary image by disabling (setting to 0) pixels
that have a value of less than 225 and turning on (setting to 255) the pixels
that has a value of more than 225, here is the output image:
Now, this is easy for OpenCV to detect contours:
The above code finds contours within the binary image and draws them with
a thick green line to the image, let's show it:
Output image:
To achieve good results on different and real-world images, you need to tune
your threshold value or perform edge detection. For instance, for a pancakes
image, I've decreased the threshold to 127, here is the result:
Alright, this is it for this tutorial, if you want to test this on your live camera,
head to this link.
Source Code:
contour_detector.py
import cv2
import matplotlib.pyplot as plt
live-contour-detector.py
import cv2
cap = cv2.VideoCapture(0)
while True:
_, frame = cap.read()
# convert to grayscale
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
# create a binary thresholded image
_, binary = cv2.threshold(gray, 255 // 2, 255, cv2.THRESH_BINARY_INV)
# find the contours from the thresholded image
contours, hierarchy = cv2.findContours(binary, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
# draw all contours
image = cv2.drawContours(frame, contours, -1, (0, 255, 0), 2)
# show the images
cv2.imshow("gray", gray)
cv2.imshow("image", image)
cv2.imshow("binary", binary)
if cv2.waitKey(1) == ord("q"):
break
cap.release()
cv2.destroyAllWindows()
PART 13: Optical Character
Recognition (OCR) in Python
Learn how to Use Tesseract OCR library and pytesseract wrapper for optical character recognition
(OCR) to convert text in images into digital text in Python.
Humans can easily understand the text content of an image simply by looking
at it. However, it is not the case for computers. They need some sort of a
structured method or algorithm to be able to understand it. This is
where Optical Character Recognition (OCR) comes into play.
We gonna use pytesseract module for Python which is a wrapper for the
Tesseract-OCR engine, so we can access it via Python.
The most recent stable version of tesseract is 4 which uses a new recurrent
neural network (LSTM) based OCR engine which is focused on line
recognition.
After you have everything installed in your machine, open up a new Python
file and follow along:
import pytesseract
import cv2
import matplotlib.pyplot as plt
from PIL import Image
For demonstration purposes, I'm gonna use this image for recognition:
I've named it "test.png" and put it in the current directory, let's load this
image:
As you may notice, you can load the image either with OpenCV or Pillow, I
prefer using OpenCV as it enables us to use the live camera.
Note: If the above code raises an error, please consider adding Tesseract-
OCR binaries to PATH variables. Read their official installation guide more
carefully.
So we're going to search for the word "dog" in the text document, we want
the output data to be structured and not a raw string, that's why I
passed output_type to be a dictionary, so we can easily get each word's data
(you can print data dictionary to see how the output is organized).
plt.imsave("all_dog_words.png", image_)
plt.imshow(image_)
plt.show()
Also, this won't work very well on hand-written text, complex real-world
images, and unclear images or images that contain an exclusive amount of
text.
Alright, that's it for this tutorial, let us see what you can build with this
utility!
We have made a tutorial where you can use OCR to extract text from images
inside PDF files, check it out!
Source Code:
extracting_text.py
import pytesseract
import cv2
import matplotlib.pyplot as plt
import sys
from PIL import Image
print(data)
draw_boxes.py
import pytesseract
import cv2
import matplotlib.pyplot as plt
from PIL import Image
plt.imsave("all_dog_words.png", image_)
plt.imshow(image_)
plt.show()
live_recognizer.py (using cam)
import pytesseract
import cv2
import matplotlib.pyplot as plt
from PIL import Image
cap = cv2.VideoCapture(0)
while True:
# read the image from the cam
_, image = cap.read()
# make a of this image to draw in
image_ = image.()
# get all data from the image
data = pytesseract.image_to_data(image, output_type=pytesseract.Output.DICT)
# print the data
print(data["text"])
# get all occurences of the that word
word_occurences = [ i for i, word in enumerate(data["text"]) if word.lower() == target_word ]
if cv2.waitKey(1) == ord("q"):
break
cv2.imshow("image_", image_)
cap.release()
cv2.destroyAllWindows()
PART 14: Detect Shapes in Images
in Python using OpenCV
Detecting shapes, lines and circles in images using Hough Transform technique with OpenCV in
Python. Hough transform is a popular feature extraction technique to detect any shape within an image.
In the previous tutorial, we have seen how you can detect edges in an image.
However, that's not usually enough in the image processing phase. In this
tutorial, you will learn how you can detect shapes (mainly lines and circles)
in images using Hough Transform technique in Python
using OpenCV library.
import numpy as np
import matplotlib.pyplot as plt
import cv2
Detecting Lines
I'm gonna use a photo of a computer monitor, make sure you have the
photo monitor.jpg in your current directory (you're free to use any):
# convert to grayscale
grayscale = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
Now we have detected the edges in the image, it is suited for us to use hough
transform to detect the lines:
Here is my output:
The green lines are the lines we just drew, as you can see, most of the
monitor is surrounded by green lines, feel free to tweak the parameters to get
better results.
Here is the full code for detecting lines in your live camera:
import numpy as np
import matplotlib.pyplot as plt
import cv2
cap = cv2.VideoCapture(0)
while True:
_, image = cap.read()
# convert to grayscale
grayscale = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# perform edge detection
edges = cv2.Canny(grayscale, 30, 100)
# detect lines in the image using hough lines technique
lines = cv2.HoughLinesP(edges, 1, np.pi/180, 60, np.array([]), 50, 5)
# iterate over the output lines and draw them
for line in lines:
for x1, y1, x2, y2 in line:
cv2.line(image, (x1, y1), (x2, y2), (255, 0, 0), 3)
cv2.line(edges, (x1, y1), (x2, y2), (255, 0, 0), 3)
# show images
cv2.imshow("image", image)
cv2.imshow("edges", edges)
if cv2.waitKey(1) == ord("q"):
break
cap.release()
cv2.destroyAllWindows()
Detecting Circles
In order to detect circles, we gonna need to use cv2.HoughCircles() method
instead, I have coins.jpg image (which contains several coins) in the current
directory, let's load it:
Next, we gonna create a new of this image, in which we're going to draw the
detected circles:
# finds the circles in the grayscale image using the Hough transform
circles = cv2.HoughCircles(image=img, method=cv2.HOUGH_GRADIENT,
dp=0.9,
minDist=80, param1=110, param2=39, maxRadius=70)
In case you're wondering what does these parameters refer to,
type help(cv2.HoughCircles) and you'll find a good explanation.
Here is my result:
As you can see, it isn't perfect, as it doesn't detect all circles in the image, try
to tune the parameters passed to cv2.HoughCircles() method and see if you
achieve better results.
Alright, that's it for now, here are the references of this tutorial:
shape_detector.py
import numpy as np
import matplotlib.pyplot as plt
import cv2
import sys
# convert to grayscale
grayscale = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
live_shape_detector.py
import numpy as np
import matplotlib.pyplot as plt
import cv2
cap = cv2.VideoCapture(0)
while True:
_, image = cap.read()
# convert to grayscale
grayscale = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# perform edge detection
edges = cv2.Canny(grayscale, 30, 100)
# detect lines in the image using hough lines technique
lines = cv2.HoughLinesP(edges, 1, np.pi/180, 60, np.array([]), 50, 5)
# iterate over the output lines and draw them
for line in lines:
for x1, y1, x2, y2 in line:
cv2.line(image, (x1, y1), (x2, y2), (255, 0, 0), 3)
cv2.line(edges, (x1, y1), (x2, y2), (255, 0, 0), 3)
# show images
cv2.imshow("image", image)
cv2.imshow("edges", edges)
if cv2.waitKey(1) == ord("q"):
break
cap.release()
cv2.destroyAllWindows()
circle_detector.py
import cv2
import numpy as np
import matplotlib.pyplot as plt
import sys
Learn more here about the theory behind Canny edge detector.
import cv2
import numpy as np
import matplotlib.pyplot as plt
Now let's read the image when want to detect its edges:
Before we pass the image to the Canny edge detector, we need to convert the
image to gray scale:
# convert it to grayscale
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
The smallest value between threshold1 and threshold2 is used for edge
linking. The largest value is used to find initial segments of strong edges.
Interesting, try to fine tune the threshold values and see if you can make it
better.
If you want to use the live camera, here is the full code for that:
import numpy as np
import cv2
cap = cv2.VideoCapture(0)
while True:
_, frame = cap.read()
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
edges = cv2.Canny(gray, 30, 100)
cv2.imshow("edges", edges)
cv2.imshow("gray", gray)
if cv2.waitKey(1) == ord("q"):
break
cap.release()
cv2.destroyAllWindows()
Source Code:
edge_detector.py
import cv2
import numpy as np
import matplotlib.pyplot as plt
import sys
# read the image
image = cv2.imread(sys.argv[1])
# convert it to grayscale
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# show the grayscale image, if you want to show, uncomment 2 below lines
# plt.imshow(gray, cmap="gray")
# plt.show()
live_edge_detector.py
import numpy as np
import cv2
cap = cv2.VideoCapture(0)
while True:
_, frame = cap.read()
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
edges = cv2.Canny(gray, 30, 100)
cv2.imshow("edges", edges)
cv2.imshow("gray", gray)
if cv2.waitKey(1) == ord("q"):
break
cap.release()
cv2.destroyAllWindows()
PART 16: Use Transfer Learning
for Image Classification using
TensorFlow in Python
Learn what is transfer learning and how to use pre trained MobileNet model for better performance to
classify flowers using TensorFlow in Python.
For these reasons, it is better to use transfer learning for image classification
problems instead of creating your model and training from scratch, models
such as ResNet, InceptionV3, Xception, and MobileNet are trained on a
massive dataset called ImageNet which contains more than 14 million images
that classify 1000 different objects.
import tensorflow as tf
from tensorflow.keras.models import Model
from tensorflow.keras.applications import MobileNetV2, ResNet50,
InceptionV3 # try to use them and see which is better
from tensorflow.keras.layers import Dense
from tensorflow.keras.callbacks import ModelCheckpoint, TensorBoard
from tensorflow.keras.utils import get_file
from tensorflow.keras.preprocessing.image import ImageDataGenerator
import os
import pathlib
import numpy as np
The dataset comes with inconsistent image sizes, as a result, we gonna need
to resize all the images to a shape that is acceptable by MobileNet (the model
that we gonna use):
batch_size = 32
# 5 types of flowers
num_classes = 5
# training for 10 epochs
epochs = 10
# size of each image
IMAGE_SHAPE = (224, 224, 3)
def load_data():
"""This function downloads, extracts, loads, normalizes and one-hot
encodes Flower Photos dataset"""
# download the dataset and extract it
data_dir =
get_file(origin='https://storage.googleapis.com/download.tensorflow.org/example_images
fname='flower_photos', untar=True)
data_dir = pathlib.Path(data_dir)
# count how many images are there
image_count = len(list(data_dir.glob('*/*.jpg')))
print("Number of images:", image_count)
# get all classes for this dataset (types of flowers) excluding LICENSE file
CLASS_NAMES = np.array([item.name for item in data_dir.glob('*') if
item.name != "LICENSE.txt"])
# roses = list(data_dir.glob('roses/*'))
# 20% validation set 80% training set
image_generator = ImageDataGenerator(rescale=1/255,
validation_split=0.2)
# make the training dataset generator
train_data_gen =
image_generator.flow_from_directory(directory=str(data_dir),
batch_size=batch_size,
classes=list(CLASS_NAMES),
target_size=(IMAGE_SHAPE[0], IMAGE_SHAPE[1]),
shuffle=True, subset="training")
# make the validation dataset generator
test_data_gen =
image_generator.flow_from_directory(directory=str(data_dir),
batch_size=batch_size,
classes=list(CLASS_NAMES),
target_size=(IMAGE_SHAPE[0], IMAGE_SHAPE[1]),
shuffle=True, subset="validation")
return train_data_gen, test_data_gen, CLASS_NAMES
The above function downloads and extracts the dataset, and then use
the ImageDataGenerator Keras utility class to wrap the dataset in a Python
generator (so the images only loads to memory by batches, not in one shot).
After that, we scale and resize the images to a fixed shape and then split the
dataset by 80% for training and 20% for validation.
I also encourage you to change this function to use tf.data API instead, the
dataset is already in Tensorflow datasets and you can load it as we did in this
tutorial.
def create_model(input_shape):
# load MobileNetV2
model = MobileNetV2(input_shape=input_shape)
# remove the last fully connected layer
model.layers.pop()
# freeze all the weights of the model except the last 4 layers
for layer in model.layers[:-4]:
layer.trainable = False
# construct our own fully connected layer for classification
output = Dense(num_classes, activation="softmax")
# connect that dense layer to the model
output = output(model.layers[-1].output)
model = Model(inputs=model.inputs, outputs=output)
# print the summary of the model architecture
model.summary()
# training the model using adam optimizer
model.compile(loss="categorical_crossentropy", optimizer="adam",
metrics=["accuracy"])
return model
The above function will first download the model weights (if not available)
and then remove the last layer.
After that, we freeze the last layers, that's because it is pre-trained, we don't
wanna modify these weights. However, it is a good practice to retrain the last
convolutional layer as this dataset is quite similar to the original ImageNet
dataset, so we won't ruin the weights (that much).
Finally, we construct our own dense layer that consists of five neurons and
connect it to the last layer of the MobileNetV2 model. The following figure
demonstrates the architecture:
Note that you can use the TensorFlow hub to load this model very easily,
check this link to use their code snippet for creating the model.
if __name__ == "__main__":
# load the data generators
train_generator, validation_generator, class_names = load_data()
# constructs the model
model = create_model(input_shape=IMAGE_SHAPE)
# model name
model_name = "MobileNetV2_finetune_last5"
# some nice callbacks
tensorboard = TensorBoard(log_dir=os.path.join("logs", model_name))
checkpoint = ModelCheckpoint(os.path.join("results", f"{model_name}" +
"-loss-{val_loss:.2f}.h5"),
save_best_only=True,
verbose=1)
# make sure results folder exist
if not os.path.isdir("results"):
os.mkdir("results")
# count number of steps per epoch
training_steps_per_epoch = np.ceil(train_generator.samples / batch_size)
validation_steps_per_epoch = np.ceil(validation_generator.samples /
batch_size)
# train using the generators
model.fit_generator(train_generator,
steps_per_epoch=training_steps_per_epoch,
validation_data=validation_generator,
validation_steps=validation_steps_per_epoch,
epochs=epochs, verbose=1, callbacks=[tensorboard,
checkpoint])
Nothing fancy here, loading the data, constructing the model, and then using
some callbacks for tracking and saving the best models.
As soon as you execute the script, the training process begins, you'll notice
that not all weights are being trained:
I used tensorboard to experiment a little bit, for example, I tried freezing all
the weights except for the last classification layer, decreasing the optimizer
learning rate, used some image flipping, zooming, and general augmentation,
here is a screenshot:
MobileNetV2 was the model I froze all its weights (except for the last 5
unit dense layer of course).
MobileNetV2_augmentation uses some image augmentation.
MobileNetV2_finetune_last5 the model we're using right now, which does not
freeze the last 4 layers of the MobileNetV2 model.
MobileNetV2_finetune_last5_less_lr was dominant for almost 86% accuracy,
that's because once you don't freeze the trained weights, you need to
decrease the learning rate so you can slowly adjust the weights to your
dataset. This was an Adam optimizer with a 0.0005 learning rate.
Note: to modify the learning rate, you can import Adam optimizer
from keras.optimizers package, and then compile the model
with optimizer=Adam(lr=0.0005) parameter.
Make sure to use the optimal weights, the one which has the lower loss and
higher accuracy.
Output:
Okay, let's visualize a little bit, we are going to plot a complete batch of
images with its corresponding predicted and correct labels:
Conclusion
Alright, that's it. In this tutorial, you discovered how you can use transfer
learning to quickly develop and use state-of-the-art models using Tensorflow
and Keras in Python.
I highly encourage you to use other models that were mentioned above, try to
fine-tune them as well, good luck!
Source Code:
train.py
import tensorflow as tf
from tensorflow.keras.models import Model
from tensorflow.keras.applications import MobileNetV2, ResNet50, InceptionV3 # try to use them and
see which is better
from tensorflow.keras.layers import Dense
from tensorflow.keras.callbacks import ModelCheckpoint, TensorBoard
from tensorflow.keras.utils import get_file
from tensorflow.keras.preprocessing.image import ImageDataGenerator
import os
import pathlib
import numpy as np
batch_size = 32
num_classes = 5
epochs = 10
def load_data():
"""This function downloads, extracts, loads, normalizes and one-hot encodes Flower Photos
dataset"""
# download the dataset and extract it
data_dir =
get_file(origin='https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz'
fname='flower_photos', untar=True)
data_dir = pathlib.Path(data_dir)
# get all classes for this dataset (types of flowers) excluding LICENSE file
CLASS_NAMES = np.array([item.name for item in data_dir.glob('*') if item.name !=
"LICENSE.txt"])
# roses = list(data_dir.glob('roses/*'))
# 20% validation set 80% training set
image_generator = ImageDataGenerator(rescale=1/255, validation_split=0.2)
def create_model(input_shape):
# load MobileNetV2
model = MobileNetV2(input_shape=input_shape)
# remove the last fully connected layer
model.layers.pop()
# freeze all the weights of the model except the last 4 layers
for layer in model.layers[:-4]:
layer.trainable = False
# construct our own fully connected layer for classification
output = Dense(num_classes, activation="softmax")
# connect that dense layer to the model
output = output(model.layers[-1].output)
if __name__ == "__main__":
# load the data generators
train_generator, validation_generator, class_names = load_data()
test.py
from train import load_data, create_model, IMAGE_SHAPE, batch_size, np
import matplotlib.pyplot as plt
# load the data generators
train_generator, validation_generator, class_names = load_data()
# constructs the model
model = create_model(input_shape=IMAGE_SHAPE)
# load the optimal weights
model.load_weights("results/MobileNetV2_finetune_last5_less_lr-loss-0.45-acc-0.86.h5")
In this tutorial, you will learn how to generate and read QR codes in Python
using qrcode and OpenCV libraries.
Generate QR Code
First, let's start by generating QR codes, it is basically straightforward
using qrcode library:
import qrcode
# example data
data = "https://www.bbc.com"
# output file name
filename = "site.png"
# generate qr code
img = qrcode.make(data)
# save img to a file
img.save(filename)
This will generate a new image file in the current directory with the name
of "site.png", which contains a QR code image of the data specified (in this
case, this website URL), will look something like this:
You can also use this library to have full control with QR code generation
using the qrcode.QRCode() class, in which you can instantiate and specify the
size, fill color, back color, and error correction, like so:
import qrcode
import numpy as np
# data to encode
data = "https://www.bbc.com"
# instantiate QRCode object
qr = qrcode.QRCode(version=1, box_size=10, border=4)
# add data to the QR code
qr.add_data(data)
# compile the data into a QR code array
qr.make()
# print the image shape
print("The shape of the QR image:", np.array(qr.get_matrix()).shape)
# transfer the array into an actual image
img = qr.make_image(fill_color="white", back_color="black")
# save it to a file
img.save("site_inversed.png")
parameter controls how many pixels each box of the QR code is,
box_size
whereas the border controls how many boxes thick the border should be.
We then add the data using the qr.add_data() method, compiles it to an array
using qr.make() method, and then make the actual image
using qr.make_image() method. We specified white as the fill_color and black as
the back_color , which is the exact opposite of the default QR code, check it out:
Alright, open up a new Python file and follow along with me, let's read the
image that we just generated:
import cv2
# read the QRCODE image
img = cv2.imread("site.png")
We have the image and the detector, let's detect and decode that data:
We just need data and bbox here, bbox will help us draw the quadrangle in
the image and data will be printed to the console!
Let's do it:
# if there is a QR code
if bbox is not None:
print(f"QRCode data:\n{data}")
# display the image with lines
# length of bounding box
n_lines = len(bbox)
for i in range(n_lines):
# draw all lines
point1 = tuple(bbox[i][0])
point2 = tuple(bbox[(i+1) % n_lines][0])
cv2.line(img, point1, point2, color=(255, 0, 0), thickness=2)
Finally, let's show the image and quit when a key is pressed:
QRCode data:
https://www.bbc.com
And the following image is shown:
As you can see, the blue lines are drawn in the exact QR code
borders. Awesome, we are done with this script, try to run it with different
data and see your own results!
Note that this is ideal for QR codes and not for barcodes, If you want to read
barcodes, check this tutorial that is dedicated to that!
If you want to detect and decode QR codes live using your webcam (and I'm
sure you do), here is a code for that:
import cv2
# initalize the cam
cap = cv2.VideoCapture(0)
# initialize the cv2 QRCode detector
detector = cv2.QRCodeDetector()
while True:
_, img = cap.read()
# detect and decode
data, bbox, _ = detector.detectAndDecode(img)
# check if there is a QRCode in the image
if bbox is not None:
# display the image with lines
for i in range(len(bbox)):
# draw all lines
cv2.line(img, tuple(bbox[i][0]), tuple(bbox[(i+1) % len(bbox)][0]),
color=(255, 0, 0), thickness=2)
if data:
print("[+] QR Code detected, data:", data)
# display the result
cv2.imshow("img", img)
if cv2.waitKey(1) == ord("q"):
break
cap.release()
cv2.destroyAllWindows()
Awesome, we are done with this tutorial, you can now integrate this into your
own applications!
Source Code:
generate_qrcode.py
import qrcode
import sys
generate_qrcode_with_control.py
import qrcode
import numpy as np
# data to encode
data = "https://www.thepythoncode.com"
read_qrcode.py
import cv2
import sys
filename = sys.argv[1]
# if there is a QR code
if bbox is not None:
print(f"QRCode data:\n{data}")
# display the image with lines
# length of bounding box
n_lines = len(bbox)
for i in range(n_lines):
# draw all lines
point1 = tuple(bbox[i][0])
point2 = tuple(bbox[(i+1) % n_lines][0])
cv2.line(img, point1, point2, color=(255, 0, 0), thickness=2)
read_qrcode_live.py
import cv2
while True:
_, img = cap.read()
if data:
print("[+] QR Code detected, data:", data)
if cv2.waitKey(1) == ord("q"):
break
cap.release()
cv2.destroyAllWindows()
PART 18: Make an Image Classifier
in Python using Tensorflow 2 and
Keras
Building and training a model that classifies CIFAR-10 dataset images that were loaded using
Tensorflow Datasets which consists of airplanes, dogs, cats and other 7 objects using Tensorflow 2 and
Keras libraries in Python.
We will preprocess the images and labels, then train a convolutional neural
network on all the training samples. The images will need to
be normalized and the labels need to be one-hot encoded.
As you may expect, we'll be using tf.data API to load CIFAR-10 dataset.
Hyper Parameters
I have experimented with various parameters, and found this as optimal ones:
# hyper-parameters
batch_size = 64
# 10 categories of images (CIFAR-10)
num_classes = 10
# number of training epochs
epochs = 30
def load_data():
"""
This function loads CIFAR-10 dataset, and preprocess it
"""
def preprocess_image(image, label):
# convert [0, 255] range integers to [0, 1] range floats
image = tf.image.convert_image_dtype(image, tf.float32)
return image, label
# loading the CIFAR-10 dataset, splitted between train and test sets
ds_train, info = tfds.load("cifar10", with_info=True, split="train",
as_supervised=True)
ds_test = tfds.load("cifar10", split="test", as_supervised=True)
# repeat dataset forever, shuffle, preprocess, split by batch
ds_train =
ds_train.repeat().shuffle(1024).map(preprocess_image).batch(batch_size)
ds_test =
ds_test.repeat().shuffle(1024).map(preprocess_image).batch(batch_size)
return ds_train, ds_test, info
Repeat the dataset forever using the repeat() method, this will enable us
to generate data samples repeatedly (we'll specify stopping conditions
in the training phase).
Shuffle it.
Normalize images to be between 0 and 1, this will help the neural
network to train much faster, we used the map() method that accepts a
callback function that takes the image and label as arguments, we
simply used the built-in Tensorflow's convert_image_dtype() method
that does that.
Finally, we batch our dataset by 64 samples using the batch() function,
so each time we generate new data points, it'll return 64 images and
their 64 labels.
Constructing the Model
def create_model(input_shape):
# building the model
model = Sequential()
model.add(Conv2D(filters=32, kernel_size=(3, 3), padding="same",
input_shape=input_shape))
model.add(Activation("relu"))
model.add(Conv2D(filters=32, kernel_size=(3, 3), padding="same"))
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Conv2D(filters=64, kernel_size=(3, 3), padding="same"))
model.add(Activation("relu"))
model.add(Conv2D(filters=64, kernel_size=(3, 3), padding="same"))
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Conv2D(filters=128, kernel_size=(3, 3), padding="same"))
model.add(Activation("relu"))
model.add(Conv2D(filters=128, kernel_size=(3, 3), padding="same"))
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
# flattening the convolutions
model.add(Flatten())
# fully-connected layer
model.add(Dense(1024))
model.add(Activation("relu"))
model.add(Dropout(0.5))
model.add(Dense(num_classes, activation="softmax"))
# print the summary of the model architecture
model.summary()
# training the model using adam optimizer
model.compile(loss="sparse_categorical_crossentropy",
optimizer="adam", metrics=["accuracy"])
return model
if __name__ == "__main__":
# load the data
ds_train, ds_test, info = load_data()
# constructs the model
model = create_model(input_shape=info.features["image"].shape)
# some nice callbacks
logdir = os.path.join("logs", "cifar10-model-v1")
tensorboard = TensorBoard(log_dir=logdir)
# make sure results folder exist
if not os.path.isdir("results"):
os.mkdir("results")
# train
model.fit(ds_train, epochs=epochs, validation_data=ds_test, verbose=1,
steps_per_epoch=info.splits["train"].num_examples // batch_size,
validation_steps=info.splits["test"].num_examples // batch_size,
callbacks=[tensorboard])
# save the model to disk
model.save("results/cifar10-model-v1.h5")
After loading the data and creating the model, I used Tensorboard that will be
tracking the accuracy and loss in each epoch and providing us with nice
visualization.
We will be using the "results" folder to save our models, if you're not sure
how you can handle files and directories in Python, check this tutorial.
Since ds_train and ds_test will generate data samples in batches repeatedly, we
need to specify the number of steps per epoch, and that's the number of
samples divided by the batch size, and it is the same for validation_steps as well.
Run this, it will take several minutes to complete training, depending on your
CPU/GPU.
Epoch 1/30
781/781 [==============================] - 20s 26ms/step - loss:
1.6503 - accuracy: 0.3905 - val_loss: 1.2835 - val_accuracy: 0.5238
Epoch 2/30
781/781 [==============================] - 16s 21ms/step - loss:
1.1847 - accuracy: 0.5750 - val_loss: 0.9773 - val_accuracy: 0.6542
Epoch 29/30
781/781 [==============================] - 16s 21ms/step - loss:
0.4094 - accuracy: 0.8570 - val_loss: 0.5954 - val_accuracy: 0.8089
Epoch 30/30
781/781 [==============================] - 16s 21ms/step - loss:
0.4130 - accuracy: 0.8563 - val_loss: 0.6128 - val_accuracy: 0.8060
Now to open tensorboard, all you need to do is to type this command in the
terminal or the command prompt in the current directory:
tensorboard --logdir="logs"
Open up a browser tab and type localhost:6006, you'll be redirected to
tensorboard, here is my result:
Clearly, we are on the right track, validation loss is decreasing, and the
accuracy is increasing all the way to about 81%. That's great!
Let's make a Python dictionary that maps each integer value to its
corresponding label in the dataset:
# CIFAR-10 classes
categories = {
0: "airplane",
1: "automobile",
2: "bird",
3: "cat",
4: "deer",
5: "dog",
6: "frog",
7: "horse",
8: "ship",
9: "truck"
}
Loading the test data and the model:
Evaluation:
# evaluation
loss, accuracy = model.evaluate(X_test, y_test)
print("Test accuracy:", accuracy*100, "%")
We've used next(iter(ds_test)) to get the next testing batch and then extracted the
first image and label in that batch and made predictions on the model, here is
the result:
156/156 [==============================] -
3s 20ms/step - loss: 0.6119 - accuracy: 0.8063
Test accuracy: 80.62900900840759 %
Predicted label: frog
True label: frog
Conclusion
Alright, we are done with this tutorial, 81% isn't bad for this little CNN, I
highly encourage you to tweak the model or check ResNet50, Xception, or
other state-of-the-art models to get higher performance!
If you're not sure how to use these models, I have a tutorial on this: How to
Use Transfer Learning for Image Classification using Keras in Python.
You may wonder that these images are so simple, 32x32 grid isn't how the
real world is, images aren't simple like that, they often contain many objects,
complex patterns, and so on. As a result, it is often a common practice to use
image segmentation methods such as contour detection or K-Means
clustering segmentation before passing to any classification techniques.
Source Code:
train.py
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Activation, Flatten
from tensorflow.keras.layers import Conv2D, MaxPooling2D
from tensorflow.keras.callbacks import TensorBoard
import tensorflow as tf
import tensorflow_datasets as tfds
import os
# hyper-parameters
batch_size = 64
# 10 categories of images (CIFAR-10)
num_classes = 10
# number of training epochs
epochs = 30
def create_model(input_shape):
"""
Constructs the model:
- 32 Convolutional (3x3)
- Relu
- 32 Convolutional (3x3)
- Relu
- Max pooling (2x2)
- Dropout
- 64 Convolutional (3x3)
- Relu
- 64 Convolutional (3x3)
- Relu
- Max pooling (2x2)
- Dropout
def load_data():
"""
This function loads CIFAR-10 dataset, and preprocess it
"""
# Loading data using Keras
# loading the CIFAR-10 dataset, splitted between train and test sets
# (X_train, y_train), (X_test, y_test) = cifar10.load_data()
# print("Training samples:", X_train.shape[0])
# print("Testing samples:", X_test.shape[0])
# print(f"Images shape: {X_train.shape[1:]}")
if __name__ == "__main__":
# train
# model.fit(X_train, y_train,
# batch_size=batch_size,
# epochs=epochs,
# validation_data=(X_test, y_test),
# callbacks=[tensorboard, checkpoint],
# shuffle=True)
model.fit(ds_train, epochs=epochs, validation_data=ds_test, verbose=1,
steps_per_epoch=info.splits["train"].num_examples // batch_size,
validation_steps=info.splits["test"].num_examples // batch_size,
callbacks=[tensorboard])
test.py
from train import load_data, batch_size
from tensorflow.keras.models import load_model
import matplotlib.pyplot as plt
import numpy as np
# CIFAR-10 classes
categories = {
0: "airplane",
1: "automobile",
2: "bird",
3: "cat",
4: "deer",
5: "dog",
6: "frog",
7: "horse",
8: "ship",
9: "truck"
}
In this tutorial, we will be building a simple Python script that deals with
detecting human faces in an image, we will be using two methods in OpenCV
library. First, we are going to use haar cascade classifiers, which is an easy
way (and not that accurate as well) and most convenient way for beginners.
After that, we'll dive into using Single Shot Multibox Detectors (or SSDs in
short), which is a method for detecting objects in images using a single deep
neural network.
The nice thing about haar feature-based cascade classifiers is that you can
make a classifier of any object you want, OpenCV already provided some
classifier parameters to you, so you don't have to collect any data to train on
it.
Alright, create a new Python file and follow along, let's first
import OpenCV:
import cv2
You gonna need a sample image to test with, make sure it has clear front
faces in it, I will use this stock image that contains two nice lovely kids:
The function imread() loads an image from the specified file and returns it as a
numpy N-dimensional array.
Before we detect faces in the image, we will first need to convert the image to
grayscale, that is because the function we gonna use to detect faces expects a
grayscale image:
# converting to grayscale
image_gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
The function cvtColor() converts an input image from one color space to
another, we specified cv2.COLOR_BGR2GRAY code, which means
converting from BGR (Blue Green Red) to grayscale.
Since this tutorial is about detecting human faces, go ahead and download the
haar cascade for human face detection in this list. More
precisely, "haarcascade_frontalface_default.xml". Let's put it in a folder
called "cascades" and then load it:
Pretty cool, right? Feel free to use other object classifiers, other images and
even more interesting, use your webcam ! Here is the code for that:
import cv2
while True:
# read the image from the cam
_, image = cap.read()
# converting to grayscale
image_gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# detect all the faces in the image
faces = face_cascade.detectMultiScale(image_gray, 1.3, 5)
# for every face, draw a blue rectangle
for x, y, width, height in faces:
cv2.rectangle(image, (x, y), (x + width, y + height), color=(255, 0, 0),
thickness=2)
cv2.imshow("image", image)
if cv2.waitKey(1) == ord("q"):
break
cap.release()
cv2.destroyAllWindows()
Once you execute that (if you have a webcam of course), it will open up your
webcam and start drawing blue rectangles around all front faces in the image.
The code isn't that challenging, all I changed is, instead of reading the image
from a file, I created a VideoCapture object that reads from it every time in
a while loop, once you press the q button, the main loop will end.
To get started predicting faces using SSDs in OpenCV, you need to download
the ResNet face detection model architecture along with its pre-trained
weights, and then save them into weights folder in the current working
directory:
import cv2
import numpy as np
#
https://raw.githubusercontent.com/opencv/opencv/master/samples/dnn/face_detector/deplo
prototxt_path = "weights/deploy.prototxt.txt"
#
https://raw.githubusercontent.com/opencv/opencv_3rdparty/dnn_samples_face_detector_2
model_path = "weights/res10_300x300_ssd_iter_140000_fp16.caffemodel"
Now to load the actual model, we need to use readNetFromCaffe() method that
takes the model architecture and weights as arguments:
Now to pass this image into the neural network, we need to prepare it. More
specifically, we need to resize the image to the shape of (300, 300) and
performs mean subtraction as it's trained that way:
Let's use this blob object as the input of the network and perform feed forward
to get detected faces:
Now output object has all detected objects (faces in this case), let's iterate
over this array and draw all faces in the image that has confidence of more
than 50%:
font_scale = 1.0
for i in range(0, output.shape[0]):
# get the confidence
confidence = output[i, 2]
# if confidence is above 50%, then draw the surrounding box
if confidence > 0.5:
# get the surrounding box cordinates and upscale them to original image
box = output[i, 3:7] * np.array([w, h, w, h])
# convert to integers
start_x, start_y, end_x, end_y = box.astype(np.int)
# draw the rectangle surrounding the face
cv2.rectangle(image, (start_x, start_y), (end_x, end_y), color=(255, 0,
0), thickness=2)
# draw text as well
cv2.putText(image, f"{confidence*100:.2f}%", (start_x, start_y-5),
cv2.FONT_HERSHEY_SIMPLEX, font_scale, (255, 0, 0), 2)
After we extracted the confidence of the model of the detected object, we get
the surrounding box and multiply it by the width and height of original image to
get the right box coordinates, because as you remember, we've resized the
image previously to (300, 300) , so the output should be between 0 and 300 as
well.
In this case, we didn't only draw the surrounding boxes, but we write some
text indicating the confidence as a percentage, let's show and save the new
image:
By the way, if you want to detect faces using this method in real-time using
your camera, you can check the full code page.
There are many real-world applications for face detection, for instance, we've
used face detection to blur faces in images and videos in real-time using
OpenCV as well!
Alright, this is it for this tutorial, you can get all tutorial materials (including
the testing image, the haar cascade parameters, SSDs model weights, and the
full code) here.
Source Code:
face_detection.py
import cv2
# converting to grayscale
image_gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
live_face_detection.py
import cv2
while True:
# read the image from the cam
_, image = cap.read()
# converting to grayscale
image_gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
cv2.imshow("image", image)
if cv2.waitKey(1) == ord("q"):
break
cap.release()
cv2.destroyAllWindows()
face_detection_dnn.py
import cv2
import numpy as np
# https://raw.githubusercontent.com/opencv/opencv/master/samples/dnn/face_detector/deploy.prototxt
prototxt_path = "weights/deploy.prototxt.txt"
#
https://raw.githubusercontent.com/opencv/opencv_3rdparty/dnn_samples_face_detector_20180205_fp16/res10_300x30
model_path = "weights/res10_300x300_ssd_iter_140000_fp16.caffemodel"
live_face_detection_dnn.py
import cv2
import numpy as np
# https://raw.githubusercontent.com/opencv/opencv/master/samples/dnn/face_detector/deploy.prototxt
prototxt_path = "weights/deploy.prototxt.txt"
#
https://raw.githubusercontent.com/opencv/opencv_3rdparty/dnn_samples_face_detector_20180205_fp16/res10_300x30
model_path = "weights/res10_300x300_ssd_iter_140000_fp16.caffemodel"
cap = cv2.VideoCapture(0)
while True:
cv2.destroyAllWindows()
cap.release()
Summary
This book is dedicated to the readers who take time to write me each day.
Every morning I’m greeted by various emails — some with requests, a few
with complaints, and then there are the very few that just say thank you. All
these emails encourage and challenge me as an author — to better both my
books and myself.
Thank you!