Chapter 8 - Image Processing Theory and Application
Chapter 8 - Image Processing Theory and Application
Learning
Chapter 8: Image
Processing Theory and
Application
2
Contents
1. Computer Vision Overview
5. Image Classification
6. Object Detection
7. AdaBoost
8. Face Detection
- The world is three-dimensional in the human visual system, while images on computers are two-
dimensional.
- Computer vision is a science of studying how to make computers "see" like humans.
Computer vision has been widely used in many fields such as:
- Word processing: character recognition, document repair, office automation, and spam
classification.
- National defense: resource detection, military reconnaissance, and missile path planning.
- Smart transportation: road traffic management, e-police image capturing system, and driving.
- Entertainment: movie special effect, video editing, facial enhancement, motion sensing game, and
virtual reality (VR).
3. Computer Vision - related Disciplines
Computer vision is an interdisciplinary subject that studies image theories, technologies, and applications.
Computer vision is related not only to traditional mathematics, physics, physiology, psychology,
computer science, and electronic engineering but also to professional technologies such as computer
graphics, image pattern recognition, and image engineering.
These technical terms are associated with each other and are often used together. In many cases, they are
used by people with different professional backgrounds.
4. Computer Vision and AI
Most computer vision theories use Artificial Intelligence (AI) technologies. The development of AI
is closely related to computer vision. Many application problems in computer vision provide
research directions for AI technologies.
The most mature technology direction of AI in computer vision is image recognition, which
makes machines understand the content of images.
- Image classification is a basic research topic in the AI field and a core issue in
the computer vision field.
- Compared with the image classification, the object detection is not only a process of
recognizing objects but a process of locating objects in an image.
- The method for recognizing objects is the same as that of image classification, and bounding
boxes are used to locate and mark the locations of objects in an image.
6. Object Detection
- As one of the basic technologies of image processing and computer vision in the AI field,
object detection has a wide range of applications, such as traffic monitoring, image
search, facial recognition, and Human–Computer Interaction (HCI).
- Objects in an image can be detected using the object detection technology for further
processing using intelligent algorithms.
7. AdaBoost
- Adaptive boosting (AdaBoost) is an adaptive boosting algorithm, which can implement efficient
binary classification. The AdaBoost algorithm is used to combine multiple weak classifiers to form a
strong classifier. A weak classifier generally uses a single-layer decision tree model.
- AdaBoost only trains a single weak classifier during one iteration. The adaptation is embodied in: the
weight of the sample misclassified in the N-1th iteration will increase at the Nth iteration, and the
weight of the correctly classified sample will decrease and be used again to train the next weak
classification.
- Each weak classifier has a corresponding weight, and the weak classifier with a small classification
error rate has a large weight, which plays a greater role in the final classification function, while the
weak classifier with a large classification error rate has a small weight.
8. Face Detection
- Used with the AdaBoost algorithm, the Haar-like feature has a good performance in face detection.
- The Haar-like feature can reflect image intensity changes.
- In face images, some facial features can be described using rectangular features. For example, the color of
eyes is darker than that of cheeks, the color of nose wings is darker than that of the nose bridge, and the color of
the mouth is darker than that of the skin surrounding the mouth.
- The Haar-like feature has a good performance in the detection of upright frontal faces and objects whose
intensities change symmetrically.
9. Convolutional Neural Network (CNN)
A Convolutional Neural Network (CNN) is a feedforward neural network. Its artificial neurons can respond
to parts of the surrounding units within the coverage region. CNNs perform excellently in image processing.
A CNN consists of convolutional layers, pooling layers, and fully connected layers.
In the 1960s, Hubel and Wiesel found that the unique network structures could effectively reduce the
complexity of feedback neural networks when studying neurons used for local sensitivity and direction
selection in the cat visual cortex, based on which they proposed the CNN. Now, the CNN has become a
research focus in fields of science and technology, especially in pattern classification.
The CNN is widely used because it takes raw images as input without image preprocessing.
Architecture of Convolutional Neural Network
• Pooling layer: partitions features obtained from the convolutional layer into some areas and outputs
the maximum or minimum value, generating new features with a smaller spatial size.
• Fully connected layer: integrates all local features into global features to calculate the final scores
for each type
• Output layer: outputs the final result.
CNN Architecture
تدخل الـ input imageعلى convolution layerوظيفتها عمل فلتر بأبعاد مختلفة على الصورة( .يمكن وضع أكثر من ➜
layerعلى حسب التطبيق).
بعد كدة ممكن تدخل على max Pooling layerبتعمل فلتر برضه بس بتاخد أكبر قيمة. ➜
ممكن بعدها تدخل على average Pooling layerبتعمل برضه فلتر بس بتاخد المتوسط. ➜
(الـ 3طبقات السابقين الخاصين بالفلتر يمكن تكرارهم على حسب طبيعة الصور) ➜
بعد كدة تدخل على Flatten layerعشان تحول الصور من matrixالى .vector ➜
بعد كدة الـ ( Classification layerبتدخل على طبقة أو أكثر من الـ Dense layerوبيتم فيها تحديد الـ activation ➜
functionهلى هتبقى reluوال )sigmoid
آخر طبقة بتكون denseبس الـ activation functionبتبقى Softmax ➜
Architecture of Convolutional Neural Network
Bird Pbird
Sunset Psunset
Dog Pdog
Cat Pcat
Vectorization
Convolution + nonlinearity Max pooling
Multi-category
Convolution layers + pooling layers
Fully connected layer
10. CNN types
10.1. ILSVRC
The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) held by
Stanford University, is closely related to the development of deep learning and
convolutional neural networks.
The dataset used by the annual ILSVRC contains about 1.2 million images and
labels in roughly 1000 categories, which is a subset of all the data of ImageNet.
Generally, the top-5 and top-1 error rates are used as the evaluation indicators of
model performance.
10.1.ILSVRC Historical Achievements
Since 2010, the ILSVRC evaluates algorithms for image classification, single-object
locating, and object detection.
https://twitter.com/hashtag/ilsvrc
10.2.ImageNet
The ImageNet project was founded in 2007 by Li Feifei, a Chinese professor at Stanford University. The
project aims to collect a large amount of image data with label information for model training in computer
vision.
The ImageNet dataset contains 15 million labeled high-resolution images of objects in roughly 22,000
categories. In about one million of the images, bounding boxes are also provided for objects of interest.
www.image-net.org
10.3. AlexNet
- AlexNet, 2012
- ReLU, overlapping pooling, data augmentation, dropout
- VGGNet investigates the effect of the depth on convolutional networks. VGGNet uses only
very small kernels with a spatial size of 3x3. After several convolutional, max pooling, and
fully connected layers, the category prediction result is generated using the softmax
function.
Visual Geometry Group. VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION.
Six Configurations of the VGG
Visual Geometry Group. VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION.
10.5. GoogLeNet
GooLeNet, 2014
• Convolutional Neural Networks (CNNs) are deep neural networks that have the capability
to classify and segment images.
• CNN architectures for classification and segmentation include a variety of different layers
with specific purposes, such as a convolutional layer, pooling layer, fully connected
layers, dropout layers, etc.
12. Description of basic CNN architecture for Classification
• The CNN architecture for classification includes convolutional layers, max-pooling layers,
and fully connected layers.
• Max-pooling layers are employed when there are instances when the picture doesn’t
require all of the high-resolution details or an output with smaller regions extracted by
CNN’s is needed after performing downsampling operation on input data.
13. Description of basic CNN architecture for Segmentation
• Computer vision deals with images, and image segmentation is one of the most
important steps.
• It involves dividing a visual input into segments to make image analysis easier.
• Image segmentation sorts pixels into larger components while also eliminating the need
to consider each pixel as a unit.
13. Description of basic CNN architecture for Segmentation
• Image segmentation is the process of dividing image into manageable sections or “tiles”.
• The process of image segmentation starts with defining small regions on an image that
should not be divided.
• These regions are called seeds, and the position of these seeds defines the tiles.
13. Description of basic CNN architecture for Segmentation
• The picture below can be used to understand image classification, object detection and
image segmentation. Notice how image segmentation can be used for image
classification or object detection.
14. What is data augmentation?
• This includes making small changes to data or using deep learning models to
generate new data points.
15. Why is data augmentation important?
• For machine learning models, collecting and labeling of data can be exhausting and
costly processes. Transformations in datasets by using data augmentation techniques
allow companies to reduce these operational costs.
• One of the steps into a data model is cleaning data which is necessary for high accuracy
models. However, if cleaning reduces the represent-ability of data, then the model
cannot provide good predictions for real world inputs. Data augmentation techniques
can enable machine learning models to be more robust by creating variations that the
model may see in the real world.
16. How does data augmentation work?
Data Augmentation
17. Traditional Data augmentation Types
• For data augmentation, making simple alterations on visual data is popular. In addition,
generative adversarial networks (GANs) are used to create new synthetic data. Classic
image processing activities for data augmentation are:
• 1. padding
• 2. random rotating
• 3. re-scaling,
• 4. vertical and horizontal flipping
• 5. translation ( image is moved along X, Y direction)
• 6. cropping
• 7. zooming
• 8. darkening & brightening/color modification
• 9. grayscaling
• 10. changing contrast
• 11. adding noise
• 12. random erasing
17. Traditional Data augmentation Types
18. Advanced models for data augmentation are
• Generative adversarial networks (GANs): GAN algorithms can learn patterns from input
datasets and automatically create new examples which resemble training data.
• Neural style transfer: Neural style transfer models can blend content image and style
image and separate style from content.
• Popular open source python packages for data augmentation in computer vision are
Keras ImageDataGenerator, Skimage and OpenCV.
19. Why is data augmentation important?
• Image recognition and NLP models generally use data augmentation methods.
• Also, the medical imaging domain utilizes data augmentation to apply transformations
on images and create diversity into the datasets.
20. What are use cases/examples in data augmentation?
• In an image classification task, the network assigns a label (or class) to each input
image. However, suppose you want to know the shape of that object, which pixel
belongs to which object, etc. In this case, you need to assign a class to each pixel of the
image—this task is known as segmentation.
• A segmentation model returns much more detailed information about the image. Image
segmentation has many applications in medical imaging, self-driving cars and satellite
imaging, just to name a few.
21. Example on Image Segmentation
• This example uses the Oxford-IIIT Pet Dataset (Parkhi et al, 2012). The dataset consists
of images of 37 pet breeds, with 200 images per breed (~100 each in the training and
test splits). Each image includes the corresponding labels, and pixel-wise masks. The
masks are class-labels for each pixel. Each pixel is given one of three categories:
import tensorflow as tf
import tensorflow_datasets as tfds
In addition, the image color values are normalized to the [0, 1] range. Finally, as mentioned above the pixels
in the segmentation mask are labeled either {1, 2, 3}. For the sake of convenience, subtract 1 from the
segmentation mask, resulting in labels that are : {0, 1, 2}.
def load_image(datapoint):
input_image = tf.image.resize(datapoint['image'], (128, 128))
input_mask = tf.image.resize(datapoint['segmentation_mask'], (128, 128))
The dataset already contains the required training and test splits, so continue to use the same splits:
TRAIN_LENGTH = info.splits['train'].num_examples
BATCH_SIZE = 64
BUFFER_SIZE = 1000
STEPS_PER_EPOCH = TRAIN_LENGTH // BATCH_SIZE
class Augment(tf.keras.layers.Layer):
def __init__(self, seed=42):
super().__init__()
# both use the same seed, so they'll make the same random changes.
self.augment_inputs = tf.keras.layers.RandomFlip(mode="horizontal", seed=seed)
self.augment_labels = tf.keras.layers.RandomFlip(mode="horizontal", seed=seed)
Build the input pipeline, applying the augmentation after batching the inputs:
train_batches = (
train_images
.cache()
.shuffle(BUFFER_SIZE)
.batch(BATCH_SIZE)
.repeat()
.map(Augment())
.prefetch(buffer_size=tf.data.AUTOTUNE))
test_batches = test_images.batch(BATCH_SIZE)
21. Example on Image Segmentation
Visualize an image example and its corresponding mask from the dataset:
def display(display_list):
plt.figure(figsize=(15, 15))
for i in range(len(display_list)):
plt.subplot(1, len(display_list), i+1)
plt.title(title[i])
plt.imshow(tf.keras.utils.array_to_img(display_list[i]))
plt.axis('off')
plt.show()
The model being used here is a modified U-Net. A U-Net consists of an encoder (downsampler) and
decoder (upsampler). To learn robust features and reduce the number of trainable parameters, use a
pretrained model—MobileNetV2—as the encoder. For the decoder, you will use the upsample block.
As mentioned, the encoder is a pretrained MobileNetV2 model. You will use the model from
tf.keras.applications. The encoder consists of specific outputs from intermediate layers in the model.
Note that the encoder will not be trained during the training process.
21. Example on Image Segmentation
down_stack.trainable = False
21. Example on Image Segmentation
up_stack = [
pix2pix.upsample(512, 3), # 4x4 -> 8x8
pix2pix.upsample(256, 3), # 8x8 -> 16x16
pix2pix.upsample(128, 3), # 16x16 -> 32x32
pix2pix.upsample(64, 3), # 32x32 -> 64x64
]
21. Example on Image Segmentation
def unet_model(output_channels:int):
inputs = tf.keras.layers.Input(shape=[128, 128, 3])
# Downsampling through the model
skips = down_stack(inputs)
x = skips[-1]
skips = reversed(skips[:-1])
x = last(x)
return tf.keras.Model(inputs=inputs, outputs=x)
21. Example on Image Segmentation
Note that the number of filters on the last layer is set to the number of output_channels. This will be one output
channel per class.
Train the model
Since this is a multiclass classification problem, use the tf.keras.losses.CategoricalCrossentropy loss function with
the from_logits argument set to True, since the labels are scalar integers instead of vectors of scores for each pixel of
every class.
When running inference, the label assigned to the pixel is the channel with the highest value. This is what the
create_mask function is doing.
21. Example on Image Segmentation
OUTPUT_CLASSES = 3
model = unet_model(output_channels=OUTPUT_CLASSES)
model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
tf.keras.utils.plot_model(model, show_shapes=True)
def create_mask(pred_mask):
pred_mask = tf.math.argmax(pred_mask, axis=-1)
pred_mask = pred_mask[..., tf.newaxis]
return pred_mask[0]
21. Example on Image Segmentation
def show_predictions(dataset=None, num=1):
if dataset:
for image, mask in dataset.take(num):
pred_mask = model.predict(image)
display([image[0], mask[0], create_mask(pred_mask)])
else:
display([sample_image, sample_mask,
create_mask(model.predict(sample_image[tf.newaxis, ...]))])
show_predictions()
21. Example on Image Segmentation
The callback defined below is used to observe how the model improves while it is training:
class DisplayCallback(tf.keras.callbacks.Callback):
def on_epoch_end(self, epoch, logs=None):
clear_output(wait=True)
show_predictions()
print ('\nSample Prediction after epoch {}\n'.format(epoch+1))
EPOCHS = 20
VAL_SUBSPLITS = 5
VALIDATION_STEPS = info.splits['test'].num_examples//BATCH_SIZE//VAL_SUBSPLITS
class DisplayCallback(tf.keras.callbacks.Callback):
def on_epoch_end(self, epoch, logs=None):
clear_output(wait=True)
show_predictions()
print ('\nSample Prediction after epoch {}\n'.format(epoch+1))
EPOCHS = 20
VAL_SUBSPLITS = 5
VALIDATION_STEPS = info.splits['test'].num_examples//BATCH_SIZE//VAL_SUBSPLITS
plt.figure()
plt.plot(model_history.epoch, loss, 'r', label='Training loss')
plt.plot(model_history.epoch, val_loss, 'bo', label='Validation loss')
plt.title('Training and Validation Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss Value')
plt.ylim([0, 1])
plt.legend()
plt.show()
21. Example on Image Segmentation
Make predictions
Now, make some predictions. In the interest of saving time, the number of epochs was kept small, but you
may set this higher to achieve more accurate results.
show_predictions(test_batches, 3)
21. Example on Image Segmentation
22. TensorFlow Programming Basics
2. Load and prepare the MNIST dataset. Convert the samples from integers to floating-point
numbers:
mnist = tf.keras.datasets.mnist
3. Build the tf.keras.Sequential model by stacking layers. Choose an optimizer and loss function for
training:
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10)
])
TensorFlow Programming Basics
4. For each example the model returns a vector of "logits" or "log-odds" scores, one for each class.
predictions = model(x_train[:1]).numpy()
predictions
5.The tf.nn.softmax function converts these logits to "probabilities" for each class:
tf.nn.softmax(predictions).numpy()
Note: It is possible to bake this tf.nn.softmax in as the activation function for the last layer of the network. While this
can make the model output more directly interpretable, this approach is discouraged as it's impossible to provide an
exact and numerically stable loss calculation for all models when using a softmax output.
6.The losses.SparseCategoricalCrossentropy loss takes a vector of logits and a True index and
returns a scalar loss for each example. (Calculate loss)
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
This loss is equal to the negative log probability of the true class: It is zero if the model is sure of the
correct class.
TensorFlow Programming Basics
This untrained model gives probabilities close to random (1/10 for each class), so the initial loss
should be close to -tf.log(1/10) ~= 2.3.
loss_fn(y_train[:1], predictions).numpy()
model.compile(optimizer='adam',
loss=loss_fn,
metrics=['accuracy'])
The Model.fit method adjusts the model parameters to minimize the loss:
model.fit(x_train, y_train, epochs=5)
The Model.evaluate method checks the models performance, usually on a "Validation-set" or "Test-set".
model.evaluate(x_test, y_test, verbose=2)
https://www.kaggle.com/datasets/andrewmvd/leukemia-classification
Thanks!
Any
questions?
72