0% found this document useful (0 votes)
28 views72 pages

Chapter 8 - Image Processing Theory and Application

Uploaded by

loosolooso2000
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views72 pages

Chapter 8 - Image Processing Theory and Application

Uploaded by

loosolooso2000
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 72

Advanced Machine

Learning

Chapter 8: Image
Processing Theory and
Application

Dr. Fatma M. Talaat


Chapter 8: Image Processing
Theory and Application

2
Contents
1. Computer Vision Overview

2. Applications of Computer Vision

3. Computer Vision - related Disciplines

4. Computer Vision and AI

5. Image Classification

6. Object Detection

7. AdaBoost

8. Face Detection

9. Convolutional Neural Network (CNN)

10. CNN types

11. Data Augmentation

12. TensorFlow Programming Basics


1. Computer Vision Overview

- The world is three-dimensional in the human visual system, while images on computers are two-
dimensional.

- Computer vision is a science of studying how to make computers "see" like humans.

- Computer vision is a generic term for image-related technologies.


It includes image collection and acquisition, image compression coding, image storage and
transmission, image synthesis, image reconstruction, image recognition, image reorganization,
and image display.
2. Applications of Computer Vision

Computer vision has been widely used in many fields such as:

- Public security protection: facial recognition, fingerprint recognition, scenario-based monitoring,


and environment modeling.

- Biomedicine: chromosome analysis, X-ray, CT image analysis, and micromanipulation.

- Word processing: character recognition, document repair, office automation, and spam
classification.

- National defense: resource detection, military reconnaissance, and missile path planning.

- Smart transportation: road traffic management, e-police image capturing system, and driving.

- Entertainment: movie special effect, video editing, facial enhancement, motion sensing game, and
virtual reality (VR).
3. Computer Vision - related Disciplines

Computer vision is an interdisciplinary subject that studies image theories, technologies, and applications.
Computer vision is related not only to traditional mathematics, physics, physiology, psychology,
computer science, and electronic engineering but also to professional technologies such as computer
graphics, image pattern recognition, and image engineering.

These technical terms are associated with each other and are often used together. In many cases, they are
used by people with different professional backgrounds.
4. Computer Vision and AI

Most computer vision theories use Artificial Intelligence (AI) technologies. The development of AI
is closely related to computer vision. Many application problems in computer vision provide
research directions for AI technologies.

The most mature technology direction of AI in computer vision is image recognition, which
makes machines understand the content of images.

Image Content Recognition Face Detection


5. Image Classification

- Image classification is a basic research topic in the AI field and a core issue in
the computer vision field.

- The image classification operation is performed based on image processing


technologies to determine the category of an input image.

- Classify dogs by breeds:


5. Image Classification

Confidence: used to measure the reliability of the image classification result.


6. Object Detection

- Compared with the image classification, the object detection is not only a process of
recognizing objects but a process of locating objects in an image.

- The method for recognizing objects is the same as that of image classification, and bounding
boxes are used to locate and mark the locations of objects in an image.
6. Object Detection

- As one of the basic technologies of image processing and computer vision in the AI field,
object detection has a wide range of applications, such as traffic monitoring, image
search, facial recognition, and Human–Computer Interaction (HCI).

- Objects in an image can be detected using the object detection technology for further
processing using intelligent algorithms.
7. AdaBoost
- Adaptive boosting (AdaBoost) is an adaptive boosting algorithm, which can implement efficient
binary classification. The AdaBoost algorithm is used to combine multiple weak classifiers to form a
strong classifier. A weak classifier generally uses a single-layer decision tree model.

- AdaBoost only trains a single weak classifier during one iteration. The adaptation is embodied in: the
weight of the sample misclassified in the N-1th iteration will increase at the Nth iteration, and the
weight of the correctly classified sample will decrease and be used again to train the next weak
classification.

- Each weak classifier has a corresponding weight, and the weak classifier with a small classification
error rate has a large weight, which plays a greater role in the final classification function, while the
weak classifier with a large classification error rate has a small weight.
8. Face Detection

- Used with the AdaBoost algorithm, the Haar-like feature has a good performance in face detection.
- The Haar-like feature can reflect image intensity changes.

- In face images, some facial features can be described using rectangular features. For example, the color of
eyes is darker than that of cheeks, the color of nose wings is darker than that of the nose bridge, and the color of
the mouth is darker than that of the skin surrounding the mouth.

- The Haar-like feature has a good performance in the detection of upright frontal faces and objects whose
intensities change symmetrically.
9. Convolutional Neural Network (CNN)

A Convolutional Neural Network (CNN) is a feedforward neural network. Its artificial neurons can respond
to parts of the surrounding units within the coverage region. CNNs perform excellently in image processing.

A CNN consists of convolutional layers, pooling layers, and fully connected layers.

In the 1960s, Hubel and Wiesel found that the unique network structures could effectively reduce the
complexity of feedback neural networks when studying neurons used for local sensitivity and direction
selection in the cat visual cortex, based on which they proposed the CNN. Now, the CNN has become a
research focus in fields of science and technology, especially in pattern classification.

The CNN is widely used because it takes raw images as input without image preprocessing.
Architecture of Convolutional Neural Network

• Input layer: inputs data.


• Convolutional layer: composed of several convolutional units. The parameters of each
convolutional unit are obtained by optimizing the backpropagation algorithm. The purpose of
convolution calculation is to extract different input features. The first convolutional layer may extract
only some low-level features such as edges, lines, and angles. A multi-layer network can extract
more complex features based on the low-level features.
• Rectified linear units layer (ReLU layer): uses ReLU f(x) = max(0, x) as the activation function.

• Pooling layer: partitions features obtained from the convolutional layer into some areas and outputs
the maximum or minimum value, generating new features with a smaller spatial size.
• Fully connected layer: integrates all local features into global features to calculate the final scores
for each type
• Output layer: outputs the final result.
‫‪CNN Architecture‬‬
‫تدخل الـ ‪ input image‬على ‪ convolution layer‬وظيفتها عمل فلتر بأبعاد مختلفة على الصورة‪( .‬يمكن وضع أكثر من‬ ‫➜‬
‫‪ layer‬على حسب التطبيق)‪.‬‬
‫بعد كدة ممكن تدخل على ‪ max Pooling layer‬بتعمل فلتر برضه بس بتاخد أكبر قيمة‪.‬‬ ‫➜‬
‫ممكن بعدها تدخل على ‪ average Pooling layer‬بتعمل برضه فلتر بس بتاخد المتوسط‪.‬‬ ‫➜‬
‫(الـ ‪ 3‬طبقات السابقين الخاصين بالفلتر يمكن تكرارهم على حسب طبيعة الصور)‬ ‫➜‬
‫بعد كدة تدخل على ‪ Flatten layer‬عشان تحول الصور من ‪ matrix‬الى ‪.vector‬‬ ‫➜‬
‫بعد كدة الـ ‪( Classification layer‬بتدخل على طبقة أو أكثر من الـ ‪ Dense layer‬وبيتم فيها تحديد الـ ‪activation‬‬ ‫➜‬
‫‪ function‬هلى هتبقى ‪ relu‬وال ‪)sigmoid‬‬
‫آخر طبقة بتكون ‪ dense‬بس الـ ‪ activation function‬بتبقى ‪Softmax‬‬ ‫➜‬
Architecture of Convolutional Neural Network

Input Three-feature Three-feature Five-feature Five-feature Output


image image image image image layer

Convolutional Pooling Convolutional Pooling Fully connected


layer layer layer layer layer

Bird Pbird

Sunset Psunset

Dog Pdog

Cat Pcat
Vectorization
Convolution + nonlinearity Max pooling

Multi-category
Convolution layers + pooling layers
Fully connected layer
10. CNN types

10.1. ILSVRC
The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) held by
Stanford University, is closely related to the development of deep learning and
convolutional neural networks.

The dataset used by the annual ILSVRC contains about 1.2 million images and
labels in roughly 1000 categories, which is a subset of all the data of ImageNet.
Generally, the top-5 and top-1 error rates are used as the evaluation indicators of
model performance.
10.1.ILSVRC Historical Achievements

Since 2010, the ILSVRC evaluates algorithms for image classification, single-object
locating, and object detection.

https://twitter.com/hashtag/ilsvrc
10.2.ImageNet
The ImageNet project was founded in 2007 by Li Feifei, a Chinese professor at Stanford University. The
project aims to collect a large amount of image data with label information for model training in computer
vision.

The ImageNet dataset contains 15 million labeled high-resolution images of objects in roughly 22,000
categories. In about one million of the images, bounding boxes are also provided for objects of interest.

www.image-net.org
10.3. AlexNet
- AlexNet, 2012
- ReLU, overlapping pooling, data augmentation, dropout

Alex. ImageNet Classification with Deep Convolutional Neural Networks .


10.4. VGGNet
- VGGNet (Visual Geometry Group), 2014

- VGGNet investigates the effect of the depth on convolutional networks. VGGNet uses only
very small kernels with a spatial size of 3x3. After several convolutional, max pooling, and
fully connected layers, the category prediction result is generated using the softmax
function.

Visual Geometry Group. VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION.
Six Configurations of the VGG

Visual Geometry Group. VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION.
10.5. GoogLeNet
GooLeNet, 2014

Christian Szegedy. Going Deeper with Convolutions.


Inception Architecture
- GoogLeNet uses the Inception architecture with substructures
connected in parallel.

Christian Szegedy. Going Deeper with Convolutions.


10.6. ResNet
ResNet, 2015

Kaiming He. Deep Residual Learning for Image Recognition .


Residual Architecture
Residual architecture proposed by ResNet

Kaiming He. Deep Residual Learning for Image Recognition .


10.7. SENet
Squeeze-and-Excitation Networks (SENet), 2017

Jie Hu. Squeeze-and-Excitation Networks.


11. CNN Basic Architecture for Classification & Segmentation

• Convolutional Neural Networks (CNNs) are deep neural networks that have the capability
to classify and segment images.

• CNNs can be trained using supervised or unsupervised machine learning methods,


depending on what you want them to do.

• CNN architectures for classification and segmentation include a variety of different layers
with specific purposes, such as a convolutional layer, pooling layer, fully connected
layers, dropout layers, etc.
12. Description of basic CNN architecture for Classification

• The CNN architecture for classification includes convolutional layers, max-pooling layers,
and fully connected layers.

• Convolution and max-pooling layers are used for feature extraction.


• While convolution layers are meant for feature detection, max-pooling layers are meant
for feature selection.

• Max-pooling layers are employed when there are instances when the picture doesn’t
require all of the high-resolution details or an output with smaller regions extracted by
CNN’s is needed after performing downsampling operation on input data.
13. Description of basic CNN architecture for Segmentation

• Computer vision deals with images, and image segmentation is one of the most
important steps.

• It involves dividing a visual input into segments to make image analysis easier.

• Segments are made up of sets of one or more pixels.

• Image segmentation sorts pixels into larger components while also eliminating the need
to consider each pixel as a unit.
13. Description of basic CNN architecture for Segmentation

• Image segmentation is the process of dividing image into manageable sections or “tiles”.

• The process of image segmentation starts with defining small regions on an image that
should not be divided.

• These regions are called seeds, and the position of these seeds defines the tiles.
13. Description of basic CNN architecture for Segmentation

• The picture below can be used to understand image classification, object detection and
image segmentation. Notice how image segmentation can be used for image
classification or object detection.
14. What is data augmentation?

• Data augmentation: is a set of techniques to artificially increase the amount of


data by generating new data points from existing data.

• This includes making small changes to data or using deep learning models to
generate new data points.
15. Why is data augmentation important?

• Machine learning applications especially in the deep learning domain continue to


diversify and increase rapidly.

• Data-centric approaches to model development such as data augmentation techniques


can be a good tool against challenges which the artificial intelligence world faces.

• Data augmentation is useful to improve performance and outcomes of machine learning


models by forming new and different examples to train datasets. If the dataset in a
machine learning model is rich and sufficient, the model performs better and more
accurately.
15. Why is data augmentation important?

• For machine learning models, collecting and labeling of data can be exhausting and
costly processes. Transformations in datasets by using data augmentation techniques
allow companies to reduce these operational costs.

• One of the steps into a data model is cleaning data which is necessary for high accuracy
models. However, if cleaning reduces the represent-ability of data, then the model
cannot provide good predictions for real world inputs. Data augmentation techniques
can enable machine learning models to be more robust by creating variations that the
model may see in the real world.
16. How does data augmentation work?

Data Augmentation
17. Traditional Data augmentation Types

• For data augmentation, making simple alterations on visual data is popular. In addition,
generative adversarial networks (GANs) are used to create new synthetic data. Classic
image processing activities for data augmentation are:
• 1. padding
• 2. random rotating
• 3. re-scaling,
• 4. vertical and horizontal flipping
• 5. translation ( image is moved along X, Y direction)
• 6. cropping
• 7. zooming
• 8. darkening & brightening/color modification
• 9. grayscaling
• 10. changing contrast
• 11. adding noise
• 12. random erasing
17. Traditional Data augmentation Types
18. Advanced models for data augmentation are

• Adversarial training/Adversarial machine learning: It generates adversarial examples


which disrupt a machine learning model and injects them into a dataset to train.

• Generative adversarial networks (GANs): GAN algorithms can learn patterns from input
datasets and automatically create new examples which resemble training data.

• Neural style transfer: Neural style transfer models can blend content image and style
image and separate style from content.

• Reinforcement learning: Reinforcement learning models train software agents to attain


their goals and make decisions in a virtual environment.
18. Advanced models for data augmentation are

• Popular open source python packages for data augmentation in computer vision are
Keras ImageDataGenerator, Skimage and OpenCV.
19. Why is data augmentation important?

• Improving model prediction accuracy


- adding more training data into the models.
- preventing data scarcity for better models.
- reducing data overfitting ( i.e. an error in statistics, it means a function corresponds too
closely to a limited set of data points) and creating variability in data.
- increasing generalization ability of the models.
- helping resolve class imbalance issues in classification.

• Reducing costs of collecting and labeling data


• Enables rare event prediction
• Prevents data privacy problems
19. What are use cases/examples in data augmentation?

• Image recognition and NLP models generally use data augmentation methods.

• Also, the medical imaging domain utilizes data augmentation to apply transformations
on images and create diversity into the datasets.
20. What are use cases/examples in data augmentation?

The reasons of data augmentation interest in healthcare are:

• Small dataset for medical images.


• Sharing data is not easy due to patient data privacy regulations.
• There are only a few patients whose data can be used as training data in the diagnosis of rare
diseases.
20. What are use cases/examples in data augmentation?

Example studies in this field include:

• Brain tumor segmentation


• Differential data augmentation for medical imaging
• An automated data augmentation method for synthesizing labeled medical images
• Semi-supervised task-driven data augmentation for medical image segmentation
21. Example on Image Segmentation

• In an image classification task, the network assigns a label (or class) to each input
image. However, suppose you want to know the shape of that object, which pixel
belongs to which object, etc. In this case, you need to assign a class to each pixel of the
image—this task is known as segmentation.

• A segmentation model returns much more detailed information about the image. Image
segmentation has many applications in medical imaging, self-driving cars and satellite
imaging, just to name a few.
21. Example on Image Segmentation

• This example uses the Oxford-IIIT Pet Dataset (Parkhi et al, 2012). The dataset consists
of images of 37 pet breeds, with 200 images per breed (~100 each in the training and
test splits). Each image includes the corresponding labels, and pixel-wise masks. The
masks are class-labels for each pixel. Each pixel is given one of three categories:

• Class 1: Pixel belonging to the pet.


• Class 2: Pixel bordering the pet.
• Class 3: None of the above/a surrounding pixel.
21. Example on Image Segmentation

import tensorflow as tf
import tensorflow_datasets as tfds

from tensorflow_examples.models.pix2pix import pix2pix


from IPython.display import clear_output
import matplotlib.pyplot as plt

Download the Oxford-IIIT Pets dataset


dataset, info = tfds.load('oxford_iiit_pet:3.*.*', with_info=True)

In addition, the image color values are normalized to the [0, 1] range. Finally, as mentioned above the pixels
in the segmentation mask are labeled either {1, 2, 3}. For the sake of convenience, subtract 1 from the
segmentation mask, resulting in labels that are : {0, 1, 2}.

def normalize(input_image, input_mask):


input_image = tf.cast(input_image, tf.float32) / 255.0
input_mask -= 1
return input_image, input_mask
21. Example on Image Segmentation

def load_image(datapoint):
input_image = tf.image.resize(datapoint['image'], (128, 128))
input_mask = tf.image.resize(datapoint['segmentation_mask'], (128, 128))

input_image, input_mask = normalize(input_image, input_mask)

return input_image, input_mask

The dataset already contains the required training and test splits, so continue to use the same splits:
TRAIN_LENGTH = info.splits['train'].num_examples
BATCH_SIZE = 64
BUFFER_SIZE = 1000
STEPS_PER_EPOCH = TRAIN_LENGTH // BATCH_SIZE

train_images = dataset['train'].map(load_image, num_parallel_calls=tf.data.AUTOTUNE)


test_images = dataset['test'].map(load_image, num_parallel_calls=tf.data.AUTOTUNE)
21. Example on Image Segmentation

The following class performs a simple augmentation by randomly-flipping an image.

class Augment(tf.keras.layers.Layer):
def __init__(self, seed=42):
super().__init__()
# both use the same seed, so they'll make the same random changes.
self.augment_inputs = tf.keras.layers.RandomFlip(mode="horizontal", seed=seed)
self.augment_labels = tf.keras.layers.RandomFlip(mode="horizontal", seed=seed)

def call(self, inputs, labels):


inputs = self.augment_inputs(inputs)
labels = self.augment_labels(labels)
return inputs, labels
21. Example on Image Segmentation

Build the input pipeline, applying the augmentation after batching the inputs:

train_batches = (
train_images
.cache()
.shuffle(BUFFER_SIZE)
.batch(BATCH_SIZE)
.repeat()
.map(Augment())
.prefetch(buffer_size=tf.data.AUTOTUNE))

test_batches = test_images.batch(BATCH_SIZE)
21. Example on Image Segmentation

Visualize an image example and its corresponding mask from the dataset:

def display(display_list):
plt.figure(figsize=(15, 15))

title = ['Input Image', 'True Mask', 'Predicted Mask']

for i in range(len(display_list)):
plt.subplot(1, len(display_list), i+1)
plt.title(title[i])
plt.imshow(tf.keras.utils.array_to_img(display_list[i]))
plt.axis('off')
plt.show()

for images, masks in train_batches.take(2):


sample_image, sample_mask = images[0], masks[0]
display([sample_image, sample_mask])
21. Example on Image Segmentation
21. Example on Image Segmentation

Define the model

The model being used here is a modified U-Net. A U-Net consists of an encoder (downsampler) and
decoder (upsampler). To learn robust features and reduce the number of trainable parameters, use a
pretrained model—MobileNetV2—as the encoder. For the decoder, you will use the upsample block.

As mentioned, the encoder is a pretrained MobileNetV2 model. You will use the model from
tf.keras.applications. The encoder consists of specific outputs from intermediate layers in the model.
Note that the encoder will not be trained during the training process.
21. Example on Image Segmentation

base_model = tf.keras.applications.MobileNetV2(input_shape=[128, 128, 3], include_top=False)

# Use the activations of these layers


layer_names = [
'block_1_expand_relu', # 64x64
'block_3_expand_relu', # 32x32
'block_6_expand_relu', # 16x16
'block_13_expand_relu', # 8x8
'block_16_project', # 4x4
]
base_model_outputs = [base_model.get_layer(name).output for name in layer_names]

# Create the feature extraction model


down_stack = tf.keras.Model(inputs=base_model.input, outputs=base_model_outputs)

down_stack.trainable = False
21. Example on Image Segmentation

The decoder/upsampler is simply a series of upsample blocks implemented in TensorFlow examples:

up_stack = [
pix2pix.upsample(512, 3), # 4x4 -> 8x8
pix2pix.upsample(256, 3), # 8x8 -> 16x16
pix2pix.upsample(128, 3), # 16x16 -> 32x32
pix2pix.upsample(64, 3), # 32x32 -> 64x64
]
21. Example on Image Segmentation
def unet_model(output_channels:int):
inputs = tf.keras.layers.Input(shape=[128, 128, 3])
# Downsampling through the model
skips = down_stack(inputs)
x = skips[-1]
skips = reversed(skips[:-1])

# Upsampling and establishing the skip connections


for up, skip in zip(up_stack, skips):
x = up(x)
concat = tf.keras.layers.Concatenate()
x = concat([x, skip])
# This is the last layer of the model
last = tf.keras.layers.Conv2DTranspose(
filters=output_channels, kernel_size=3, strides=2,
padding='same') #64x64 -> 128x128

x = last(x)
return tf.keras.Model(inputs=inputs, outputs=x)
21. Example on Image Segmentation
Note that the number of filters on the last layer is set to the number of output_channels. This will be one output
channel per class.
Train the model

Now, all that is left to do is to compile and train the model.

Since this is a multiclass classification problem, use the tf.keras.losses.CategoricalCrossentropy loss function with
the from_logits argument set to True, since the labels are scalar integers instead of vectors of scores for each pixel of
every class.

When running inference, the label assigned to the pixel is the channel with the highest value. This is what the
create_mask function is doing.
21. Example on Image Segmentation
OUTPUT_CLASSES = 3

model = unet_model(output_channels=OUTPUT_CLASSES)
model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])

Plot the resulting model architecture:

tf.keras.utils.plot_model(model, show_shapes=True)

def create_mask(pred_mask):
pred_mask = tf.math.argmax(pred_mask, axis=-1)
pred_mask = pred_mask[..., tf.newaxis]
return pred_mask[0]
21. Example on Image Segmentation
def show_predictions(dataset=None, num=1):
if dataset:
for image, mask in dataset.take(num):
pred_mask = model.predict(image)
display([image[0], mask[0], create_mask(pred_mask)])
else:
display([sample_image, sample_mask,
create_mask(model.predict(sample_image[tf.newaxis, ...]))])

show_predictions()
21. Example on Image Segmentation
The callback defined below is used to observe how the model improves while it is training:

class DisplayCallback(tf.keras.callbacks.Callback):
def on_epoch_end(self, epoch, logs=None):
clear_output(wait=True)
show_predictions()
print ('\nSample Prediction after epoch {}\n'.format(epoch+1))

EPOCHS = 20
VAL_SUBSPLITS = 5
VALIDATION_STEPS = info.splits['test'].num_examples//BATCH_SIZE//VAL_SUBSPLITS

model_history = model.fit(train_batches, epochs=EPOCHS,


steps_per_epoch=STEPS_PER_EPOCH,
validation_steps=VALIDATION_STEPS,
validation_data=test_batches,
callbacks=[DisplayCallback()])
21. Example on Image Segmentation
The callback defined below is used to observe how the model improves while it is training:

class DisplayCallback(tf.keras.callbacks.Callback):
def on_epoch_end(self, epoch, logs=None):
clear_output(wait=True)
show_predictions()
print ('\nSample Prediction after epoch {}\n'.format(epoch+1))

EPOCHS = 20
VAL_SUBSPLITS = 5
VALIDATION_STEPS = info.splits['test'].num_examples//BATCH_SIZE//VAL_SUBSPLITS

model_history = model.fit(train_batches, epochs=EPOCHS,


steps_per_epoch=STEPS_PER_EPOCH,
validation_steps=VALIDATION_STEPS,
validation_data=test_batches,
callbacks=[DisplayCallback()])
21. Example on Image Segmentation
21. Example on Image Segmentation
loss = model_history.history['loss']
val_loss = model_history.history['val_loss']

plt.figure()
plt.plot(model_history.epoch, loss, 'r', label='Training loss')
plt.plot(model_history.epoch, val_loss, 'bo', label='Validation loss')
plt.title('Training and Validation Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss Value')
plt.ylim([0, 1])
plt.legend()
plt.show()
21. Example on Image Segmentation
Make predictions

Now, make some predictions. In the interest of saving time, the number of epochs was kept small, but you
may set this higher to achieve more accurate results.

show_predictions(test_batches, 3)
21. Example on Image Segmentation
22. TensorFlow Programming Basics

• Build a neural network that classifies images.


• Train this neural network.
• And, finally, evaluate the accuracy of the model.
TensorFlow Programming Basics
1. Download and install TensorFlow 2. Import TensorFlow into your program:
import tensorflow as tf

2. Load and prepare the MNIST dataset. Convert the samples from integers to floating-point
numbers:

mnist = tf.keras.datasets.mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()


x_train, x_test = x_train / 255.0, x_test / 255.0

3. Build the tf.keras.Sequential model by stacking layers. Choose an optimizer and loss function for
training:

model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10)
])
TensorFlow Programming Basics
4. For each example the model returns a vector of "logits" or "log-odds" scores, one for each class.
predictions = model(x_train[:1]).numpy()
predictions

5.The tf.nn.softmax function converts these logits to "probabilities" for each class:
tf.nn.softmax(predictions).numpy()

Note: It is possible to bake this tf.nn.softmax in as the activation function for the last layer of the network. While this
can make the model output more directly interpretable, this approach is discouraged as it's impossible to provide an
exact and numerically stable loss calculation for all models when using a softmax output.

6.The losses.SparseCategoricalCrossentropy loss takes a vector of logits and a True index and
returns a scalar loss for each example. (Calculate loss)

loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)

This loss is equal to the negative log probability of the true class: It is zero if the model is sure of the
correct class.
TensorFlow Programming Basics
This untrained model gives probabilities close to random (1/10 for each class), so the initial loss
should be close to -tf.log(1/10) ~= 2.3.

loss_fn(y_train[:1], predictions).numpy()

model.compile(optimizer='adam',
loss=loss_fn,
metrics=['accuracy'])
The Model.fit method adjusts the model parameters to minimize the loss:
model.fit(x_train, y_train, epochs=5)

The Model.evaluate method checks the models performance, usually on a "Validation-set" or "Test-set".
model.evaluate(x_test, y_test, verbose=2)

The image classifier is now trained to ~98% accuracy on this dataset.


Project 3: Try dealing with this data

https://www.kaggle.com/datasets/andrewmvd/leukemia-classification
Thanks!
Any
questions?

72

You might also like