User's Guide: Image Processing Toolbox™
User's Guide: Image Processing Toolbox™
User's Guide: Image Processing Toolbox™
User's Guide
R2021a
How to Contact MathWorks
Phone: 508-647-7000
Getting Started
1
Image Processing Toolbox Product Description . . . . . . . . . . . . . . . . . . . . . 1-2
Key Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-2
Compilability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-3
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-17
Introduction
2
Images in MATLAB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-2
v
Convert Image Data Between Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-22
Overview of Image Class Conversions . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-22
Losing Information in Conversions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-22
Converting Indexed Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-22
vi Contents
Remove Confidential Information from DICOM File . . . . . . . . . . . . . . . . 3-14
vii
Measure Distance Between Pixels in Image Viewer App . . . . . . . . . . . . . 4-33
Determine Distance Between Pixels Using Distance Tool . . . . . . . . . . . . . 4-33
Export Endpoint and Distance Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-34
Customize the Appearance of the Distance Tool . . . . . . . . . . . . . . . . . . . 4-35
Explore 3-D Labeled Volumetric Data with Volume Viewer App . . . . . . . 4-57
Load Labeled Volume and Intensity Volume into Volume Viewer . . . . . . . 4-57
View Labeled Volume in Volume Viewer . . . . . . . . . . . . . . . . . . . . . . . . . 4-59
Embed Labeled Volume with Intensity Volume . . . . . . . . . . . . . . . . . . . . 4-60
viii Contents
Interactive Tool Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-6
Display Target Image in Figure Window . . . . . . . . . . . . . . . . . . . . . . . . . . 5-6
Create the Tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-6
Position Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-7
Add Navigation Aids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-8
Customize Tool Interactivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-8
Geometric Transformations
6
Resize an Image with imresize Function . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-2
ix
Exploring a Conformal Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-52
Image Registration
7
Approaches to Registering Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-2
Registration Estimator App . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-2
Intensity-Based Automatic Image Registration . . . . . . . . . . . . . . . . . . . . . 7-3
Control Point Registration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-4
Automated Feature Detection and Matching . . . . . . . . . . . . . . . . . . . . . . . 7-5
x Contents
Control Point Registration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-56
Filter Grayscale and Truecolor (RGB) Images using imfilter Function . . . 8-7
xi
Segment Thermographic Image after Edge-Preserving Filtering . . . . . . 8-29
Transforms
9
Fourier Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-2
Definition of Fourier Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-2
Discrete Fourier Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-5
Applications of the Fourier Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-8
xii Contents
Morphological Operations
10
Types of Morphological Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-2
Morphological Dilation and Erosion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-2
Operations Based on Dilation and Erosion . . . . . . . . . . . . . . . . . . . . . . . 10-4
xiii
Analyzing and Enhancing Images
11
Pixel Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-3
Determine Values of Individual Pixels in Images . . . . . . . . . . . . . . . . . . . 11-3
xiv Contents
Gamma Correction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-73
Specify Gamma when Adjusting Contrast . . . . . . . . . . . . . . . . . . . . . . . 11-73
xv
Registration Markers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-151
ROI-Based Processing
12
Create a Binary Mask . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-2
Create a Binary Mask from a Grayscale Image . . . . . . . . . . . . . . . . . . . . 12-2
Create Binary Mask Using an ROI Function . . . . . . . . . . . . . . . . . . . . . . 12-2
Create Binary Mask Based on Color Values . . . . . . . . . . . . . . . . . . . . . . . 12-4
Create Binary Mask Without an Associated Image . . . . . . . . . . . . . . . . . 12-4
xvi Contents
Create Freehand ROI Editing Tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-76
Image Segmentation
13
Texture Segmentation Using Gabor Filters . . . . . . . . . . . . . . . . . . . . . . . . 13-2
Segment Image and Create Mask Using Color Thresholder App . . . . . 13-42
xvii
Segment Lungs from 3-D Chest Scan . . . . . . . . . . . . . . . . . . . . . . . . . . 13-126
Image Deblurring
14
Image Deblurring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-2
Deblurring Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-3
Color
15
Display Colors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-2
xviii Contents
Profile-Based Color Space Conversions . . . . . . . . . . . . . . . . . . . . . . . . . . 15-10
Read ICC Profiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-10
Write ICC Profile Information to a File . . . . . . . . . . . . . . . . . . . . . . . . . 15-10
Convert RGB to CMYK Using ICC Profiles . . . . . . . . . . . . . . . . . . . . . . . 15-11
What is Rendering Intent in Profile-Based Conversions? . . . . . . . . . . . . 15-12
xix
Sliding Neighborhood Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-3
Determine the Center Pixel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-3
General Algorithm of Sliding Neighborhood Operations . . . . . . . . . . . . . 17-4
Border Padding Behavior in Sliding Neighborhood Operations . . . . . . . . 17-4
Implementing Linear and Nonlinear Filtering as Sliding Neighborhood
Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-4
Deep Learning
18
Train and Apply Denoising Neural Networks . . . . . . . . . . . . . . . . . . . . . . 18-2
Remove Gaussian Noise Using Pretrained Network . . . . . . . . . . . . . . . . 18-2
Train a Denoising Network Using Built-In Layers . . . . . . . . . . . . . . . . . . 18-2
Train Fully Customized Denoising Neural Network . . . . . . . . . . . . . . . . . 18-3
xx Contents
Remove Noise from Color Image Using Pretrained Neural Network . . 18-12
xxi
Classify Hyperspectral Images Using Deep Learning . . . . . . . . . . . . . . . 19-54
xxii Contents
1
Getting Started
This topic presents two examples to get you started doing image processing using MATLAB® and the
Image Processing Toolbox software. The examples contain cross-references to other sections in the
documentation that have in-depth discussions on the concepts presented in the examples.
Image Processing Toolbox apps let you automate common image processing workflows. You can
interactively segment image data, compare image registration techniques, and batch-process large
datasets. Visualization functions and apps let you explore images, 3D volumes, and videos; adjust
contrast; create histograms; and manipulate regions of interest (ROIs).
You can accelerate your algorithms by running them on multicore processors and GPUs. Many
toolbox functions support C/C++ code generation for desktop prototyping and embedded vision
system deployment.
Key Features
• Image analysis, including segmentation, morphology, statistics, and measurement
• Apps for image region analysis, image batch processing, and image registration
• 3D image processing workflows, including visualization and segmentation
• Image enhancement, filtering, geometric transformations, and deblurring algorithms
• Intensity-based and non-rigid image registration methods
• Support for CUDA enabled NVIDIA GPUs (with Parallel Computing Toolbox™)
• C-code generation support for desktop prototyping and embedded vision system deployment
1-2
Compilability
Compilability
The Image Processing Toolbox software is compilable with the MATLAB Compiler™ except for the
following functions that launch GUIs:
• cpselect
• implay
• imtool
1-3
1 Getting Started
Read an image into the workspace, using the imread command. The example reads one of the
sample images included with the toolbox, an image of a young girl in a file named pout.tif , and
stores it in an array named I . imread infers from the file that the graphics file format is Tagged
Image File Format (TIFF).
I = imread('pout.tif');
Display the image, using the imshow function. You can also view an image in the Image Viewer app.
The imtool function opens the Image Viewer app which presents an integrated environment for
displaying images and performing some common image processing tasks. The Image Viewer app
provides all the image display capabilities of imshow but also provides access to several other tools
for navigating and exploring images, such as scroll bars, the Pixel Region tool, Image Information
tool, and the Contrast Adjustment tool.
imshow(I)
Check how the imread function stores the image data in the workspace, using the whos command.
You can also check the variable in the Workspace Browser. The imread function returns the image
data in the variable I , which is a 291-by-240 element array of uint8 data.
1-4
Basic Image Import, Processing, and Export
whos I
View the distribution of image pixel intensities. The image pout.tif is a somewhat low contrast
image. To see the distribution of intensities in the image, create a histogram by calling the imhist
function. (Precede the call to imhist with the figure command so that the histogram does not
overwrite the display of the image I in the current figure window.) Notice how the histogram
indicates that the intensity range of the image is rather narrow. The range does not cover the
potential range of [0, 255], and is missing the high and low values that would result in good contrast.
figure
imhist(I)
Improve the contrast in an image, using the histeq function. Histogram equalization spreads the
intensity values over the full range of the image. Display the image. (The toolbox includes several
other functions that perform contrast adjustment, including imadjust and adapthisteq, and
interactive tools such as the Adjust Contrast tool, available in the Image Viewer.)
I2 = histeq(I);
figure
imshow(I2)
1-5
1 Getting Started
Call the imhist function again to create a histogram of the equalized image I2 . If you compare the
two histograms, you can see that the histogram of I2 is more spread out over the entire range than
the histogram of I .
figure
imhist(I2)
1-6
Basic Image Import, Processing, and Export
Write the newly adjusted image I2 to a disk file, using the imwrite function. This example includes
the filename extension '.png' in the file name, so the imwrite function writes the image to a file in
Portable Network Graphics (PNG) format, but you can specify other formats.
imwrite (I2, 'pout2.png');
View what imwrite wrote to the disk file, using the imfinfo function. The imfinfo function
returns information about the image in the file, such as its format, size, width, and height.
imfinfo('pout2.png')
1-7
1 Getting Started
InterlaceType: 'none'
Transparency: 'none'
SimpleTransparencyData: []
BackgroundColor: []
RenderingIntent: []
Chromaticities: []
Gamma: []
XResolution: []
YResolution: []
ResolutionUnit: []
XOffset: []
YOffset: []
OffsetUnit: []
SignificantBits: []
ImageModTime: '24 Feb 2021 03:40:45 +0000'
Title: []
Author: []
Description: []
Copyright: []
CreationTime: []
Software: []
Disclaimer: []
Warning: []
Source: []
Comment: []
OtherText: []
1-8
Correct Nonuniform Illumination and Analyze Foreground Objects
The background illumination is brighter in the center of the image than at the bottom. Preprocess the
image to make the background illumination more uniform.
As a first step, remove all of the foreground (rice grains) using morphological opening. The opening
operation removes small objects that cannot completely contain the structuring element. Define a
disk-shaped structuring element with a radius of 15, which fits entirely inside a single grain of rice.
se = strel('disk',15)
se =
strel is a disk shaped structuring element with properties:
1-9
1 Getting Started
To perform the morphological opening, use imopen with the structuring element.
background = imopen(I,se);
imshow(background)
Subtract the background approximation image, background, from the original image, I, and view
the resulting image. After subtracting the adjusted background image from the original image, the
resulting image has a uniform background but is now a bit dark for analysis.
I2 = I - background;
imshow(I2)
1-10
Correct Nonuniform Illumination and Analyze Foreground Objects
Use imadjust to increase the contrast of the processed image I2 by saturating 1% of the data at
both low and high intensities and by stretching the intensity values to fill the uint8 dynamic range.
I3 = imadjust(I2);
imshow(I3)
1-11
1 Getting Started
Note that the prior two steps could be replaced by a single step using imtophat which first
calculates the morphological opening and then subtracts it from the original image.
I2 = imtophat(I,strel('disk',15));
Create a binary version of the processed image so you can use toolbox functions for analysis. Use the
imbinarize function to convert the grayscale image into a binary image. Remove background noise
from the image with the bwareaopen function.
bw = imbinarize(I3);
bw = bwareaopen(bw,50);
imshow(bw)
Now that you have created a binary version of the original image you can perform analysis of objects
in the image.
Find all the connected components (objects) in the binary image. The accuracy of your results
depends on the size of the objects, the connectivity parameter (4, 8, or arbitrary), and whether or not
any objects are touching (in which case they could be labeled as one object). Some of the rice grains
in the binary image bw are touching.
cc = bwconncomp(bw,4)
cc.NumObjects
1-12
Correct Nonuniform Illumination and Analyze Foreground Objects
ans = 95
grain = false(size(bw));
grain(cc.PixelIdxList{50}) = true;
imshow(grain)
Visualize all the connected components in the image by creating a label matrix and then displaying it
as a pseudocolor indexed image.
Use labelmatrix to create a label matrix from the output of bwconncomp. Note that labelmatrix
stores the label matrix in the smallest numeric class necessary for the number of objects.
labeled = labelmatrix(cc);
whos labeled
Use label2rgb to choose the colormap, the background color, and how objects in the label matrix
map to colors in the colormap. In the pseudocolor image, the label identifying each object in the label
matrix maps to a different color in an associated colormap matrix.
RGB_label = label2rgb(labeled,'spring','c','shuffle');
imshow(RGB_label)
1-13
1 Getting Started
Compute the area of each object in the image using regionprops. Each rice grain is one connected
component in the cc structure.
graindata = regionprops(cc,'basic')
Create a new vector grain_areas, which holds the area measurement for each grain.
grain_areas = [graindata.Area];
ans = 194
min_area = 61
idx = 16
grain = false(size(bw));
grain(cc.PixelIdxList{idx}) = true;
imshow(grain)
1-14
Correct Nonuniform Illumination and Analyze Foreground Objects
histogram(grain_areas)
title('Histogram of Rice Grain Area')
1-15
1 Getting Started
See Also
bwareaopen | bwconncomp | imadjust | imbinarize | imopen | imread | imshow | label2rgb |
labelmatrix | regionprops
1-16
Acknowledgments
Acknowledgments
This table lists the copyright owners of the images used in the Image Processing Toolbox
documentation.
Image Source
cameraman Copyright Massachusetts Institute of Technology. Used with
permission.
cell Cancer cell from a rat's prostate, courtesy of Alan W. Partin,
M.D., Ph.D., Johns Hopkins University School of Medicine.
circuit Micrograph of 16-bit A/D converter circuit, courtesy of Steve
Decker and Shujaat Nadeem, MIT, 1993.
concordaerial and Visible color aerial photographs courtesy of mPower3/Emerge.
westconcordaerial
concordorthophoto and Orthoregistered photographs courtesy of Massachusetts
westconcordorthophoto Executive Office of Environmental Affairs, MassGIS.
forest Photograph of Carmanah Ancient Forest, British Columbia,
Canada, courtesy of Susan Cohen.
LAN files Permission to use Landsat data sets provided by Space Imaging,
LLC, Denver, Colorado.
liftingbody Picture of M2-F1 lifting body in tow, courtesy of NASA (Image
number E-10962).
m83 M83 spiral galaxy astronomical image courtesy of Anglo-
Australian Observatory, photography by David Malin.
moon Copyright Michael Myers. Used with permission.
saturn Voyager 2 image, 1981-08-24, NASA catalog #PIA01364.
solarspectra Courtesy of Ann Walker. Used with permission.
tissue Courtesy of Alan W. Partin, M.D., PhD., Johns Hopkins University
School of Medicine.
trees Trees with a View, watercolor and ink on paper, copyright Susan
Cohen. Used with permission.
paviaU University of Pavia hyperspectral data set, courtesy of Paolo
Gamba, PhD., Remote Sensing Group at the University of Pavia.
Used with permission.
1-17
2
Introduction
This chapter introduces you to the fundamentals of image processing using MATLAB and the Image
Processing Toolbox software.
Images in MATLAB
The basic data structure in MATLAB is the array, an ordered set of real or complex elements. This
object is naturally suited to the representation of images, real-valued ordered sets of color or
intensity data.
MATLAB stores most images as two-dimensional matrices, in which each element of the matrix
corresponds to a single discrete pixel in the displayed image. (Pixel is derived from picture element
and usually denotes a single dot on a computer display.) For example, an image composed of 200
rows and 300 columns of different colored dots would be stored in MATLAB as a 200-by-300 matrix.
Some images, such as truecolor images, represent images using a three-dimensional array. In
truecolor images, the first plane in the third dimension represents the red pixel intensities, the
second plane represents the green pixel intensities, and the third plane represents the blue pixel
intensities. This convention makes working with images in MATLAB similar to working with any other
type of numeric data, and makes the full power of MATLAB available for image processing
applications.
For more information on how Image Processing Toolbox assigns pixel indices and how to relate pixel
indices to continuous spatial coordinates, see “Image Coordinate Systems” on page 2-3.
See Also
imread | imshow
Related Examples
• “Basic Image Import, Processing, and Export” on page 1-4
More About
• “Image Types in the Toolbox” on page 2-12
2-2
Image Coordinate Systems
Pixel Indices
As described in “Images in MATLAB” on page 2-2, MATLAB stores most images as arrays. Each (row,
column) index of the array corresponds to a single pixel in the displayed image.
There is a one-to-one correspondence between pixel indices and subscripts for the first two matrix
dimensions. Just like for array indexing in MATLAB, pixel indices are integer values and range from 1
to the length of the row or column. The indices are ordered from top to bottom, and from left to right.
For example, the data for the pixel in the fifth row, second column is stored in the matrix element
(5,2). You use normal MATLAB matrix subscripting to access values of individual pixels. For example,
the MATLAB code
I(2,15)
returns the value of the pixel at row 2, column 15 of the single-channel image I. Similarly, the
MATLAB code
RGB(2,15,:)
returns the color values of the pixel at row 2, column 15 of the multi-channel image RGB.
Spatial Coordinates
In a spatial coordinate system, locations in an image are positions on a continuous plane. Locations
are described in terms of Cartesian x and y coordinates (not row and column indices as in the pixel
indexing system). From this Cartesian perspective, an (x,y) location such as (3.2,5.3) is meaningful
and distinct from the coordinate (5,3).
The Image Processing Toolbox defines two types of spatial coordinate systems depending on the
frame of reference. Intrinsic coordinates specify locations with respect to the image's frame of
reference. World coordinates specify locations with respect to an external world observer.
2-3
2 Introduction
Intrinsic Coordinates
By default, the toolbox defines spatial image coordinates using the intrinsic coordinate system. This
spatial coordinate system corresponds to the image’s pixel indices. The intrinsic coordinates (x,y) of
the center point of any pixel are identical to the column and row indices for that pixel. For example,
the center point of the pixel in row 5, column 3 has spatial coordinates x = 3.0, y = 5.0. Be aware,
however, that the order of order of intrinsic coordinate (3.0,5.0) is reversed relative to pixel indices
(5,3).
The intrinsic coordinates of the center of every pixel are integer valued. The center of the upper left
pixel has intrinsic coordinates (1.0, 1.0). The center of the lower right pixel has intrinsic coordinates
(numCols, numRows), where numCols and numRows are the number of rows and columns in the
image. In general, the center of the pixel with pixel indices (m, n) falls at the point x = n, y = m in the
intrinsic coordinate system.
Because the size of each pixel in the intrinsic coordinate system is one unit, the boundaries of the
image have fractional coordinates. The upper left corner of the image is located at (0.5,0.5), not at
(0,0). Similarly, the lower right corner of the image is located at (numCols + 0.5, numRows + 0.5).
Several functions primarily work with spatial coordinates rather than pixel indices, but as long as you
are using the default spatial coordinate system (intrinsic coordinates), you can specify locations in
terms of their columns (x) and rows (y).
World Coordinates
In some situations, you might want to use a world coordinate system (also called a nondefault spatial
coordinate system). Some situations when you might want to use a world coordinate system include:
• When you perform a geometric operation, such as translation, on an image and want to preserve
information about how the new position relates to the original position.
• When pixels are not square. For example, in magnetic resonance imaging (MRI), data you can
collect data such that pixels have a higher sampling rate in one direction than an orthogonal
direction.
• When you know how the extent of pixels aligns with positions in the real world. For example, in an
aerial photograph, every pixel might cover a specific 5-by-5 meter patch on the ground.
• When you want to reverse the direction of the x-axis or y-axis. This is a common technique to use
with geospatial data.
There are several ways to define a world coordinate system. You can use spatial referencing objects,
which encode the location of the image in a world coordinate system, the image resolution, and how
the image extent relates to intrinsic and world coordinates. You can also specify the maximum and
minimum coordinate in each dimension. For more information, see “Define World Coordinate System
of Image” on page 2-6.
2-4
Image Coordinate Systems
See Also
Related Examples
• “Shift X- and Y-Coordinate Range of Displayed Image” on page 2-9
More About
• “Images in MATLAB” on page 2-2
• “Define World Coordinate System of Image” on page 2-6
2-5
2 Introduction
Image Processing Toolbox uses includes two spatial referencing objects, imref2d and imref3d. The
table describes the properties of the 2-D spatial referencing object, imref2d. The 3-D spatial
referencing object, imref3d, includes these properties as well as corresponding properties for the Z
dimension.
Property Description
XWorldLimits Upper and lower bounds along the X dimension in world
coordinates (nondefault spatial coordinates)
YWorldLimits Upper and lower bounds along the Y dimension in world
coordinates (nondefault spatial coordinates)
ImageSize Size of the image, returned by the size function.
PixelExtentInWorldX Size of pixel along the X dimension
PixelExtentInWorldY Size of pixel along the Y dimension
ImageExtentInWorldX Size of image along the X dimension
ImageExtentInWorldY Size of image along the Y dimension
XIntrinsicLimits Upper and lower bounds along X dimension in intrinsic
coordinates (default spatial coordinates)
YIntrinsicLimits Upper and lower bounds along Y dimension in intrinsic
coordinates (default spatial coordinates).
To illustrate spatial referencing, this sample code creates a spatial referencing object associated with
a 2-by-2 image. The code specifies the pixel extent in the horizontal and vertical directions as 4 units/
pixel and 2 units/pixel, respectively. The object calculates the world limits, image extent in world
coordinates, and image extent in intrinsic coordinates.
R = imref2d([2 2],4,2)
R =
XWorldLimits: [2 10]
YWorldLimits: [1 5]
ImageSize: [2 2]
PixelExtentInWorldX: 4
PixelExtentInWorldY: 2
2-6
Define World Coordinate System of Image
ImageExtentInWorldX: 8
ImageExtentInWorldY: 4
XIntrinsicLimits: [0.5000 2.5000]
YIntrinsicLimits: [0.5000 2.5000]
The figure illustrates how these properties map to elements of the image.
By default, the intrinsic coordinates, world coordinates, and MATLAB axes coordinates of an image
coincide. For an image A, the default value of XData is [1 size(A,2)] and the default value of
YData is [1 size(A,1)]. For example, if A is a 100 row by 200 column image, the default XData is
[1 200] and the default YData is [1 100].
To define a nondefault world coordinate system for an image, specify the image XData and YData
properties with the range of coordinates spanned by the image in each dimension. When you do this,
the MATLAB axes coordinates become identical to the world coordinates and no longer coincide with
the intrinsic coordinates. For an example, see “Shift X- and Y-Coordinate Range of Displayed Image”
on page 2-9.
Note that the values in XData and YData are actually the coordinates for the center point of the
boundary pixels, not the outermost edge of the boundary pixels. Therefore, the actual coordinate
range spanned by the image is slightly larger. For instance, if XData is [1 200] and the image is 200
pixels wide, as for the intrinsic coordinate system, then each pixel is one unit wide and the interval in
X spanned by the image is [0.5 200.5]. Similarly, if XData is [1 200] and the image is 50 pixels wide,
2-7
2 Introduction
as for a nondefault world coordinate system, then each pixel is four units wide and the interval in X
spanned by the image is [–1 202].
You can set XData or YData such that the x-axis or y-axis is reversed. You would do this by placing
the larger value first. For example, set the XData to [200 1].
See Also
imref2d | imref3d | imregister | imregtform | imshow | imwarp
Related Examples
• “Shift X- and Y-Coordinate Range of Displayed Image” on page 2-9
More About
• “Image Coordinate Systems” on page 2-3
2-8
Shift X- and Y-Coordinate Range of Displayed Image
Read an image.
I = imread("peppers.png");
Display the image using the intrinsic coordinate system, returning properties of the image in ax. Turn
on the axis to display the coordinate system.
figure
ax = imshow(I);
title('Image Displayed with Intrinsic Coordinates')
axis on
Check the range of the x- and y-coordinates, which are stored in the XData and YData properties of
ax. The ranges match the dimensions of the image.
xrange = ax.XData
xrange = 1×2
2-9
2 Introduction
1 512
yrange = ax.YData
yrange = 1×2
1 384
Change the range of the x- and y-coordinates. This example shifts the image to the right by adding
100 to the x-coordinates and shifts the image up by subtracting 25 from the y-coordinates.
xrangeNew = xrange + 100;
yrangeNew = yrange - 25;
Confirm that the ranges of the x- and y-coordinates of the new image match the shifted ranges
specified by xrangeNew and yrangeNew.
2-10
Shift X- and Y-Coordinate Range of Displayed Image
axNew.XData
ans = 1×2
101 612
axNew.YData
ans = 1×2
-24 359
See Also
More About
• “Image Coordinate Systems” on page 2-3
• “Define World Coordinate System of Image” on page 2-6
2-11
2 Introduction
All images in Image Processing Toolbox are assumed to have nonsparse values. Numeric and logical
images are expected to be real-valued unless otherwise specified.
• For single or double arrays, integer values range from [1, p].
• For logical, uint8, or uint16 arrays, values range from [0, p-1].
There are other models, called color spaces, that describe colors using
three color channels. For these color spaces, the range of each data
type may differ from the range allowed by images in the RGB color
space. For example, pixel values in the L*a*b* color space of data type
double can be negative or greater than 1. For more information, see
“Understanding Color Spaces and Color Space Conversion” on page
15-15.
2-12
Image Types in the Toolbox
Binary Images
In a binary image, each pixel has one of only two discrete values: 1 or 0. Most functions in the toolbox
interpret pixels with value 1 as belonging to a region of interest, and pixels with value 0 as the
background. Binary images are frequently used in conjunction with other image types to indicate
which portions of the image to process.
The figure shows a binary image with a close-up view of some of the pixel values.
Indexed Images
An indexed image consists of an image matrix and a color map.
A color map is an m-by-3 matrix of class double containing values in the range [0, 1]. Each row of
the color map specifies the red, green, and blue components of a single color.
The pixel values in the image matrix are direct indices into the color map. Therefore, the color of
each pixel in the indexed image is determined by mapping the pixel value in the image matrix to the
corresponding color in the color map. The mapping depends on the class of the image matrix:
• If the image matrix is of class single or double, the color map normally contains integer values
in the range [1, p], where p is the length of the color map. The value 1 points to the first row in the
color map, the value 2 points to the second row, and so on.
• If the image matrix is of class logical, uint8 or uint16, the color map normally contains
integer values in the range [0, p–1]. The value 0 points to the first row in the color map, the value
1 points to the second row, and so on.
2-13
2 Introduction
A color map is often stored with an indexed image and is automatically loaded with the image when
you use the imread function. After you read the image and the color map into the workspace as
separate variables, you must keep track of the association between the image and color map.
However, you are not limited to using the default color map—you can use any color map that you
choose.
The figure illustrates an indexed image, the image matrix, and the color map, respectively. The image
matrix is of class double, so the value 7 points to the seventh row of the color map.
Grayscale Images
A grayscale image is a data matrix whose values represent intensities of one image pixel. While
grayscale images are rarely saved with a color map, MATLAB uses a color map to display them.
You can obtain a grayscale image directly from a camera that acquires a single signal for each pixel.
You can also convert truecolor or multispectral images to grayscale to emphasize one particular
aspect of the images. For example, you can take a linear combination of the red, green, and blue
channels of an RGB image such that the resulting grayscale image indicates the brightness,
saturation, or hue of each pixel. You can process each channel of a truecolor or multispectral image
independently by splitting the channels into separate grayscale images.
The figure depicts a grayscale image of class double whose pixel values are in the range [0, 1].
2-14
Image Types in the Toolbox
Truecolor Images
A truecolor image is an image in which each pixel has a color specified by three values. Graphics file
formats store truecolor images as 24-bit images, where three color channels are 8 bits each. This
yields a potential of 16 million colors. The precision with which a real-life image can be replicated has
led to the commonly used term truecolor image.
RGB images are the most common type of truecolor images. In RGB images, the three color channels
are red, green, and blue. For more information about the RGB color channels, see “Display Separated
Color Channels of RGB Image” on page 2-18.
There are other models, called color spaces, that describe colors using three different color channels.
For these color spaces, the range of each data type may differ from the range allowed by images in
the RGB color space. For example, pixel values in the L*a*b* color space of data type double can be
negative or greater than 1. For more information, see “Understanding Color Spaces and Color Space
Conversion” on page 15-15.
Truecolor images do not use a color map. The color of each pixel is determined by the combination of
the intensities stored in each color channel at the pixel's location.
The figure depicts the red, green, and blue channels of a floating-point RGB image. Observe that pixel
values are in the range [0, 1].
To determine the color of the pixel at (row, column) coordinate (2,3), you would look at the RGB
triplet stored in the vector (2,3,:). Suppose (2,3,1) contains the value 0.5176, (2,3,2) contains
0.1608, and (2,3,3) contains 0.0627. The color for the pixel at (2,3) is
0.5176 0.1608 0.0627
HDR Images
Dynamic range refers to the range of brightness levels. The dynamic range of real-world scenes can
be quite high. High dynamic range (HDR) images attempt to capture the whole tonal range of real-
world scenes (called scene-referred), using 32-bit floating-point values to store each color channel.
The figure depicts the red, green, and blue channels of a tone-mapped HDR image with original pixel
values in the range [0, 3.2813]. Tone mapping is a process that reduces the dynamic range of an HDR
image to the range expected by a computer monitor or screen.
2-15
2 Introduction
The figure depicts a multispectral image with six channels consisting of red, green, blue color
channels (depicted as a single RGB image) and three infrared channels.
2-16
Image Types in the Toolbox
Label Images
A label image is an image in which each pixel specifies a class, object, or region of interest (ROI). You
can derive a label image from an image of a scene using segmentation techniques.
• A numeric label image enumerates objects or ROIs in the scene. Labels are nonnegative integers.
The background typically has the value 0. The pixels labeled 1 make up one object; the pixels
labeled 2 make up a second object; and so on.
• A categorical label image specifies the class of each pixel in the image. The background is
commonly assigned the value <undefined>.
The figure depicts a label image with three categories: petal, leaf, and dirt.
See Also
More About
• “Convert Between Image Types” on page 2-21
• “Understanding Color Spaces and Color Space Conversion” on page 15-15
• “Work with Image Sequences as Multidimensional Arrays” on page 2-48
2-17
2 Introduction
Create an RGB image with uninterrupted areas of red, green, and blue. Display the image.
imSize = 200;
RGB = reshape(ones(imSize,1)*reshape(jet(imSize),1,imSize*3),[imSize,imSize,3]);
imshow(RGB)
title('Original RGB Image')
[R,G,B] = imsplit(RGB);
Display a grayscale representation of each color channel. Notice that each separated color plane in
the figure contains an area of white. The white corresponds to the highest values (purest shades) of
each separate color. For example, in the red channel image, the white represents the highest
concentration of pure red values. As red becomes mixed with green or blue, gray pixels appear. The
black region in the image shows pixel values that contain no red values, in other words, when R ==
0.
figure
subplot(1,3,1)
imshow(R)
title('Red Channel')
subplot(1,3,2)
imshow(G)
title('Green Channel')
subplot(1,3,3)
imshow(B)
title('Blue Channel')
2-18
Display Separated Color Channels of RGB Image
Display a color representation of each color channel. In these images, the desired color channel
maintains its original intensity values and pixel values in the other two color channels are set to 0.
allBlack = zeros(size(RGB,1,2),class(RGB));
justR = cat(3,R,allBlack,allBlack);
justG = cat(3,allBlack,G,allBlack);
justB = cat(3,allBlack,allBlack,B);
figure
montage({justR,justG,justB},'Size',[1 3], ...
"BackgroundColor",'w',"BorderSize",10);
title('Color Representation of the Red, Green, and Blue Color Channels');
2-19
2 Introduction
See Also
imsplit
More About
• “Image Types in the Toolbox” on page 2-12
2-20
Convert Between Image Types
You can perform certain conversions just using MATLAB syntax. For example, you can convert a
grayscale image to truecolor format by concatenating three copies of the original matrix along the
third dimension.
RGB = cat(3,I,I,I);
The resulting truecolor image has identical matrices for the red, green, and blue planes, so the image
displays as shades of gray.
In addition to these image type conversion functions, there are other functions that return a different
image type as part of the operation they perform. For example, the region of interest functions return
a binary image that you can use to mask an image for filtering or for other operations.
Note When you convert an image from one format to another, the resulting image might look
different from the original. For example, if you convert a color indexed image to a grayscale image,
the resulting image displays as shades of grays, not color.
Function Description
demosaic Convert Bayer pattern encoded image to truecolor (RGB) image.
dither Use dithering to convert a grayscale image to a binary image or to convert a
truecolor image to an indexed image.
gray2ind Convert a grayscale image to an indexed image.
grayslice Convert a grayscale image to an indexed image by using multilevel
thresholding.
ind2gray Convert an indexed image to a grayscale image.
ind2rgb Convert an indexed image to a truecolor image.
mat2gray Convert a data matrix to a grayscale image, by scaling the data.
rgb2gray Convert a truecolor image to a grayscale image.
Note: To work with images that use other color spaces, such as HSV, first
convert the image to RGB, process the image, and then convert it back to the
original color space. For more information about color space conversion
routines, see “Understanding Color Spaces and Color Space Conversion” on
page 15-15.
rgb2ind Convert a truecolor image to an indexed image.
2-21
2 Introduction
For easier conversion of classes, use one of these functions: im2uint8, im2uint16, im2int16,
im2single, or im2double. These functions automatically handle the rescaling and offsetting of the
original data of any image class. For example, this command converts a double-precision RGB image
with data in the range [0,1] to a uint8 RGB image with data in the range [0,255].
RGB2 = im2uint8(RGB1);
For example, a uint16 or double indexed image with 300 colors cannot be converted to uint8,
because uint8 arrays have only 256 distinct values. If you want to perform this conversion, you must
first reduce the number of the colors in the image using the imapprox function. This function
performs the quantization on the colors in the color map, to reduce the number of distinct colors in
the image. See “Reduce Colors of Indexed Image Using imapprox” on page 15-7 for more
information.
2-22
Perform an Operation on a Sequence of Images
fileFolder = fullfile(matlabroot,'toolbox','images','imdata');
dirOutput = dir(fullfile(fileFolder,'AT3_1m4_*.tif'));
fileNames = {dirOutput.name}'
numFrames = numel(fileNames)
fileNames =
{'AT3_1m4_01.tif'}
{'AT3_1m4_02.tif'}
{'AT3_1m4_03.tif'}
{'AT3_1m4_04.tif'}
{'AT3_1m4_05.tif'}
{'AT3_1m4_06.tif'}
{'AT3_1m4_07.tif'}
{'AT3_1m4_08.tif'}
{'AT3_1m4_09.tif'}
{'AT3_1m4_10.tif'}
numFrames =
10
Preallocate an m -by- n -by- p array and read images into the array.
I = imread(fileNames{1});
sequence = zeros([size(I) numFrames],class(I));
sequence(:,:,1) = I;
for p = 2:numFrames
sequence(:,:,p) = imread(fileNames{p});
end
Process each image in the sequence, performing standard deviation filtering. Note that, to use
stdfilt with an image sequence, you must specify the nhood argument, passing a 2-D
neighborhood.
sequenceNew = stdfilt(sequence,ones(3));
figure;
for k = 1:numFrames
imshow(sequence(:,:,k));
title(sprintf('Original Image # %d',k));
2-23
2 Introduction
pause(1);
imshow(sequenceNew(:,:,k),[]);
title(sprintf('Processed Image # %d',k));
pause(1);
end
2-24
Detecting Cars in a Video of Traffic
The VideoReader function constructs a multimedia reader object that can read video data from a
multimedia file. See VideoReader for information on which formats are supported on your platform.
Use VideoReader to access the video and get basic information about it.
trafficVid = VideoReader('traffic.mj2')
trafficVid =
General Properties:
Name: 'traffic.mj2'
Path: 'B:\matlab\toolbox\images\imdata'
Duration: 8
CurrentTime: 0
NumFrames: 120
Video Properties:
Width: 160
Height: 120
FrameRate: 15
BitsPerPixel: 24
VideoFormat: 'RGB24'
The get method provides more information on the video such as its duration in seconds.
get(trafficVid)
obj =
General Properties:
Name: 'traffic.mj2'
Path: 'B:\matlab\toolbox\images\imdata'
Duration: 8
CurrentTime: 0
NumFrames: 120
Video Properties:
Width: 160
Height: 120
FrameRate: 15
2-25
2 Introduction
BitsPerPixel: 24
VideoFormat: 'RGB24'
When working with video data, it can be helpful to select a representative frame from the video and
develop your algorithm on that frame. Then, this algorithm can be applied to the processing of all the
frames in the video.
For this car-tagging application, examine a frame that includes both light-colored and dark-colored
cars. When an image has many structures, like the traffic video frames, it is useful to simplify the
image as much as possible before trying to detect an object of interest. One way to do this for the car
tagging application is to suppress all objects in the image that are not light-colored cars (dark-colored
cars, lanes, grass, etc.). Typically, it takes a combination of techniques to remove these extraneous
objects.
One way to remove the dark-colored cars from the video frames is to use the imextendedmax
function. This function returns a binary image that identifies regions with intensity values above a
specified threshold, called regional maxima. All other objects in the image with pixel values below
this threshold become the background. To eliminate the dark-colored cars, determine the average
2-26
Detecting Cars in a Video of Traffic
pixel value for these objects in the image. (Use rgb2gray to convert the original video from RGB to
grayscale.) You can use the pixel region tool in implay to view pixel values. Specify the average pixel
value (or a value slightly higher) as the threshold when you call imextendedmax. For this example,
set the value to 50.
darkCarValue = 50;
darkCar = rgb2gray(read(trafficVid,71));
noDarkCar = imextendedmax(darkCar, darkCarValue);
imshow(darkCar)
figure, imshow(noDarkCar)
In the processed image, note how most of the dark-colored car objects are removed but many other
extraneous objects remain, particularly the lane-markings. The regional maxima processing will not
remove the lane markings because their pixel values are above the threshold. To remove these
objects, you can use the morphological function imopen. This function uses morphological processing
to remove small objects from a binary image while preserving large objects. When using
morphological processing, you must decide on the size and shape of the structuring element used in
the operation. Because the lane-markings are long and thin objects, use a disk-shaped structuring
element with radius corresponding to the width of the lane markings. You can use the pixel region
tool in implay to estimate the width of these objects. For this example, set the value to 2.
sedisk = strel('disk',2);
noSmallStructures = imopen(noDarkCar, sedisk);
imshow(noSmallStructures)
2-27
2 Introduction
To complete the algorithm, use regionprops to find the centroid of the objects in
noSmallStructures (should just be the light-colored cars). Use this information to position the tag
on the light-colored cars in the original video.
The car-tagging application processes the video one frame at a time in a loop. (Because a typical
video contains a large number of frames, it would take a lot of memory to read and process all the
frames at once.)
A small video (like the one in this example) could be processed at once, and there are many functions
that provide this capability. For more information, see “Process Image Sequences” on page 2-49.
For faster processing, preallocate the memory used to store the processed video.
nframes = trafficVid.NumberOfFrames;
I = read(trafficVid, 1);
taggedCars = zeros([size(I,1) size(I,2) 3 nframes], class(I));
for k = 1 : nframes
singleFrame = read(trafficVid, k);
% Get the area and centroid of each remaining object in the frame. The
% object with the largest area is the light-colored car. Create a copy
% of the original frame and tag the car by changing the centroid pixel
% value to red.
taggedCars(:,:,:,k) = singleFrame;
2-28
Detecting Cars in a Video of Traffic
if ~isempty([stats.Area])
areaArray = [stats.Area];
[junk,idx] = max(areaArray);
c = stats(idx).Centroid;
c = floor(fliplr(c));
width = 2;
row = c(1)-width:c(1)+width;
col = c(2)-width:c(2)+width;
taggedCars(row,col,1,k) = 255;
taggedCars(row,col,2,k) = 0;
taggedCars(row,col,3,k) = 0;
end
end
Get the frame rate of the original video and use it to see taggedCars in implay.
frameRate = trafficVid.FrameRate;
implay(taggedCars,frameRate);
See Also
VideoReader | bwareaopen | imextendedmax | imopen | implay | regionprops | rgb2gray
2-29
2 Introduction
More About
• “Work with Image Sequences as Multidimensional Arrays” on page 2-48
• “Perform an Operation on a Sequence of Images” on page 2-23
2-30
Process Folder of Images Using Image Batch Processor App
Create a new folder in an area where you have write permission and copy a set of 10 images from the
Image Processing Toolbox imdata folder into the new folder.
mkdir('cellprocessing');
copyfile(fullfile(matlabroot,'toolbox','images','imdata','AT3*.tif'),'cellprocessing','f');
Open the Image Batch Processor app from the MATLAB® toolstrip. On the Apps tab, in the Image
Load images into the app. In the app toolstrip, click Load Images. In the Load Images from Folder
dialog box, specify the folder containing the images you want to load. For this example, specify the
folder that you created in the first step, cellprocessing. By default, the app includes images in
subfolders. To change this behavior, clear Include images in subfolders. Then, click Load.
2-31
2 Introduction
The Image Batch Processor app creates thumbnails of the images in the folder and displays them in
a scrollable tab in the left pane. The app displays the first selected image (highlighted in blue) in
larger resolution in the Input Image tab in the right pane.
Specify the name of the function you want to use to process the images in the folder. To specify an
existing function, type the name in the Function Name box in the Batch Function section of the app
toolstrip. You can also click the folder icon next to the box to browse and select the function. To
create a new batch processing function, click New in the Batch Function section of the app toolstrip.
2-32
Process Folder of Images Using Image Batch Processor App
When you do this, the app opens the batch processing function template in the MATLAB® Editor. For
this example, click New to create a new function.
In the batch processing function template, enter code for the new function into the space reserved in
the template file and click Save. This example uses the default name for the batch processing
function, myimfcn, but you can specify any name. For this example, the code specifies a function that
creates a mask image, calculates the total number of cells in the image, and creates a thresholded
version of the original image.
%--------------------------------------------------------------------------
% Auto-generated by imageBatchProcessor App.
%
% When used by the App, this function will be called for every input image
% file automatically. IM contains the input image as a matrix. RESULTS is a
% scalar structure containing the results of this processing function.
%
%--------------------------------------------------------------------------
imstd = stdfilt(im,ones(27));
bw = imstd>30;
2-33
2 Introduction
results.bw = bw;
results.thresholdMask = thresholdMask;
results.numCells = n;
end
Save the file. After saving, the app displays the name of this new function in the Function Name box
on the app toolstrip.
Test the new function by running the batch processor on one of your images. With one image selected
(highlighted in blue), click Process Selected to process the selected image. The app displays the
results of the processing in a new tab called Results. For this example, the app displays the binary
mask, a count of the number of objects (cells) in the image, and a thresholded version of the image.
2-34
Process Folder of Images Using Image Batch Processor App
To get a closer view of the image results, click Show for that particular result in the Results tab. The
app open a larger resolution version of the image in a new tab in a bottom-center pane. For this
example, view the binary mask results be clicking Show for bw in the Results tab. To explore the
results, move the cursor over the image to access the pan and zoom controls. When zooming and
panning, the app links the result image to the original image—panning or zooming on one image
causes the other image to move as well. If you do not want this behavior, clear Link Axes in the app
toolstrip.
2-35
2 Introduction
If the results of the test run on one image are successful, then execute the function on all of the
images in the folder. To process all the images at once, on the app toolstrip, click Process Selected
and select Process All. To process only a subset of the images, click Process Selected. You can
select images to process either by pressing Ctrl and clicking the desired images or by clicking one
image to start, pressing Shift, and clicking another image to select all images in between the starting
and ending images. If you have Parallel Computing Toolbox™, you can click Use Parallel on the app
toolstrip to process the images on a local parallel pool. For this example, process all of the images.
The app processes all the images in the specified folder. A filled-in green square next to a thumbnail
indicates the app successfully processed that image. The Results tab contains the results of the
selected image (highlighted in blue). A status bar at the bottom-right of the app reports on the
number of images processed.
To save the results, click Export to view the options available. You can export results to the
workspace or to a file, or you can get the MATLAB® code the app used to generate the results.
Save the results in a workspace variable. On the app toolstrip, click Export and select Export result
of all processed images to workspace option. In the dialog box that opens, select the results you
2-36
Process Folder of Images Using Image Batch Processor App
want to export. A common approach is to export the nonimage results to the workspace and save the
images that result from the processing in files. This example saves the cell count along with the name
of the input file to the workspace variable numCells.
By default, the app returns the results you select in a table named allresults. To store the results
in a structure instead of a table, select Struct Array in the Choose format section of the dialog box.
To specify another name for the result variable, change Variable name in the dialog box. If you
select Include input image file name, the app includes the name of the image associated with the
results in the structure or table. After specifying exporting details, click OK.
To get the MATLAB® code that the app used to process your files, on the app toolstrip, click Export
and select the Generate function. The app generates a function that accepts the input folder name
and the output folder name as input arguments. By default, the function returns a table with the
results, but you can choose a structure instead. For image results, you can specify the file format and
whether you want the function to write the image to the specified output folder.
2-37
2 Introduction
Download the BBBC005v1 data set from the Broad Bioimage Benchmark Collection. This data set is
an annotated biological image set designed for testing and validation. The image set provides
examples of in- and out-of-focus synthetic images, which can be used for validation of focus metrics.
The data set contains almost 20,000 files. For more information, see this introduction to the data set.
At the system prompt on a Linux system, use the wget command to download the zip file containing
the BBBC data set. Before running this command, make sure that your target location has enough
space to hold the zip file (1.8 GB) and the extracted images (2.6 GB).
wget https://data.broadinstitute.org/bbbc/BBBC005/BBBC005_v1_images.zip
At the system prompt on a Linux system, extract the files from the zip file.
unzip BBBC005_v1_images.zip
Examine the image file names in this data set. The names are constructed in a specific format to
contain useful information about each image. For example, the file name BBBC005_v1_images/
SIMCEPImages_A05_C18_F1_s16_w1.TIF indicates that the image contains 18 cells (C18) and was
filtered with a Gaussian low-pass filter with diameter 1 and a sigma of 0.25x diameter to simulate
focus blur (F1). The w1 identifies the stain used. For example, find the number of images in the data
set that use the w1 stain.
d = dir('C:\Temp\BBBCdata\BBBC005_v1_images\*w1*');
numel(d)
ans = 9600
View the files in the BBBC data set and test an algorithm on a small subset of the files using the
Image Batch Processor app. The example tests a simple algorithm that segments the cells in the
images. (The example uses a modified version of this cell segmentation algorithm to create the cell
counting algorithm used in the MapReduce implementation.)
Open the Image Batch Processor app. From the MATLAB toolstrip, on the Apps tab, in the Image
Processing and Computer Vision section, click Image Batch Processor. You can also open the app
from the command line using the imageBatchProcessor command.
In the Image Batch Processor app, click Load Images and navigate to the folder in which you
stored the downloaded data set.
2-38
Process Large Set of Images Using MapReduce Framework and Hadoop
The Image Batch Processor app displays thumbnails of the images in the folder in the left pane and
a higher-resolution version of the currently selected image in the Input Image tab. View some of the
images to get familiar with the data set.
2-39
2 Introduction
Specify the name of the function that implements your cell segmentation algorithm. To specify an
existing function, type its name in the Function name field or click the folder icon to browse and
select the function. To create a new batch processing function, click New. The app opens the batch
function template in the MATLAB® editor. For this example, create a new function containing the
following image segmentation code. Click Save to create the batch function. The app updates to
display the name of the function you created in the Batch Function section of the app toolstrip.
Select the thumbnail of an image displayed in the app and click Process Selected to execute a test
run of your algorithm. For this example, choose only an image with the “w1” stain (identifiable in the
file name). The segmentation algorithm works best with these images.
2-40
Process Large Set of Images Using MapReduce Framework and Hadoop
Examine the results of running your algorithm to verify that your segmentation algorithm found the
correct number of cells in the image. The names of the images contain the cell count in the C number.
For example, the image named SIMCEPImages_A05_C18_F1_s05_w1.TIF contains 18 cells.
Compare this number to the results returned at the command line for the sample image.
After assuring that your segmentation code works as expected on one image, set up a small test
version on your local system of the large scale processing you want to perform. You should test your
processing framework before running it on thousands of files.
First, create an image datastore, using the ImageDatastore function, containing a small subset of
your images. MapReduce uses a datastore to process data in small chunks that individually fit into
memory. Move to the folder containing the images and create an image datastore. Because the cell
segmentation algorithm implemented in cellSegmenter.m works best with the cell body stain,
select only the files with the indicator w1 in their file names.
localimds = imageDatastore(fullfile('/your_data/broad_data/BBBC005_v1-
images','*w1*'));
Even limiting the selection to file with "w1" in their names, the image data store still contains over
9000 files. Subset the list of images further, selecting every 100th file from the thousands of files in
the data set.
localimds.Files = localimds.Files(1:100:end);
2-41
2 Introduction
Once you have created the image datastore, convert the sample subset of images into Hadoop
sequence files, a format used by the Hadoop cluster. Note that this step simply changes the data from
one storage format to another without changing the data value. For more information about sequence
files, see Getting Started with MapReduce (MATLAB).
To convert the image datastore to an Hadoop sequence file, create a “map” function and a “reduce”
function which you pass to the mapreduce function. To convert the image files to Hadoop sequence
files, the map function should be a no-op function. For this example, the map function simply saves
the image data as-is, using its file name as a key.
function identityMap(data, info, intermKVStore)
add(intermKVStore, info.Filename, data);
end
Create a reduce function that converts the image files into a key-value datastore backed by sequence
files.
function identityReduce(key, intermValueIter, outKVStore)
while hasnext(intermValueIter)
add(outKVStore, key, getnext(intermValueIter));
end
end
Call mapreduce, passing your map and reduce functions. The example first calls the mapreducer
function to specify where the processing takes place. To test your set up and perform the processing
on your local system, specify 0.
mapreducer(0);
After creating the subset of image files for testing, and converting them to a key-value datastore, you
are ready to test the algorithm. Modify your original cell segmentation algorithm to return the cell
count. (The Image Batch Processor app, where this example first tested the algorithm, can only
return processed images, not values such as the cell count.)
Modify the cell segmentation function to return a cell count and remove the display of the image.
function cellCount = cellCounter(im)
% Otsu thresholding
bw = imbinarize(im);
2-42
Process Large Set of Images Using MapReduce Framework and Hadoop
cellCount = sum(round(cellsPerBlob));
end
Create a map function that calculates the error count for a specific image. This function gets the
actual cell count for an image from the file name coding (the C number) and compares it to the cell
count returned by the segmentation algorithm.
function mapImageToMisCountError(data, ~, intermKVStore)
% Extract the image
im = data.Value{1};
% Call the cell counting algorithm
actCount = cellCounter(im);
% The original file name is available as the key
fileName = data.Key{1};
[~, name] = fileparts(fileName);
% Extract expected cell count and focus blur from the file name
strs = strsplit(name, '_');
expCount = str2double(strs{3}(2:end));
focusBlur = str2double(strs{4}(2:end));
diffCount = abs(actCount-expCount);
% Note: focus blur is the key
add(intermKVStore, focusBlur, diffCount);
end
Create a reduce function that computes the average error in cell count for each focus value.
function reduceErrorCount(key, intermValueIter, outKVStore)
focusBlur = key;
% Compute the sum of all differences in cell count for this value of
% focus blur
count = 0;
totalDiff = 0;
while hasnext(intermValueIter)
diffCount = getnext(intermvalueIter);
count = count + 1;
totalDiff = totalDiff+diffCount;
end
% Average
meanDiff = totalDiff/count;
add(outKVStore, focusBlue, meanDiff);
end
focusErrorTbl = readall(focusErrords);
averageErrors = cell2mat(focusErrorTbl.Value);
The simple cell counting algorithm used here relies on the average area of a cell or a group of cells.
Increasing focus blur diffuses cell boundaries, and thus the area. The expected result is for the error
to go up with increasing focus blur, as seen in this plot of the results.
2-43
2 Introduction
function plot_errors()
bar(focusErrorTbl.Key, averageErrors);
ha = gca;
ha.XTick = sort(focusErrorTbl.Key);
ha.XLim = [min(focusErrorTbl.Key)-2 max(focusErrorTbl.Key)+2];
title('Cell counting result on a test data set');
xlabel('Focus blur');
ylabel('Average error in cell count');
end
Now that you've verified the processing of your algorithm on a subset of your data, run your
algorithm on the full dataset on a Hadoop cluster.
Load all the image data into the Hadoop file system and run your MapReduce framework on a
Hadoop cluster, using the following shell commands. To run this command, replace your_data with
the location on your computer.
2-44
Process Large Set of Images Using MapReduce Framework and Hadoop
Set up access to the MATLAB Parallel Server cluster. To run this command, replace 'your/hadoop/
install' with the location on your computer.
setenv('HADOOP_HOME','/your/hadoop/install');
cluster = parallel.cluster.Hadoop;
cluster.HadoopProperties('mapred.job.tracker') = 'hadoop01glnxa64:54311';
cluster.HadoopProperties('fs.default.name') = 'hdfs://hadoop01glnxa64:54310';
disp(cluster);
mapreducer(cluster);
Convert all the image data into Hadoop sequence files. This is similar to what you did on your local
system when you converted a subset of the images for prototyping. You can reuse the map and reduce
functions you used previously. Use the internal Hadoop cluster.
broadFolder = 'hdfs://hadoop01glnxa64:54310/user/broad_data/
BBBC005_v1_images';
Pick only the cell body stain (w1) files for processing.
imageDS = imageDatastore(w1Files);
seqFolder = 'hdfs://hadoop01glnxa64:54310/user/datasets/images/broad_data/
broad_sequence';
Run the cell counting algorithm on the entire data set stored in the Hadoop file system using the
MapReduce framework. The only change from running the framework on your local system is that
now the input and output locations are on the Hadoop file system.
2-45
2 Introduction
output = 'hdfs://hadoop01glnxa64:54310/user/broad_data/
BBBC005_focus_vs_errorCount';
Run your algorithm on the Mapreduce framework. Use the tic and toc functions to record how long
it takes to process the set of images.
tic;
toc
Gather results.
focusErrorTbl = readall(focusErrords);
averageErrors = cell2mat(focusErrorTbl.Value);
2-46
Process Large Set of Images Using MapReduce Framework and Hadoop
See Also
ImageDatastore | mapreduce
More About
• “Getting Started with MapReduce”
• “Work with Remote Data”
2-47
2 Introduction
• If you have a sequence of 2-D grayscale, binary, or indexed images, then concatenate the images
in the third dimension to create a 3-D array of size m-by-n-by-p. Each of the p images has size m-
by-n.
• If you have a sequence of 2-D RGB images, then concatenate the images along the fourth
dimension to create a 4-D array of size m-by-n-by-3-by-p. Each of the p images has size m-by-n-
by-3.
Use the cat function to concatenate individual images. For example, this code concatenates a group
of RGB images along the fourth dimension.
A = cat(4,A1,A2,A3,A4,A5)
Note Some functions work with a particular type of multidimensional array, call a multiframe array.
In a multiframe array, images are concatenated along the fourth dimension regardless of the number
of color channels that the images have. A multiframe array of grayscale, binary, or indexed images
has size m-by-n-by-1-by-p. If you need to convert a multiframe array of grayscale images to a 3-D
array for use with other toolbox functions, then you can use the squeeze function to remove the
singleton dimension.
To animate an image sequence or provide navigation within the sequence, use the Video Viewer app
(implay). The Video Viewer app provides playback controls that you can use to navigate among the
frames in the sequence.
2-48
Work with Image Sequences as Multidimensional Arrays
Some toolbox functions that accept multidimensional arrays, however, do not by default interpret an
m-by-n-by-p or an m-by-n-by-3-by-p array as an image sequence. To use these functions with image
sequences, you must use particular syntax and be aware of other limitations. The table lists these
toolbox functions and provides guidelines about how to use them to process image sequences.
2-49
2 Introduction
See Also
More About
• “Perform an Operation on a Sequence of Images” on page 2-23
• “View Image Sequences in Video Viewer App” on page 4-65
• “Process Folder of Images Using Image Batch Processor App” on page 2-31
2-50
Image Arithmetic Functions
You can do image arithmetic using the MATLAB arithmetic operators. The Image Processing Toolbox
software also includes a set of functions that implement arithmetic operations for all numeric,
nonsparse data types. The toolbox arithmetic functions accept any numeric data type, including
uint8, uint16, and double, and return the result image in the same format. The functions perform
the operations in double precision, on an element-by-element basis, but do not convert images to
double-precision values in the MATLAB workspace. Overflow is handled automatically. The functions
clip return values to fit the data type.
Note On Intel® architecture processors, the image arithmetic functions can take advantage of the
Intel Integrated Performance Primitives (Intel IPP) library, thus accelerating their execution time. The
Intel IPP library is only activated, however, when the data passed to these functions is of specific data
types. See the reference pages for the individual arithmetic functions for more information.
See Also
More About
• “Image Arithmetic Clipping Rules” on page 2-52
• “Nest Calls to Image Arithmetic Functions” on page 2-53
2-51
2 Introduction
MATLAB arithmetic operators and the Image Processing Toolbox arithmetic functions use these rules
for integer arithmetic:
• Values that exceed the range of the integer type are clipped, or truncated, to that range.
• Fractional values are rounded.
For example, if the data type is uint8, results greater than 255 (including Inf) are set to 255. The
table lists some additional examples.
See Also
More About
• “Nest Calls to Image Arithmetic Functions” on page 2-53
2-52
Nest Calls to Image Arithmetic Functions
A+B
C=
2
I = imread('rice.png');
I2 = imread('cameraman.tif');
K = imdivide(imadd(I,I2),2); % not recommended
When used with uint8 or uint16 data, each arithmetic function rounds and clips its result before
passing it on to the next operation. This can significantly reduce the precision of the calculation.
A better way to perform this calculation is to use the imlincomb function. imlincomb performs all
the arithmetic operations in the linear combination in double precision and only rounds and clips the
final result.
K = imlincomb(.5,I,.5,I2); % recommended
See Also
imlincomb
More About
• “Image Arithmetic Clipping Rules” on page 2-52
2-53
2 Introduction
Image data differences can be used to distinguish different surface features of an image, which have
varying reflectivity across different spectral channels. By finding differences between the visible red
and NIR channels, the example identifies areas containing significant vegetation.
This example finds vegetation in a LANDSAT Thematic Mapper image covering part of Paris, France,
made available courtesy of Space Imaging, LLC. Seven spectral channels (bands) are stored in one
file in the Erdas LAN format. The LAN file, paris.lan, contains a 7-channel 512-by-512 Landsat
image. A 128-byte header is followed by the pixel values, which are band interleaved by line (BIL) in
order of increasing band number. Pixel values are stored as unsigned 8-bit integers, in little-endian
byte order.
The first step is to read bands 4, 3, and 2 from the LAN file using the MATLAB® function
multibandread.
Channels 4, 3, and 2 cover the near infrared (NIR), the visible red, and the visible green parts of the
electromagnetic spectrum. When they are mapped to the red, green, and blue planes, respectively, of
an RGB image, the result is a standard color-infrared (CIR) composite. The final input argument to
multibandread specifies which bands to read, and in which order, so that you can construct a
composite in a single step.
Variable CIR is a 512-by-512-by-3 array of class uint8. It is an RGB image, but with false colors.
When the image is displayed, red pixel values signify the NIR channel, green values signify the visible
red channel, and blue values signify the visible green channel.
In the CIR image, water features are very dark (the Seine River) and green vegetation appears red
(parks and shade trees). Much of the image appearance is due to the fact that healthy, chlorophyll-
rich vegetation has a high reflectance in the near infrared. Because the NIR channel is mapped to the
red channel in the composite image, any area with a high vegetation density appears red in the
display. A noticeable example is the area of bright red on the left edge, a large park (the Bois de
Boulogne) located west of central Paris within a bend of the Seine River.
imshow(CIR)
title('CIR Composite')
text(size(CIR,2),size(CIR,1) + 15,...
'Image courtesy of Space Imaging, LLC',...
'FontSize',7,'HorizontalAlignment','right')
2-54
Find Vegetation in a Multispectral Image
By analyzing differences between the NIR and red channels, you can quantify this contrast in spectral
content between vegetated areas and other surfaces such as pavement, bare soil, buildings, or water.
A scatter plot is a natural place to start when comparing the NIR channel (displayed as red pixel
values) with the visible red channel (displayed as green pixel values). It's convenient to extract these
channels from the original CIR composite into individual variables. It's also helpful to convert from
class uint8 to class single so that the same variables can be used in the NDVI computation below,
as well as in the scatter plot.
NIR = im2single(CIR(:,:,1));
R = im2single(CIR(:,:,2));
Viewing the two channels together as grayscale images, you can see how different they look.
2-55
2 Introduction
imshow(R)
title('Visible Red Band')
imshow(NIR)
title('Near Infrared Band')
2-56
Find Vegetation in a Multispectral Image
With one simple call to the plot command in MATLAB, you can create a scatter plot displaying one
point per pixel (as a blue cross, in this case), with its x-coordinate determined by its value in the red
channel and its y-coordinate by the value its value in the NIR channel.
plot(R,NIR,'+b')
ax = gca;
ax.XLim = [0 1];
ax.XTick = 0:0.2:1;
ax.YLim = [0 1];
ax.YTick = 0:0.2:1;
axis square
xlabel('red level')
ylabel('NIR level')
title('NIR vs. Red Scatter Plot')
2-57
2 Introduction
The appearance of the scatter plot of the Paris scene is characteristic of a temperate urban area with
trees in summer foliage. There's a set of pixels near the diagonal for which the NIR and red values
are nearly equal. This "gray edge" includes features such as road surfaces and many rooftops. Above
and to the left is another set of pixels for which the NIR value is often well above the red value. This
zone encompasses essentially all of the green vegetation.
Observe from the scatter plot that taking the ratio of the NIR level to red level would be one way to
locate pixels containing dense vegetation. However, the result would be noisy for dark pixels with
small values in both channels. Also notice that the difference between the NIR and red channels
should be larger for greater chlorophyll density. The Normalized Difference Vegetation Index (NDVI)
is motivated by this second observation. It takes the (NIR - red) difference and normalizes it to help
balance out the effects of uneven illumination such as the shadows of clouds or hills. In other words,
2-58
Find Vegetation in a Multispectral Image
on a pixel-by-pixel basis subtract the value of the red channel from the value of the NIR channel and
divide by their sum.
Notice how the array-arithmetic operators in MATLAB make it possible to compute an entire NDVI
image in one simple command. Recall that variables R and NIR have class single. This choice uses
less storage than class double but unlike an integer class also allows the resulting ratio to assume a
smooth gradation of values.
Variable ndvi is a 2-D array of class single with a theoretical maximum range of [-1 1]. You can
specify these theoretical limits when displaying ndvi as a grayscale image.
figure
imshow(ndvi,'DisplayRange',[-1 1])
title('Normalized Difference Vegetation Index')
2-59
2 Introduction
The Seine River appears very dark in the NDVI image. The large light area near the left edge of the
image is the park (Bois de Boulogne) noted earlier.
In order to identify pixels most likely to contain significant vegetation, apply a simple threshold to the
NDVI image.
threshold = 0.4;
q = (ndvi > threshold);
ans = 5.2204
2-60
Find Vegetation in a Multispectral Image
or about 5 percent.
The park and other smaller areas of vegetation appear white by default when displaying the logical
(binary) image q.
imshow(q)
title('NDVI with Threshold Applied')
To link the spectral and spatial content, you can locate above-threshold pixels on the NIR-red scatter
plot, re-drawing the scatter plot with the above-threshold pixels in a contrasting color (green) and
then re-displaying the threshold NDVI image using the same blue-green color scheme. As expected,
the pixels having an NDVI value above the threshold appear to the upper left of the rest and
correspond to the redder pixels in the CIR composite displays.
2-61
2 Introduction
figure
subplot(1,2,1)
plot(R,NIR,'+b')
hold on
plot(R(q(:)),NIR(q(:)),'g+')
axis square
xlabel('red level')
ylabel('NIR level')
title('NIR vs. Red Scatter Plot')
subplot(1,2,2)
imshow(q)
colormap([0 0 1; 0 1 0]);
title('NDVI with Threshold Applied')
See Also
im2single | imshow | multibandread
Related Examples
• “Enhance Multispectral Color Composite Images” on page 11-92
2-62
Find Vegetation in a Multispectral Image
More About
• “Images in MATLAB” on page 2-2
2-63
3
This chapter describes how to get information about the contents of a graphics file, read image data
from a file, and write image data to a file, using standard graphics and medical file formats.
Read a truecolor image into the workspace. The example reads the image data from a graphics file
that uses JPEG format.
RGB = imread('football.jpg');
If the image file format uses 8-bit pixels, imread returns the image data as an m-by-n-by-3 array of
uint8 values. For graphics file formats that support 16-bit data, such as PNG and TIFF, imread
returns an array of uint16 values.
whos
Read a grayscale image into the workspace. The example reads the image data from a graphics file
that uses the TIFF format. imread returns the grayscale image as an m-by-n array of uint8 values.
I = imread('cameraman.tif');
whos
Read an indexed image into the workspace. imread uses two variables to store an indexed image in
the workspace: one for the image and another for its associated colormap. imread always reads the
colormap into a matrix of class double, even though the image array itself may be of class uint8 or
uint16.
[X,map] = imread('trees.tif');
whos
In these examples, imread infers the file format to use from the contents of the file. You can also
specify the file format as an argument to imread. imread supports many common graphics file
formats, such as the Graphics Interchange Format (GIF), Joint Photographic Experts Group (JPEG),
Portable Network Graphics (PNG), and Tagged Image File Format (TIFF) formats. For the latest
information concerning the bit depths and image formats supported, see imread and imformats
reference pages.
pep = imread('peppers.png','png');
whos
3-2
Read Image Data into the Workspace
See Also
imread
More About
• “Image Types in the Toolbox” on page 2-12
• “Read Multiple Images from a Single Graphics File” on page 3-4
• “Write Image Data to File in Graphics Format” on page 3-6
3-3
3 Reading and Writing Image Data
Read the images from the file, using a loop to read each image sequentially.
for frame=1:27
[mri(:,:,:,frame),map] = imread('mri.tif',frame);
end
whos
See Also
imread
More About
• “Read Image Data into the Workspace” on page 3-2
3-4
Read and Write 1-Bit Binary Images
Check the bit depth of the graphics file containing a binary image, text.png. Note that the file
stores the binary image in 1-bit format.
info = imfinfo('text.png');
info.BitDepth
ans = 1
Read the binary image from the file into the workspace. When you read a binary image stored in 1-bit
format, imread represents the data in the workspace as a logical array.
BW = imread('text.png');
whos
Write the binary image to a file in 1-bit format. If the file format supports it, imwrite exports a
binary image as a 1-bit image, by default. To verify this, use imfinfo to get information about the
newly created file and check the BitDepth field. When writing binary files, imwrite sets the
ColorType field to grayscale.
imwrite(BW,'test.tif');
info = imfinfo('test.tif');
info.BitDepth
ans = 1
See Also
imfinfo | imread | imwrite
More About
• “Image Types in the Toolbox” on page 2-12
• “Read Image Data into the Workspace” on page 3-2
3-5
3 Reading and Writing Image Data
Load image data into the workspace. This example loads the indexed image X from a MAT-file,
trees.mat, along with the associated colormap map.
load trees
whos
Export the image data as a bitmap file using imwrite, specifying the name of the variable and the
name of the output file you want to create. If you include an extension in the filename, imwrite
attempts to infer the desired file format from it. For example, the file extension .bmp specifies the
Microsoft Windows Bitmap format. You can also specify the format explicitly as an argument to
imwrite.
imwrite(X,map,'trees.bmp')
Use format-specific parameters with imwrite to control aspects of the export process. For example,
with PNG files, you can specify the bit depth. To illustrate, read an image into the workspace in TIFF
format and note its bit depth.
I = imread('cameraman.tif');
s = imfinfo('cameraman.tif');
s.BitDepth
ans = 8
Write the image to a graphics file in PNG format, specifying a bit depth of 4.
imwrite(I,'cameraman.png','Bitdepth',4)
newfile = imfinfo('cameraman.png');
newfile.BitDepth
ans = 4
See Also
imwrite
More About
• “Image Types in the Toolbox” on page 2-12
• “Read Image Data into the Workspace” on page 3-2
3-6
DICOM Support in Image Processing Toolbox
MATLAB provides the following support for working with files in the DICOM format:
Note MATLAB supports working with DICOM files. There is no support for working with DICOM
network capabilities.
See Also
Apps
DICOM Browser
Functions
dicomCollection | dicomread | dicomreadVolume | dicomwrite
More About
• “Read Metadata from DICOM Files” on page 3-8
• “Read Image Data from DICOM Files” on page 3-10
• “Create New DICOM Series” on page 3-18
• “Remove Confidential Information from DICOM File” on page 3-14
• “Write Image Data to DICOM Files” on page 3-12
3-7
3 Reading and Writing Image Data
The following example reads the metadata from a sample DICOM file that is included with the
toolbox.
info = dicominfo('CT-MONO2-16-ankle.dcm')
info =
When dicominfo encounters a private metadata field in a DICOM file, it returns the metadata
creating a generic name for the field based on the group and element tags of the metadata. For
example, if the file contained private metadata at group 0009 and element 0006, dicominfo creates
the name:Private_0009_0006. dicominfo attempts to interpret the private metadata, if it can. For
example, if the metadata contains characters, dicominfo processes the data. If it can't interpret the
data, dicominfo returns a sequence of bytes.
If you need to process a DICOM file created by a manufacturer that uses private metadata, and you
prefer to view the correct name of the field as well as the data, you can create your own copy of the
DICOM data dictionary and update it to include definitions of the private metadata. You will need
information about the private metadata that vendors typically provide in DICOM compliance
statements. For more information about updating DICOM dictionary, see “Create Your Own Copy of
DICOM Dictionary” on page 3-9.
3-8
Read Metadata from DICOM Files
1 Make a copy of the text version of the DICOM dictionary that is included with MATLAB. This file,
called dicom-dict.txt is located in matlabroot/toolbox/images/medformats or
matlabroot/toolbox/images/iptformats depending on which version of the Image
Processing Toolbox software you are working with. Do not attempt to edit the MAT-file version of
the dictionary, dicom-dict.mat.
2 Edit your copy of the DICOM dictionary, adding entries for the metadata. Insert the new
metadata field using the group and element tag, type, and other information. Follow the format of
the other entries in the file. The creator of the metadata (such as an equipment vendor) must
provide you with the information.
3 Save your copy of the dictionary.
4 Set MATLAB to use your copy of the DICOM dictionary, dicomdict function.
See Also
Apps
DICOM Browser
Functions
dicomdict | dicomdisp | dicominfo | dicomread | dicomreadVolume
More About
• “Read Image Data from DICOM Files” on page 3-10
• “Create New DICOM Series” on page 3-18
• “Remove Confidential Information from DICOM File” on page 3-14
• “Write Image Data to DICOM Files” on page 3-12
3-9
3 Reading and Writing Image Data
When using dicomread, you can specify the file name as an argument, as in the following example.
The example reads the sample DICOM file that is included with the toolbox.
I = dicomread('CT-MONO2-16-ankle.dcm');
You can also use the metadata structure returned by dicominfo to specify the file you want to read,
as in the following example.
info = dicominfo('CT-MONO2-16-ankle.dcm');
I = dicomread(info);
imshow(I,'DisplayRange',[])
See Also
Apps
DICOM Browser
Functions
dicominfo | dicomread | dicomreadVolume
More About
• “Read Metadata from DICOM Files” on page 3-8
• “Create New DICOM Series” on page 3-18
3-10
Read Image Data from DICOM Files
3-11
3 Reading and Writing Image Data
dicomwrite can write many other types of DICOM data (such as X-ray, radiotherapy, or nuclear
medicine) to a file. However, dicomwrite does not perform any validation of this data.
You can also specify the metadata you want to write to the file by passing to dicomwrite an existing
DICOM metadata structure that you retrieved using dicominfo. In the following example, the
dicomwrite function writes the relevant information in the metadata structure info to the new
DICOM file.
info = dicominfo('CT-MONO2-16-ankle.dcm');
I = dicomread(info);
dicomwrite(I,'ankle.dcm',info)
Note that the metadata written to the file is not identical to the metadata in the info structure.
When writing metadata to a file, there are certain fields that dicomwrite must update. To illustrate,
look at the instance ID in the original metadata and compare it with the ID in the new file.
info.SOPInstanceUID
ans =
1.2.840.113619.2.1.2411.1031152382.365.1.736169244
Now, read the metadata from the newly created DICOM file, using dicominfo, and check the
SOPInstanceUID field.
info2 = dicominfo('ankle.dcm');
info2.SOPInstanceUID
ans =
1.2.841.113411.2.1.2411.10311244477.365.1.63874544
Note that the instance ID in the newly created file differs from the ID in the original file.
3-12
Write Image Data to DICOM Files
optionally includes a two-letter value representation (VR) that identifies the format of the attribute
data. For example, the format can be a single-precision binary floating point number, a character
vector that represents a decimal integer, or a character vector in the format of a date-time.
To include the VR in the attribute when using dicomwrite, specify the 'VR' name-value pair
argument as 'explicit'. If you do not specify the VR, then dicomwrite infers the value
representation from the data dictionary.
See Also
Apps
DICOM Browser
Functions
dicomanon | dicominfo | dicomread | dicomuid | dicomwrite
More About
• “Read Metadata from DICOM Files” on page 3-8
• “Read Image Data from DICOM Files” on page 3-10
• “Create New DICOM Series” on page 3-18
• “Remove Confidential Information from DICOM File” on page 3-14
3-13
3 Reading and Writing Image Data
When using a DICOM file as part of a training set, blinded study, or a presentation, you might want to
remove confidential patient information, a process called anonymizing the file. To do this, use the
dicomanon function.
dicomFile = 'CT-MONO2-16-ankle.dcm';
I = dicomread(dicomFile);
Display the image. Because the DICOM image data is signed 16-bit data, automatically scale the
display range so that the minimum pixel value is black and the maximum pixel value is white.
imshow(I,'DisplayRange',[])
3-14
Remove Confidential Information from DICOM File
info = dicominfo(dicomFile);
The DICOM file in this example has already been anonymized for patient privacy. To create an
informative test DICOM file, set the PatientName with an artificial value using the Person Name (PN)
value representation.
info.PatientName = 'Doe^John';
dicomFileNotAnon = 'ankle_notAnon.dcm';
dicomwrite(I,dicomFileNotAnon,info);
3-15
3 Reading and Writing Image Data
Read the metadata from the non-anonymous DICOM file, then confirm that the patient name in the
new file is not anonymous.
infoNotAnon = dicominfo(dicomFileNotAnon);
infoNotAnon.PatientName
To identify the series to which the non-anonymous image belongs, display the value of the
SeriesInstanceUID property.
infoNotAnon.SeriesInstanceUID
ans =
'1.2.840.113619.2.1.2411.1031152382.365.736169244'
Anonymize the file using the dicomanon function. The function creates a new series with new study
values, changes some of the metadata, and then writes the image to a new file.
dicomFileAnon = 'ankle_anon.dcm'
dicomFileAnon =
'ankle_anon.dcm'
dicomanon(dicomFileNotAnon,dicomFileAnon);
infoAnon = dicominfo(dicomFileAnon);
infoAnon.PatientName
Confirm that the anonymous image belongs to a new study by displaying the value of the
SeriesInstanceUID property.
infoAnon.SeriesInstanceUID
ans =
'1.3.6.1.4.1.9590.100.1.2.23527108211086648741591392582894545242'
See Also
Apps
DICOM Browser
3-16
Remove Confidential Information from DICOM File
Functions
dicomanon | dicominfo | dicomread | dicomuid | dicomwrite
More About
• “Read Image Data from DICOM Files” on page 3-10
• “Read Metadata from DICOM Files” on page 3-8
• “Create New DICOM Series” on page 3-18
• “Write Image Data to DICOM Files” on page 3-12
• “DICOM Support in Image Processing Toolbox” on page 3-7
3-17
3 Reading and Writing Image Data
In the DICOM standard, images can be organized into series. By default, when you write an image
with metadata to a DICOM file, dicomwrite puts the image in the same series. You typically only
start a new DICOM series when you modify the image in some way. To make the modified image the
start of a new series, assign a new DICOM unique identifier to the SeriesInstanceUID metadata field.
I = dicomread('CT-MONO2-16-ankle.dcm');
Display the image. Because the DICOM image data is signed 16-bit data, automatically scale the
display range so that the minimum pixel value is black and the maximum pixel value is white.
imshow(I,'DisplayRange',[])
3-18
Create New DICOM Series
To identify the series an image belongs to, view the value of the SeriesInstanceUID property.
info.SeriesInstanceUID
ans =
'1.2.840.113619.2.1.2411.1031152382.365.736169244'
This example modifies the image by removing all of the text from the image. Text in the image
appears white. Find the maximum value of all pixels in the image, which corresponds to the white
text.
textValue = max(I(:));
3-19
3 Reading and Writing Image Data
The background of the image appears black. Find the minimum value of all pixels in the image, which
corresponds to the background.
backgroundValue = min(I(:));
To remove the text, set all pixels with the maximum value to the minimum value.
Imodified = I;
Imodified(Imodified == textValue) = backgroundValue;
imshow(Imodified,'DisplayRange',[])
3-20
Create New DICOM Series
To write the modified image as a new series, you need a new DICOM unique identifier (UID).
Generate a new UID using the dicomuid function. dicomuid is guaranteed to generate a unique
UID.
uid = dicomuid
uid =
'1.3.6.1.4.1.9590.100.1.2.380478983111216208316727402442736437608'
Set the value of the SeriesInstanceUID field in the metadata associated with the original DICOM file
to the generated value.
info.SeriesInstanceUID = uid;
Write the modified image to a new DICOM file, specifying the modified metadata structure, info, as an
argument. Because you set the SeriesInstanceUID value, the written image is part of a new series.
dicomwrite(Imodified,'ankle_newseries.dcm',info);
To verify this operation, view the image and the SeriesInstanceUID metadata field in the new file.
See Also
Apps
DICOM Browser
Functions
dicomanon | dicominfo | dicomread | dicomuid | dicomwrite
More About
• “Remove Confidential Information from DICOM File” on page 3-14
• “DICOM Support in Image Processing Toolbox” on page 3-7
3-21
3 Reading and Writing Image Data
dicomDir = fullfile(matlabroot,'toolbox/images/imdata/dog');
Create an imageDatastore, specifying the read function as a handle to the dicomread function.
I = read(dicomds);
Display the image. The image has signed 16-bit data, so scale the display range to the pixel values in
the image.
imshow(I,[])
3-22
Create Image Datastore Containing DICOM Images
See Also
dicomread | imageDatastore
More About
• “Create Image Datastore Containing Single and Multi-File DICOM Volumes” on page 3-24
3-23
3 Reading and Writing Image Data
Specify the location of a directory containing DICOM data. The data includes 2-D images, a 3-D
volume in a single file, and a multi-file 3-D volume.
dicomDir = fullfile(matlabroot,'toolbox/images/imdata');
Gather details about the DICOM files by using the dicomCollection function. This function returns
the details as a table, where each row represents a single study. For multi-file DICOM volumes, the
function aggregates the files into a single study.
collection = dicomCollection(dicomDir,"IncludeSubfolders",true)
collection=6×14 table
StudyDateTime SeriesDateTime PatientName PatientSex
________________________ ________________________ _______________ __________
Get the file names that comprise the study. For multi-file DICOM volumes, the file names are listed as
a string array.
dicomFileName = collection.Filenames{idx};
if length(dicomFileName) > 1
matFileName = fileparts(dicomFileName(1));
matFileName = split(matFileName,filesep);
matFileName = replace(strtrim(matFileName(end))," ","_");
else
[~,matFileName] = fileparts(dicomFileName);
end
matFileName = fullfile(matFileDir,matFileName);
Read the data. Try different read functions because the images have a different number of
dimensions and are stored in different formats.
1) Try reading the data of the study by using the dicomreadVolume function.
3-24
Create Image Datastore Containing Single and Multi-File DICOM Volumes
• If the data is a multi-file volume, then dicomreadVolume runs successfully and returns the
complete volume in a single 4-D array. You can add this data to the datastore.
• If the data is contained in a single file, then dicomreadVolume does not run successfully.
2) Try reading the data of the study by using the dicomread function.
• If dicomread returns a 4-D array, then the study contains a complete 3-D volume. You can add
this data to the datastore.
• If dicomread returns a 2-D matrix or 3-D array, then the study contains a single 2-D image. Skip
this data and continue to the next study in the collection.
try
data = dicomreadVolume(collection,collection.Row{idx});
catch ME
data = dicomread(dicomFileName);
if ndims(data)<4
% Skip files that are not volumes
continue;
end
end
For complete volumes returned in a 4-D array, write the data and the absolute file name to a MAT file.
save(matFileName,'data','dicomFileName');
end
Create an imageDatastore from the MAT files containing the volumetric DICOM data. Specify the
ReadFcn property as the helper function matRead, defined at the end of this example.
[V,Vinfo] = read(imdsdicom);
[~,VFileName] = fileparts(Vinfo.Filename);
The DICOM volume is grayscale. Remove the singleton channel dimension by using the squeeze
function, then display the volume by using the volshow function.
V = squeeze(V);
ViewPnl = uipanel(figure,'Title',VFileName);
volshow(V,'Parent',ViewPnl);
3-25
3 Reading and Writing Image Data
Supporting Functions
The matRead function loads data from the first variable of a MAT file with file name filename.
See Also
dicomCollection | dicominfo | dicomread | dicomreadVolume | imageDatastore | volshow
More About
• “Create Image Datastore Containing DICOM Images” on page 3-22
• “Preprocess Volumes for Deep Learning” (Deep Learning Toolbox)
3-26
Mayo Analyze 7.5 Files
Note The Analyze 7.5 format uses the same dual-file data set organization and the same file name
extensions as the Interfile format; however, the file formats are not interchangeable. To learn how to
read data from an Interfile data set, see “Interfile Files” on page 3-28.
The following example calls the analyze75info function to read the metadata from the Analyze 7.5
header file. The example then passes the info structure returned by analyze75info to the
analyze75read function to read the image data from the image file.
info = analyze75info('brainMRI.hdr');
X = analyze75read(info);
3-27
3 Reading and Writing Image Data
Interfile Files
Interfile is a file format that was developed for the exchange of nuclear medicine image data. For
more information, see interfileinfo or interfileread.
• Header file — Provides information about dimensions, identification and processing history. You
use the interfileinfo function to read the header information. The header file has the .hdr
file extension.
• Image file — Image data, whose data type and ordering are described by the header file. You use
interfileread to read the image data into the workspace. The image file has the .img file
extension.
Note The Interfile format uses the same dual-file data set organization and the same file name
extensions as the Analyze 7.5 format; however, the file formats are not interchangeable. To learn how
to read data from an Analyze 7.5 data set, see “Mayo Analyze 7.5 Files” on page 3-27.
3-28
End-to-End Implementation of Digital Camera Processing Pipeline
Introduction
Digital single-lens reflex (DSLR) cameras, and many modern phone cameras, can save data collected
from the camera sensor directly to a RAW file. Each pixel of RAW data is the amount of light captured
by the corresponding camera photosensor. The data depends on fixed characteristics of the camera
hardware, such as the sensitivity of each photosensor to a particular range of wavelengths of the
electromagnetic spectrum. The data also depends on camera acquisition settings, such as exposure
time, and factors of the scene, such as the light source.
These are the main processing steps in a traditional camera processing pipeline:
You can also apply additional postprocessing steps such as denoising, highlight clipping, and contrast
adjustment.
Cameras acquire images and create RAW files. These RAW files contain:
While a RAW file can also contain a camera-generated JPEG preview image, or a JPEG thumbnail, you
only need the CFA image and metadata to implement a camera pipeline.
Read a CFA test image from a file using the rawread function and display it.
fileName = "colorCheckerTestImage.NEF";
cfaImage = rawread(fileName);
whos cfaImage
imshow(cfaImage,[])
title("Linear CFA Image")
3-29
3 Reading and Writing Image Data
Many cameras mask a portion of the photosensor, typically at the edges, to prevent those sections
from capturing any light. This enables you to accurately determine the black level of the sensor. For
such cameras, the number of pixels in the image is smaller than the number of pixels in the sensor.
For example, use the rawinfo function to retrieve the metadata from the test CFA image. In the
metadata, note that the column value in the VisibleImageSize field is smaller than in the
CFAImageSize field. The rawread function, by default, reads only the visible portion of the CFA.
fileInfo = rawinfo(fileName);
if fileInfo.CFASensorType ~= "Bayer"
error( "The input file %s has CFA Image created by a % sensor." + ...
"The camera pipeline in this example will work only for a CFA Image created by a Bayer
fileName,fileInfo.CFASensorType );
end
disp(fileInfo.ImageSizeInfo)
3-30
End-to-End Implementation of Digital Camera Processing Pipeline
Many cameras apply nonlinear range compression to captured signals before storing them in RAW
files. To generate linear data, in preparation for transforming the CFA data, you must reverse this
nonlinear range compression. Cameras typically store this range compression as a lookup table
(LUT). If the camera does not apply range compression, the LUT contains an identity mapping. The
rawread function automatically applies this LUT to the CFA data and returns linearized light values.
Plot a subset of the values in the LinearizationTable field of the test image metadata.
linTable = fileInfo.ColorInfo.LinearizationTable;
plot((0:length(linTable)-1), fileInfo.ColorInfo.LinearizationTable)
title("Linearization Table")
Scale Image
RAW images do not have a true black value. Even with the shutter closed, electricity flowing through
the sensors causes nonzero photon counts. Cameras use the value of the masked pixels to compute
the black level of the CFA image. Various RAW file formats report this black level differently. Some file
3-31
3 Reading and Writing Image Data
formats specify the black level as one scalar value per channel of the CFA image. Other formats, such
as DNG, specify black-level as a repeated m-by-n region that starts at the top-left corner of the visible
portion of the CFA.
To scale the image, subtract the black level from the CFA image. Use the provided helper functions to
perform these calculations.
blackLevel = fileInfo.ColorInfo.BlackLevel;
disp(blackLevel)
0 0 0 0
if isvector(fileInfo.ColorInfo.BlackLevel)
cfaMultiChannel = performPerChannelBlackLevelSub(cfaImage, fileInfo);
else
cfa = performRegionBlackLevelSub(cfaImage, fileInfo);
To correct for CFA data values less than the black-level value, clamp the values to 0.
RAW file metadata often represent the white level as the maximum value allowed by the data type. If
this white-level value is much higher than the highest value in the image, using this white-level value
for scaling results in an image that is darker than it should be. To avoid this, scale the CFA image
using the maximum pixel value found in the image.
cfaMultiChannel = double(cfaMultiChannel);
whiteLevel = max(cfaMultiChannel(:));
disp(whiteLevel)
3366
White balance is the process of removing unrealistic color casts from a rendered image, such that it
appears closer to how human eyes would see the subject.
camWB = fileInfo.ColorInfo.CameraAsTakenWhiteBalance;
Next, scale the multipliers so that the values of the G channel are 1.
gLoc = strfind(fileInfo.CFALayout,"G");
gLoc = gLoc(1);
camWB = camWB/camWB(gLoc);
disp(camWB)
3-32
End-to-End Implementation of Digital Camera Processing Pipeline
cfaWB = planar2raw(wbCFAMultiChannel);
cfaWB = im2uint16(cfaWB);
Convert the Bayer-encoded CFA image into a truecolor image by demosaicing and rotating it. This
image is in linear camera space.
camspaceRGB = demosaic(cfaWB,fileInfo.CFALayout);
camspaceRGB = imrotate(camspaceRGB,fileInfo.ImageSizeInfo.ImageRotation);
imshow(camspaceRGB)
title("Rendered Image in Linear Camera Space")
3-33
3 Reading and Writing Image Data
You can convert a CFA image to an RGB image either by using the profile connection space (PCS)
conversion functions or by using the conversion matrix returned in the RAW file metadata.
Convert the CFA image to an Adobe RGB 1998 output color space. This conversion consists of these
steps:
• Convert the image from the linear camera space into a profile connection space, such as XYZ.
• Convert the image from the XYZ profile connection space to the Adobe RGB 1998 color space.
cam2xyz = computeXYZTransformation(fileInfo);
xyzImage = imapplymatrix(cam2xyz,im2double(camspaceRGB));
% xyz2rgb function applies Gamma Correction for Adobe RGB 1998 color space
adobeRGBImage = xyz2rgb(xyzImage,"ColorSpace","adobe-rgb-1998","OutputType","uint16");
imshow(adobeRGBImage);
title("Rendered RGB Image in Adobe RGB 1998 Color Space");
Use the transformation matrix in the fileInfo.ColorInfo.CameraTosRGB field of the CFA file
metadata to convert the image from the linear camera space to the linear sRGB space.
3-34
End-to-End Implementation of Digital Camera Processing Pipeline
imshow(srgbImage)
title("Rendered RGB Image in sRGB Colorspace")
Pipeline Comparison
As this example shows, you can use general image processing toolbox functions and RAW file
metadata to convert a CFA image into an sRGB image. You can also perform this conversion by using
the raw2rgb function. While using the raw2rgb function does not provide the same flexibility as
using the metadata, the results are comparable. The raw2rgb function uses the libraw 0.20.0
RAW file processing library. Compare the result of the raw2rgb conversion to that of the PCS
conversion, for the Adobe RGB 1998 color space, and to that of the metadata conversion, for the
sRGB color spaces.
adobeRGBReference = raw2rgb(fileName,"ColorSpace","adobe-rgb-1998");
3-35
3 Reading and Writing Image Data
srgbReference = raw2rgb(fileName,"ColorSpace","srgb");
montage({srgbReference, srgbImage});
title("sRGB Images: Left Image: raw2rgb, Right Image: MATLAB Pipeline");
Helper Functions
blackLevel = fileInfo.ColorInfo.BlackLevel;
blackLevel = reshape(blackLevel,[1 1 numel(blackLevel)]);
3-36
End-to-End Implementation of Digital Camera Processing Pipeline
The computeXYZTransformation function scales the metadata matrix that specifies the
transformation between the linear camera space and the XYZ profile connection space. Scaling this
matrix avoids a strong, pink color cast in the rendered image.
% However, the order of values for white balance mutlipliers are as per
% fileInfo.CFALayout. Hence, we have to reorder the multipliers to
% ensure we are scaling the correct row of the CameraToXYZ matrix.
wbIdx = strfind(fileInfo.CFALayout,"R");
gidx = strfind(fileInfo.CFALayout,"G"); wbIdx(2) = gidx(1);
wbIdx(3) = strfind(fileInfo.CFALayout,"B");
wbCoeffs = fileInfo.ColorInfo.D65WhiteBalance(wbIdx);
3-37
3 Reading and Writing Image Data
hdr_image = hdrread('office.hdr');
whos
The range of data exceeds the range [0, 1] expected of LDR data.
hdr_range =
0 3.2813
• Some functions clip values outside the expected range before continuing to process the data.
These functions can return unexpected results because clipping causes a loss of information.
• Some functions expect data in the range [0, 1] but do not adjust the data before processing it.
These functions can return incorrect results.
• Some functions expect real data. If your HDR image contains values of Inf, then these functions
can return unexpected results.
• Some functions have no limitations for the range of input data. These functions accept and
process HDR data correctly.
To work with functions that require LDR data, you can reduce the dynamic range of an image using a
process called tone mapping. Tone mapping scales HDR data to the range [0, 1] while attempting to
preserve the appearance of the original image. Tone mapping functions such as tonemap,
tonemapfarbman, and localtonemap give more accurate results than simple linear rescaling such
3-38
Work with High Dynamic Range Images
as performed by the rescale function. However, note that tone mapping incurs a loss of subtle
information and detail.
To display HDR images, you must perform tone mapping. For an example, see “Display High Dynamic
Range Image” on page 3-40.
hdr_image = makehdr(files);
hdrwrite(hdr,'filename');
See Also
hdrread | hdrwrite | localtonemap | makehdr | tonemap | tonemapfarbman
Related Examples
• “Display High Dynamic Range Image” on page 3-40
More About
• “Image Types in the Toolbox” on page 2-12
3-39
3 Reading and Writing Image Data
Read a high dynamic range (HDR) image, using hdrread. If you try to display the HDR image, notice
that it does not display correctly.
hdr_image = hdrread('office.hdr');
imshow(hdr_image)
Convert the HDR image to a dynamic range that can be viewed on a computer, using the tonemap
function. This function converts the HDR image into an RGB image of class uint8 .
rgb = tonemap(hdr_image);
whos
imshow(rgb)
3-40
Display High Dynamic Range Image
See Also
localtonemap | tonemap | tonemapfarbman
More About
• “Work with High Dynamic Range Images” on page 3-38
3-41
4
This section describes the image display and exploration tools provided by the Image Processing
Toolbox software.
imshow is the fundamental image display function. Use imshow when you want to display any of the
different image types supported by the toolbox, such as grayscale (intensity), truecolor (RGB), binary,
and indexed. For more information, see “Display an Image in Figure Window” on page 4-3. The
imshow function is also a key building block for image applications you can create using the toolbox
modular tools. For more information, see “Build Interactive Tools”.
The other toolbox display function, imtool, opens the Image Viewer app, which presents an
integrated environment for displaying images and performing some common image processing tasks.
Image Viewer provides all the image display capabilities of imshow but also provides access to
several other tools for navigating and exploring images, such as scroll bars, the Pixel Region tool, the
Image Information tool, and the Adjust Contrast tool. For more information, see “Get Started with
Image Viewer App” on page 4-17.
In general, using the toolbox functions to display images is preferable to using MATLAB image
display functions image and imagesc because the toolbox functions set certain graphics object
properties automatically to optimize the image display. The following table lists these properties and
their settings for each image type. In the table, X represents an indexed image, I represents a
grayscale image, BW represents a binary image, and RGB represents a truecolor image.
Note Both imshow and imtool can perform automatic scaling of image data. When called with the
syntax imshow(I,'DisplayRange',[]), and similarly for imtool, the functions set the axes CLim
property to [min(I(:)) max(I(:))]. CDataMapping is always scaled for grayscale images, so
that the value min(I(:)) is displayed using the first color map color, and the value max(I(:)) is
displayed using the last color map color.
4-2
Display an Image in Figure Window
Overview
To display image data, use the imshow function. The following example reads an image into the
workspace and then displays the image in a figure window using the imshow function.
moon = imread('moon.tif');
imshow(moon);
4-3
4 Displaying and Exploring Images
You can also pass imshow the name of a file containing an image.
imshow('moon.tif');
This syntax can be useful for scanning through images. Note, however, that when you use this syntax,
imread does not store the image data in the workspace. If you want to bring the image into the
workspace, you must use the getimage function, which retrieves the image data from the current
4-4
Display an Image in Figure Window
image object. This example assigns the image data from moon.tif to the variable moon, if the figure
window in which it is displayed is currently active.
moon = getimage;
For more information about using imshow to display the various image types supported by the
toolbox, see “Display Different Image Types” on page 4-73.
To override the default initial magnification behavior for a particular call to imshow, specify the
InitialMagnification parameter. For example, to view an image at 150% magnification, use this
code.
pout = imread('pout.tif');
imshow(pout, 'InitialMagnification', 150)
imshow attempts to honor the magnification you specify. However, if the image does not fit on the
screen at the specified magnification, imshow scales the image to fit. You can also specify the 'fit'
as the initial magnification value. In this case, imshow scales the image to fit the current size of the
figure window.
When imshow scales an image, it uses interpolation to determine the values for screen pixels that do
not directly correspond to elements in the image matrix. For more information about specifying
interpolation methods, see “Resize an Image with imresize Function” on page 6-2.
imshow('moon.tif','Border','tight')
The following figure shows the same image displayed with and without a border.
4-5
4 Displaying and Exploring Images
The 'border' parameters affect only the image being displayed in the call to imshow. If you want all
the images that you display using imshow to appear without the gray border, set the Image
Processing Toolbox 'ImshowBorder' preference to 'tight'. You can also use preferences to
include visible axes in the figure. For more information about preferences, see iptprefs.
See Also
More About
• “Display Multiple Images” on page 4-7
4-6
Display Multiple Images
imshow always displays an image in the current figure. If you display two images in succession, the
second image replaces the first image. To view multiple figures with imshow, use the figure
command to explicitly create a new empty figure before calling imshow for the next image. The
following example views the first three frames in an array of grayscale images I.
imshow(I(:,:,:,1))
figure, imshow(I(:,:,:,2))
figure, imshow(I(:,:,:,3))
The images in the montage can be of different types and sizes. montage converts indexed images to
RGB using the color map present in the file.
By default, the montage function does not include any blank space between the images in the
montage. You can specify the amount of blank space between the image using the BorderSize
parameter. You can also specify the color of the space between images using the BackgroundColor
parameter.
This example shows how to view multiple frames in a multiframe array at one time, using the
montage function. montage displays all the image frames, arranging them into a rectangular grid.
The montage of images is a single image object. The image frames can be grayscale, indexed, or
truecolor images. If you specify indexed images, they all must use the same colormap.
onion = imread('onion.png');
onionArray = repmat(onion, [ 1 1 1 4 ]);
Display all the images at once, in a montage. By default, the montage function displays the images in
a grid. The first image frame is in the first position of the first row, the next frame is in the second
position of the first row, and so on.
montage(onionArray);
4-7
4 Displaying and Exploring Images
To specify a different number of rows and columns, use the 'size' parameter. For example, to
display the images in one horizontal row, specify the 'size' parameter with the value [1 NaN].
Using other montage parameters you can specify which images you want to display and adjust the
contrast of the images displayed.
montage(onionArray,'size',[1 NaN]);
4-8
Display Multiple Images
Note The Image Viewer app (imtool) does not support this capability.
subplot divides a figure into multiple display regions. Using the syntax subplot(m,n,p), you
define an m-by-n matrix of display regions and specify which region, p, is active.
For example, you can use this syntax to display two images side by side.
[X1,map1]=imread('forest.tif');
[X2,map2]=imread('trees.tif');
subplot(1,2,1), imshow(X1,map1)
subplot(1,2,2), imshow(X2,map2)
4-9
4 Displaying and Exploring Images
• falsecolor, in which the two images are overlaid in different color bands. Gray regions indicate
where the images have the same intensity, and colored regions indicate where the image intensity
values differ. RGB images are converted to grayscale before display in falsecolor.
• alpha blending, in which the intensity of the display is the mean of the two input images. Alpha
blending supports grayscale and truecolor images.
• checkerboard, in which the output image consists of alternating rectangular regions from the two
input images.
• the difference of the two images. RGB images are converted to grayscale.
• montage, in which the two images are displayed alongside each other. This visualization mode is
similar to the display using the montage function.
imshowpair uses optional spatial referencing information to display the pair of images.
See Also
imshow | imshowpair | montage
More About
• “Display an Image in Figure Window” on page 4-3
• “Display Different Image Types” on page 4-73
4-10
View Thumbnails of Images in Folder or Datastore
To view thumbnails of all the images in a folder, open the Image Browser app from the MATLAB®
toolstrip. On the Apps tab, in the Image Processing and Computer Vision section, click Image
Browser .
Load images into the app by first clicking Load and then select Load folder of images. The app
displays a file explorer window. Navigate to the folder you want to view. For this example, select the
sample image folder, imdata.
You can also open the app at the command line using the imageBrowser function, specifying the
name of the folder you want to view. For example, to view all the images in the sample image folder,
use this command: imageBrowser(fullfile(matlabroot,'toolbox/images/imdata/'));.
4-11
4 Displaying and Exploring Images
The Image Browser app displays thumbnails of all the images in the folder. To adjust the size of the
image thumbnails, use the Thumbnail Size slider in the app toolstrip.
To view thumbnails of all the images in an image datastore, open the Image Browser app from the
MATLAB® toolstrip. On the Apps tab, in the Image Processing and Computer Vision section,
4-12
View Thumbnails of Images in Folder or Datastore
Load images into the app by first clicking Load and then select Load image datastore from
workspace.
For this example, create an image datastore by using the imageDatastore function containing the
images in the imdata folder.
imds = imageDatastore(fullfile(matlabroot,'toolbox/images/imdata/'));
In the dialog box, select the image datastore variable and click OK. The Image Browser app displays
thumbnails of all the images in the folder.
You can also open the app at the command line using the imageBrowser function, specifying the
name of the image datastore that you want to view. For example, to view all the images in the image
datastore created from the sample image folder, use this command: imageBrowser(imds);
4-13
4 Displaying and Exploring Images
To get a closer look at an image displayed in the Image Browser app, select the image and click
Preview. You can also get a preview by double-clicking an image. The app displays the image at a
higher resolution in a Preview tab. For example, view the blobs.png binary image in a Preview tab.
To explore the image displayed in the Preview tab, use the zoom and pan options visible over the top-
right corner of the image when you pause over the image.
4-14
View Thumbnails of Images in Folder or Datastore
If you are viewing images in a folder, you can export all the images in an image datastore. On the app
toolstrip, click Export All and specify the name of the image datastore. The Export All option is not
available if you are viewing images in an image datastore.
To export an individual image from the folder, right-click the image and choose Export image to
workspace. Specify the name of the workspace variable in which you want to store the image.
If you are viewing images in an image datastore, you can export an individual image into the
workspace. Select the image, right-click, and then choose the Export image to workspace option.
In the Export to workspace dialog box, specify the variable name you want to use for the image.
If you are viewing images in a folder in the Image Browser app, you can modify the display of
thumbnails. For example, you can delete some of the image thumbnails and then save the modified
display in an image datastore. Click Export All in the toolstrip and specify the variable name you
want use for the datastore. When you open this image datastore in the future, the Image Browser
app displays only the images you saved. The Image Browser app does not delete the images from
the file system—it only removes the thumbnails from the display.
4-15
4 Displaying and Exploring Images
You can select an image in the Image Browser app and then open the image in another app. The
Image Browser app lets you open the Image Viewer app, the Color Thresholder app, the Image
Segmenter app, and the Image Region Analyzer app.
For example, in the Image Browser app select the blobs.png image. In the app toolstrip, click the
Image Region Analyzer app. The Image Region Analyzer app opens containing the blobs.png
image.
See Also
Image Browser | imageDatastore
More About
• “Getting Started with Datastore”
4-16
Get Started with Image Viewer App
The Image Viewer app presents an integrated environment for displaying images and performing
common image processing tasks. The workflow for using Image Viewer typically involves a
combination of these steps:
The figure shows an image displayed in Image Viewer with many of the related tools open and
active.
4-17
4 Displaying and Exploring Images
Note You can also access individual tools outside the Image Viewer app. To do so, display an image
in a figure window using a function such as imshow, then create one or more tools using toolbox
functions. For example, you can build an image processing app with custom layout and behavior
using a combination of individual tools. For more information, see “Interactive Tool Workflow” on
page 5-6
• You can open the Image Viewer app from the command line by using the imtool function. Use
this function when you want to control various aspects of the initial image display, such as the
initial magnification, color map, or display range. For example, this code opens Image Viewer
and loads the image with file name cameraman.tif.
imtool('cameraman.tif')
4-18
Get Started with Image Viewer App
• You can open the Image Viewer app from the Apps tab, under Image Processing and Computer
Vision. To bring image data into Image Viewer from a file name, select Open from the File menu.
To bring image data into Image Viewer from the workspace, select Import from Workspace
from the File menu. Optionally filter the variables in the workspace to show only images of a
desired type, such as binary, indexed, intensity (grayscale), or truecolor (RGB) images.
• You can start a new Image Viewer from within an existing Image Viewer by using the New
option from the File menu.
Note When you specify a file name, Image Viewer does not save the image data in a workspace
variable. However, you can export the image from Image Viewer to the workspace. For more
information, see “Save and Export Results” on page 4-24.
4-19
4 Displaying and Exploring Images
4-20
Get Started with Image Viewer App
4-21
4 Displaying and Exploring Images
4-22
Get Started with Image Viewer App
By default, when you close Image Viewer, the app does not save the modified image data. However,
you can export the modified image to a file or save the modified data in a workspace variable. For
more information, see “Save and Export Results” on page 4-24.
4-23
4 Displaying and Exploring Images
Destination Procedure
Create workspace There are three ways to create a workspace variable from the image data
variable in Image Viewer.
• You can use the Export to Workspace option on the Image Viewer
File menu.
• If you start the app by using the imtool function and specify a handle
to the tool, then you can use the getimage function and specify the
handle to the tool. For example, this code opens the image with file
name moon.tif in an Image Viewer then exports the image to the
variable moon.
t = imtool('moon.tif');
moon = getimage(t);
• If you start the app without specifying a handle to the tool, then you can
use the getimage function and specify a handle to the image object
within the figure. For example, this code opens the image with file name
moon.tif in an Image Viewer then exports the image to the variable
moon.
imtool('moon.tif')
moon = getimage(imgca);
Save to file Use the Save Image tool by selecting the Save as option on the Image
Viewer File menu. This tool enables you to navigate your file system to
determine where to save the file, specify the file name, and choose the file
format.
4-24
Get Started with Image Viewer App
Destination Procedure
Open new figure Select the Print to Figure option from the File menu. You can use this
window figure window to see a color bar and print the image. For more
information, see “Add Color Bar to Displayed Grayscale Image” on page 4-
78 and “Print Images” on page 4-80.
See Also
Image Viewer | imtool
4-25
4 Displaying and Exploring Images
The Image Viewer app provides tools that enable you to see the pixel values and coordinates of
individual pixels and groups of pixels. You can save the pixel location and value information.
The figure shows Image Viewer with pixel location and grayscale pixel value displayed in the Pixel
Information tool.
4-26
Get Pixel Information in Image Viewer App
Note You can also obtain pixel value information from a figure with imshow by using the
impixelinfo function.
To save the pixel location and value information displayed, right-click a pixel in the image and choose
the Copy pixel info option. Image Viewer copies the x- and y-coordinates and the pixel value to the
clipboard. You can paste this pixel information into the MATLAB workspace or another application by
right-clicking and selecting Paste from the context menu.
The figure shows Image Viewer with the Pixel Region tool. Note how the Pixel Region tool includes
its own Pixel Information tool in the display.
4-27
4 Displaying and Exploring Images
The following sections provide more information about using the Pixel Region tool.
Note You can also obtain pixel region information from a figure, such as displayed using imshow, by
using the impixelregion function.
4-28
Get Pixel Information in Image Viewer App
Select a Region
To start the Pixel Region tool, click the Pixel Region button in the Image Viewer toolbar or
select the Pixel Region option from the Tools menu. Image Viewer displays the pixel region
rectangle in the center of the target image and opens the Pixel Region tool.
Note Scrolling the image can move the pixel region rectangle off the part of the image that is
currently displayed. To bring the pixel region rectangle back to the center of the part of the image
that is currently visible, click the Pixel Region button again. For help finding the Pixel Region tool in
large images, see “Determine the Location of the Pixel Region Rectangle” on page 4-30.
Using the mouse, position the pointer over the pixel region rectangle. The pointer changes to the
fleur shape, .
Click the left mouse button and drag the pixel region rectangle to any part of the image. As you move
the pixel region rectangle over the image, the Pixel Region tool updates the pixel values displayed.
You can also move the pixel region rectangle by moving the scroll bars in the Pixel Region tool
window.
To get a closer view of image pixels, use the zoom buttons on the Pixel Region tool toolbar. As you
zoom in, the size of the pixels displayed in the Pixel Region tool increase and fewer pixels are visible.
As you zoom out, the size of the pixels in the Pixel Region tool decrease and more pixels are visible.
To change the number of pixels displayed in the tool, without changing the magnification, resize the
Pixel Region tool using the mouse.
As you zoom in or out, note how the size of the pixel region rectangle changes according to the
magnification. You can resize the pixel region rectangle using the mouse. Resizing the pixel region
rectangle changes the magnification of pixels displayed in the Pixel Region tool.
If the magnification allows, the Pixel Region tool overlays each pixel with its numeric value. For RGB
images, this information includes three numeric values, one for each band of the image. For indexed
images, this information includes the index value and the associated RGB value. If you would rather
not see the numeric values in the display, go to the Pixel Region tool Edit menu and clear the
Superimpose Pixel Values option.
4-29
4 Displaying and Exploring Images
To determine the current location of the pixel region in the target image, you can use the pixel
information given at the bottom of the tool. This information includes the x- and y-coordinates of
pixels in the target image coordinate system. When you move the pixel region rectangle over the
target image, the pixel information given at the bottom of the tool is not updated until you move the
cursor back over the Pixel Region tool.
You can also retrieve the current position of the pixel region rectangle by selecting the Copy
Position option from the Pixel Region tool Edit menu. This option copies the position information to
the clipboard. The position information is a vector of the form [xmin ymin width height]. To
paste this position vector into the MATLAB workspace or another application, right-click and select
Paste from the context menu.
4-30
Get Pixel Information in Image Viewer App
You can print the view of the image displayed in the Pixel Region tool. Select the Print to Figure
option from the Pixel Region tool File menu. See “Print Images” on page 4-80 for more information.
For grayscale images, Image Viewer displays this information in the Display Range tool at the
bottom right corner of the window. Image Viewer does not show the Display Range tool for indexed,
truecolor, or binary images.
The figure shows an image with display range information in Image Viewer.
4-31
4 Displaying and Exploring Images
Note You can also obtain the image display range from a figure, such as displayed using imshow, by
using the imdisplayrange function.
You can change the image contrast and brightness by adjusting the display range. For more
information, see “Adjust Image Contrast in Image Viewer App” on page 4-36.
See Also
Image Viewer | imdisplayrange | impixelinfo | impixelregion | imtool
More About
• “Get Started with Image Viewer App” on page 4-17
4-32
Measure Distance Between Pixels in Image Viewer App
The Image Viewer app enables you to measure the Euclidean distance between two pixels using the
Distance tool. The Distance tool displays the line, the endpoints, and label with the distance
measurement. The tool specifies the distance in data units determined by the XData and YData
properties, which is pixels, by default. You can save the endpoint locations and distance information.
Measure distance with a click-and-drag approach. When you move the pointer over the image, the
pointer changes to cross hairs . Position the cross hairs at the first endpoint, hold down the mouse
button, drag the cross hairs to the second endpoint, and release the button.
The figure shows a distance line with the line endpoints and distance measurement label.
4-33
4 Displaying and Exploring Images
Note You can also use a Distance tool with a figure, such as displayed using imshow, by using the
imdistline function.
4-34
Measure Distance Between Pixels in Image Viewer App
After you click OK, the Distance tool creates the variables in the workspace, as in the following
example.
whos
Name Size Bytes Class Attributes
• Toggle the distance tool label on and off using the Show Distance Label option.
• Change the color used to display the Distance tool line using the Set color option.
• Constrain movement of the tool to either horizontal or vertical using the Constrain drag option.
• Delete the distance tool object using the Delete option.
See Also
Image Viewer | imdistline | imtool
More About
• “Get Started with Image Viewer App” on page 4-17
4-35
4 Displaying and Exploring Images
An image lacks contrast when there are no sharp differences between black and white. Brightness
refers to the overall lightness or darkness of an image. Contrast adjustment works by manipulating
the display range of the image. Pixels with value equal to or less than the minimum value of the
display range appear black. Pixels with value equal to or greater than the maximum value of the
display range appear white. Pixels within the display range appear as a smooth gradient of shades of
gray. Adjusting the contrast spreads the pixel values across the display range, revealing much more
detail in the image.
The default display range of an image is determined by the image class. For instance, the display
range for images with data type uint8 is 0 to 255. If the data range, or actual minimum and
maximum pixel values of the image data, is more narrow than the display range, then the displayed
image does not display all shades of gray from black to white. In this case, you can improve the image
contrast by shrinking the display range to match the data range. To highlight certain image features,
you can further shrink the display range to be less than the data range.
The Image Viewer app offers two tools that enable you to change the contrast or brightness of an
image interactively. The Window/Level tool enables you to change contrast and brightness by simply
dragging the mouse over the image. The Adjust Contrast tool displays a histogram of image pixel
values and graphical representation of the display range so that you can see how the display range
relates to pixel values.
The Adjust Contrast tool displays a histogram of pixel values and information about the data range.
The data range is a fixed property of the image data and does not change when you adjust the display
range using the Adjust Contrast tool. The Adjust Contrast tool also displays a red-tinted rectangular
box, called a window, overlaid on the histogram. The window directly corresponds to the display
range of the image. The tool also shows the precise values of the precise values of the range and
location of the window on the histogram.
For example, in the figure, the histogram for the image shows that the data range of the image is 7 to
253. The display range is the default display range for the uint8 data type, 0 to 255. Therefore, no
pixels appear as black or white. Further, the histogram shows that many pixel values are clustered in
the middle of the display range, which explains why it is challenging to distinguish between the
medium-gray pixel values.
4-36
Adjust Image Contrast in Image Viewer App
To increase the contrast of the image, narrow the range of pixel values. To increase the brightness of
an image, shift the range towards large pixel values. There are three ways to adjust the display range
of the image.
• You can adjust the display range interactively by manipulating the window. Adjust the minimum
and maximum value of the display range by clicking and dragging the left and right edges of the
window. Change the position of the window by clicking and dragging the interior of the window.
• You can enter specific values for the extent, width, and center of the window. You can also define
these values by clicking the dropper button associated with these fields. When you do this, the
pointer becomes an eye dropper shape. Position the eye dropper pointer over the pixel in the
image that you want to be the minimum (or maximum) value and click the mouse button.
• You can let the Adjust Contrast tool scale the display range automatically. When you select the
Match data range option, the tool makes the display range equal to the data range of the image.
When you select the Eliminate outliers option, the tool removes an equal percentage of pixels
from the top and bottom of the display range. By default, the tool eliminates 2% of pixels, in other
words, the top 1% and the bottom 1% of pixels. (You can perform this same operation using the
stretchlim function.)
The display range of the image displayed in the Image Viewer window updates in real time when you
change the display range. Image Viewer also updates the display range values displayed in the lower
right corner of the app window.
For example, in the figure, the window indicates that the display range of the image is 12 to 193.
Bright pixels appear much brighter than in the original image. There is better contrast between
pixels with similar gray values, such as for the pillars and roof of the domed buildings.
4-37
4 Displaying and Exploring Images
To start the Window/Level tool, click Window/Level in the Image Viewer toolbar or select the
Window/Level option from the Image Viewer Tools menu.
Move the pointer over the image. The pointer changes to the Window/Level cursor . To adjust the
image contrast, click and drag the mouse horizontally. To adjust image brightness, click and drag the
mouse vertically.
If you also have the Adjust Contrast tool open, then contrast adjustments you make using the
Window/Level tool immediately adjust the window in the Adjust Contrast tool. For example, if you
increase the brightness using the Window/Level tool, then the window in the Adjust Contrast tool
shifts to the right.
When you close the Adjust Contrast tool, the Window/Level tool remains active. To stop the Window/
Level tool, click the Window/Level button or any of the navigation buttons in the Image Viewer
toolbar.
When using the Adjust Contrast tool, you can modify pixel values in the image to reflect the contrast
adjustments by clicking the Adjust Data button. When you click the Adjust Data button, the
histogram updates. You can then adjust the contrast again, if necessary. If you have other interactive
modular tool windows open, they will update automatically to reflect the contrast adjustment.
Note The Adjust Data button is unavailable until you make a change to the contrast of the image.
4-38
Adjust Image Contrast in Image Viewer App
When you close Image Viewer, the app does not save the modified image data. To save these
changed values, use the Save As option from the Image Viewer File menu to store the modified data
in a file or use the Export to Workspace option to save the modified data in a workspace variable.
For more information, see “Save and Export Results” on page 4-24.
See Also
Image Viewer | imcontrast | imtool
More About
• “Get Started with Image Viewer App” on page 4-17
4-39
4 Displaying and Exploring Images
After you open an image in Image Viewer, start the Crop Image tool by clicking Crop Image in
the Image Viewer toolbar or by selecting Crop Image from the Image Viewer Tools menu. For
more information about opening an image in Image Viewer, see “Open Image Viewer App” on page
4-18.
When you move the pointer over the image, the pointer changes to cross hairs . Define the
rectangular crop region by clicking and dragging the mouse over the image. You can fine-tune the
crop rectangle by moving and resizing the crop rectangle using the mouse. Or, if you want to crop a
different region, move to the new location and click and drag again. To zoom in or out on the image
while the Crop Image tool is active, use Ctrl+Plus or Ctrl+Minus keys. Note that these are the
Plus(+) and Minus(-) keys on the numeric keypad of your keyboard.
The figure shows a crop rectangle being defined using the Crop Image tool.
4-40
Crop Image Using Image Viewer App
When you are finished defining the crop region, perform the crop operation. Double-click the left
mouse button or right-click inside the region and select Crop Image from the context menu. Image
Viewer displays the cropped image. If you have other modular interactive tools open, then they will
update to show the newly cropped image.
4-41
4 Displaying and Exploring Images
By default, if you close Image Viewer, then the app does not save the modified image data. To save
the cropped image, you can use the Save As option from the Image Viewer File menu to store the
modified data in a file or use the Export to Workspace option to save the modified data in the
workspace variable.
See Also
Image Viewer | imcrop | imtool
4-42
Crop Image Using Image Viewer App
More About
• “Get Started with Image Viewer App” on page 4-17
4-43
4 Displaying and Exploring Images
In this section...
“Load Volume Data into the Volume Viewer” on page 4-44
“View the Volume Data in the Volume Viewer” on page 4-46
“Adjust View of Volume Data in Volume Viewer” on page 4-49
“Refine the View with the Rendering Editor” on page 4-51
“Save Volume Viewer Rendering and Camera Configuration Settings” on page 4-56
Load the MRI data of a human head from a MAT-file into the workspace. This operation creates a
variable named D in your workspace that contains the volumetric data. Use the squeeze command to
remove the singleton dimension from the data.
load mri
D = squeeze(D);
whos
Open the Volume Viewer app. From the MATLAB toolstrip , open the Apps tab and under Image
Processing and Computer Vision, click . You can also open the app using the volumeViewer
command.
Load volumetric data into Volume Viewer app. Click Import Volume. You can load an image by
specifying its file name or load a variable from the workspace. If you have volumetric data in a
DICOM format that uses multiple files to represent a volume, you can specify the DICOM folder
name. Choose the Import From Workspace option because the data is in the workspace.
4-44
Explore 3-D Volumetric Data with Volume Viewer App
Select the workspace variable in the Import from Workspace dialog box and click OK.
4-45
4 Displaying and Exploring Images
When you create a new session, this option deletes all the data currently in the viewer. Click Yes to
create the new session.
View the volume in the Volume Viewer app. By default, the Volume Viewer displays the data as a
volume but you can also view it as slice planes. The MRI data displayed as a volume is recognizable
as a human head. To explore the volume, zoom in and out on the image using the mouse wheel or a
right-click. You can also rotate the volume by positioning the cursor in the image window, pressing
and holding the mouse, and moving the cursor. You are always zooming or rotating around the center
of the volume. The position of the axes in the Orientation Axes window reflects the spatial
orientation of the image as you rotate it.
4-46
Explore 3-D Volumetric Data with Volume Viewer App
To change the background color used in the display window, click Background Color and select a
color.
4-47
4 Displaying and Exploring Images
View the MRI data as a set of slice planes. Click Slice Planes. You can also zoom in and rotate this
view of the data. Use the scroll bars in the three slice windows to view individual slices in any of the
planes.
4-48
Explore 3-D Volumetric Data with Volume Viewer App
Continue using Volume Viewer capabilities until you achieve the best view of your data.
Click Volume to return to viewing your data as a volume and use the capabilities of the Volume
Viewer to get the best visualization of your data. The Volume Viewer provides several spatial
referencing options that let you get a more realistic view of the head volume. (The head appears
flattened in the default view.)
4-49
4 Displaying and Exploring Images
• Upsample To Cube--The Volume Viewer calculates a scale factor that makes the number of
samples in each dimension the same as the largest dimension in the volume. This setting can
make non-isotropically sampled data appear scaled more correctly.
4-50
Explore 3-D Volumetric Data with Volume Viewer App
• Use Volume Metadata--If the data file includes resolution data in its metadata, the Volume
Viewer uses the metadata and displays the volume true to scale. The Volume Viewer selects the
Use Volume Metadata option, by default, if metadata is present.
• Choose the overall viewing approach: Volume Rendering, Maximum Intensity Projection, or
Isosurface.
• Modify the alphamap by specifying a preset alphamap, such as ct-bone, or by customizing the
alphamap using the Opacity/Image Intensity curve.
• Specify the color map used in the visualization.
• Specify the lighting in the visualization.
4-51
4 Displaying and Exploring Images
The Volume Viewer offers several viewing approaches for volumes. The Maximum Intensity
Projection (MIP) option looks for the voxel with the highest intensity value for each ray projected
through the data. MIP can be useful for revealing the highest intensity structure within a volume. You
can also view the volume as an Isosurface.
Volume rendering is highly dependent on defining an appropriate alphamap so that structures you
want to see are opaque and structures you do not want to see are transparent. The Rendering Editor
lets you define the opacity and transparency of voxel values throughout the volume. You can choose
from a set of alphamap presets that automatically achieve certain well-defined effects. For example,
to define a view that works well with CT bone data, select the CT Bone rendering preset. By default,
the Volume Viewer uses a simple linear relationship, but each preset changes the curve of the plot to
give certain data value more or less opacity. You customize the alphamap by manipulating the plot
directly.
4-52
Explore 3-D Volumetric Data with Volume Viewer App
4-53
4 Displaying and Exploring Images
Color, when used with voxel intensity and opacity, is an important element of volume visualization. In
the Rendering Editor, you can select from a list of predefined MATLAB color maps, such as jet and
parula. You can also specify a custom color map that you have defined as a variable in the
workspace. You can also change the color mapping for any color map by using the interactive color
bar scale. For example, to lighten the color values in a visualization, click on the color bar to create a
circular slider. To modify the color mapping so that more value map to lighter colors, move the slider
to the left. You can create multiple sliders on the color bar to define other color mappings.
4-54
Explore 3-D Volumetric Data with Volume Viewer App
4-55
4 Displaying and Exploring Images
Modify lighting effects. By default, the Volume Viewer uses certain lighting effects on the volume
display. You can turn off these lighting effects by clearing the Lighting check box.
To save rendering and camera configuration settings, click Export and click the Rendering and
Camera Configurations option.
Specify the name for the structure that the Volume Viewer creates or accept the default name
(config) and click OK.
4-56
Explore 3-D Labeled Volumetric Data with Volume Viewer App
In this section...
“Load Labeled Volume and Intensity Volume into Volume Viewer” on page 4-57
“View Labeled Volume in Volume Viewer” on page 4-59
“Embed Labeled Volume with Intensity Volume” on page 4-60
Load the MRI intensity data of a human brain and the labeled volume from MAT-files into the
workspace. This operation creates two variables in the workspace: vol and label.
load(fullfile(toolboxdir('images'),'imdata','BrainMRILabeled','images','vol_001.mat'));
load(fullfile(toolboxdir('images'),'imdata','BrainMRILabeled','labels','label_001.mat'));
whos
Open the Volume Viewer app. From the MATLAB toolstrip, open the Apps tab and under Image
Processing and Computer Vision, click . You can also open the app using the volumeViewer
command.
Load the labeled volume into the Volume Viewer app. Click Import Labeled Volume to open the
labeled volume. You can load an image in a file name or load a variable from the workspace. (If you
have volumetric data in a DICOM format that uses multiple files to represent a volume, you can
specify the DICOM folder name.) For this example, choose the Import From Workspace option.
4-57
4 Displaying and Exploring Images
Select the workspace variable associated with the labeled volume data in the Import from
Workspace dialog box and click OK.
4-58
Explore 3-D Labeled Volumetric Data with Volume Viewer App
View the labeled volume in the Volume Viewer app. By default, the Volume Viewer displays the data
as a labeled volume but you can also view it as slice planes. To explore the labeled volume, zoom in
and out on the image using the mouse wheel or a right-click. You can also rotate the volume by
positioning the cursor in the image window, pressing and holding the mouse, and moving the cursor.
You are always zooming or rotating around the center of the volume. The position of the axes in the
Orientation Axes window reflects the spatial orientation of the labeled volume as you rotate it.
4-59
4 Displaying and Exploring Images
Refine your view of the labeled volume using the Rendering Editor part of the Volume Viewer. You
can use the Rendering Editor to view certain labels and hide others, change the color, and modify
the transparency of the labels. Label 000 is the background and it is typically not visible. When you
select the background label, the Show Labels check box is clear, by default. To select all the visible
labels at once, select the background label and click Invert Selection. To change the color of a label
(or all the labels), select the label in the Rendering Editor and specify the color in the Color
selector. You can also control the transparency of the label using the Opacity slider.
With the labeled volume already in the Volume Viewer, load the intensity volume into the app. Click
Import Volume and choose the Import From Workspace option.
4-60
Explore 3-D Labeled Volumetric Data with Volume Viewer App
Select the workspace variable associated with the intensity volume in the Import from Workspace
dialog box and click OK.
4-61
4 Displaying and Exploring Images
The Volume Viewer displays the labeled volume over the intensity volumetric data. By default, the
Volume Viewer displays the label data and the intensity data as volumes but you can also view it as
slice planes. To explore the labeled and intensity volumes, zoom in and out using the mouse wheel or
a right-click. You can also rotate the volumes by positioning the cursor in the image window, pressing
and holding the mouse, and moving the cursor. To view only the intensity volume, and hide the
labeled volume, click View Volume.
4-62
Explore 3-D Labeled Volumetric Data with Volume Viewer App
Refine your view of the labeled volume and the intensity volume using options in the Rendering
Editor. To only view the labeled volume, and hide the intensity volume, clear the Embed Labels in
Volume check box. In the Labels area of the Rendering Editor, you can select any of the labels and
change its color or transparency. Label 000 is the background. By default, the background is set to
black and is not visible. The Show Labels check box is clear. To select all the labels, click on the
background and then click Invert Selection. If the intensity volume is visible, you can modify the
threshold value and transparency using the sliders in the Volume area of the Rendering Editor.
4-63
4 Displaying and Exploring Images
4-64
View Image Sequences in Video Viewer App
Load the image sequence into the MATLAB workspace. For this example, load the MRI data from the
file mristack.mat, which is included in the imdata folder.
load mristack
This places a variable named mristack in your workspace. The variable is an array of 21 grayscale
frames containing MRI images of the brain. Each frame is a 256-by-256 array of uint8 data.
Click the Video Viewer app in the apps gallery and select the Import from workspace option on the
File menu. You can also call implay, specifying the name of the image sequence variable as an
argument.
implay(mristack)
The Video Viewer opens, displaying the first frame of the image sequence. Note how the Video Viewer
displays information about the image sequence, such as the size of each frame and the total number
of frames, at the bottom of the window.
4-65
4 Displaying and Exploring Images
To view the image sequence or video as an animation, click the Play button in the Playback toolbar,
select Play from the Playback menu, or press P or the Space bar. By default, the Video Viewer plays
the image sequence forward, once in its entirety, but you can view the frames in the image sequence
in many ways, described in this table. As you view an image sequence, note how the Video Viewer
updates the Status Bar at the bottom of the window.
4-66
View Image Sequences in Video Viewer App
Change the view of the image sequence or examine a frame more closely.
The Video Viewer supports several tools listed in the Tools menu and on the Toolbar that you can use
to examine the frames in the image sequence more closely.
4-67
4 Displaying and Exploring Images
The Configuration dialog box contains four tabs: Core, Sources, Visuals, and Tools. On each tab,
select a category and then click Properties to view configuration settings.
The following table lists the options that are available for each category on every pane.
4-68
View Image Sequences in Video Viewer App
To save your configuration settings for future use, select File > Configuration Set > Save as.
Note By default, the Video Viewer uses the configuration settings from the file implay.cfg. If you
want to store your configuration settings in this file, you should first create a backup copy of the file.
4-69
4 Displaying and Exploring Images
If you want to increase the actual playback rate, but your system's hardware cannot keep up with the
desired rate, select the Allow frame drop to achieve desired playback rate check box. This
parameter enables the Video Viewer app to achieve the playback rate by dropping frames. When you
select this option, the Frame Rate dialog box displays several additional options that you can use to
specify the minimum and maximum refresh rates. If your hardware allows it, increase the refresh rate
to achieve a smoother playback. However, if you specify a small range for the refresh rate, the
computed frame replay schedule may lead to a choppy replay, and a warning will appear.
If you know that the pixel values do not use the entire data type range, you can select the Specify
range of displayed pixel values check box and enter the range for your data. The dialog box
automatically displays the range based on the data type of the pixel values.
To view basic information about the image data, click the Video Information button in the Video
Viewer toolbar or select Video Information from the Tools menu. The Video Viewer displays a dialog
box containing basic information about the image sequence, such as the size of each frame, the frame
rate, and the total number of frames.
4-70
View Image Sequences in Video Viewer App
4-71
4 Displaying and Exploring Images
mov = immovie(X,map);
In the example, X is a four-dimensional array of images that you want to use for the movie.
implay(mov);
This example loads the multiframe image mri.tif and makes a movie out of it.
mri = uint8(zeros(128,128,1,27));
for frame=1:27
[mri(:,:,:,frame),map] = imread('mri.tif',frame);
end
mov = immovie(mri,map);
implay(mov);
Note To view a MATLAB movie, you must have MATLAB software installed. To make a movie that can
be run outside the MATLAB environment, use the VideoWriter class to create a movie in a standard
video format, such as, AVI.
4-72
Display Different Image Types
If you need help determining what type of image you are working with, see “Image Types in the
Toolbox” on page 2-12.
imshow(X,map)
or
imtool(X,map)
For each pixel in X, these functions display the color stored in the corresponding row of map. If the
image matrix data is of class double, the value 1 points to the first row in the color map, the value 2
points to the second row, and so on. However, if the image matrix data is of class uint8 or uint16,
the value 0 (zero) points to the first row in the color map, the value 1 points to the second row, and so
on. This offset is handled automatically by the imtool and imshow functions.
If the color map contains a greater number of colors than the image, the functions ignore the extra
colors in the color map. If the color map contains fewer colors than the image requires, the functions
set all image pixels over the limits of the color map's capacity to the last color in the color map. For
example, if an image of class uint8 contains 256 colors, and you display it with a color map that
contains only 16 colors, all pixels with a value of 15 or higher are displayed with the last color in the
color map.
Both functions display the image by scaling the intensity values to serve as indices into a grayscale
color map.
If I is double, a pixel value of 0.0 is displayed as black, a pixel value of 1.0 is displayed as white, and
pixel values in between are displayed as shades of gray. If I is uint8, then a pixel value of 255 is
displayed as white. If I is uint16, then a pixel value of 65535 is displayed as white.
Grayscale images are similar to indexed images in that each uses an m-by-3 RGB color map, but you
normally do not specify a color map for a grayscale image. MATLAB displays grayscale images by
using a grayscale system color map (where R=G=B). By default, the number of levels of gray in the
4-73
4 Displaying and Exploring Images
color map is 256 on systems with 24-bit color, and 64 or 32 on other systems. (See “Display Colors”
on page 15-2 for a detailed explanation.)
In some cases, the image data you want to display as a grayscale image could have a display range
that is outside the conventional toolbox range (that is, [0, 1] for single or double arrays, [0 ,255]
for uint8 arrays, [0, 65535] for uint16 arrays, or [-32767, 32768] for int16 arrays). For example, if
you filter a grayscale image, some of the output data could fall outside the range of the original data.
To display unconventional range data as an image, you can specify the display range directly, using
this syntax for both the imshow and imtool functions.
imshow(I,'DisplayRange',[low high])
or
imtool(I,'DisplayRange',[low high])
If you use an empty matrix ([]) for the display range, these functions scale the data automatically,
setting low and high to the minimum and maximum values in the array.
The next example filters a grayscale image, creating unconventional range data. The example calls
imtool to display the image in Image Viewer, using the automatic scaling option. If you execute
this example, note the display range specified in the lower right corner of the Image Viewer window.
I = imread('testpat1.png');
J = filter2([1 2;-1 -2],I);
imtool(J,'DisplayRange',[]);
4-74
Display Different Image Types
Note For the toolbox to interpret the image as binary, it must be of class logical. Grayscale images
that happen to contain only 0's and 1's are not binary images.
To display a binary image, call the imshow function or open the Image Viewer app. For example, this
code reads a binary image into the MATLAB workspace and then displays the image. This
documentation uses the variable name BW to represent a binary image in the workspace
BW = imread('circles.png');
imshow(BW)
4-75
4 Displaying and Exploring Images
You might prefer to invert binary images when you display them, so that 0 values are displayed as
white and 1 values are displayed as black. To do this, use the NOT (~) operator in MATLAB. (In this
figure, a box is drawn around the image to show the image boundary.) For example:
imshow(~BW)
You can also display a binary image using the indexed image color map syntax. For example, the
following command specifies a two-row color map that displays 0's as red and 1's as blue.
imshow(BW,[1 0 0; 0 0 1])
4-76
Display Different Image Types
To display a truecolor image, call the imshow function or open the Image Viewer app. For example,
this code reads a truecolor image into the MATLAB workspace and then displays the image. This
documentation uses the variable name RGB to represent a truecolor image in the workspace
RGB = imread('peppers.png');
imshow(RGB)
Systems that use 24 bits per screen pixel can display truecolor images directly, because they allocate
8 bits (256 levels) each to the red, green, and blue color planes. On systems with fewer colors,
imshow displays the image using a combination of color approximation and dithering. See “Display
Colors” on page 15-2 for more information.
Note If you display a color image and it appears in black and white, check if the image is an indexed
image. With indexed images, you must specify the color map associated with the image. For more
information, see “Display Indexed Images” on page 4-73.
4-77
4 Displaying and Exploring Images
I = imread('liftingbody.png');
Convert the image to data type double. Data is in the range [0, 1].
I = im2double(I);
dataRangeI = [min(I(:)) max(I(:))]
dataRangeI = 1×2
0 1
Filter the image using an edge detection filter. The filtered data exceeds the default range [0, 1]
because the filter is not normalized.
h = [1 2 1; 0 0 0; -1 -2 -1];
J = imfilter(I,h);
dataRangeJ = [min(J(:)) max(J(:))]
dataRangeJ = 1×2
-2.5961 2.5451
Display the filtered image using the full display range of the filtered data. imshow displays the
minimum data value as black and the maximum data value as white.
imshow(J,[])
Use the colorbar function to add the color bar to the image.
colorbar
4-78
Add Color Bar to Displayed Grayscale Image
See Also
imshow
More About
• “Display Grayscale Images” on page 4-73
4-79
4 Displaying and Exploring Images
Print Images
If you want to output a MATLAB image to use in another application (such as a word-processing
program or graphics editor), use imwrite to create a file in the appropriate format. See “Write
Image Data to File in Graphics Format” on page 3-6 for details.
If you want to print an image, use imshow to display the image in a MATLAB figure window. If you
are using Image Viewer, then you must use the Print to Figure option on the File menu. When you
choose this option, Image Viewer opens a separate figure window and displays the image in it. You
can access the standard MATLAB printing capabilities in this figure window. You can also use the
Print to Figure option to print the image displayed in the Overview tool and the Pixel Region tool.
Once the image is displayed in a figure window, you can use either the MATLAB print command or
the Print option from the File menu of the figure window to print the image. When you print from
the figure window, the output includes non-image elements such as labels, titles, and other
annotations.
• Image colors print as shown on the screen. This means that images are not affected by the figure
object's InvertHardcopy property.
• To ensure that printed images have the proper size and aspect ratio, set the figure object's
PaperPositionMode property to auto. When PaperPositionMode is set to auto, the width
and height of the printed figure are determined by the figure's dimensions on the screen. By
default, the value of PaperPositionMode is manual. If you want the default value of
PaperPositionMode to be auto, you can add this line to your startup.m file.
set(0,'DefaultFigurePaperPositionMode','auto')
For detailed information about printing with File/Print or the print command, see “Print Figure
from File Menu”. For a complete list of options for the print command, enter help print at the
MATLAB command-line prompt or see the print command reference page.
4-80
Manage Display Preferences
In this section...
“Retrieve Values of Toolbox Preferences” on page 4-81
“Set Values of Toolbox Preferences” on page 4-81
You can use Image Processing Toolbox preferences to control certain characteristics of how imshow
and the Image Viewer app display images on your screen. For example, using toolbox preferences,
you can specify the initial magnification used.
To open the Preference dialog box, click Preferences in the Home tab in the MATLAB desktop. In
the Preferences dialog box, select Image Processing Toolbox. You can also access Image Processing
Toolbox preferences from the Image Viewer File menu, or by typing iptprefs at the command line.
To retrieve the values of Image Processing Toolbox preferences programmatically, type iptgetpref
at the command prompt. The following example uses iptgetpref to retrieve the value to determine
the value of the ImtoolInitialMagnification preference.
iptgetpref('ImtoolInitialMagnification')
ans =
100
Preference names are case insensitive and can be abbreviated. For a complete list of toolbox
preferences, see the iptprefs reference page.
To open the Preference dialog box, click Preferences in the Home tab in the MATLAB desktop. In
the Preferences dialog box, select Image Processing Toolbox. You can also access Image Processing
Toolbox preferences from the Image Viewer File menu, or by typing iptprefs at the command line.
To specify the value of a toolbox preference, use the iptsetpref function. This example calls
iptsetpref to specify that imshow resize the figure window so that it fits tightly around displayed
images.
iptsetpref('ImshowBorder', 'tight');
For a table of the available preferences, see the iptprefs reference page.
4-81
5
This chapter describes how to use interactive modular tools and create custom image processing
applications.
You can use the tools independently or in combination. You can create custom image processing apps
that open a combination of tools and initialize their display and interactions. For more information,
see “Interactive Tool Workflow” on page 5-6.
You can also access all tools using the Image Viewer app.
5-2
Interactive Image Viewing and Processing Tools
5-3
5 Building GUIs with Modular Tools
5-4
Interactive Image Viewing and Processing Tools
See Also
More About
• “Interactive Tool Workflow” on page 5-6
• “Get Started with Image Viewer App” on page 4-17
5-5
5 Building GUIs with Modular Tools
Some of the tools add themselves to the figure window containing the image. Prevent the tools from
displaying over the image by including a border. If you are using the imshow function, then make
sure that the Image Processing Toolbox ImshowBorder preference is set to 'loose' (this is the
default setting).
When you create a tool, you can specify the target image or you can let the tool pick a suitable target
image.
• To specify the target image, provide a handle to the target image as an input argument to the tool
creation function. The handle can be a specific image object, or a figure, axes, or uipanel object
that contains an image.
• To let the tool pick the target image, call the tool creation function with no input arguments. By
default, the tool uses the image in the current figure as the target image. If the current figure
contains multiple images, then the tool associates with the first image in the figure object's
children (the last image created). Note that not all tools offer a no-argument syntax.
Some tools can work with multiple images in a figure. These are impixelinfo, impixelinfoval,
and imdisplayrange.
When you create a tool, you can optionally specify the object that you want to be the parent of the
tool. By specifying the parent, you determine where the tool appears on your screen. Using this
5-6
Interactive Tool Workflow
syntax of the tool creation functions, you can add the tool to the figure window containing the target
image, open the tool in a separate figure window, or create some other combination.
Specifying the parent is optional. When you do not specify the parent, the tools use default behavior.
• Some of the smaller tools, such as the Display Range tool and Pixel Information tool, use the
parent of the target image as their parent, inserting themselves in the same figure window as the
target image.
• Other tools, such as the Adjust Contrast tool and Choose Colormap tool, open in separate figures
of their own.
• Two tools, the Overview tool and Pixel Region tool, have different creation functions for specifying
the parent figure. Their primary creation functions, imoverview and impixelregion, open the
tools in a separate figure window. To specify a different parent, you must use the
imoverviewpanel and impixelregionpanel functions. For an example, see “Create Pixel
Region Tool” on page 5-15.
Note The Overview tool and the Pixel Region tool provide additional capabilities when created in
their own figure windows. For example, both tools include zoom buttons that are not part of their
uipanel versions.
Position Tools
Each tool has default positioning behavior. For example, the impixelinfo function creates the tool
as a uipanel object that is the full width of the figure window, positioned in the lower left corner of
the target image figure window.
Because the tools are constructed from graphics objects, such as uipanel objects, you can change
their default positioning or other characteristics by setting properties of the objects. To specify the
position of a tool or other graphics object, set the Position property as a four-element position
vector [left bottom width height]. The values of left and bottom specify the distance from
the lower left corner of the parent container object, such as a figure. The values of width and
height specify the dimensions of the object.
When you specify a position vector, you can specify the units of the values in the vector by setting the
value of the Units property of the object. To allow better resizing behavior, use normalized units
because they specify the relative position of the tool, not the exact location in pixels.
For example, when you first create an embedded Pixel Region tool in a figure, it appears to take over
the entire figure because, by default, the position vector is set to [0 0 1 1], in normalized units.
This position vector tells the tool to align itself with the bottom left corner of its parent and fill the
entire object. To accommodate the image and the Pixel Information tool and Display Range tools,
change the position of the Pixel Region tool in the lower half of the figure window, leaving room at the
bottom for the Pixel Information and Display Range tools. Here is the position vector for the Pixel
Region tool.
To accommodate the Pixel Region tool, reposition the target image so that it fits in the upper half of
the figure window, using the following position vector. To reposition the image, you must specify the
Position property of the axes object that contains it; image objects do not have a Position
property.
5-7
5 Building GUIs with Modular Tools
The Scroll Panel is the primary navigation tool and is a prerequisite for the other navigation tools.
When you display an image in a Scroll Panel, the tool displays only a portion of the image, if it is too
big to fit into the figure window. When only a portion of the image is visible, the Scroll Panel adds
horizontal and vertical scroll bars, to enable viewing of the parts of the image that are not currently
visible.
Once you create a Scroll Panel, you can optionally add the other navigation tools: the Overview tool
and the Magnification tool. The Overview tool displays a view of the entire image, scaled to fit, with a
rectangle superimposed over it that indicates the part of the image that is currently visible in the
scroll panel. The Magnification Box displays the current magnification of the image and can be used
to change the magnification.
Adding a scroll panel to an image display changes the relationship of the graphics objects used in the
display. For more information, see “Add Scroll Panel to Figure” on page 5-10.
Note The toolbox navigation tools are incompatible with standard MATLAB figure window navigation
tools. When using these tools in a GUI, suppress the toolbar and menu bar in the figure windows to
avoid conflicts between the tools.
Some tools have a one-way connection to the target image. These tools get updated when you
interact with the target image, but you cannot use the tool to modify the target image. For example,
the Pixel Information tool receives information about the location and value of the pixel currently
under the pointer.
Other tools have a two-way connection to the target image. These tools get updated when you
interact with the target image, and you can update the target image by interacting with the tools. For
example, the Overview tool sets up a two-way connection to the target image. For this tool, if you
change the visible portion of the target image by scrolling, panning, or by changing the
magnification, then the Overview tool changes the size and location of the detail rectangle to match
the portion of the image that is now visible. Conversely, if you move the detail window in the
Overview tool, then the tool updates the visible portion of the target image in the scroll panel.
The tools accomplish this interactivity by using callback properties of the graphics objects. For
example, the figure object supports a WindowButtonMotionFcn callback that executes whenever
the mouse button is depressed. You can customize the connectivity of a tool by using the application
programmer interface (API) associated with the tool to set up callbacks to get notification of events.
For more information, see “Callbacks — Programmed Response to User Action” and “Overview
Events and Listeners”. For an example, see “Build Image Comparison Tool” on page 5-24.
5-8
Interactive Tool Workflow
For example, the Magnification box supports a single API function: setMagnification. You can use
this API function to set the magnification value displayed in the Magnification box. The Magnification
box automatically notifies the scroll panel to change the magnification of the image based on the
value. The scroll panel also supports an extensive set of API functions. To get information about these
APIs, see the reference page for each tool.
See Also
Related Examples
• “Create Pixel Region Tool” on page 5-15
• “Build App for Navigating Large Images” on page 5-21
• “Build App to Display Pixel Information” on page 5-19
• “Build Image Comparison Tool” on page 5-24
More About
• “Interactive Image Viewing and Processing Tools” on page 5-2
• “Add Scroll Panel to Figure” on page 5-10
5-9
5 Building GUIs with Modular Tools
When you display an image in a scroll panel, it changes the object hierarchy of your displayed image.
This diagram illustrates the typical object hierarchy for an image displayed in an axes object in a
figure object.
When you call the imscrollpanel function to put the target image in a scrollable window, this
object hierarchy changes. imscrollpanel inserts a new object into the hierarchy between the
figure object and the axes object containing the image. The figure shows the object hierarchy after
the call to imscrollpanel.
5-10
Add Scroll Panel to Figure
After you add a scroll panel to a figure, you can change the image data displayed in the scroll bar by
using the replaceImage function in the imscrollpanel API.
The scroll panel navigation tool is not compatible with the figure window toolbar and menu bar. When
you add a scroll panel to an image displayed in a figure window, suppress the toolbar and menu bar
from the figure. This sample code demonstrates one way to do this.
See Also
immagbox | imoverview | imscrollpanel
Related Examples
• “Build App for Navigating Large Images” on page 5-21
5-11
5 Building GUIs with Modular Tools
More About
• “Interactive Image Viewing and Processing Tools” on page 5-2
5-12
Get Handle to Target Image
Get the handle when you initially display the image in a figure window using the imshow syntax that
returns a handle.
hfig = figure;
himage = imshow('moon.tif')
5-13
5 Building GUIs with Modular Tools
himage =
Image with properties:
Get the handle after you have displayed the image in a figure window using the imhandles function.
You must specify a handle to the figure window as a parameter.
himage2 = imhandles(hfig)
himage2 =
Image with properties:
See Also
imhandles
More About
• “Interactive Tool Workflow” on page 5-6
5-14
Create Pixel Region Tool
I = imread("pout.tif");
Display the image in a figure window. Return a handle to the target image, himage.
himage = imshow('pout.tif');
To create the Pixel Region tool in a separate window, use the impixelregion function.
hpixreg = impixelregion(himage);
5-15
5 Building GUIs with Modular Tools
fig = figure;
ax = axes;
img = imshow(I);
To create the Pixel Region tool in the same figure as the target image, use the
impixelregionpanel function. Specify the target image's parent figure, fig, as the parent of the
Pixel Region tool.
pixregionobj = impixelregionpanel(fig,img);
5-16
Create Pixel Region Tool
The Pixel Region tool overlaps and hides the original image. To see both the image and the tool, shift
their positions so that they do not overlap.
set(ax,'Units','normalized','Position',[0 .5 1 .5]);
set(pixregionobj,'Units','normalized','Position',[0 .04 1 .4]);
5-17
5 Building GUIs with Modular Tools
See Also
impixelregion | impixelregionpanel
More About
• “Position Tools” on page 5-7
• “Interactive Image Viewing and Processing Tools” on page 5-2
5-18
Build App to Display Pixel Information
First, define a function that builds the app. This example uses a function called my_pixinfo_tool,
which is attached at the end of the example.
After you define the function that builds the app, test the app. Read an image into the workspace.
I = imread('pears.png');
The my_pixinfo_tool function accepts an image as an argument and displays the image in a figure
window with a Pixel Information tool, Display Range tool, Distance tool, and Pixel Region tool. Note
that the function suppresses the toolbar and menu bar in the figure window because scrollable
navigation is incompatible with standard MATLAB™ figure window navigation tools.
function my_pixinfo_tool(im)
% Create figure, setting up properties
fig = figure('Toolbar','none', ...
'Menubar','none', ...
5-19
5 Building GUIs with Modular Tools
end
See Also
imdisplayrange | imdistline | impixelinfo | impixelregionpanel
Related Examples
• “Create Pixel Region Tool” on page 5-15
• “Build Image Comparison Tool” on page 5-24
• “Build App for Navigating Large Images” on page 5-21
More About
• “Interactive Tool Workflow” on page 5-6
• “Interactive Image Viewing and Processing Tools” on page 5-2
5-20
Build App for Navigating Large Images
First, define a function that builds the app. This example defines a function called
my_large_image_display at the end of the example.
After you define the function that builds the app, test the app. Read an image into the workspace.
I = imread('car1.jpg');
my_large_image_display(I)
5-21
5 Building GUIs with Modular Tools
The my_large_image_display function accepts an image as an argument and displays the image
in a figure window with scroll bars, an Overview tool, and a magnification box. Note that the function
suppresses the toolbar and menu bar in the figure window because scrollable navigation is
incompatible with standard MATLAB™ figure window navigation tools.
function my_large_image_display(im)
5-22
Build App for Navigating Large Images
end
See Also
immagbox | imoverview | imscrollpanel
Related Examples
• “Build App to Display Pixel Information” on page 5-19
• “Create Image Comparison Tool Using ROIs” on page 12-37
More About
• “Interactive Tool Workflow” on page 5-6
• “Interactive Image Viewing and Processing Tools” on page 5-2
• “Add Scroll Panel to Figure” on page 5-10
5-23
5 Building GUIs with Modular Tools
First, define a function that builds the app. This example uses a function called
my_image_compare_tool, which is attached at the end of the example.
After you define the function that builds the app, test the app. Get two images.
I = imread('flamingos.jpg');
L = rgb2lightness(I);
Iedge = edge(L,'Canny');
Display the images in the app. When you move the detail rectangle in the Overview tool or change the
magnification in one image, both images respond.
my_image_compare_tool(I,Iedge);
5-24
Build Image Comparison Tool
The my_image_compare_tool function accepts two images as input arguments and displays the
images in scroll panels. The custom tool also includes an Overview tool and a Magnification box. Note
that the function suppresses the toolbar and menu bar in the figure window because scrollable
navigation is incompatible with standard MATLAB™ figure window navigation tools.
To synchronize the scroll panels, the function makes the connections between tools using callbacks
and the Scroll Panel API functions. The function specifies a callback function that executes every time
the magnification changes. The function specified is the setMagnification API function of the
other scroll panel. Thus, whenever the magnification changes in one of the scroll panels, the other
scroll panel changes its magnification to match. The tool sets up a similar connection for position
changes.
function my_image_compare_tool(left_image,right_image)
5-25
5 Building GUIs with Modular Tools
end
See Also
immagbox | imoverview | imscrollpanel
Related Examples
• “Build App to Display Pixel Information” on page 5-19
• “Build App for Navigating Large Images” on page 5-21
5-26
Build Image Comparison Tool
More About
• “Interactive Tool Workflow” on page 5-6
• “Interactive Image Viewing and Processing Tools” on page 5-2
• “Add Scroll Panel to Figure” on page 5-10
5-27
5 Building GUIs with Modular Tools
Note The impoly function used in this example is not recommended. Use the new drawpolyline
function and Polyline ROI object instead. See “Use Polyline to Create Angle Measurement Tool” on
page 12-72.
This example shows how to create an angle measurement tool using interactive tools and ROI objects.
The example displays an image in a figure window and overlays a simple angle measurement tool
over the image. When you move the lines in the angle measurement tool, the function calculates the
angle formed by the two lines and displays the angle in a title.
Create a function that accepts an image as an argument and displays an angle measurement tool over
the image in a figure window. This code includes a second function used as a callback function that
calculates the angle and displays the angle in the figure.
function my_angle_measurement_tool(im)
% Create figure, setting up properties
figure('Name','My Angle Measurement Tool',...
'NumberTitle','off',...
'IntegerHandle','off')
% Callback function that calculates the angle and updates the title.
% Function receives an array containing the current x,y position of
5-28
Create Angle Measurement Tool Using ROI Objects
I = imread('gantrycrane.png');
Open the angle measurement tool, specifying the image as an argument. The tool opens a figure
window, displaying the image with the angle measure tool centered over the image in a right angle.
Move the pointer over any of the vertices of the tool to measure any angle in the image. In the
following figure, the tool is measuring an angle in the image. Note the size of the angle displayed in
the title of the figure.
my_angle_measurement_tool(I);
5-29
5 Building GUIs with Modular Tools
See Also
More About
• “ROI Creation Overview” on page 12-5
• “ROI Migration” on page 12-16
• “Use Polyline to Create Angle Measurement Tool” on page 12-72
5-30
6
Geometric Transformations
A geometric transformation (also known as a spatial transformation) modifies the spatial relationship
between pixels in an image, mapping pixel locations in a moving image to new locations in an output
image. The toolbox includes functions that perform certain specialized geometric transformations,
such as resizing and rotating an image. In addition, the toolbox includes functions that you can use to
perform many types of 2-D and N-D geometric transformations, including custom transformations.
Resize the image, using the imresize function. In this example, you specify a magnification factor.
To enlarge an image, specify a magnification factor greater than 1.
J = imresize(I,1.25);
Resize the image again, this time specifying the desired size of the output image, rather than a
magnification value. Pass imresize a vector that contains the number of rows and columns in the
output image. If the specified size does not produce the same aspect ratio as the input image, the
output image will be distorted. If you specify one of the elements in the vector as NaN, imresize
calculates the value for that dimension to preserve the aspect ratio of the image. To perform the
resizing required for multi-resolution processing, use impyramid.
K = imresize(I,[100 150]);
figure, imshow(K)
6-2
Resize an Image with imresize Function
Resize the image again, this time specifying the interpolation method. When you enlarge an image,
the output image contains more pixels than the original image. imresize uses interpolation to
determine the values of these pixels, computing a weighted average of some set of pixels in the
vicinity of the pixel location. imresize bases the weightings on the distance each pixel is from the
point. By default, imresize uses bicubic interpolation, but you can specify other interpolation
methods or interpolation kernels. See the imresize reference page for a complete list. You can also
specify your own custom interpolation kernel. This example use bilinear interpolation.
L = imresize(I,1.5,'bilinear');
figure, imshow(L)
6-3
6 Geometric Transformations
Resize the image again, this time shrinking the image. When you reduce the size of an image, you
lose some of the original pixels because there are fewer pixels in the output image. This can
introduce artifacts, such as aliasing. The aliasing that occurs as a result of size reduction normally
appears as stair-step patterns (especially in high-contrast images), or as moire (ripple-effect) patterns
in the output image. By default, imresize uses antialiasing to limit the impact of aliasing on the
output image for all interpolation types except nearest neighbor. To turn off antialiasing, specify the
'Antialiasing' parameter and set the value to false. Even with antialiasing turned on, resizing can
introduce artifacts because information is always lost when you reduce the size of an image.
M = imresize(I,.75,'Antialiasing',false);
figure, imshow(M)
6-4
Resize an Image with imresize Function
6-5
6 Geometric Transformations
Rotate an Image
This example shows how to rotate an image using the imrotate function. When you rotate an image,
you specify the image to be rotated and the rotation angle, in degrees. If you specify a positive
rotation angle, the image rotates counterclockwise; if you specify a negative rotation angle, the image
rotates clockwise.
By default, the output image is large enough to include the entire original image. Pixels that fall
outside the boundaries of the original image are set to 0 and appear as a black background in the
output image. You can, however, specify that the output image be the same size as the input image,
using the 'crop' argument.
By default, imrotate uses nearest-neighbor interpolation to determine the value of pixels in the
output image, but you can specify other interpolation methods. See the imrotate reference page for
a list of supported interpolation methods.
I = imread('circuit.tif');
Rotate the image 35 degrees counterclockwise. In this example, specify bilinear interpolation.
J = imrotate(I,35,'bilinear');
figure
imshowpair(I,J,'montage')
6-6
Rotate an Image
Rotate the original image 35 degrees counterclockwise, specifying that the rotated image be cropped
to the same size as the original image.
K = imrotate(I,35,'bilinear','crop');
figure
imshowpair(I,K,'montage')
6-7
6 Geometric Transformations
Crop an Image
Note You can also crop an image interactively using the Image Tool — see “Crop Image Using Image
Viewer App” on page 4-40.
To extract a rectangular portion of an image, use the imcrop function. Using imcrop, you can
specify the crop region interactively using the mouse or programmatically by specifying the size and
position of the crop region.
This example illustrates an interactive syntax. The example reads an image into the MATLAB
workspace and calls imcrop specifying the image as an argument. imcrop displays the image in a
figure window and waits for you to draw the crop rectangle on the image. When you move the pointer
over the image, the shape of the pointer changes to cross hairs . Click and drag the pointer to
specify the size and position of the crop rectangle. You can move and adjust the size of the crop
rectangle using the mouse. When you are satisfied with the crop rectangle, double-click to perform
the crop operation, or right-click inside the crop rectangle and select Crop Image from the context
menu. imcrop returns the cropped image in J.
I = imread('circuit.tif')
J = imcrop(I);
You can also specify the size and position of the crop rectangle as parameters when you call imcrop.
Specify the crop rectangle as a four-element position vector, [xmin ymin width height].
In this example, you call imcrop specifying the image to crop, I, and the crop rectangle. imcrop
returns the cropped image in J.
6-8
Crop an Image
I = imread('circuit.tif');
J = imcrop(I,[60 40 100 90]);
6-9
6 Geometric Transformations
I = imread('cameraman.tif');
Display the image. The size of the image is 256-by-256 pixels. By default, imshow displays the image
with the upper right corner at (0,0).
figure
imshow(I)
title('Original Image')
Translate the image, shifting the image by 15 pixels in the x-direction and 25 pixels in the y-direction.
Note that, by default, imtranslate displays the translated image within the boundaries (or limits) of
the original 256-by-256 image. This results in some of the translated image being clipped.
J = imtranslate(I,[15, 25]);
Display the translated image. The size of the image is 256-by-256 pixels.
figure
imshow(J)
title('Translated Image')
6-10
Translate an Image using imtranslate Function
Use the 'OutputView' parameter set to 'full' to prevent clipping the translated image. The size
of the new image is 281-by-271 pixels.
K = imtranslate(I,[15, 25],'OutputView','full');
figure
imshow(K)
title('Translated Image, Unclipped')
6-11
6 Geometric Transformations
6-12
2-D and 3-D Geometric Transformation Process Overview
6-13
6 Geometric Transformations
rigid2d geometri
cTransfo Other
rigid3d
Approach to Create projecti rm2d Transform
affine2d affine3d
Transformation Object ve2d geometri ation
cTransfo Objects
rm3d
“Define Transformation X
X X X
Matrix” on page 6-14
“Define Custom Point-Wise
Mapping Function” on page X
6-14
“Estimate Transformation
from Control Point Pairs” on X X X
page 6-15
“Estimate Transformation
Using Similarity
X X
Optimization” on page 6-
16
“Estimate Transformation
Using Phase Correlation” on X
page 6-16
If you know the transformation matrix for the geometric transformation you want to perform, then
you can create a rigid2d, affine2d, projective2d, rigid3d, or affine3d geometric
transformation object directly. For more information about creating a transformation matrix, see
“Matrix Representation of Geometric Transformations” on page 6-17.
The following example defines the transformation matrix for a 2-D translation and creates an
affine2d geometric transformation object.
xform = [ 1 0 0
0 1 0
40 40 1 ];
tform_translate = affine2d(xform)
tform_translate =
T: [3x3 double]
Dimensionality: 2
If you have an inverse point-wise mapping function, then you can define a custom 2-D and 3-D
geometric transformation using the geometricTransform2d and the geometricTransform3d
objects respectively.
6-14
2-D and 3-D Geometric Transformation Process Overview
The following example specifies an inverse mapping function that accepts and returns 2-D points in
packed (x,y) format. Then, the example creates a geometricTransform2d custom geometric
transformation object.
inversefn = @(c) [c(:,1)+c(:,2),c(:,1).^2]
inversefn =
@(c)[c(:,1)+c(:,2),c(:,1).^2]
tform = geometricTransform2d(inversefn)
tform =
InverseFcn: [function_handle]
ForwardFcn: []
Dimensionality: 2
inversefn =
@(c)[c(:,1)+c(:,2),c(:,1)-c(:,2),c(:,3).^2]
tform = geometricTransform3d(inversefn)
tform =
InverseFcn: [function_handle]
ForwardFcn: []
Dimensionality: 3
You can create a geometric transformation object by passing two sets of control point pairs to the
fitgeotrans function. The fitgeotrans function automatically estimates the transformation from
these points and returns one of the geometric transformation objects.
Different transformations require a varying number of points. For example, affine transformations
require three non-collinear points in each image (a triangle) and projective transformations require
four points (a quadrilateral).
This example passes two sets of control points to fitgeotrans, which returns an affine2d
geometric transformation object.
movingPoints = [11 11;21 11; 21 21];
fixedPoints = [51 51;61 51;61 61];
6-15
6 Geometric Transformations
tform_cpp = fitgeotrans(movingPoints,fixedPoints,'affine')
tform_cpp =
T: [3x3 double]
Dimensionality: 2
If you have a fixed image and a moving image that are slightly misaligned, then you can use the
imregtform function to estimate an affine geometric transformation that aligns the images.
imregtform optimizes the mean squares or Mattes mutual information similarity metric of the two
images, using a regular step gradient descent or one-plus-one evolutionary optimizer. For more
information, see “Create an Optimizer and Metric for Intensity-Based Image Registration” on page 7-
26.
If you have a fixed image and a moving image that are severely misaligned, then you can use the
imregcorr function to estimate an affine geometric transformation that improves the image
alignment. You can refine the resulting transformation by using similarity optimization.
imwarp uses the geometric transformation to map coordinates in the output image to the
corresponding coordinates in the input image (inverse mapping). Then, imwarp uses the coordinate
mapping to interpolate pixel values within the input image and compute the output pixel value.
See Also
affine2d | affine3d | fitgeotrans | geometricTransform2d | geometricTransform3d |
imwarp | projective2d | rigid2d | rigid3d
Related Examples
• “Perform Simple 2-D Translation Transformation” on page 6-25
More About
• “Matrix Representation of Geometric Transformations” on page 6-17
• “N-Dimensional Spatial Transformations” on page 6-29
6-16
Matrix Representation of Geometric Transformations
6-17
6 Geometric Transformations
The transformation is a 3-by-3 matrix. Unlike affine transformations, there are no restrictions on the
last column of the transformation matrix.
6-18
Matrix Representation of Geometric Transformations
Projective transformations are frequently used to register images that are out of alignment. If you
have two images that you would like to align, first select control point pairs using cpselect. Then, fit
a projective transformation matrix to control point pairs using fitgeotrans and setting the
transformationType to 'projective'. This automatically creates a projective2d geometric
transformation object. The transformation matrix is stored as a property in the projective2d
object. The transformation can then be applied to other images using imwarp.
This example shows how to create a composite of 2-D translation and rotation transformations.
Create a checkerboard image that will undergo transformation. Also create a spatial reference object
for the image.
cb = checkerboard(4,2);
cb_ref = imref2d(size(cb));
To illustrate the spatial position of the image, create a flat background image. Overlay the
checkerboard over the background, highlighting the position of the checkerboard in green.
background = zeros(150);
imshowpair(cb,cb_ref,background,imref2d(size(background)))
6-19
6 Geometric Transformations
Create a translation matrix, and store it as an affine2d geometric transformation object. This
translation will shift the image horizontally by 100 pixels.
Create a rotation matrix, and store it as an affine2d geometric transformation object. This
translation will rotate the image 30 degrees clockwise about the origin.
Perform translation first and rotation second. In the multiplication of the transformation matrices, the
translation matrix T is on the left, and the rotation matrix R is on the right.
TR = T*R;
tform_tr = affine2d(TR);
[out,out_ref] = imwarp(cb,cb_ref,tform_tr);
imshowpair(out,out_ref,background,imref2d(size(background)))
6-20
Matrix Representation of Geometric Transformations
Reverse the order of the transformations: perform rotation first and translation second. In the
multiplication of the transformation matrices, the rotation matrix R is on the left, and the translation
matrix T is on the right.
RT = R*T;
tform_rt = affine2d(RT);
[out,out_ref] = imwarp(cb,cb_ref,tform_rt);
imshowpair(out,out_ref,background,imref2d(size(background)))
Notice how the spatial position of the transformed image is different than when translation was
followed by rotation.
6-21
6 Geometric Transformations
x′ = x + az x′ = x + ay x′ = x
y′ = y + bz y′ = y y′ = y + bx
z′ = z z′ = z + cy z′ = z + cx
1 0 0 0 1 0 0 0 1 b c 0
0 1 0 0 a 1 c 0 0 1 0 0
a b 1 0 0 0 1 0 0 0 1 0
0 0 0 1 0 0 0 1 0 0 0 1
Rotation About x axis: About y axis: About z axis:
For N-D affine transformations, the last column must contain [zeros(N,1); 1]. imwarp does not
support transformations of more than three dimensions.
See Also
affine2d | affine3d | fitgeotrans | imwarp | projective2d | rigid2d | rigid3d
Related Examples
• “Perform Simple 2-D Translation Transformation” on page 6-25
More About
• “2-D and 3-D Geometric Transformation Process Overview” on page 6-13
6-22
Specify Fill Values in Geometric Transformation Output
rgb = imread('onion.png');
xform = [ 1 0 0
0 1 0
40 40 1 ];
Create the geometric transformation object. This example creates an affine2d object.
tform_translate = affine2d(xform)
tform_translate =
affine2d with properties:
T: [3x3 double]
Dimensionality: 2
Create a 2D referencing object. This object specifies aspects of the coordinate system of the output
space so that the area needing fill values is visible. By default, imwarp sizes the output image to be
just large enough to contain the entire transformed image but not the entire output coordinate space.
Rout = imref2d(size(rgb));
Rout.XWorldLimits(2) = Rout.XWorldLimits(2)+40;
Rout.YWorldLimits(2) = Rout.YWorldLimits(2)+40;
Rout.ImageSize = Rout.ImageSize+[40 40];
cb_rgb = imwarp(rgb,tform_translate,'OutputView',Rout);
figure, imshow(cb_rgb)
6-23
6 Geometric Transformations
cb_fill = imwarp(rgb,tform_translate,'FillValues',[187;192;57],...
'OutputView',Rout);
figure, imshow(cb_fill)
See Also
affine2d | imref2d | imwarp
More About
• “2-D and 3-D Geometric Transformation Process Overview” on page 6-13
• “Matrix Representation of Geometric Transformations” on page 6-17
6-24
Perform Simple 2-D Translation Transformation
Read the image to be transformed. This example creates a checkerboard image using the
checkerboard function.
cb = checkerboard;
imshow(cb)
Get spatial referencing information about the image. This information is useful when you want to
display the result of the transformation.
cb_ref = imref2d(size(cb))
cb_ref =
imref2d with properties:
Create a 3-by-3 transformation matrix, called T in this example, that defines the transformation. In
this matrix, T(3,1) specifies the number of pixels to shift the image in the horizontal direction and
T(3,2) specifies the number of pixels to shift the image in the vertical direction.
T = [1 0 0; 0 1 0; 20 30 1]
T = 3×3
1 0 0
0 1 0
20 30 1
6-25
6 Geometric Transformations
Create a geometric transformation object that defines the translation you want to perform. Because
translation transformations are a special case of the affine transformation, the example uses an
affine2d geometric transformation object to represent translation. Create an affine2d object by
passing the 3-by-3 transformation matrix, T, to the affine2d constructor.
tform = affine2d(T);
Perform the transformation. Call the imwarp function specifying the image you want to transform
and the geometric transformation object. imwarp returns the transformed image, cb_translated.
This example also returns the optional spatial referencing object, cb_translated_ref, which
contains spatial referencing information about the transformed image.
[cb_translated,cb_translated_ref] = imwarp(cb,tform);
View the original and the transformed image side-by-side using the subplot function in conjunction
with imshow . When viewing the translated image, it might appear that the transformation had no
effect. The transformed image looks identical to the original image. The reason that no change is
apparent in the visualization is because imwarp sizes the output image to be just large enough to
contain the entire transformed image but not the entire output coordinate space. Notice, however,
that the coordinate values have been changed by the transformation.
figure;
subplot(1,2,1);
imshow(cb,cb_ref);
subplot(1,2,2);
imshow(cb_translated,cb_translated_ref)
6-26
Perform Simple 2-D Translation Transformation
To see the entirety of the transformed image in the same relation to the origin of the coordinate space
as the original image, use imwarp with the 'OutputView' parameter, specifying a spatial
referencing object. The spatial referencing object specifies the size of the output image and how
much of the output coordinate space is included in the output image. To do this, the example makes a
copy of the spatial referencing object associated with the original image and modifies the world
coordinate limits to accommodate the full size of the transformed image. The example sets the limits
of the output image in world coordinates to include the origin from the input
cb_translated_ref = cb_ref;
cb_translated_ref.XWorldLimits(2) = cb_translated_ref.XWorldLimits(2)+20;
cb_translated_ref.YWorldLimits(2) = cb_translated_ref.YWorldLimits(2)+20;
[cb_translated,cb_translated_ref] = imwarp(cb,tform,'OutputView',cb_translated_ref);
figure, subplot(1,2,1);
imshow(cb,cb_ref);
subplot(1,2,2);
imshow(cb_translated,cb_translated_ref)
See Also
affine2d | imref2d | imwarp
6-27
6 Geometric Transformations
More About
• “2-D and 3-D Geometric Transformation Process Overview” on page 6-13
• “Matrix Representation of Geometric Transformations” on page 6-17
6-28
N-Dimensional Spatial Transformations
• maketform
• fliptform
• tformfwd
• tforminv
• findbounds
• makeresampler
• tformarray
• imtransform
The imtransform, findbounds, and tformarray functions use the tformfwd and tforminv
functions internally to encapsulate the forward transformations needed to determine the extent of an
output image or array and/or to map the output pixels/array locations back to input locations. You can
use tformfwd and tforminv to explore the geometric effects of a transformation by applying them
to points and lines and plotting the results. They support a consistent handling of both image and
point-wise data.
You can use tformarray to work with arbitrary-dimensional array transformations. The arrays do
not need to have the same dimensions. The output can have either a lower or higher number of
dimensions than the input. For example, if you are sampling 3-D data on a 2-D slice or manifold, the
input array might have a lower dimensionality. The output dimensionality might be higher, for
example, if you combine multiple 2-D transformations into a single 2-D to 3-D operation.
You can create a resampling structure using the makeresampler function to obtain special effects or
custom processing. For example, you could specify your own separable filtering/interpolation kernel,
build a custom resampler around the MATLAB interp2 or interp3 functions, or even implement an
advanced antialiasing technique.
The imtransform function options let you control many aspects of the transformation. For example,
note how the transformed image appears to contain multiple copies of the original image. This is
6-29
6 Geometric Transformations
accomplished by using the 'Size' option, to make the output image larger than the input image, and
then specifying a padding method that extends the input image by repeating the pixels in a circular
pattern. The Image Processing Toolbox Image Transformation demos provide more examples of using
the imtransform function and related functions to perform different types of spatial
transformations.
See Also
More About
• “2-D and 3-D Geometric Transformation Process Overview” on page 6-13
6-30
Register Two Images Using Spatial Referencing to Enhance Display
Read the two images of the same scene that are slightly misaligned.
fixed = imread('westconcordorthophoto.png');
moving = imread('westconcordaerial.png');
Load a MAT-file that contains preselected control points for the fixed and moving images and create
a geometric transformation fit to the control points, using fitgeotrans .
6-31
6 Geometric Transformations
load westconcordpoints
tform = fitgeotrans(movingPoints, fixedPoints, 'projective');
Perform the transformation necessary to register the moving image with the fixed image, using
imwarp . This example uses the optional 'FillValues' parameter to specify a fill value (white),
which will help when displaying the fixed image over the transformed moving image, to check
registration. Notice that the full content of the geometrically transformed moving image is present,
now called registered . Also note that there are no blank rows or columns.
Overlay the transformed image, registered , over the fixed image, using imshowpair . Notice
how the two images appear misregistered. This happens because imshowpair assumes that the
images are both in the default intrinsic coordinate system. The next steps provide two ways to
remedy this display problem.
figure, imshowpair(fixed,registered,'blend');
6-32
Register Two Images Using Spatial Referencing to Enhance Display
Constrain the transformed image, registered , to the same number of rows and columns, and the
same spatial limits as the fixed image. This ensures that the registered image appears registered
with the fixed image but areas of the registered image that would extrapolate beyond the extent of
the fixed image are discarded. To do this, create a default spatial referencing object that specifies the
size and location of the fixed image, and use imwarp's 'OutputView' parameter to create a
constrained resampled image registered1. Display the registered image over the fixed image. In
this view, the images appear to have been registered, but not all of the unregistered image is visible.
Rfixed = imref2d(size(fixed));
registered1 = imwarp(moving,tform,'FillValues', 255,'OutputView',Rfixed);
figure, imshowpair(fixed,registered1,'blend');
6-33
6 Geometric Transformations
As an alternative, use the optional imwarp syntax that returns the output spatial referencing object
that indicates the position of the full transformed image in the same default intrinsic coordinate
system as the fixed image. Display the registered image over the fixed image and note that now the
full registered image is visible.
6-34
Register Two Images Using Spatial Referencing to Enhance Display
Clean up.
iptsetpref('ImshowAxesVisible','off')
6-35
6 Geometric Transformations
Overview
checkerboard produces an image that has rectangular tiles and four unique corners, which makes it
easy to see how the checkerboard image gets distorted by geometric transformations.
After you have run this example once, try changing the image I to your favorite image.
sqsize = 60;
I = checkerboard(sqsize,4,4);
nrows = size(I,1);
ncols = size(I,2);
fill = 0.3;
imshow(I)
title('Original')
6-36
Create a Gallery of Transformed Images
Nonreflective similarity transformations may include a rotation, a scaling, and a translation. Shapes
and angles are preserved. Parallel lines remain parallel. Straight lines remain straight.
[u v] = [x y 1] T
6-37
6 Geometric Transformations
sc = scale*cos(angle);
ss = scale*sin(angle);
T = [ sc -ss 0;
ss sc 0;
tx ty 1];
Since nonreflective similarities are a subset of affine transformations, create an affine2d object
using:
t_nonsim = affine2d(T);
I_nonreflective_similarity = imwarp(I,t_nonsim,'FillValues',fill);
imshow(I_nonreflective_similarity);
title('Nonreflective Similarity')
6-38
Create a Gallery of Transformed Images
If you change either tx or ty to a non-zero value, you will notice that it has no effect on the output
image. If you want to see the coordinates that correspond to your transformation, including the
translation, include spatial referencing information:
[I_nonreflective_similarity,RI] = imwarp(I,t_nonsim,'FillValues',fill);
imshow(I_nonreflective_similarity,RI)
axis on
title('Nonreflective Similarity (Spatially Referenced)')
6-39
6 Geometric Transformations
Notice that passing the output spatial referencing object RI from imwarp reveals the translation. To
specify what part of the output image you want to see, use the 'OutputView' name-value pair in the
imwarp function.
[u v] = [x y 1] T
6-40
Create a Gallery of Transformed Images
sc = scale*cos(angle);
ss = scale*sin(angle);
T = [ sc -ss 0;
a*ss a*sc 0;
tx ty 1];
Since similarities are a subset of affine transformations, create an affine2d object using:
t_sim = affine2d(T);
As in the translation example above, retrieve the output spatial referencing object RI from the
imwarp function, and pass RI to imshow to reveal the reflection.
[I_similarity,RI] = imwarp(I,t_sim,'FillValues',fill);
imshow(I_similarity,RI)
axis on
title('Similarity')
6-41
6 Geometric Transformations
In an affine transformation, the x and y dimensions can be scaled or sheared independently and there
may be a translation, a reflection, and/or a rotation. Parallel lines remain parallel. Straight lines
remain straight. Similarities are a subset of affine transformations.
For an affine transformation, the equation is the same as for a similarity and nonreflective similarity:
[u v] = [x y 1] T
T is 3-by-3 matrix, where all six elements of the first and second columns can be different. The third
column must be [0;0;1].
6-42
Create a Gallery of Transformed Images
imshow(I_affine)
title('Affine')
In a projective transformation, quadrilaterals map to quadrilaterals. Straight lines remain straight but
parallel lines do not necessarily remain parallel. Affine transformations are a subset of projective
transformations.
[ up vp wp ] = [ x y w ] T
up
u=
wp
vp
v=
wp
6-43
6 Geometric Transformations
ADG
T= BEH
CFI
Ax + By + C
u=
Gx + Hy + I
Dx + Ey + F
v=
Gx + Hy + I
T = [1 0 0.002;
1 1 0.0002;
0 0 1 ];
t_proj = projective2d(T);
I_projective = imwarp(I,t_proj,'FillValues',fill);
imshow(I_projective)
title('Projective')
6-44
Create a Gallery of Transformed Images
In a piecewise linear transformation, affine transformations are applied separately to regions of the
image. In this example, the top-left, top-right, and bottom-left points of the checkerboard remain
unchanged, but the triangular region at the lower-right of the image is stretched so that the bottom-
right corner of the transformed image is 50% further to the right and 20% lower than the original
coordinate.
I_piecewise_linear = imwarp(I,t_piecewise_linear,'FillValues',fill);
imshow(I_piecewise_linear)
title('Piecewise Linear')
6-45
6 Geometric Transformations
This example and the following two examples show how you can create an explicit mapping to
associate each point in a regular grid (xi,yi) with a different point (ui,vi). This mapping is stored in a
geometricTranform2d object, which used by imwarp to transform the image.
In this sinusoidal transformation, the x-coordinate of each pixel is unchanged. The y-coordinate of
each row of pixels is shifted up or down following a sinusoidal pattern.
I_sinusoid = imwarp(I,tform,'FillValues',fill);
imshow(I_sinusoid);
title('Sinusoid')
6-46
Create a Gallery of Transformed Images
Barrel distortion perturbs an image radially outward from its center. Distortion is greater farther
from the center, resulting in convex sides.
First, define a function that maps pixel indices to distance from the center. Use the meshgrid
function to create arrays of the x-coordinate and y-coordinate of each pixel, with the origin in the
upper-left corner of the image.
[xi,yi] = meshgrid(1:ncols,1:nrows);
Shift the origin to the center of the image. Then, convert the Cartesian x- and y-coordinates to
cylindrical angle (theta) and radius (r) coordinates using the cart2pol function. r changes linearly
as distance from the center pixel increases.
6-47
6 Geometric Transformations
xt = xi - ncols/2;
yt = yi - nrows/2;
[theta,r] = cart2pol(xt,yt);
Define the amplitude, a, of the cubic term. This parameter is adjustable. Then, add a cubic term to r
so that r changes nonlinearly with distance from the center pixel.
Convert back to the Cartesian coordinate system. Shift the origin back to the upper-right corner of
the image.
[ut,vt] = pol2cart(theta,s1);
ui = ut + ncols/2;
vi = vt + nrows/2;
Store the mapping between (xi,yi) and (ui,vi) in a geometricTranform2d object. Use imwarp to
transform the image according to the pixel mapping.
I_barrel = imwarp(I,tform,'FillValues',fill);
imshow(I_barrel)
title('Barrel')
6-48
Create a Gallery of Transformed Images
Pin-cushion distortion is the inverse of barrel distortion because the cubic term has a negative
amplitude. Distortion is still greater farther from the center but the distortion appears as concave
sides.
You can begin with the same theta and r values as for the barrel transformation. Define a different
amplitude, b, of the cubic term. This parameter is adjustable. Then, subtract a cubic term to r so that
r changes nonlinearly with distance from the center pixel.
Convert back to the Cartesian coordinate system. Shift the origin back to the upper-right corner of
the image.
6-49
6 Geometric Transformations
[ut,vt] = pol2cart(theta,s);
ui = ut + ncols/2;
vi = vt + nrows/2;
Store the mapping between (xi,yi) and (ui,vi) in a geometricTranform2d object. Use imwarp to
transform the image according to the pixel mapping.
figure
subplot(3,3,1),imshow(I),title('Original')
6-50
Create a Gallery of Transformed Images
subplot(3,3,2),imshow(I_nonreflective_similarity),title('Nonreflective Similarity')
subplot(3,3,3),imshow(I_similarity),title('Similarity')
subplot(3,3,4),imshow(I_affine),title('Affine')
subplot(3,3,5),imshow(I_projective),title('Projective')
subplot(3,3,6),imshow(I_piecewise_linear),title('Piecewise Linear')
subplot(3,3,7),imshow(I_sinusoid),title('Sinusoid')
subplot(3,3,8),imshow(I_barrel),title('Barrel')
subplot(3,3,9),imshow(I_pin),title('Pin Cushion')
Note that subplot changes the scale of the images being displayed.
See Also
Functions
checkerboard | fitgeotrans | imwarp | makeresampler | tformarray
Objects
LocalWeightedMeanTransformation2D | PiecewiseLinearTransformation2D |
PolynomialTransformation2D | affine2d | projective2d
More About
• “Matrix Representation of Geometric Transformations” on page 6-17
• “2-D and 3-D Geometric Transformation Process Overview” on page 6-13
6-51
6 Geometric Transformations
Conformal transformations, or mappings, have many important properties and uses. One property
relevant to image transformation is the preservation of local shape (except sometimes at isolated
points).
This example uses a 2-D conformal transformation to warp an image. The mapping from output to
input, g: R^2 -> R^2, is defined in terms of a complex analytic function G: C -> C, where
G(z) = (z + 1/z) / 2.
We define g via a direct correspondence between each point (x,y) in R^2 (the Euclidean plane) and
the point z = x + i*y in C (the complex plane),
where
This conformal mapping is important in fluid mechanics because it transforms lines of flow around a
circular disk (or cylinder, if we add a third dimension) to straight lines. (See pp. 340-341 in Strang,
Gilbert, Introduction to Applied Mathematics, Wellesley-Cambridge Press, Wellesley, MA, 1986.)
A note on the value of complex variables: although we could express the definition of g directly in
terms of x and y, that would obscure the underlying simplicity of the transformation. This
disadvantage would come back to haunt us in Step 3 below. There, if we worked purely in real
variables, we would need to solve a pair of simultaneous nonlinear equations instead of merely
applying the quadratic formula!
We start by loading the peppers image, extracting a 300-by-500 subimage, and displaying it.
A = imread('peppers.png');
A = A(31:330,1:500,:);
figure
imshow(A)
title('Original Image','FontSize',14)
6-52
Exploring a Conformal Mapping
Then use maketform to make a custom tform struct with a handle to function conformalInverse
as its INVERSE_FCN argument:
type conformalInverse.m
function U = conformalInverse(X, ~)
% conformalInverse Inverse conformal transformation.
%
% Supports conformal transformation example, ConformalMappingImageExample.m
% ("Exploring a Conformal Mapping").
Z = complex(X(:,1),X(:,2));
W = (Z + 1./Z)/2;
U(:,2) = imag(W);
U(:,1) = real(W);
Horizontal and vertical bounds are needed for mapping the original and transformed images to the
input and output complex planes. Note that the proportions in uData and vData match the height-to-
width ratio of the original image (3/5).
6-53
6 Geometric Transformations
We apply imtransform using the SIZE parameter to ensure an aspect ratio that matches the
proportions in xData and yData (6/5), and view the result.
B = imtransform( A, conformal, 'cubic', ...
'UData', uData,'VData', vData,...
'XData', xData,'YData', yData,...
'Size', [300 360], 'FillValues', 255 );
figure
imshow(B)
title('Transformed Image','FontSize',14)
Compare the original and transformed images. Except that the edges are now curved, the outer
boundary of the image is preserved by the transformation. Note that each feature from the original
image appears twice in the transformed image (look at the various peppers). And there is a hole in
the middle of the transformed image with four regular cusps around its edges.
In fact, every point in the input w-plane is mapped to two points in the output z-plane, one inside the
unit circle and one outside. The copies inside the unit circle are much smaller than those outside. It's
clear that the cusps around the central hole are just the copies of the four image corners that mapped
inside the unit circle.
If the transformation created with maketform has a forward function, then we can apply tformfwd
to regular geometric objects (in particular, to rectangular grids and uniform arrays of circles) to
obtain further insight into the transformation. In this example, because G maps two output points to
6-54
Exploring a Conformal Mapping
each input point, there is no unique forward transformation. But we can proceed if we are careful and
work with two different forward functions.
we find that
z = w +/- sqrt{(w^2 - 1).
The positive and the negative square roots lead to two separate forward transformations. We
construct the first using maketform and a handle to the function, conformalForward1.
t1 = maketform('custom', 2, 2, @conformalForward1, [], []);
function X = conformalForward1(U, ~)
% conformalForward1 Forward transformation with positive square root.
%
% Supports conformal transformation example, ConformalMappingImageExample.m
% ("Exploring a Conformal Mapping").
W = complex(U(:,1),U(:,2));
Z = W + sqrt(W.^2 - 1);
X(:,2) = imag(Z);
X(:,1) = real(Z);
type conformalForward2.m
function X = conformalForward2(U, ~)
% conformalForward2 Forward transformation with negative square root.
%
% Supports conformal transformation example, ConformalMappingImageExample.m
% ("Exploring a Conformal Mapping").
W = complex(U(:,1),U(:,2));
Z = W - sqrt(W.^2 - 1);
X(:,2) = imag(Z);
X(:,1) = real(Z);
With the two forward transformations, we can illustrate the mapping of a grid of lines, using
additional helper functions.
6-55
6 Geometric Transformations
You can see that the grid lines are color-coded according to their quadrants in the input plane before
and after the transformations. The colors also follow the transformed grids to the output planes. Note
that each quadrant transforms to a region outside the unit circle and to a region inside the unit circle.
The right-angle intersections between grid lines are preserved under the transformation -- evidence
of the shape-preserving property of conformal mappings -- except for the points at +1 and -1 on the
real axis.
Under a conformal transformation, small circles should remain nearly circular, changing only in
position and size. Again applying the two forward transformations, this time we map a regular array
of uniformly-sized circles.
6-56
Exploring a Conformal Mapping
You can see that the transform to a circle packing where tangencies have been preserved. In this
example, the color coding indicates use of the positive (green) or negative (blue) square root of w^2
- 1. Note that the circles change dramatically but that they remain circles (shape-preservation, once
again).
To further explore the conformal mapping, we can place the input and transformed images on the
pair of axes used in the preceding examples and superpose a set of curves as well.
First we display the input image, rendered semi-transparently, over the input axes of the conformal
map, along with a black ellipse and a red line along the real axis.
figure
axIn = conformalSetupInputAxes(axes);
conformalShowInput(axIn, A, uData, vData)
title('Original Image Superposed on Input Plane','FontSize',14)
6-57
6 Geometric Transformations
Next we display the output image over the output axes of the conformal map, along with two black
circles and one red circle. Again, the image is semi-transparent.
figure
axOut = conformalSetupOutputAxes(axes);
conformalShowOutput(axOut, B, xData, yData)
title('Transformed Image Superposed on Output Plane','FontSize',14)
6-58
Exploring a Conformal Mapping
MATLAB® graphics made it easy to shift and scale the original and transformed images to superpose
them on the input (w-) and output (z-) planes, respectively. The use of semi-transparency makes it
easier to see the ellipse, line, and circles. The ellipse in the w-plane has intercepts at 5/4 and -5/4 on
the horizontal axis and 3/4 and -3/4 on the vertical axis. G maps two circles centered on the origin to
this ellipse: the one with radius 2 and the one with radius 1/2. And, as shown in red, G maps the unit
circle to the interval [-1 1] on the real axis.
If the inverse transform function within a custom tform struct returns a vector filled with NaN for a
given output image location, then imtransform (and also tformarray) assign the specified fill
value at that location. In this step we repeat Step 1, but modify our inverse transformation function
slightly to take advantage of this feature.
type conformalInverseClip.m
function U = conformalInverseClip( X, ~)
% conformalInverseClip Inverse conformal transformation with clipping.
%
% This is a modification of conformalInverse in which points in X
% inside the circle of radius 1/2 or outside the circle of radius 2 map to
% NaN + i*NaN.
%
% Supports conformal transformation example, ConformalMappingImageExample.m
% ("Exploring a Conformal Mapping").
6-59
6 Geometric Transformations
Z = complex(X(:,1),X(:,2));
W = (Z + 1./Z)/2;
q = 0.5 <= abs(Z) & abs(Z) <= 2;
W(~q) = complex(NaN,NaN);
U(:,2) = imag(W);
U(:,1) = real(W);
This is the same as the function defined in Step 2, except for the two additional lines:
which cause the inverse transformation to return NaN at any point not between the two circles with
radii of 1/2 and 2, centered on the origin. The result is to mask that portion of the output image with
the specified fill value.
6-60
Exploring a Conformal Mapping
The result is identical to our initial transformation except that the outer corners and inner cusps have
been masked away to produce a ring effect.
Applying the "ring" transformation to an image of winter greens (hemlock and alder berries) leads to
an aesthetic special effect.
Load the image greens.jpg, which already has a 3/5 height-to-width ratio, and display it.
C = imread('greens.jpg');
figure
imshow(C)
title('Winter Greens Image','FontSize',14);
6-61
6 Geometric Transformations
Transform the image and display the result, this time creating a square output image.
6-62
Exploring a Conformal Mapping
Notice that the local shapes of objects in the output image are preserved. The alder berries stayed
round!
6-63
6 Geometric Transformations
This example uses the MRI data set that comes with MATLAB® and that is used in the help examples
for both montage and immovie. Loading mri.mat adds two variables to the workspace: D (128-
by-128-by-1-by-27, class uint8) and a grayscale colormap, map (89-by-3, class double).
D comprises 27 128-by-128 horizontal slices from an MRI data scan of a human cranium. Values in D
range from 0 through 88, so the colormap is needed to generate a figure with a useful visual range.
The dimensionality of D makes it compatible with montage. The first two dimensions are spatial. The
third dimension is the color dimension, with size 1 because it indexes into the color map. (size(D,3)
would be 3 for an RGB image sequence.) The fourth dimension is temporal (as with any image
sequence), but in this particular case it is also spatial. So there are three spatial dimensions in D and
we can use imtransform or tformarray to convert the horizontal slices to sagittal slices (showing
the view from the side of the head) or coronal (frontal) slices (showing the view from the front or
back of the head).
An important factor is that the sampling intervals are not the same along the three dimensions:
samples along the vertical dimension (4) are spaced 2.5 times more widely than along the horizontal
dimensions.
Load the MRI data set and view the 27 horizontal slices as a montage.
load mri;
montage(D,map)
title('Horizontal Slices');
6-64
Exploring Slices from a 3-Dimensional MRI Data Set
We can construct a mid-sagittal slice from the MRI data by taking a subset of D and transforming it to
account for the different sampling intervals and the spatial orientation of the dimensions of D.
The following statement extracts all the data needed for a midsagittal slice.
M1 = D(:,64,:,:); size(M1)
ans = 1×4
128 1 1 27
6-65
6 Geometric Transformations
ans = 1×2
128 27
figure, imshow(M2,map);
title('Sagittal - Raw Data');
We can obtain a much more satisfying view by transforming M2 to change its orientation and increase
the sampling along the vertical (inferior-superior) dimension by a factor of 2.5 -- making the sampling
interval equal in all three spatial dimensions. We could do this in steps starting with a transpose, but
the following affine transformation enables a single-step transformation and more economical use of
memory.
The upper 2-by-2 block of the matrix passed to maketform, [0 -2.5;1 0], combines the rotation
and scaling. After transformation we have:
The call
imtransform(M2,T0,'cubic')
would suffice to apply T to M2 and provide good resolution while interpolating along the top to bottom
direction. However, there is no need for cubic interpolation in the front to back direction, since no
resampling will occur along (output) dimension 2. Therefore we specify nearest-neighbor resampling
in this dimension, with greater efficiency and identical results.
R2 = makeresampler({'cubic','nearest'},'fill');
M3 = imtransform(M2,T0,R2);
figure, imshow(M3,map);
title('Sagittal - IMTRANSFORM')
6-66
Exploring Slices from a 3-Dimensional MRI Data Set
Step 3: Extract Sagittal Slice from the Horizontal Slices Using TFORMARRAY
In this step we obtain the same result as step 2, but use tformarray to go from three spatial
dimensions to two in a single operation. Step 2 does start with an array having three spatial
dimensions and end with an array having two spatial dimensions, but intermediate two-dimensional
images (M1 and M2) pave the way for the call to imtransform that creates M3. These intermediate
images are not necessary if we use tformarray instead of imtransform. imtransform is very
convenient for 2-D to 2-D transformations, but tformarray supports N-D to M-D transformations,
where M need not equal N.
Through its TDIMS_A argument, tformarray allows us to define a permutation for the input array.
Since we want to create an image with:
and extract just a single sagittal plane via the original dimension 2, we specify tdims_a = [4 1 2]. We
create a tform via composition starting with a 2-D affine transformation T1 that scales the (new)
dimension 1 by a factor of -2.5 and adds a shift of 68.5 to keep the array coordinates positive. The
second part of the composite is a custom transformation T2 that extracts the 64th sagittal plane using
a very simple INVERSE_FCN.
We use the same approach to resampling as before, but include a third dimension.
R3 = makeresampler({'cubic','nearest','nearest'},'fill');
tformarray transforms the three spatial dimensions of D to a 2-D output in a single step. Our output
image is 66-by-128, with the original 27 planes expanding to 66 in the vertical (inferior-superior)
direction.
figure, imshow(M4,map);
title('Sagittal - TFORMARRAY');
6-67
6 Geometric Transformations
We create a 4-D array (the third dimension is the color dimension) that can be used to generate an
image sequence that goes from left to right, starts 30 planes in, skips every other plane, and has 35
frames in total. The transformed array has:
As in the previous step, we permute the input array using TDIMS_A = [4 1 2], again flipping and
rescaling/resampling the vertical dimension. Our affine transformation is the same as the T1 above,
except that we add a third dimension with a (3,3) element of 0.5 and (4,3) element of -14 chosen to
map 30, 32, ... 98 to 1, 2, ..., 35. This centers our 35 frames on the mid-sagittal slice.
In our call to tformarray, TSIZE_B = [66 128 35] now includes the 35 frames in the 4th, left-to-
right dimension (which is the third transform dimension). The resampler remains the same.
View the sagittal slices as a montage (padding the array slightly to separate the elements of the
montage).
S2 = padarray(S,[6 0 0 0],0,'both');
figure, montage(S2,map)
title('Sagittal Slices');
6-68
Exploring Slices from a 3-Dimensional MRI Data Set
Constructing coronal slices is almost the same as constructing sagittal slices. We change TDIMS_A
from [4 1 2] to [4 2 1]. We create a series of 45 frames, starting 8 planes in and moving from
back to front, skipping every other frame. The dimensions of the output array are ordered as follows:
In our call to tformarray, TSIZE_B = [66 128 48] specifies the vertical, side-to-side, and front-to-
back dimensions, respectively. The resampler remains the same.
6-69
6 Geometric Transformations
Note that all array permutations and flips in steps 3, 4, and 5 were handled as part of the
tformarray operation.
View the coronal slices as a montage (padding the array slightly to separate the elements of the
montage).
C2 = padarray(C,[6 0 0 0],0,'both');
figure, montage(C2,map)
title('Coronal Slices');
6-70
Padding and Shearing an Image Simultaneously
In two dimensions, a simple shear transformation that maps a pair of input coordinates [u v] to a
pair of output coordinates [x y] has the form
x = u + a*v
y=v
where a is a constant.
Any simple shear is a special case of an affine transformation. You can easily verify that
100
[x y 1] = [u v 1] * a 1 0
001
yields the values for x and y that you received from the first two equations.
a = 0.45;
T = maketform('affine', [1 0 0; a 1 0; 0 0 1] );
A = imread('football.jpg');
h1 = figure; imshow(A); title('Original Image');
6-71
6 Geometric Transformations
B = imtransform(A,T,'cubic','FillValues',orange);
but this is wasteful since we would apply cubic interpolation along both columns and rows. (With our
pure shear transform, we really only need to interpolate along each row.) Instead, we create and use
a resampler that applies cubic interpolation along the rows but simply uses nearest neighbor
interpolation along the columns, then call imtransform and display the result.
R = makeresampler({'cubic','nearest'},'fill');
B = imtransform(A,T,R,'FillValues',orange);
h2 = figure; imshow(B);
title('Sheared Image');
6-72
Padding and Shearing an Image Simultaneously
Transforming a grid of straight lines or an array of circles with tformfwd is a good way to
understand a transformation (as long as it has both forward and inverse functions).
Define a grid of lines covering the original image, and display it over the image Then use tformfwd
to apply the pure shear to each line in the grid, and display the result over the sheared image.
[U,V] = meshgrid(0:64:320,0:64:256);
[X,Y] = tformfwd(T,U,V);
gray = 0.65 * [1 1 1];
figure(h1);
hold on;
line(U, V, 'Color',gray);
line(U',V','Color',gray);
6-73
6 Geometric Transformations
figure(h2);
hold on;
line(X, Y, 'Color',gray);
line(X',Y','Color',gray);
6-74
Padding and Shearing an Image Simultaneously
6-75
6 Geometric Transformations
When we applied the shear transformation, imtransform filled in the orange triangles to the left and
right, where there was no data. That's because we specified a pad method of 'fill' when calling
makeresampler. There are a total of five different pad method choices ('fill', 'replicate',
'bound', 'circular', and 'symmetric'). Here we compare the first three.
First, to get a better look at how the 'fill' option worked, use the 'XData' and 'YData' options
in imtransform to force some additional space around the output image.
R = makeresampler({'cubic','nearest'},'fill');
figure, imshow(Bf);
title('Pad Method = ''fill''');
6-76
Padding and Shearing an Image Simultaneously
Now, try the 'replicate' method (no need to specify fill values in this case).
R = makeresampler({'cubic','nearest'},'replicate');
Br = imtransform(A,T,R,'XData',[-49 500],'YData', [-49 400]);
figure, imshow(Br);
title('Pad Method = ''replicate''');
6-77
6 Geometric Transformations
R = makeresampler({'cubic','nearest'}, 'bound');
Bb = imtransform(A,T,R,'XData',[-49 500],'YData',[-49 400],...
'FillValues',orange);
figure, imshow(Bb);
title('Pad Method = ''bound''');
6-78
Padding and Shearing an Image Simultaneously
Results with 'fill' and 'bound' look very similar, but look closely and you'll see that the edges are
smoother with 'fill'. That's because the input image is padded with the fill values, then the cubic
interpolation is applied across the edge, mixing fill and image values. In contrast, 'bound'
recognizes a strict boundary between the inside and outside of the input image. Points falling outside
are filled. Points falling inside are interpolated, using replication when they're near the edge. A close
up look helps show this more clearly. We choose XData and YData to bracket a point near the lower
right corner of the image, in the output image space, the resize with 'nearest' to preserve the
appearance of the individual pixels.
R = makeresampler({'cubic','nearest'},'fill');
Cf = imtransform(A,T,R,'XData',[423 439],'YData',[245 260],...
'FillValues',orange);
R = makeresampler({'cubic','nearest'},'bound');
Cb = imtransform(A,T,R,'XData',[423 439],'YData',[245 260],...
'FillValues',orange);
Cf = imresize(Cf,12,'nearest');
Cb = imresize(Cb,12,'nearest');
6-79
6 Geometric Transformations
figure;
subplot(1,2,1); imshow(Cf); title('Pad Method = ''fill''');
subplot(1,2,2); imshow(Cb); title('Pad Method = ''bound''');
The remaining two pad methods are 'circular' (circular repetition in each dimension) and
'symmetric' (circular repetition of the image with an appended mirror image). To show more of the
pattern that emerges, we redefine the transformation to cut the scale in half.
R = makeresampler({'cubic','nearest'},'circular');
Bc = imtransform(A,Thalf,R,'XData',[-49 500],'YData',[-49 400],...
'FillValues',orange);
figure, imshow(Bc);
title('Pad Method = ''circular''');
6-80
Padding and Shearing an Image Simultaneously
R = makeresampler({'cubic','nearest'},'symmetric');
Bs = imtransform(A,Thalf,R,'XData',[-49 500],'YData',[-49 400],...
'FillValues',orange);
figure, imshow(Bs);
title('Pad Method = ''symmetric''');
6-81
6 Geometric Transformations
6-82
7
Image Registration
This chapter describes the image registration capabilities of the Image Processing Toolbox software.
Image registration is the process of aligning two or more images of the same scene. Image
registration is often used as a preliminary step in other image processing applications.
Image registration is often used as a preliminary step in other image processing applications. For
example, you can use image registration to align satellite images or medical images captured with
different diagnostic modalities, such as MRI and SPECT. Image registration enables you to compare
common features in different images. For example, you might discover how a river has migrated, how
an area became flooded, or whether a tumor is visible in an MRI or SPECT image.
Image Processing Toolbox offers three image registration approaches: an interactive Registration
Estimator app, intensity-based automatic image registration, and control point registration. Computer
Vision Toolbox offers automated feature detection and matching.
7-2
Approaches to Registering Images
a quantitative measure of quality, and it returns the registered image and the transformation matrix.
The app also generates code with your selected registration technique and settings, so you can apply
an identical transformation to multiple images.
Registration Estimator app offers six feature-based techniques, three intensity-based techniques, and
one nonrigid registration technique. For a more detailed comparison of the available techniques, see
“Techniques Supported by Registration Estimator App” on page 7-22.
To register images using an intensity-based technique, use imregister and specify the type of
geometric transformation to apply to the moving image. imregister iteratively adjusts the
transformation to optimize the similarity of the two images.
Alternatively, you can estimate a localized displacement field and apply a nonrigid transformation to
the moving image using imregdemons.
7-3
7 Image Registration
• You want to prioritize the alignment of specific features, rather than the entire set of features
detected using automated feature detection. For example, when registering two medical images,
you can focus the alignment on desired anatomical features and disregard matched features that
correspond to less informative anatomical structures.
• Images have repeated patterns that provide an unclear mapping using automated feature
matching. For example, photographs of buildings with many windows, or aerial photographs of
gridded city streets, have many similar features that are challenging to map automatically. In this
case, manual selection of control point pairs can provide a clearer mapping of features, and thus a
better transformation to align the feature points.
Control point registration can apply many types of transformations to the moving image. Global
transformations, which act on the entire image uniformly, include affine, projective, and polynomial
geometric transformations. Nonrigid transformations, which act on local regions, include piecewise
linear and local weighted mean transformations.
Use the Control Point Selection Tool to select control points. Start the tool with cpselect.
7-4
Approaches to Registering Images
For an example, see “Find Image Rotation and Scale Using Automated Feature Matching” (Computer
Vision Toolbox). You must have Computer Vision Toolbox to use this method.
Note The Registration Estimator app offers six feature-based techniques to register a single pair of
images. However, the app does not provide an automated workflow to register multiple images.
See Also
imregister | imwarp
Related Examples
• “Register Images Using Registration Estimator App” on page 7-6
• “Register Multimodal MRI Images” on page 7-32
• “Register Images with Projection Distortion Using Control Points” on page 7-80
7-5
7 Image Registration
Create two misaligned images in the workspace. This example creates the moving image J by
rotating the fixed image I clockwise by 30 degrees.
I = imread('cameraman.tif');
J = imrotate(I,-30);
In this example, you can open Registration Estimator from the command window because the
images have no spatial referencing information or initial transformation estimate. Run this command
in the command window, and specify the moving image and the fixed image as the two input
arguments.
registrationEstimator(J,I)
If your images have spatial referencing information, or if you want to specify an initial transformation
estimate, then you must load the images using a dialog window. For more information, see“Load
Images, Spatial Referencing Information, and Initial Transformation” on page 7-14.
You can also open Registration Estimator from the MATLAB™ Toolstrip. Open the Apps tab and
click Registration Estimator under Image Processing and Computer Vision. If you open the app
from the toolstrip, you must load the images using a dialog window.
After you load the images, the app displays an overlay of the images and creates three registration
trials: Phase Correlation, MSER, and SURF. These trials appear as drafts in the history list. You
can click on each trial to adjust the registration settings. To create a trial for a different registration
technique, select a technique from the Technique menu.
The default Green-Magenta overlay style shows the fixed image in green and the moving image in
magenta. The overlay looks gray in areas where the two images have similar intensity. Additional
overlay styles assist with visualizing the results of the registration. When you click a feature-based
technique in the history list, the image overlay displays a set of red and green dots connected by
yellow lines. These points are the matched features used to align the images.
7-6
Register Images Using Registration Estimator App
Run the three default registration trials with the default settings. Click each trial in the history list,
then click Register Images.
After the registration finishes, the trial displays a quality score and computation time. The quality
score is based loosely on the ssim function and provides an overall estimate of registration quality. A
score closer to 1 indicates a higher quality registration. Different registration techniques and settings
can yield similar quality scores but show error in different regions of the image. Inspect the image
overlay to confirm which registration technique is the most acceptable. Colors in the image overlay
indicate residual misalignment.
Note: due to randomness in the registration optimizer, the quality score, registered image, and
geometric transformation can vary slightly between trials despite identical registration settings.
7-7
7 Image Registration
After you have an initial registration estimate, adjust registration settings to improve the quality of
the alignment. For more information on available settings, see “Tune Registration Settings in
Registration Estimator App” on page 7-17. If you know the conditions under which the images were
obtained, then you can select a different transformation type or clear the Has Rotation option. Post-
processing using nonrigid transformations is available for advanced workflows.
Adjust the settings of the MSER trial. Try increasing the number of detected features and the quality
of matched features independently to see if either improves the quality of the registration.
To increase the number of detected features, click the MSER trial, numbered 2, in the history list. In
the Current Registration Settings panel, drag the Number of Detected Features slider to the right.
When you change the setting, the app creates a new trial, numbered 2.1, in the history list. The
image overlay shows more matched features, as expected.
7-8
Register Images Using Registration Estimator App
To run the registration with these settings, click Register Images. The quality metric of this trial is
less than the quality of the original MSER trial with the default number of matched features. The
image overlay of this trial has an overall magenta tint and a thick green strip along the top of the
man's head and shoulder. Therefore, increasing the number of detected features does not necessarily
improve the quality of the registration.
7-9
7 Image Registration
To see the effect of increasing the quality of matched features, click the MSER trial 2 (not 2.1) in the
history list. In the Current Registration Settings panel, drag the Quality of Matched Features
slider to the right. When you change the setting, the app creates a new trial, numbered 2.2, in the
history list. The image overlay displays a smaller number of high quality matched points.
7-10
Register Images Using Registration Estimator App
To see the registration with these settings, click Register Images. Compared to the other MSER
trials, this trial has the best quality score. There is not a noticeable difference in the visual quality of
the image compared to the original MSER trial with default settings. If you want to see which pixels
differ between the default MSER trial and this trial, change the overlay style to Difference and toggle
between the two trials.
7-11
7 Image Registration
When you find an acceptable registration, export the registered image and the geometric
transformation to the workspace. You can use the registration results to apply a similar registration
to multiple frames in an image sequence. To learn more, see “Export Results from Registration
Estimator App” on page 7-20.
This example exports trial 2.2 because it has the best quality score and no severe regions of
misalignment. Click trial 2.2 in the history list, then click Export and select Export Images. In the
Export to Workspace dialog box, assign a name to the registration output. The output is a structure
that contains the final registered image and the geometric transformation.
See Also
Registration Estimator
7-12
Register Images Using Registration Estimator App
More About
• “Approaches to Registering Images” on page 7-2
• “Techniques Supported by Registration Estimator App” on page 7-22
7-13
7 Image Registration
Loading images from file supports only BMP, JPG, JPEG, TIF, TIFF, PNG, and DCM file types. To work
with a wider range of file formats, load images from the workspace. Registration Estimator
supports any image read into the workspace by the imread function and DICOM image read into the
workspace using the dicomread function.
Load images into the app by clicking the Load Images icon.
• To load images from a file, select the Load from file option. In the dialog box, specify the file
path of the moving and fixed images. Use Browse to navigate to a folder.
7-14
Load Images, Spatial Referencing Information, and Initial Transformation
• To load variables from the workspace, select the Load from workspace option. In the dialog
box, select the name of the variable containing the moving image from the Moving Image menu
and the variable containing the fixed image from the Fixed Image menu.
Note If you load DICOM images into the workspace using dicomread, spatial referencing
information in the metadata is no longer associated with the image data. To preserve spatial
referencing information with DICOM images, either load the images from file or create an imref2d
object from the image metadata. For more information about DICOM metadata, see “Read Metadata
from DICOM Files” on page 3-8.
If you do not have spatial referencing information, then the Spatial Referencing Object and DICOM
Metadata radio buttons are inactive.
7-15
7 Image Registration
If you do not have a geometric transformation object in your workspace, the Initial Transformation
Object selection box is inactive.
See Also
Functions
dicomread | imread
Classes
affine2d | imref2d | projective2d
Related Examples
• “Register Images Using Registration Estimator App” on page 7-6
7-16
Tune Registration Settings in Registration Estimator App
Note Due to randomness in the registration optimizer, the quality metric, registered image, and
geometric transformation can vary slightly between trials despite identical registration settings.
• Translation transformations preserve the size and orientation of the image. Each pixel in the
image is displaced the same amount in the same direction.
• Rigid transformations include rotation and translation. Rigid transformations preserve length.
Note Although reflection is a type of rigid transformation, Registration Estimator app does not
support reflection.
• Similarity transformations include isotropic scaling, rotation, and translation. Similarity
transformations preserve shape, but not size. When used with a featured-based registration
technique, at least two matched pairs of points are required.
• Affine transformations include shear and all supported similarity transformations. Affine
transformations preserve parallel lines, but not necessarily angles between lines or distances
between points. When used with a featured-based registration technique, at least three matched
pairs of points are required.
• Projective transformations allow tilting in addition to all supported affine transformations. When
used with a featured-based registration technique, at least four matched pairs of points are
required.
7-17
7 Image Registration
• Number of Detected Features. The transformation type determines the minimum number of
matched features required to perform a registration. Similarity transformations require two or
more matched features. Affine transformations require three or more matched features. Projective
transformations require four or more matched features.
• Quality of matched features. The quality value is a combination of matched features options.
• Rotation. By default, feature-based registration allows the moving image to rotate. However, some
imaging scenarios, such as stereoscopy, produce images with identical rotation. If your images
have the same rotation, clearing this option can improve the accuracy of the registration.
• Normalize. This option scales the pixel values of both images to the same dynamic range.
• Apply Gaussian blur. Smoothing the images with a Gaussian blur can help the optimizer find the
global maximum or minimum of the solution surface. However, smoothing changes the shape of
the surface, and over-smoothing can shift the position of the extrema. Large amounts of blurring
are useful when the images are severely misaligned at the start of the registration, to help the
optimizer search the correct basin of attraction. Small amounts of blurring are useful when the
images start with close alignment.
• Align centers. This option provides an initial transformation that aligns the world coordinates of
the centers of the two images. The geometric option aligns the geometric centers, based on the
spatial referencing information of the images. The center of mass option aligns the centers of
mass, calculated from the weighted mean of pixel intensities.
Monomodal registration enables you to adjust the properties of the regular step gradient descent
optimizer. For more information about the properties of this optimizer, see
RegularStepGradientDescent.
Multimodal registration enables you to adjust the properties of the one plus one evolutionary
optimizer. For more information about the properties of this optimizer, see
OnePlusOneEvolutionary.
Phase correlation enables you to choose to window the frequency-domain representation of the
images. Windowing increases the stability of registration results. If the common features you are
trying to align in your images are oriented along the edges, clearing this option can improve
registration results. For more information about using phase correlation to transform an image, see
imregcorr.
• Number of iterations. This value is the number of iterations on each pyramid level.
7-18
Tune Registration Settings in Registration Estimator App
• Pyramid levels. The value represents the number of Gaussian pyramid reduction levels. The
maximum number of pyramid levels depends on the size of each dimension in the images. For
example, when the shortest dimension of the fixed and moving images is 256 pixels, at most eight
pyramid levels can be used. For more information about pyramid reduction, see impyramid.
• Smoothing. The value represents the standard deviation of Gaussian smoothing and remains the
same at each pyramid level. Values are in the range [0.5, 3]. Larger values result in smoother
output displacement fields. Smaller values result in more localized deformation in the output
displacement field.
Note Although isotropic scaling and shearing are nonrigid transformations from a mathematical
perspective, these transformations act globally on an image. Enable scaling and shearing in the
Registration Estimator app by selecting an affine or projective transformation type, not by applying a
nonrigid transformation.
See Also
imregcorr | imregdemons
Related Examples
• “Register Images Using Registration Estimator App” on page 7-6
More About
• “Techniques Supported by Registration Estimator App” on page 7-22
• “Matrix Representation of Geometric Transformations” on page 6-17
7-19
7 Image Registration
• Export the registered image and the geometric transformation to the workspace. Apply an
identical geometric transformation to other images using imwarp.
• Generate a function with the desired registration technique and settings. Call this function to
register other images using the same settings.
Generate a Function
To generate MATLAB code that registers images using the desired registration technique and
settings, click the corresponding trial in the history list, then click Export. Select the Generate
Function option. The app opens the MATLAB editor containing a function with the autogenerated
code. To save the code, click Save in the MATLAB editor.
Note If you generate a function using a feature-based registration technique, then you must have
Computer Vision Toolbox to run the function.
The generated function accepts a moving and a fixed image as inputs. The function returns a
structure that contains the final registered image, the spatial referencing object, and the geometric
transformation of the registered image. If you generate a function using a feature-based registration
technique, then the output structure has two additional fields for the moving matched features and
the fixed matched features.
See Also
imwarp
Related Examples
• “Register Images Using Registration Estimator App” on page 7-6
7-20
Export Results from Registration Estimator App
More About
• “2-D and 3-D Geometric Transformation Process Overview” on page 6-13
7-21
7 Image Registration
Feature-Based Registration
Feature-based registration techniques automatically detect distinct image features such as sharp
corners, blobs, or regions of uniform intensity. The moving image undergoes a single global
transformation to provide the best alignment of corresponding features with the fixed image.
FAST detects corner features, especially in scenes of human origin such as streets and indoor
rooms. FAST supports single-scale images and point-tracking.
MinEigen also detects corner features. MinEigen supports single-scale images and point-
tracking.
Harris also detects corner features, using a more efficient algorithm than MinEigen. Harris
supports single-scale images and point-tracking.
BRISK also detects corner features. Unlike the preceding algorithms, BRISK supports changes
in scale and rotation, and point-tracking.
SURF detects blobs in images and supports changes in scale and rotation.
KAZE detects multiscale blob features from a scale space constructed using nonlinear diffusion.
MSER detects regions of uniform intensity. MSER supports changes in scale and rotation, and is
more robust to affine transformations than the other feature-based algorithms.
In Registration Estimator, you can register images and generate functions for all feature-based
techniques without a Computer Vision Toolbox license. However, to run an autogenerated function
that uses a feature-based registration technique, you must have Computer Vision Toolbox. For more
information, see “Export Results from Registration Estimator App” on page 7-20.
Intensity-Based Registration
Intensity-based registration techniques correlate image intensity in the spatial or frequency domain.
The moving image undergoes a single global transformation to maximize the correlation of its
intensity with the intensity of the fixed image.
7-22
Techniques Supported by Registration Estimator App
Monomodal intensity registers images with similar brightness and contrast that are captured on
the same type of scanner or sensor. For example, use monomodal intensity to register MRI scans
taken of similar subjects using the same imaging sequence.
Multimodal intensity registers images with different brightness and contrast. These images can
come from two different types of devices, such as two camera models or two types of medical imaging
systems (such as CT and MRI). These images can also come from a single device. For example, use
multimodal intensity to register images taken with the same camera using different exposure
settings, or to register MRI images acquired during a single session using different imaging
sequences.
Phase correlation registers images in the frequency domain. Like multimodal intensity, phase
correlation is invariant to image brightness. Phase correlation is more robust to noise than the other
intensity-based registration techniques.
Note Phase correlation provides better results when the aspect ratio of each image is square.
Nonrigid Registration
See Also
Related Examples
• “Register Images Using Registration Estimator App” on page 7-6
• “Approaches to Registering Images” on page 7-2
7-23
7 Image Registration
The process begins with the transform type you specify and an internally determined transformation
matrix. Together, they determine the specific image transformation that is applied to the moving
image with bilinear interpolation.
Next, the metric compares the transformed moving image to the fixed image and a metric value is
computed.
Finally, the optimizer checks for a stop condition. A stop condition is anything that warrants the
termination of the process. In most cases, the process stops when it reaches a point of diminishing
returns or when it reaches the specified maximum number of iterations. If there is no stop condition,
the optimizer adjusts the transformation matrix to begin the next iteration.
7-24
Intensity-Based Automatic Image Registration
See Also
Related Examples
• “Register Multimodal MRI Images” on page 7-32
More About
• “Use Phase Correlation as Preprocessing Step in Registration” on page 7-27
• “Approaches to Registering Images” on page 7-2
7-25
7 Image Registration
In addition, imregister supports two techniques for optimizing the image metric:
• One-plus-one evolutionary
• Regular step gradient descent
You can pass any combination of metric and optimizer to imregister, but some pairs are better
suited for some image classes. Refer to the table for help choosing an appropriate starting point.
Use imregconfig to create the default metric and optimizer for a capture scenario in one step. For
example, the following command returns the optimizer and metric objects suitable for registering
monomodal images.
[optimizer,metric] = imregconfig('monomodal');
Alternatively, you can create the objects individually. This enables you to create alternative
combinations to address specific registration issues. The following code creates the same monomodal
optimizer and metric combination.
optimizer = registration.optimizer.RegularStepGradientDescent();
metric = registration.metric.MeanSquares();
Getting good results from optimization-based image registration can require modifying optimizer or
metric settings. For an example of how to modify and use the metric and optimizer with imregister,
see “Register Multimodal MRI Images” on page 7-32.
See Also
imregconfig | imregister
7-26
Use Phase Correlation as Preprocessing Step in Registration
fixed = imread('cameraman.tif');
imshow(fixed);
Create an unregistered image by deliberately distorting this image using rotation, isotropic scaling,
and shearing in the y direction.
theta = 170;
rot = [cosd(theta) sind(theta) 0;...
sind(theta) cosd(theta) 0;...
0 0 1];
sc = 2.3;
scale = [sc 0 0; 0 sc 0; 0 0 1];
sh = 0.5;
shear = [1 sh 0; 0 1 0; 0 0 1];
tform = affine2d(shear*scale*rot);
moving = imwarp(fixed,tform);
7-27
7 Image Registration
Estimate the registration required to bring these two images into alignment. imregcorr returns an
affine2d object that defines the transformation.
7-28
Use Phase Correlation as Preprocessing Step in Registration
tformEstimate = imregcorr(moving,fixed);
Apply the estimated geometric transform to the misaligned image. Specify 'OutputView' to make
sure the registered image is the same size as the reference image. Display the original image and the
registered image side-by-side. You can see that imregcorr has done a good job handling the rotation
and scaling differences between the images. The registered image, movingReg, is very close to being
aligned with the original image, fixed. But some misalignment remains. imregcorr can handle
rotation and scale distortions well, but not shear distortion.
Rfixed = imref2d(size(fixed));
movingReg = imwarp(moving,tformEstimate,'OutputView',Rfixed);
imshowpair(fixed,movingReg,'montage');
View the aligned image overlaid on the original image, using imshowpair. In this view, imshowpair
uses color to highlight areas of misalignment.
imshowpair(fixed,movingReg,'falsecolor');
7-29
7 Image Registration
To finish the registration, use imregister, passing the estimated transformation returned by
imregcorr as the initial condition. imregister is more effective if the two images are roughly in
alignment at the start of the operation. The transformation estimated by imregcorr provides this
information for imregister. The example uses the default optimizer and metric values for a
registration of two images taken with the same sensor ( 'monomodal' ).
Display the result of this registration. Note that imregister has achieved a very accurate
registration, given the good initial condition provided by imregcorr.
imshowpair(fixed, movingRegistered,'Scaling','joint');
7-30
Use Phase Correlation as Preprocessing Step in Registration
See Also
imregconfig | imregcorr | imregister | imwarp
7-31
7 Image Registration
This example uses two MRI images of a knee. The fixed image is a spin echo image, while the moving
image is a spin echo image with inversion recovery. The two sagittal slices were acquired at the same
time but are slightly out of alignment.
fixed = dicomread('knee1.dcm');
moving = dicomread('knee2.dcm');
The imshowpair function is useful to visualize images during every part of the registration process.
Use it to see the two images individually in a montage fashion or display them stacked to show the
amount of misregistration.
imshowpair(moving,fixed,'montage')
title('Unregistered')
In the overlapping image from imshowpair, gray areas correspond to areas that have similar
intensities, while magenta and green areas show places where one image is brighter than the other.
In some image pairs, green and magenta areas do not always indicate misregistration, but in this
example it is easy to use the color information to see where they do.
imshowpair(moving,fixed)
title('Unregistered')
7-32
Register Multimodal MRI Images
The imregconfig function makes it easy to pick the correct optimizer and metric configuration to
use with imregister. The optimizer and metric variables are objects whose properties control the
registration. For more information, see “Create an Optimizer and Metric for Intensity-Based Image
Registration” on page 7-26.
These two images have different intensity distributions, which suggests a multimodal configuration.
[optimizer,metric] = imregconfig('multimodal');
The distortion between the two images includes scaling, rotation, and possibly shear. Use an affine
transformation to register the images.
movingRegisteredDefault = imregister(moving,fixed,'affine',optimizer,metric);
7-33
7 Image Registration
Display the result. It is very rare that imregister will align images perfectly with the default
settings. Nevertheless, using them is a useful way to decide which properties to tune first.
imshowpair(movingRegisteredDefault,fixed)
title('A: Default Registration')
The initial registration is not very good. There are still significant regions of poor alignment,
particularly along the right edge. Try to improve the registration by adjusting the optimizer and
metric configuration properties.
disp(optimizer)
7-34
Register Multimodal MRI Images
registration.optimizer.OnePlusOneEvolutionary
Properties:
GrowthFactor: 1.050000e+00
Epsilon: 1.500000e-06
InitialRadius: 6.250000e-03
MaximumIterations: 100
disp(metric)
registration.metric.MattesMutualInformation
Properties:
NumberOfSpatialSamples: 500
NumberOfHistogramBins: 50
UseAllPixels: 1
The InitialRadius property of the optimizer controls the initial step size used in parameter space
to refine the geometric transformation. When multimodal registration problems do not converge with
the default parameters, InitialRadius is a good first parameter to adjust. Start by reducing the
default value of InitialRadius by a scale factor of 3.5.
optimizer.InitialRadius = optimizer.InitialRadius/3.5;
movingRegisteredAdjustedInitialRadius = imregister(moving,fixed,'affine',optimizer,metric);
Display the result. Adjusting InitialRadius has a positive impact. There is a noticeable
improvement in the alignment of the images at the top and right edges.
imshowpair(movingRegisteredAdjustedInitialRadius,fixed)
title('B: Adjusted InitialRadius')
7-35
7 Image Registration
The MaximumIterations property of the optimizer controls the maximum number of iterations that
the optimizer will be allowed to take. Increasing MaximumIterations allows the registration search
to run longer and potentially find better registration results. Does the registration continue to
improve if the InitialRadius from the last step is used with a large number of iterations?
optimizer.MaximumIterations = 300;
movingRegisteredAdjustedInitialRadius300 = imregister(moving,fixed,'affine',optimizer,metric);
Display the results. Further improvement in registration were achieved by reusing the
InitialRadius optimizer setting from the previous registration and allowing the optimizer to take a
large number of iterations.
imshowpair(movingRegisteredAdjustedInitialRadius300,fixed)
title('C: Adjusted InitialRadius, MaximumIterations = 300')
7-36
Register Multimodal MRI Images
Optimization based registration works best when a good initial condition can be given for the
registration that relates the moving and fixed images. A useful technique for getting improved
registration results is to start with more simple transformation types like 'rigid', and then use the
resulting transformation as an initial condition for more complicated transformation types like
'affine'.
The function imregtform uses the same algorithm as imregister, but returns a geometric
transformation object as output instead of a registered output image. Use imregtform to get an
initial transformation estimate based on a 'similarity' model (translation, rotation, and scale).
7-37
7 Image Registration
The previous registration results showed an improvement after modifying the MaximumIterations
and InitialRadius properties of the optimizer. Keep these optimizer settings while using initial
conditions while attempting to refine the registration further.
tformSimilarity = imregtform(moving,fixed,'similarity',optimizer,metric);
Because the registration is being solved in the default coordinate system, also known as the intrinsic
coordinate system, obtain the default spatial referencing object that defines the location and
resolution of the fixed image.
Rfixed = imref2d(size(fixed));
Use imwarp to apply the geometric transformation output from imregtform to the moving image to
align it with the fixed image. Use the 'OutputView' option in imwarp to specify the world limits and
resolution of the output resampled image. Specifying Rfixed as the 'OutputView' forces the
resampled moving image to have the same resolution and world limits as the fixed image.
movingRegisteredRigid = imwarp(moving,tformSimilarity,'OutputView',Rfixed);
imshowpair(movingRegisteredRigid, fixed)
title('D: Registration Based on Similarity Transformation Model')
7-38
Register Multimodal MRI Images
The 'T' property of the output geometric transformation defines the transformation matrix that
maps points in moving to corresponding points in fixed.
tformSimilarity.T
ans = 3×3
1.0331 -0.1110 0
0.1110 1.0331 0
-51.1491 6.9891 1.0000
7-39
7 Image Registration
for the geometric transformation. This refined estimate for the registration includes the possibility of
shear.
movingRegisteredAffineWithIC = imregister(moving,fixed,'affine',optimizer,metric,...
'InitialTransformation',tformSimilarity);
Display the result. Using the 'InitialTransformation' to refine the 'similarity' result of
imregtform with a full affine model yields a nice registration result.
imshowpair(movingRegisteredAffineWithIC,fixed)
title('E: Registration from Affine Model Based on Similarity Initial Condition')
7-40
Register Multimodal MRI Images
Comparing the results of running imregister with different configurations and initial conditions, it
becomes apparent that there are a large number of input parameters that can be varied in imregister,
each of which may lead to different registration results.
It can be difficult to quantitatively compare registration results because there is no one quality metric
that accurately describes the alignment of two images. Often, registration results must be judged
qualitatively by visualizing the results. In the results above, the registration results in C) and E) are
both very good and are difficult to tell apart visually.
Often as the quality of multimodal registrations improve it becomes more difficult to judge the quality
of registration visually. This is because the intensity differences can obscure areas of misalignment.
Sometimes switching to a different display mode for imshowpair exposes hidden details. (This is not
always the case.)
See Also
MattesMutualInformation | MeanSquares | OnePlusOneEvolutionary |
RegularStepGradientDescent | imref2d | imregconfig | imregister | imwarp
7-41
7 Image Registration
This example uses imregister, imregtform and imwarp to automatically align two volumetric
datasets: a CT image and a T1 weighted MR image collected from the same patient at different times.
Unlike some other techniques, imregister and imregtform do not find features or use control
points. Intensity-based registration is often well-suited for medical and remotely sensed imagery.
The 3-D CT and MRI datasets used in this example were provided by Dr. Michael Fitzpatrick as part
of The Retrospective Image Registration Evaluation (RIRE) Dataset.
This example uses two 3-D images of the same patient's head. In registration problems, we consider
one image to be the fixed image and the other image to be the moving image. The goal of registration
is to align the moving image with the fixed image. In this example, the fixed image is a T1 weighted
MRI image. The moving image that we want to register is a CT image. The data is stored in the file
format used by the Retrospective Image Registration Evaluation (RIRE) Project. Use multibandread
to read the binary files that contain image data. Use the helperReadHeaderRIRE function to obtain
the metadata associated with each image. You can use the following link to find more information
about the RIRE file format: RIRE data format
fixedHeader = helperReadHeaderRIRE('rirePatient007MRT1.header');
movingHeader = helperReadHeaderRIRE('rirePatient007CT.header');
fixedVolume = multibandread('rirePatient007MRT1.bin',...
[fixedHeader.Rows, fixedHeader.Columns, fixedHeader.Slices],...
'int16=>single', 0, 'bsq', 'ieee-be' );
movingVolume = multibandread('rirePatient007CT.bin',...
[movingHeader.Rows, movingHeader.Columns, movingHeader.Slices],...
'int16=>single', 0, 'bsq', 'ieee-be' );
The helperVolumeRegistration function is a helper function that is provided to help judge the
quality of 3-D registration results. You can interactively rotate the view and both axes will remain in
sync.
helperVolumeRegistration(fixedVolume,movingVolume);
7-42
Register Multimodal 3-D Medical Images
You can also use imshowpair to look at single planes from the fixed and moving volumes to get a
sense of the overall alignment of the volumes. In the overlapping image from imshowpair, gray
areas correspond to areas that have similar intensities, while magenta and green areas show places
where one image is brighter than the other. Use imshowpair to observe the misregistration of the
image volumes along an axial slice taken through the center of each volume.
centerFixed = size(fixedVolume)/2;
centerMoving = size(movingVolume)/2;
figure
imshowpair(movingVolume(:,:,centerMoving(3)), fixedVolume(:,:,centerFixed(3)));
title('Unregistered Axial Slice')
7-43
7 Image Registration
The imregconfig function makes it easy to pick the correct optimizer and metric configuration to
use with imregister. These two images are from two different modalities, MRI and CT, so the
'multimodal' option is appropriate.
[optimizer,metric] = imregconfig('multimodal');
The algorithm used by imregister will converge to better results more quickly when spatial
referencing information about the resolution and/or location of the input imagery is specified. In this
case, the resolution of the CT and MRI datasets is defined in the image metadata. Use this metadata
to construct imref3d spatial referencing objects that we will pass as input arguments for
registration.
7-44
Register Multimodal 3-D Medical Images
Rfixed = imref3d(size(fixedVolume),fixedHeader.PixelSize(2),fixedHeader.PixelSize(1),fixedHeader
Rmoving = imref3d(size(movingVolume),movingHeader.PixelSize(2),movingHeader.PixelSize(1),movingHe
The properties of the spatial referencing objects define where the associated image volumes are in
the world coordinate system and what the pixel extent in each dimension is. The XWorldLimits
property of Rmoving defines the position of the moving volume in the X dimension. The
PixelExtentInWorld property defines the size of each pixel in world units in the X dimension (along
columns). The moving volume extends from 0.3269 to 334.97 in the world X coordinate system and
each pixel has an extent of 0.6536mm. Units are in millimeters because the header information used
to construct the spatial referencing was in millimeters. The ImageExtentInWorldX property
determines the full extent in world units of the moving image volume in world units.
Rmoving.XWorldLimits
ans = 1×2
0.3268 334.9674
Rmoving.PixelExtentInWorldX
ans = 0.6536
Rmoving.ImageExtentInWorldX
ans = 334.6406
The misalignment between the two volumes includes translation, scaling, and rotation. Use a
similarity transformation to register the images.
Start by using imregister to obtain a registered output image volume that you can view and
observe directly to access the quality of registration results.
Specify a non-default setting for the InitialRadius property of the optimizer to achieve better
convergence in registration results.
optimizer.InitialRadius = 0.004;
movingRegisteredVolume = imregister(movingVolume,Rmoving, fixedVolume,Rfixed, 'rigid', optimizer,
Use imshowpair again and repeat the process of examining the alignment of an axial slice taken
through the center of the registered volumes to get a sense of how successful the registration is.
figure
imshowpair(movingRegisteredVolume(:,:,centerFixed(3)), fixedVolume(:,:,centerFixed(3)));
title('Axial Slice of Registered Volume')
7-45
7 Image Registration
From the axial slice above, it looks like the registration was successful. Use
helperVolumeRegistration again to view the registered volume to continue judging the success
of registration.
helperVolumeRegistration(fixedVolume,movingRegisteredVolume);
7-46
Register Multimodal 3-D Medical Images
Step 3: Get 3-D Geometric Transformation That Aligns Moving With Fixed.
The imregtform function can be used when you are interested in the geometric transformation
estimate that is used by imregister to form the registered output image. imregtform uses the
same algorithm as imregister and takes the same input arguments as imregister. Since visual
inspection of the resulting volume from imregister indicated that the registration was successful,
you can call imregtform with the same input arguments to get the geometric transformation
associated with this registration result.
geomtform =
affine3d with properties:
T: [4x4 double]
Dimensionality: 3
The result of imregtform is a geometric transformation object. This object includes a property, T,
that defines the 3-D affine transformation matrix.
geomtform.T
ans = 4×4
7-47
7 Image Registration
The transformPointsForward function can be used to determine where a point [u,v,w] in the
moving image maps as a result of the registration. Because spatially referenced inputs were specified
to imregtform, the geometric transformation maps points in the world coordinate system from
moving to fixed. The transformPointsForward function is used below to determine the
transformed location of the center of the moving image in the world coordinate system.
centerXWorld = mean(Rmoving.XWorldLimits);
centerYWorld = mean(Rmoving.YWorldLimits);
centerZWorld = mean(Rmoving.ZWorldLimits);
[xWorld,yWorld,zWorld] = transformPointsForward(geomtform,centerXWorld,centerYWorld,centerZWorld)
You can use the worldToSubscript function to determine the element of the fixed volume that
aligns with the center of the moving volume.
[r,c,p] = worldToSubscript(Rfixed,xWorld,yWorld,zWorld)
r = 116
c = 132
p = 13
The imwarp function can be used to apply the geometric transformation estimate from imregtform to
a 3-D volume. The 'OutputView' name-value argument is used to define a spatial referencing
argument that determines the world limits and resolution of the output resampled image. You can
produce the same results given by imregister by using the spatial referencing object associated
with the fixed image. This creates an output volume in which the world limits and resolution of the
fixed and moving image are the same. Once the world limits and resolution of both volumes are the
same, there is pixel to pixel correspondence between each sample of the moving and fixed volumes.
movingRegisteredVolume = imwarp(movingVolume,Rmoving,geomtform,'bicubic','OutputView',Rfixed);
Use imshowpair again to view an axial slice through the center of the registered volume produced
by imwarp.
figure
imshowpair(movingRegisteredVolume(:,:,centerFixed(3)), fixedVolume(:,:,centerFixed(3)));
title('Axial Slice of Registered Volume')
7-48
Register Multimodal 3-D Medical Images
See Also
imref3d | imregconfig | imregister | imregtform | imwarp
7-49
7 Image Registration
onion = imread('onion.png');
peppers = imread('peppers.png');
imshow(onion)
figure, imshow(peppers)
7-50
Registering an Image Using Normalized Cross-Correlation
It is important to choose regions that are similar. The image sub_onion will be the template, and
must be smaller than the image sub_peppers. You can get these sub regions using either the non-
interactive script below or the interactive script.
% non-interactively
rect_onion = [111 33 65 58];
rect_peppers = [163 47 143 151];
sub_onion = imcrop(onion,rect_onion);
sub_peppers = imcrop(peppers,rect_peppers);
% OR
% interactively
%[sub_onion,rect_onion] = imcrop(onion); % choose the pepper below the onion
%[sub_peppers,rect_peppers] = imcrop(peppers); % choose the whole onion
7-51
7 Image Registration
figure, imshow(sub_peppers)
Calculate the normalized cross-correlation and display it as a surface plot. The peak of the cross-
correlation matrix occurs where the sub_images are best correlated. normxcorr2 only works on
grayscale images, so we pass it the red plane of each sub image.
c = normxcorr2(sub_onion(:,:,1),sub_peppers(:,:,1));
figure, surf(c), shading flat
7-52
Registering an Image Using Normalized Cross-Correlation
The total offset or translation between images depends on the location of the peak in the cross-
correlation matrix, and on the size and position of the sub images.
% offset found by correlation
[max_c, imax] = max(abs(c(:)));
[ypeak, xpeak] = ind2sub(size(c),imax(1));
corr_offset = [(xpeak-size(sub_onion,2))
(ypeak-size(sub_onion,1))];
% total offset
offset = corr_offset + rect_offset;
xoffset = offset(1);
yoffset = offset(2);
Step 5: See if the Onion Image was Extracted from the Peppers Image
7-53
7 Image Registration
Step 6: Pad the Onion Image to the Size of the Peppers Image
Pad the onion image to overlay on peppers, using the offset determined above.
recovered_onion = uint8(zeros(size(peppers)));
recovered_onion(ybegin:yend,xbegin:xend,:) = onion;
figure, imshow(recovered_onion)
Display one plane of the peppers image with the recovered_onion image using alpha blending.
figure, imshowpair(peppers(:,:,1),recovered_onion,'blend')
7-54
Registering an Image Using Normalized Cross-Correlation
7-55
7 Image Registration
Note You may need to perform several iterations of this process, experimenting with different types
of transformations, before you achieve a satisfactory result. Sometimes, you can perform successive
registrations, removing gross global distortions first, and then removing smaller local distortions in
subsequent passes.
The following figure provides a graphic illustration of this process. See “Register Images with
Projection Distortion Using Control Points” on page 7-80 for an extended example.
7-56
Control Point Registration
See Also
cpcorr | cpselect | fitgeotrans | imwarp
Related Examples
• “Register Images with Projection Distortion Using Control Points” on page 7-80
More About
• “Control Point Selection Procedure” on page 7-60
• “Approaches to Registering Images” on page 7-2
7-57
7 Image Registration
For control point registration, the fitgeotrans function can infer the parameters for the following
types of transformations, listed in order of complexity.
Your choice of transformation type affects the number of control point pairs you must select. For
example, a nonreflective similarity transformation requires at least two control point pairs. A fourth
7-58
Geometric Transformation Types for Control Point Registration
order polynomial transformation requires 15 control point pairs. For more information about these
transformation types, and the special syntaxes they require, see cpselect.
See Also
fitgeotrans
Related Examples
• “Register Images with Projection Distortion Using Control Points” on page 7-80
More About
• “Control Point Registration” on page 7-56
• “Control Point Selection Procedure” on page 7-60
• “Approaches to Registering Images” on page 7-2
7-59
7 Image Registration
1 Start the tool on page 7-62, specifying the moving image and the fixed image.
2 Use navigation aids to explore the image on page 7-64, looking for visual elements that you can
identify in both images. cpselect provides many ways to navigate around the image. You can
pan and zoom to view areas of the image in more detail.
3 Specify matching control point pairs on page 7-68 in the moving image and the fixed image.
4 Save the control points on page 7-73 in the workspace.
The following figure shows the default appearance of the tool when you first start it.
7-60
Control Point Selection Procedure
See Also
cpselect
More About
• “Start the Control Point Selection Tool” on page 7-62
• “Control Point Registration” on page 7-56
7-61
7 Image Registration
The cpselect command has other optional arguments. You can import existing control points, so
that you can use the Control Point Selection Tool to modify, delete, or add to existing control points.
For example, you can restart a control point selection session by including a cpstruct structure as
the third argument. For more information about restarting sessions, see “Export Control Points to the
Workspace” on page 7-73.
For simplicity, this example uses the same image as the moving and the fixed image, and no prior
control points are imported. To walk through an example of an actual registration, see “Register
Images with Projection Distortion Using Control Points” on page 7-80.
moon_fixed = imread('moon.tif');
moon_moving = moon_fixed;
cpselect(moon_moving, moon_fixed);
When the Control Point Selection Tool starts, it contains three primary components:
• Details windows—The two windows displayed at the top of the tool are called the Detail
windows. These windows show a close-up view of a portion of the images you are working with.
The moving image is on the left and the fixed image is on the right.
• Overview windows—The two windows displayed at the bottom of the tool are called the Overview
windows. These windows show the images in their entirety, at the largest scale that fits the
window. The moving image is on the left and the fixed image is on the right. You can control
whether the Overview window appears by using the View menu.
• Details rectangle—Superimposed on the images displayed in the two Overview windows is a
rectangle, called the Detail rectangle. This rectangle controls the part of the image that is visible
in the Detail window. By default, at startup, the detail rectangle covers one quarter of the entire
image and is positioned over the center of the image. You can move the Detail rectangle to change
the portion of the image displayed in the Detail windows.
The following figure shows these components of the Control Point Selection Tool.
7-62
Start the Control Point Selection Tool
The next step is to use navigation aids to explore the image, looking for visual elements shared by
both images. For more information, see “Find Visual Elements Common to Both Images” on page 7-
64.
See Also
cpselect
More About
• “Find Visual Elements Common to Both Images” on page 7-64
• “Export Control Points to the Workspace” on page 7-73
• “Control Point Selection Procedure” on page 7-60
7-63
7 Image Registration
In this section...
“Use Scroll Bars to View Other Parts of an Image” on page 7-64
“Use the Detail Rectangle to Change the View” on page 7-64
“Pan the Image Displayed in the Detail Window” on page 7-64
“Zoom In and Out on an Image” on page 7-65
“Specify the Magnification of the Images” on page 7-65
“Lock the Relative Magnification of the Moving and Fixed Images” on page 7-66
As you scroll the image in the Detail window, note how the Detail rectangle moves over the image in
the Overview window. The position of the Detail rectangle always shows the portion of the image in
the Detail window.
1
Move the pointer into the Detail rectangle. The cursor changes to the fleur shape, .
2 Press and hold the mouse button to drag the detail rectangle anywhere on the image.
Note As you move the Detail rectangle over the image in the Overview window, the view of the image
displayed in the Detail window changes.
1
Click the Pan button in the Control Point Selection Tool toolbar or select Pan from the
Tools menu.
7-64
Find Visual Elements Common to Both Images
2 Move the pointer over the image in the Detail window. The cursor changes to the hand shape,
.
3
Press and hold the mouse button. The cursor changes to a closed fist shape, . Use the mouse
to move the image in the Detail window.
Note As you move the image in the Detail window, the Detail rectangle in the Overview window
moves.
2 Move the pointer over the image in the Detail window that you want to zoom in or out on. The
cursor changes to the appropriate magnifying glass shape, such as . Position the cursor over
a location in the image and click the mouse. With each click, the Control Point Selection Tool
changes the magnification of the image by a preset amount. (See “Specify the Magnification of
the Images” on page 7-65 for a list of some of these magnifications.) cpselect centers the new
view of the image on the spot where you clicked.
Another way to use the Zoom tool to zoom in on an image is to position the cursor over a location
in the image. While pressing and holding the mouse button, draw a rectangle defining the area
you want to zoom in on. The Control Point Selection Tool magnifies the image so that the chosen
section fills the Detail window. The tool resizes the detail rectangle in the Overview window as
well.
The size of the Detail rectangle in the Overview window changes as you zoom in or out on the
image in the Detail window.
To keep the relative magnifications of the fixed and moving images synchronized as you zoom in
or out, click the Lock ratio check box. See “Lock the Relative Magnification of the Moving and
Fixed Images” on page 7-66 for more information.
7-65
7 Image Registration
1 Move the cursor into the magnification edit box of the window you want to change. The cursor
changes to the text entry cursor.
2 Type a new value in the magnification edit box and press Enter, or click the menu associated
with the edit box and choose from a list of preset magnifications. The Control Point Selection Tool
changes the magnification of the image and displays the new view in the appropriate window. To
keep the relative magnifications of the fixed and moving images synchronized as you change the
magnification, click the Lock ratio check box. See “Lock the Relative Magnification of the
Moving and Fixed Images” on page 7-66 for more information.
When the Lock Ratio check box is selected, the Control Point Selection Tool changes the
magnification of both the moving and fixed images when you zoom in or out on either one of the
images on page 7-65 or specify a magnification value on page 7-65 for either of the images.
The next step is to specify matching control point pairs. For more information, see “Select Matching
Control Point Pairs” on page 7-68.
7-66
Find Visual Elements Common to Both Images
See Also
More About
• “Start the Control Point Selection Tool” on page 7-62
• “Select Matching Control Point Pairs” on page 7-68
• “Control Point Selection Procedure” on page 7-60
7-67
7 Image Registration
You specify control points by pointing and clicking in the moving and fixed images, in either the Detail
or the Overview windows. Each point you specify in the moving image must have a match in the fixed
image. The following sections describe the ways you can use the Control Point Selection Tool to
choose control point pairs:
In this section...
“Pick Control Point Pairs Manually” on page 7-68
“Use Control Point Prediction” on page 7-69
“Move Control Points” on page 7-71
“Delete Control Points” on page 7-71
1
Click the Control Point Selection button in the Control Point Selection Tool toolbar or select
Add Points from the Tools menu. Control point selection mode is active by default. The cursor
click the mouse button. cpselect places a control point symbol, , at the position you
specified, in both the Detail window and the corresponding Overview window. cpselect
numbers the points as you select them. The appearance of the control point symbol indicates its
current state. The circle around the point indicates that it is the currently selected point. The
number identifies control point pairs.
Note Depending on where in the image you select control points, the symbol for the point may
be visible in the Overview window, but not in the Detail window.
3 You can select another point in the same image or you can move to the corresponding image and
create a match for the point. To create the match for this control point, position the cursor over
the same feature in the corresponding Detail or Overview window and click the mouse button.
cpselect places a control point symbol at the position you specified, in both the Detail and
Overview windows. You can work in either direction: picking control points in either of the Detail
windows, moving or fixed, or in either of the Overview windows, moving or fixed.
To match an unmatched control point, select it, and then pick a point in the corresponding window.
You can move on page 7-71 or delete on page 7-71 control points after you create them.
7-68
Select Matching Control Point Pairs
To illustrate point prediction, this figure shows four control points selected in the moving image,
where the points form the four corners of a square. (The control point selections in the figure do not
attempt to identify any landmarks in the image.) The figure shows the picking of a fourth point, in the
left window, and the corresponding predicted point in the right window. Note how the Control Point
Selection Tool places the predicted point at the same location relative to the other control points,
forming the bottom right corner of the square.
7-69
7 Image Registration
Note By default, the Control Point Selection Tool does not include predicted points in the set of valid
control points returned in movingPoints or fixedPoints. To include predicted points, you must
accept them by selecting the points and fine-tuning their position with the cursor. When you move a
predicted point, the Control Point Selection Tool changes the symbol to indicate that it has changed
to a standard control point. For more information, see “Move Control Points” on page 7-71.
7-70
Select Matching Control Point Pairs
1
Click the Control Point Prediction button .
Note The Control Point Selection Tool predicts control point locations based on the locations of
previous control points. You cannot use point prediction until you have a minimum of two pairs of
matched points. Until this minimum is met, the Control Point Prediction button is disabled.
2 Position the cursor anywhere in any of the images displayed. The cursor changes to the cross-
hairs shape, .
You can pick control points in either of the Detail windows, moving or fixed, or in either of the
Overview windows, moving or fixed. You also can work in either direction: moving-to-fixed image
or fixed-to-moving image.
3 Click either mouse button. The Control Point Selection Tool places a control point symbol at the
position you specified and places another control point symbol for a matching point in all the
other windows. The symbol for the predicted point contains the letter P, indicating that it is
a predicted control point.
4 To accept a predicted point, select it with the cursor and move it. The Control Point Selection
Tool removes the P from the point.
1
Click the Control Point Selection button .
2 Position the cursor over the control point you want to move. The cursor changes to the fleur
shape,
3 Press and hold the mouse button and drag the control point. The state of the control point
changes to selected when you move it.
If you move a predicted control point, the state of the control point changes to a regular
(nonpredicted) control point.
1
Click the Control Point Selection button .
2 Click the control point you want to delete. Its state changes to selected. If the control point has a
match, both points become active.
3 Delete the point (or points) using one of these methods:
Using this menu, you can delete individual points or pairs of matched points, in the moving or
fixed images.
7-71
7 Image Registration
See Also
More About
• “Find Visual Elements Common to Both Images” on page 7-64
• “Export Control Points to the Workspace” on page 7-73
• “Control Point Selection Procedure” on page 7-60
7-72
Export Control Points to the Workspace
To save control points to the workspace, select File on the Control Point Selection Tool menu bar,
then choose the Export Points to Workspace option. The Control Point Selection Tool displays this
dialog box:
By default, the Control Point Selection Tool saves the coordinates of valid control points. The Control
Point Selection Tool does not include unmatched and predicted points in the movingPoints and
fixedPoints arrays. The arrays are n-by-2 arrays, where n is the number of valid control point pairs
you selected. The two columns represent the x- and y-coordinates of the control points, respectively,
in the intrinsic coordinate system of the image.
This example shows the movingPoints array containing four valid pairs of control points.
movingPoints =
215.6667 262.3333
225.7778 311.3333
156.5556 340.1111
270.8889 368.8889
To save the current state of the Control Point Selection Tool, including unpaired and predicted control
points, select the Structure with all points check box.
This option saves the positions of all control points and their current state in a cpstruct structure.
7-73
7 Image Registration
cpstruct =
You can use the cpstruct to restart a control point selection session at the point where you left off.
This option is useful if you are picking many points over a long time and want to preserve unmatched
and predicted points when you resume work.
To extract the arrays of valid control point coordinates from a cpstruct, use the cpstruct2pairs
function.
The Control Point Selection Tool also asks if you want to save your control points when you exit the
tool.
See Also
cpselect | cpstruct2pairs | fitgeotrans
More About
• “Select Matching Control Point Pairs” on page 7-68
• “Control Point Selection Procedure” on page 7-60
• “Image Coordinate Systems” on page 2-3
7-74
Find Image Rotation and Scale
original = imread('cameraman.tif');
imshow(original)
text(size(original,2),size(original,1)+15, ...
'Image courtesy of Massachusetts Institute of Technology', ...
'FontSize',7,'HorizontalAlignment','right')
scale = 0.7;
distorted = imresize(original,scale); % Try varying the scale factor.
theta = 30;
distorted = imrotate(distorted,theta); % Try varying the angle, theta.
imshow(distorted)
7-75
7 Image Registration
Use the Control Point Selection Tool to pick at least two pairs of control points.
You can run the rest of the example with these pre-picked points, but try picking your own points to
see how the results vary.
cpselect(distorted,original,movingPoints,fixedPoints);
Save control points by choosing the File menu, then the Save Points to Workspace option. Save the
points, overwriting variables movingPoints and fixedPoints.
tform = fitgeotrans(movingPoints,fixedPoints,'nonreflectivesimilarity');
After you have done Steps 5 and 6, repeat Steps 4 through 6 but try using 'affine' instead of
'NonreflectiveSimilarity'. What happens? Are the results as good as they were with
'NonreflectiveSimilarity'?
The geometric transformation, tform, contains a transformation matrix in tform.T. Since you know
that the transformation includes only rotation and scaling, the math is relatively simple to recover the
scale and angle.
7-76
Find Image Rotation and Scale
sc −ss 0
Then, Tinv = invert(tform), and Tinv.T = ss sc 0
tx ty 1
tformInv = invert(tform);
Tinv = tformInv.T;
ss = Tinv(2,1);
sc = Tinv(1,1);
scale_recovered = sqrt(ss*ss + sc*sc)
scale_recovered = 0.7000
theta_recovered = atan2(ss,sc)*180/pi
theta_recovered = 29.3741
The recovered values of scale_recovered and theta_recovered should match the values you set
in Step 2: Resize and Rotate the Image.
Recover the original image by transforming distorted, the rotated-and-scaled image, using the
geometric transformation tform and what you know about the spatial referencing of original. The
'OutputView' name-value pair is used to specify the resolution and grid size of the resampled output
image.
Roriginal = imref2d(size(original));
recovered = imwarp(distorted,tform,'OutputView',Roriginal);
montage({original,recovered})
7-77
7 Image Registration
The recovered (right) image quality does not match the original (left) image because of the
distortion and recovery process. In particular, the image shrinking causes loss of information. The
artifacts around the edges are due to the limited accuracy of the transformation. If you were to pick
more points in Step 3: Select Control Points, the transformation would be more accurate.
See Also
cpselect | fitgeotrans | imref2d | imresize | imrotate | imwarp
More About
• “Select Matching Control Point Pairs” on page 7-68
• “Control Point Selection Procedure” on page 7-60
• “Find Image Rotation and Scale Using Automated Feature Matching” (Computer Vision Toolbox)
7-78
Use Cross-Correlation to Improve Control Point Placement
To use cross-correlation, pass sets of control points in the moving and fixed images, along with the
images themselves, to the cpcorr function.
moving_pts_adj = cpcorr(movingPoints,fixedPoints,moving,fixed);
The cpcorr function defines 11-by-11 pixel regions around each control point in the moving image
and around the matching control point in the fixed image. The function then calculates the
correlation between the values at each pixel in the region. Next, the cpcorr function finds the
position with the highest correlation value and uses it as the optimal position of the control point. The
function only moves control points up to four pixels based on the results of the cross-correlation.
Note Features in the two images must be at the same scale and have the same orientation. They
cannot be rotated relative to each other.
If cpcorr cannot correlate some of the control points, it returns their unmodified values in
movingPoints.
See Also
cpcorr | cpselect | cpstruct2pairs
More About
• “Select Matching Control Point Pairs” on page 7-68
• “Control Point Selection Procedure” on page 7-60
7-79
7 Image Registration
Read Images
Read the image westconcordorthophoto.png into the workspace. This image is an orthophoto
that has already been registered to the ground.
ortho = imread('westconcordorthophoto.png');
imshow(ortho)
text(size(ortho,2),size(ortho,1)+15, ...
'Image courtesy of Massachusetts Executive Office of Environmental Affairs', ...
'FontSize',7,'HorizontalAlignment','right');
Read the image westconcordaerial.png into the workspace. This image was taken from an
airplane and is distorted relative to the orthophoto. Because the unregistered image was taken from a
distance and the topography is relatively flat, it is likely that most of the distortion is projective.
unregistered = imread('westconcordaerial.png');
imshow(unregistered)
7-80
Register Images with Projection Distortion Using Control Points
text(size(unregistered,2),size(unregistered,1)+15, ...
'Image courtesy of mPower3/Emerge', ...
'FontSize',7,'HorizontalAlignment','right');
To select control points interactively, open the Control Point Selection tool by using the cpselect
function. Control points are landmarks that you can find in both images, such as a road intersection
or a natural feature. Select at least four pairs of control points so that cpselect can fit a projective
transformation to the control points. After you have selected corresponding moving and fixed points,
close the tool to return to the workspace.
[mp,fp] = cpselect(unregistered,ortho,'Wait',true);
7-81
7 Image Registration
Find the parameters of the projective transformation that best aligns the moving and fixed points by
using the fitgeotrans function.
t = fitgeotrans(mp,fp,'projective');
To apply the transformation to the unregistered aerial image, use the imwarp function. Specify that
the size and position of the transformed image match the size and position of the ortho image by
using the OutputView name-value pair argument.
Rfixed = imref2d(size(ortho));
registered = imwarp(unregistered,t,'OutputView',Rfixed);
See the result of the registration by overlaying the transformed image over the original orthophoto.
imshowpair(ortho,registered,'blend')
7-82
Register Images with Projection Distortion Using Control Points
See Also
cpcorr | cpselect | cpstruct2pairs | fitgeotrans
More About
• “Select Matching Control Point Pairs” on page 7-68
• “Control Point Selection Procedure” on page 7-60
7-83
8
The Image Processing Toolbox software provides a number of functions for designing and
implementing two-dimensional linear filters for image data. This chapter describes these functions
and how to use them effectively.
Filtering is a neighborhood operation, in which the value of any given pixel in the output image is
determined by applying some algorithm to the values of the pixels in the neighborhood of the
corresponding input pixel. A pixel's neighborhood is some set of pixels, defined by their locations
relative to that pixel. (See “Neighborhood or Block Processing: An Overview” on page 17-2 for a
general discussion of neighborhood operations.) Linear filtering is filtering in which the value of an
output pixel is a linear combination of the values of the pixels in the input pixel's neighborhood.
Convolution
Linear filtering of an image is accomplished through an operation called convolution. Convolution is a
neighborhood operation in which each output pixel is the weighted sum of neighboring input pixels.
The matrix of weights is called the convolution kernel, also known as the filter. A convolution kernel
is a correlation kernel that has been rotated 180 degrees.
A = [17 24 1 8 15
23 5 7 14 16
4 6 13 20 22
10 12 19 21 3
11 18 25 2 9]
h = [8 1 6
3 5 7
4 9 2]
You would use the following steps to compute the output pixel at position (2,4):
1 Rotate the correlation kernel 180 degrees about its center element to create a convolution
kernel.
2 Slide the center element of the convolution kernel so that it lies on top of the (2,4) element of A.
3 Multiply each weight in the rotated convolution kernel by the pixel of A underneath.
4 Sum the individual products from step 3.
8-2
What Is Image Filtering in the Spatial Domain?
Correlation
The operation called correlation is closely related to convolution. In correlation, the value of an
output pixel is also computed as a weighted sum of neighboring pixels. The difference is that the
matrix of weights, in this case called the correlation kernel, is not rotated during the computation.
The Image Processing Toolbox filter design functions return correlation kernels.
The following figure shows how to compute the (2,4) output pixel of the correlation of A, assuming h
is a correlation kernel instead of a convolution kernel, using these steps:
1 Slide the center element of the correlation kernel so that lies on top of the (2,4) element of A.
2 Multiply each weight in the correlation kernel by the pixel of A underneath.
3 Sum the individual products.
See Also
conv2 | convn | imfilter
8-3
8 Designing and Implementing Linear Filters for Image Data
Related Examples
• “Design Linear Filters in the Frequency Domain” on page 8-46
• “Noise Removal” on page 11-124
8-4
Integral Image
Integral Image
In an integral image, every pixel is the summation of the pixels above and to the left of it.
To illustrate, the following shows and image and its corresponding integral image. The integral image
is padded to the left and the top to allow for the calculation. The pixel value at (2,1) in the original
image becomes the pixel value (3,2) in the integral image after adding the pixel value above it (2+1)
and to the left (3+0). Similarly, the pixel at (2,2) in the original image with the value 4 becomes the
pixel at (3,3) in the integral image with the value 12 after adding the pixel value above it (4+5) and
adding the pixel to the left of it ((9+3).
Using an integral image, you can rapidly calculate summations over image subregions. Integral
images facilitate summation of pixels and can be performed in constant time, regardless of the
neighborhood size. The following figure illustrates the summation of a subregion of an image, you can
use the corresponding region of its integral image. For example, in the input image below, the
summation of the shaded region becomes a simple calculation using four reference values of the
rectangular region in the corresponding integral image. The calculation becomes, 46 – 22 – 20 + 10 =
14. The calculation subtracts the regions above and to the left of the shaded region. The area of
overlap is added back to compensate for the double subtraction.
In this way, you can calculate summations in rectangular regions rapidly, irrespective of the filter size.
Use of integral images was popularized by the Viola-Jones algorithm. To see the full citation for this
algorithm and learn how to create an integral image, see integralImage.
8-5
8 Designing and Implementing Linear Filters for Image Data
See Also
integralBoxFilter | integralBoxFilter3 | integralImage | integralImage3
Related Examples
• “Apply Multiple Filters to Integral Image” on page 8-33
8-6
Filter Grayscale and Truecolor (RGB) Images using imfilter Function
There are several MATLAB® functions that perform 2-D and multidimensional filtering that can be
compared to imfilter. The function filter2 performs two-dimensional correlation, conv2
performs two-dimensional convolution, and convn performs multidimensional convolution. However,
each of these filtering functions always converts the input to double, and the output is always
double. Also, these MATLAB® filtering functions always assume the input is zero padded, and they
do not support other padding options. In contrast, imfilter does not convert input images to
double. The imfilter function also offers a flexible set of boundary padding options.
I = imread('coins.png');
figure
imshow(I)
title('Original Image')
8-7
8 Designing and Implementing Linear Filters for Image Data
h = ones(5,5)/25;
I2 = imfilter(I,h);
figure
imshow(I2)
title('Filtered Image')
rgb = imread('peppers.png');
imshow(rgb);
8-8
Filter Grayscale and Truecolor (RGB) Images using imfilter Function
Create a filter. This averaging filter contains equal weights, and causes the filtered image to look
more blurry than the original.
h = ones(5,5)/25;
rgb2 = imfilter(rgb,h);
figure
imshow(rgb2)
8-9
8 Designing and Implementing Linear Filters for Image Data
See Also
imfilter
More About
• “What Is Image Filtering in the Spatial Domain?” on page 8-2
8-10
imfilter Boundary Padding Options
The imfilter function normally fills in these off-the-edge image pixels by assuming that they are 0.
This is called zero padding and is illustrated in the following figure.
When you filter an image, zero padding can result in a dark band around the edge of the image, as
shown in this example.
I = imread('eight.tif');
h = ones(5,5) / 25;
I2 = imfilter(I,h);
8-11
8 Designing and Implementing Linear Filters for Image Data
To eliminate the zero-padding artifacts around the edge of the image, imfilter offers an alternative
boundary padding method called border replication. In border replication, the value of any pixel
outside the image is determined by replicating the value from the nearest border pixel. This is
illustrated in the following figure.
To filter using border replication, pass the additional optional argument 'replicate' to imfilter.
I3 = imfilter(I,h,'replicate');
figure, imshow(I3);
title('Filtered Image with Border Replication')
8-12
imfilter Boundary Padding Options
The imfilter function supports other boundary padding options, such as 'circular' and
'symmetric'. See the reference page for imfilter for details.
See Also
imfilter
Related Examples
• “Design Linear Filters in the Frequency Domain” on page 8-46
8-13
8 Designing and Implementing Linear Filters for Image Data
8-14
Filter Images Using Predefined Filter
Create a 7-by-7 LoG filter with a standard deviation of 0.4 using fspecial.
h = fspecial('log',7,0.4)
h = 7×7
I2 = imfilter(I,h);
imshow(I2)
8-15
8 Designing and Implementing Linear Filters for Image Data
See Also
fspecial | imfilter
Related Examples
• “Design Linear Filters in the Frequency Domain” on page 8-46
8-16
Generate HDL Code for Image Sharpening
Vision HDL Toolbox provides image and video processing algorithms designed to generate readable,
synthesizable code in VHDL and Verilog (with HDL Coder™). The generated HDL code when run on
an FPGA (for example, Xilinx XC7Z045) can process 1920x1080 full-resolution images at 60 frames
per second.
This example shows how to use Vision HDL Toolbox to generate HDL code that sharpens a blurred
image. Since Vision HDL Toolbox algorithms are available as MATLAB® System objects™ and
Simulink® blocks, HDL code can be generated from MATLAB or Simulink. This example shows both
workflows.
2. Replicate the design using algorithms, interfaces, and data types appropriate for FPGAs and
supported for HDL code generation.
3. Simulate the two designs and compare the results to confirm that the HDL-optimized design meets
the goals.
For Steps 2 and 3 in MATLAB, you must have MATLAB, Vision HDL Toolbox, and Fixed-Point
Designer™. In Simulink, you need Simulink, Vision HDL Toolbox, and Fixed-Point Designer. In both
cases, you must have HDL Coder to generate HDL code.
Behavioral Model
The input image imgBlur is shown on the left in the diagram below. On the right, the image is
sharpened using the Image Processing Toolbox™ function imfilter.
imgBlur = imread('riceblurred.png');
sharpCoeff = [0 0 0;0 1 0;0 0 0]-fspecial('laplacian',0.2);
f = @() imfilter(imgBlur,sharpCoeff,'symmetric');
fprintf('Elapsed time is %.6f seconds.\n',timeit(f));
imgSharp = imfilter(imgBlur,sharpCoeff,'symmetric');
figure
imshowpair(imgBlur,imgSharp,'montage')
title('Blurred Image and Sharpened Image')
8-17
8 Designing and Implementing Linear Filters for Image Data
• Use HDL-friendly algorithms: The functions in Image Processing Toolbox do not support HDL
Code generation. Vision HDL Toolbox provides image and video processing algorithms designed
for efficient HDL implementations. You can generate HDL code from these algorithms using
“Functions” (Vision HDL Toolbox) and “Blocks” (Vision HDL Toolbox). Both workflows are
provided in this example. To design an FPGA-based module, replace the functions from Image
Processing Toolbox with their HDL-friendly counterparts from Vision HDL Toolbox. This example
replaces imfilter in the behavioral model with the visionhdl.ImageFilter System object in
MATLAB, or the Image Filter block in Simulink.
• Use streaming pixel interface: The functions from Image Processing Toolbox model at a high
level of abstraction. They perform full-frame processing, operating on one image frame at a time.
FPGA and ASIC implementations, however, perform pixel-stream processing, operating on one
image pixel at a time. Vision HDL Toolbox blocks and System objects use a streaming pixel
interface. Use visionhdl.FrameToPixels System object in MATLAB or Frame To Pixels
block in Simulink to convert a full frame image or video to a pixel stream. The streaming pixel
interface includes control signals that indicate each pixel's position in the frame. Algorithms that
operate on a pixel neighborhood use internal memory to store a minimum number of lines. Vision
HDL Toolbox provides the streaming pixel interface and automatic memory implementation to
address common design issues when targeting FPGAs and ASICs. For more information on the
streaming pixel protocol used by System objects from the Vision HDL Toolbox, see “Streaming
Pixel Interface” (Vision HDL Toolbox).
• Use fixed-point data representation: Functions from Image Processing Toolbox perform video
processing algorithms in the floating-point or integer domain. The System objects and blocks from
Vision HDL Toolbox require fixed-point data to generate HDL code to target FPGAs and ASICs.
Converting a design to fixed-point can introduce quantization error. Therefore, the HDL-friendly
model might generate an output slightly different from that obtained from the behavioral model.
For most applications, small quantization errors within a tolerance are acceptable. You can tune
the fixed-point settings to suit your requirements.
8-18
Generate HDL Code for Image Sharpening
In this example, we use a static image as the source. This model is also able to process continuous
video input.
To generate HDL from MATLAB, your code needs to be divided into two files: test bench and design.
The design file is used for implementing the algorithm in the FPGA or ASIC. The test bench file
provides the input data to the design file and receives the design output.
In this example, the design contains a System object visionhdl.ImageFilter. It is the HDL-
friendly counterpart of the imfilter function. Configure it with the same coefficients and padding
method as imfilter.
%#codegen
persistent sharpeningFilter;
if isempty(sharpeningFilter)
sharpCoeff = [0 0 0;0 1 0;0 0 0]-fspecial('laplacian',0.2);
sharpeningFilter = visionhdl.ImageFilter(...
'Coefficients',sharpCoeff,...
'PaddingMethod','Symmetric',...
'CoefficientsDataType','Custom',...
'CustomCoefficientsDataType',numerictype(1,16,12));
end
[pixOut,ctrlOut] = step(sharpeningFilter,pixIn,ctrlIn);
The test bench ImageSharpeningHDLTestBench.m reads in the blurred image. The frm2pix object
converts the full image frame to a stream of pixels and control structures. The test bench calls the
design function ImageSharpeningHDLDesign to process one pixel at a time. After the entire pixel-
stream is processed, pix2frm converts the output pixel stream to a full-frame image. The test bench
compares the output image to the reference output imgSharp.
...
[pixInVec,ctrlInVec] = step(frm2pix,imgBlur);
for p = 1:numPixPerFrm
[pixOutVec(p),ctrlOutVec(p)] = ImageSharpeningHDLDesign(pixInVec(p),ctrlInVec(p));
end
imgOut = step(pix2frm,pixOutVec,ctrlOutVec);
8-19
8 Designing and Implementing Linear Filters for Image Data
Simulate the design with the test bench prior to HDL code generation to make sure there are no
runtime errors.
ImageSharpeningHDLTestBench
The test bench displays the comparison result and the time spent on simulation. Due to quantization
error and rounding error, out of a total of 256*256=65536 pixels, 38554 of imgOut are different from
imgSharp. However, the maximum difference in intensity is 1. On a 0 to 255 scale, this difference is
visually unnoticeable.
As we can see by comparing the simulation time in MATLAB with that of the behavioral model, the
pixel-streaming protocol introduces significant overhead. You can use MATLAB Coder™ to speed up
the pixel-streaming simulation in MATLAB. See “Accelerate a Pixel-Streaming Design Using MATLAB
Coder” (Vision HDL Toolbox).
Once you are satisfied with the results of the FPGA-targeted model, you can use HDL Coder to
generate HDL code from the design. You can run the generated HDL code in HDL simulators or load
it into an FPGA and run it in a physical system.
Make sure that the design and test bench files are located in the same writable directory. To generate
the HDL code, use the following command:
hdlcfg = coder.config('hdl');
hdlcfg.TestBenchName = 'ImageSharpeningHDLTestBench';
hdlcfg.TargetLanguage = 'Verilog';
hdlcfg.GenerateHDLTestBench = false;
codegen -config hdlcfg ImageSharpeningHDLDesign
For more detail on how to create and configure MATLAB to HDL projects, see the "Getting Started
with MATLAB to HDL Workflow" tutorial in the HDL Coder documentation.
modelname = 'ImageSharpeningHDLModel';
open_system(modelname);
set_param(modelname,'Open','on');
8-20
Generate HDL Code for Image Sharpening
The model reads in the blurred image. The Frame To Pixels block converts a full-frame image to a
pixel stream, and the Pixels To Frame block converts the pixel stream back to a full-frame image. The
Image Sharpening HDL System contains an Image Filter block, which is the HDL-friendly counterpart
in Vision HDL Toolbox of the imfilter function presented in the behavioral model.
set_param(modelname,'Open','off');
set_param([modelname '/Image Sharpening HDL System'],'Open','on');
Configure the Image Filter block with the same sharpening coefficients and padding method as in the
behavioral model, as shown on the masks below.
8-21
8 Designing and Implementing Linear Filters for Image Data
8-22
Generate HDL Code for Image Sharpening
Simulink takes advantage of C code generation to speed up the simulation. Therefore, it is much
faster than MATLAB simulation, although still slower than the behavioral model.
The simulation creates a new variable called imgOut in the workspace. Use the following commands
to compare imgOut with imSharp generated from the behavioral model.
imgDiff = imabsdiff(imgSharp,imgOut);
fprintf('The maximum difference between corresponding pixels is %d.\n',max(imgDiff(:)));
fprintf('A total of %d pixels are different.\n',nnz(imgDiff));
Due to quantization error and rounding error, out of a total of 256*256=65536 pixels, 38554 of
imgOut are different from imgSharp. However, the maximum difference in intensity is 1. On a 0 to
255 scale, this difference is visually unnoticeable. (This is the same explanation as that presented in
Step 3 in the "Generate HDL Code from MATLAB" Section.)
Once you are satisfied with the results of the FPGA-targeted model, you can use HDL Coder to
generate HDL code from the design. You can run the generated HDL code in HDL simulators or load
it into an FPGA and run it in a physical system.
Generate HDL code from the Image Sharpening HDL System using the following command:
8-23
8 Designing and Implementing Linear Filters for Image Data
If the guidance is the same as the image to be filtered, the structures are the same—an edge in
original image is the same in the guidance image. If the guidance image is different, structures in the
guidance image will impact the filtered image, in effect, imprinting these structures on the original
image. This effect is called structure transference.
See Also
imguidedfilter
Related Examples
• “Perform Flash/No-flash Denoising with Guided Filter” on page 8-25
8-24
Perform Flash/No-flash Denoising with Guided Filter
Read the image that you want to filter into the workspace. This example uses an image of some toys
taken without a flash. Because of the low light conditions, the image contains a lot of noise.
A = imread('toysnoflash.png');
figure;
imshow(A);
title('Input Image - Camera Flash Off')
Read the image that you want to use as the guidance image into the workspace. In this example, the
guidance image is a picture of the same scene taken with a flash.
G = imread('toysflash.png');
figure;
imshow(G);
title('Guidance Image - Camera Flash On')
8-25
8 Designing and Implementing Linear Filters for Image Data
Perform the guided filtering operation. Using the imguidedfilter function, you can specify the size
of the neighborhood used for filtering. The default is a 5-by-5 square. This example uses a 3-by-3
neighborhood. You can also specify the amount of smoothing performed by the filter. The value can be
any positive number. One way to approach this is to use the default first and view the results. If you
want less smoothing and more edge preservation, use a lower value for this parameter. For more
smoothing, use a higher value. This example sets the value of the smoothing parameter.
nhoodSize = 3;
smoothValue = 0.001*diff(getrangefromclass(G)).^2;
B = imguidedfilter(A, G, 'NeighborhoodSize',nhoodSize, 'DegreeOfSmoothing',smoothValue);
figure, imshow(B), title('Filtered Image')
8-26
Perform Flash/No-flash Denoising with Guided Filter
Examine a close up of an area of the original image and compare it to the filtered image to see the
effect of this edge-preserving smoothing filter.
figure;
h1 = subplot(1,2,1);
imshow(A), title('Region in Original Image'), axis on
h2 = subplot(1,2,2);
imshow(B), title('Region in Filtered Image'), axis on
linkaxes([h1 h2])
xlim([520 660])
ylim([150 250])
8-27
8 Designing and Implementing Linear Filters for Image Data
See Also
imguidedfilter
More About
• “What is Guided Image Filtering?” on page 8-24
8-28
Segment Thermographic Image after Edge-Preserving Filtering
Read a thermal image into the workspace and use whos to understand more about the image data.
I = imread('hotcoffee.tif');
whos I
Compute the dynamic range occupied by the data to see the range of temperatures occupied by the
image. The pixel values in this image correspond to actual temperatures on the Celsius scale.
range = [min(I(:)) max(I(:))]
22.4729 77.3727
Display the thermal image. Because the thermal image is a single-precision image with a dynamic
range outside 0 to 1, you must use the imshow auto-scaling capability to display the image.
figure
imshow(I,[])
colormap(gca,hot)
title('Original image')
8-29
8 Designing and Implementing Linear Filters for Image Data
Apply edge-preserving smoothing to the image to remove noise while still retaining image details.
This is a preprocessing step before segmentation. Use the imguidedfilter function to perform
smoothing under self-guidance. The 'DegreeOfSmoothing' parameter controls the amount of
smoothing and is dependent on the range of the image. Adjust the 'DegreeOfSmoothing' to
accommodate the range of the thermographic image. Display the filtered image.
smoothValue = 0.01*diff(range).^2;
J = imguidedfilter(I,'DegreeOfSmoothing',smoothValue);
figure
imshow(J,[])
colormap(gca,hot)
title('Guided filtered image')
Determine threshold values to use in segmentation. The image has 3 distinct regions - the person, the
hot object and the background - that appear well separated in intensity (temperature). Use
multithresh to compute a 2-level threshold for the image. This partitions the image into 3 regions
using Otsu's method.
thresh = multithresh(J,2)
27.0018 47.8220
Threshold the image using the values returned by multithresh. The threshold values are at 27 and
48 Celsius. The first threshold separates the background intensity from the person and the second
threshold separates the person from the hot object. Segment the image and fill holes.
L = imquantize(J,thresh);
L = imfill(L);
8-30
Segment Thermographic Image after Edge-Preserving Filtering
figure
imshow(label2rgb(L))
title('Label matrix from 3-level Otsu')
Draw a bounding box around the foreground regions in the image and put the mean temperature
value of the region in the box. The example assumes that the largest region is the background. Use
the regionprops function to get information about the regions in the segmented image.
props = regionprops(L,I,{'Area','BoundingBox','MeanIntensity','Centroid'});
figure
imshow(I,[])
colormap(gca,hot)
title('Segmented regions with mean temperature')
for n = 1:numel(props)
% If the region is not background
if n ~= idx
% Draw bounding box around region
rectangle('Position',props(n).BoundingBox,'EdgeColor','c')
8-31
8 Designing and Implementing Linear Filters for Image Data
See Also
imfill | imguidedfilter | imquantize | multithresh
More About
• “What is Guided Image Filtering?” on page 8-24
8-32
Apply Multiple Filters to Integral Image
originalImage = imread('cameraman.tif');
figure
imshow(originalImage)
title('Original Image')
Pad the image to accommodate the size of the largest box filter. Pad each dimension by an amount
equal to half the size of the largest filter. Note the use of replicate-style padding to help reduce
boundary artifacts.
maxFilterSize = max(filterSizes);
padSize = (maxFilterSize - 1)/2;
paddedImage = padarray(originalImage,padSize,'replicate','both');
Compute the integral image representation of the padded image using the integralImage function
and display it. The integral image is monotonically non-decreasing from left to right and top to
bottom. Each pixel represents the sum of all pixel intensities to the top and left of the current pixel in
the image.
8-33
8 Designing and Implementing Linear Filters for Image Data
intImage = integralImage(paddedImage);
figure
imshow(intImage,[])
title('Integral Image Representation')
Apply three box filters of varying sizes to the integral image. The integralBoxFilter function can
be used to apply a 2-D box filter to the integral image representation of an image.
The integralBoxFilter function returns only parts of the filtering that are computed without
padding. Filtering the same integral image with different sized box filters results in different sized
outputs. This is similar to the 'valid' option in the conv2 function.
whos filteredImage*
Because the image was padded to accommodate the largest box filter prior to computing the integral
image, no image content is lost. filteredImage1 and filteredImage2 have additional padding
that can be cropped.
8-34
Apply Multiple Filters to Integral Image
1+extraPadding1(2):end-extraPadding1(2) );
figure
imshow(filteredImage1,[])
title('Image filtered with [7 7] box filter')
figure
imshow(filteredImage2,[])
title('Image filtered with [11 11] box filter')
8-35
8 Designing and Implementing Linear Filters for Image Data
figure
imshow(filteredImage3,[])
title('Image filtered with [15 15] box filter')
8-36
Apply Multiple Filters to Integral Image
See Also
integralBoxFilter | integralBoxFilter3 | integralImage | integralImage3
More About
• “Integral Image” on page 8-5
8-37
8 Designing and Implementing Linear Filters for Image Data
originalImage = imread('yellowlily.jpg');
originalImage = rgb2gray(originalImage);
imshow(originalImage)
8-38
Reduce Noise in Image Gradients
To simulate noise for this example, add some Gaussian noise to the image.
8-39
8 Designing and Implementing Linear Filters for Image Data
noisyImage = imnoise(originalImage,'gaussian');
imshow(noisyImage)
8-40
Reduce Noise in Image Gradients
Compute the magnitude of the gradient by using the imgradient and imgradientxy functions.
imgradient finds the gradient magnitude and direction, and imgradientxy finds directional image
gradients.
sobelGradient = imgradient(noisyImage);
imshow(sobelGradient,[])
title('Sobel Gradient Magnitude')
8-41
8 Designing and Implementing Linear Filters for Image Data
8-42
Reduce Noise in Image Gradients
Looking at the gradient magnitude image, it is clear that the image gradient is very noisy. The effect
of noise can be minimized by smoothing before gradient computation. imgradient already offers
this capability for small amounts of noise by using the Sobel gradient operator. The Sobel gradient
operators are 3x3 filters as shown below. They can be generated using the fspecial function.
hy = -fspecial('sobel')
hy = 3×3
-1 -2 -1
0 0 0
1 2 1
hx = hy'
hx = 3×3
-1 0 1
-2 0 2
-1 0 1
The hy filter computes a gradient along the vertical direction while smoothing in the horizontal
direction. hx smooths in the vertical direction and computes a gradient along the horizontal direction.
The 'Prewitt' and 'Roberts' method options also provide this capability.
Even with the use of Sobel, Roberts or Prewitts gradient operators, the image gradient may be too
noisy. To overcome this, smooth the image using a Gaussian smoothing filter before computing image
gradients. Use the imgaussfilt function to smooth the image. The standard deviation of the
Gaussian filter varies the extent of smoothing. Since smoothing is taken care of by Gaussian filtering,
the central or intermediate differencing gradient operators can be used.
sigma = 2;
smoothImage = imgaussfilt(noisyImage,sigma);
smoothGradient = imgradient(smoothImage,'CentralDifference');
imshow(smoothGradient,[])
title('Smoothed Gradient Magnitude')
8-43
8 Designing and Implementing Linear Filters for Image Data
8-44
Reduce Noise in Image Gradients
See Also
fspecial | imgaussfilt | imgradient | imnoise
More About
• “What Is Image Filtering in the Spatial Domain?” on page 8-2
8-45
8 Designing and Implementing Linear Filters for Image Data
This topic describes functions that perform filtering in the frequency domain. For information about
designing filters in the spatial domain, see “What Is Image Filtering in the Spatial Domain?” on page
8-2.
FIR filters have several characteristics that make them ideal for image processing in the MATLAB
environment:
Another class of filter, the infinite impulse response (IIR) filter, is not as suitable for image processing
applications. It lacks the inherent stability and ease of design and implementation of the FIR filter.
Therefore, this toolbox does not provide IIR filter support.
Note Most of the design methods described in this section work by creating a two-dimensional filter
from a one-dimensional filter or window created using Signal Processing Toolbox functions. Although
this toolbox is not required, you might find it difficult to design filters if you do not have the Signal
Processing Toolbox software.
8-46
Design Linear Filters in the Frequency Domain
This function uses a transformation matrix, a set of elements that defines the frequency
transformation. This function's default transformation matrix produces filters with nearly circular
symmetry. By defining your own transformation matrix, you can obtain different symmetries. (See Jae
S. Lim, Two-Dimensional Signal and Image Processing, 1990, for details.)
Create 1-D FIR filter using the firpm function from the Signal Processing Toolbox™.
b = firpm(10,[0 0.4 0.6 1],[1 1 0 0])
b =
Columns 1 through 9
Columns 10 through 11
-0.0000 0.0537
h =
Columns 1 through 9
Columns 10 through 11
0.0005 0.0001
0.0031 0.0005
0.0068 0.0024
0.0042 0.0063
-0.0074 0.0110
-0.0147 0.0132
-0.0074 0.0110
0.0042 0.0063
0.0068 0.0024
0.0031 0.0005
0.0005 0.0001
8-47
8 Designing and Implementing Linear Filters for Image Data
8-48
Design Linear Filters in the Frequency Domain
The toolbox function fsamp2 implements frequency sampling design for two-dimensional FIR filters.
fsamp2 returns a filter h with a frequency response that passes through the points in the input
matrix Hd. The example below creates an 11-by-11 filter using fsamp2 and plots the frequency
response of the resulting filter. (The freqz2 function in this example calculates the two-dimensional
frequency response of a filter. See “Computing the Frequency Response of a Filter” on page 8-51 for
more information.)
Hd = zeros(11,11); Hd(4:8,4:8) = 1;
[f1,f2] = freqspace(11,'meshgrid');
mesh(f1,f2,Hd), axis([-1 1 -1 1 0 1.2]), colormap(jet(64))
h = fsamp2(Hd);
figure, freqz2(h,[32 32]), axis([-1 1 -1 1 0 1.2])
Notice the ripples in the actual frequency response, compared to the desired frequency response.
These ripples are a fundamental problem with the frequency sampling design method. They occur
wherever there are sharp transitions in the desired response.
You can reduce the spatial extent of the ripples by using a larger filter. However, a larger filter does
not reduce the height of the ripples, and requires more computation time for filtering. To achieve a
smoother approximation to the desired frequency response, consider using the frequency
transformation method or the windowing method.
Windowing Method
The windowing method involves multiplying the ideal impulse response with a window function to
generate a corresponding filter, which tapers the ideal impulse response. Like the frequency sampling
method, the windowing method produces a filter whose frequency response approximates a desired
frequency response. The windowing method, however, tends to produce better results than the
frequency sampling method.
The toolbox provides two functions for window-based filter design, fwind1 and fwind2. fwind1
designs a two-dimensional filter by using a two-dimensional window that it creates from one or two
one-dimensional windows that you specify. fwind2 designs a two-dimensional filter by using a
specified two-dimensional window directly.
fwind1 supports two different methods for making the two-dimensional windows it uses:
8-49
8 Designing and Implementing Linear Filters for Image Data
• Creating a rectangular, separable window from two one-dimensional windows, by computing their
outer product
The example below uses fwind1 to create an 11-by-11 filter from the desired frequency response Hd.
The example uses the Signal Processing Toolbox hamming function to create a one-dimensional
window, which fwind1 then extends to a two-dimensional window.
Hd = zeros(11,11); Hd(4:8,4:8) = 1;
[f1,f2] = freqspace(11,'meshgrid');
mesh(f1,f2,Hd), axis([-1 1 -1 1 0 1.2]), colormap(jet(64))
h = fwind1(Hd,hamming(11));
figure, freqz2(h,[32 32]), axis([-1 1 -1 1 0 1.2])
You can create an appropriate desired frequency response matrix using the freqspace function.
freqspace returns correct, evenly spaced frequency values for any size response. If you create a
desired frequency response matrix using frequency points other than those returned by freqspace,
you might get unexpected results, such as nonlinear phase.
For example, to create a circular ideal low-pass frequency response with cutoff at 0.5, use
[f1,f2] = freqspace(25,'meshgrid');
Hd = zeros(25,25); d = sqrt(f1.^2 + f2.^2) < 0.5;
Hd(d) = 1;
mesh(f1,f2,Hd)
8-50
Design Linear Filters in the Frequency Domain
Note that for this frequency response, the filters produced by fsamp2, fwind1, and fwind2 are real.
This result is desirable for most image processing applications. To achieve this in general, the desired
frequency response should be symmetric about the frequency origin (f1 = 0, f2 = 0).
This command computes and displays the 64-by-64 point frequency response of h.
freqz2(h)
To obtain the frequency response matrix H and the frequency point vectors f1 and f2, use output
arguments
[H,f1,f2] = freqz2(h);
freqz2 normalizes the frequencies f1 and f2 so that the value 1.0 corresponds to half the sampling
frequency, or π radians.
8-51
8 Designing and Implementing Linear Filters for Image Data
For a simple m-by-n response, as shown above, freqz2 uses the two-dimensional fast Fourier
transform function fft2. You can also specify vectors of arbitrary frequency points, but in this case
freqz2 uses a slower algorithm.
See “Fourier Transform” on page 9-2 for more information about the fast Fourier transform and its
application to linear filtering and filter design.
8-52
9
Transforms
The usual mathematical representation of an image is a function of two spatial variables: f(x,y). The
value of the function at a particular location (x,y) represents the intensity of the image at that point.
This is called the spatial domain. The term transform refers to an alternative mathematical
representation of an image. For example, the Fourier transform is a representation of an image as a
sum of complex exponentials of varying magnitudes, frequencies, and phases. This is called the
frequency domain. Transforms are useful for a wide range of purposes, including convolution,
enhancement, feature detection, and compression.
This chapter defines several important transforms and shows examples of their application to image
processing.
Fourier Transform
In this section...
“Definition of Fourier Transform” on page 9-2
“Discrete Fourier Transform” on page 9-5
“Applications of the Fourier Transform” on page 9-8
If f(m,n) is a function of two discrete spatial variables m and n, then the two-dimensional Fourier
transform of f(m,n) is defined by the relationship
∞ ∞
− jω1m − jω2n
F(ω1, ω2) = ∑ ∑ f (m, n)e e .
m = −∞n = −∞
The variables ω1 and ω2 are frequency variables; their units are radians per sample. F(ω1,ω2) is often
called the frequency-domain representation of f(m,n). F(ω1,ω2) is a complex-valued function that is
periodic both in ω1 and ω2, with period 2π. Because of the periodicity, usually only the range
−π ≤ ω1, ω2 ≤ π is displayed. Note that F(0,0) is the sum of all the values of f(m,n). For this reason,
F(0,0) is often called the constant component or DC component of the Fourier transform. (DC stands
for direct current; it is an electrical engineering term that refers to a constant-voltage power source,
as opposed to a power source whose voltage varies sinusoidally.)
The inverse of a transform is an operation that when performed on a transformed image produces the
original image. The inverse two-dimensional Fourier transform is given by
∫ ∫
1 π π jω1m jω2n
f (m, n) = ω1 = − π ω2 = − π
F(ω1, ω2)e e dω1dω2 .
4π2
Roughly speaking, this equation means that f(m,n) can be represented as a sum of an infinite number
of complex exponentials (sinusoids) with different frequencies. The magnitude and phase of the
contribution at the frequencies (ω1,ω2) are given by F(ω1,ω2).
To illustrate, consider a function f(m,n) that equals 1 within a rectangular region and 0 everywhere
else. To simplify the diagram, f(m,n) is shown as a continuous function, even though the variables m
and n are discrete.
9-2
Fourier Transform
Rectangular Function
The following figure shows, as a mesh plot, the magnitude of the Fourier transform,
F(ω1, ω2) ,
of the rectangular function shown in the preceding figure. The mesh plot of the magnitude is a
common way to visualize the Fourier transform.
The peak at the center of the plot is F(0,0), which is the sum of all the values in f(m,n). The plot also
shows that F(ω1,ω2) has more energy at high horizontal frequencies than at high vertical frequencies.
This reflects the fact that horizontal cross sections of f(m,n) are narrow pulses, while vertical cross
sections are broad pulses. Narrow pulses have more high-frequency content than broad pulses.
as an image, as shown.
9-3
9 Transforms
Using the logarithm helps to bring out details of the Fourier transform in regions where F(ω1,ω2) is
very close to 0.
Examples of the Fourier transform for other simple shapes are shown below.
9-4
Fourier Transform
• The input and output of the DFT are both discrete, which makes it convenient for computer
manipulations.
• There is a fast algorithm for computing the DFT known as the fast Fourier transform (FFT).
The DFT is usually defined for a discrete function f(m,n) that is nonzero only over the finite region
0 ≤ m ≤ M − 1 and 0 ≤ n ≤ N − 1. The two-dimensional M-by-N DFT and inverse M-by-N DFT
relationships are given by
9-5
9 Transforms
and
The values F(p,q) are the DFT coefficients of f(m,n). The zero-frequency coefficient, F(0,0), is often
called the "DC component." DC is an electrical engineering term that stands for direct current. (Note
that matrix indices in MATLAB always start at 1 rather than 0; therefore, the matrix elements f(1,1)
and F(1,1) correspond to the mathematical quantities f(0,0) and F(0,0), respectively.)
The MATLAB functions fft, fft2, and fftn implement the fast Fourier transform algorithm for
computing the one-dimensional DFT, two-dimensional DFT, and N-dimensional DFT, respectively. The
functions ifft, ifft2, and ifftn compute the inverse DFT.
The DFT coefficients F(p,q) are samples of the Fourier transform F(ω1,ω2).
ω1 = 2πp/M p = 0, 1, ..., M − 1
F p, q = F ω1, ω2
ω2 = 2πq/N q = 0, 1, ..., N − 1
1 Construct a matrix f that is similar to the function f(m,n) in the example in “Definition of Fourier
Transform” on page 9-2. Remember that f(m,n) is equal to 1 within the rectangular region and 0
elsewhere. Use a binary image to represent f(m,n).
f = zeros(30,30);
f(5:24,13:17) = 1;
imshow(f,'InitialMagnification','fit')
F = fft2(f);
F2 = log(abs(F));
imshow(F2,[-1 5],'InitialMagnification','fit');
colormap(jet); colorbar
9-6
Fourier Transform
This plot differs from the Fourier transform displayed in “Visualizing the Fourier Transform” on
page 9-2. First, the sampling of the Fourier transform is much coarser. Second, the zero-
frequency coefficient is displayed in the upper left corner instead of the traditional location in the
center.
3 To obtain a finer sampling of the Fourier transform, add zero padding to f when computing its
DFT. The zero padding and DFT computation can be performed in a single step with this
command.
F = fft2(f,256,256);
F = fft2(f,256,256);F2 = fftshift(F);
imshow(log(abs(F2)),[-1 5]); colormap(jet); colorbar
9-7
9 Transforms
The resulting plot is identical to the one shown in “Visualizing the Fourier Transform” on page 9-
2.
The Fourier transform of the impulse response of a linear filter gives the frequency response of the
filter. The function freqz2 computes and displays a filter's frequency response. The frequency
response of the Gaussian convolution kernel shows that this filter passes low frequencies and
attenuates high frequencies.
h = fspecial('gaussian');
freqz2(h)
See “Design Linear Filters in the Frequency Domain” on page 8-46 for more information about linear
filtering, filter design, and frequency responses.
This example shows how to perform fast convolution of two matrices using the Fourier transform. A
key property of the Fourier transform is that the multiplication of two Fourier transforms corresponds
to the convolution of the associated spatial functions. This property, together with the fast Fourier
transform, forms the basis for a fast convolution algorithm.
Note: The FFT-based convolution method is most often used for large inputs. For small inputs it is
generally faster to use the imfilter function.
Create two simple matrices, A and B. A is an M-by-N matrix and B is a P-by-Q matrix.
9-8
Fourier Transform
A = magic(3);
B = ones(3);
Zero-pad A and B so that they are at least (M+P-1)-by-(N+Q-1). (Often A and B are zero-padded to a
size that is a power of 2 because fft2 is fastest for these sizes.) The example pads the matrices to be
8-by-8.
A(8,8) = 0;
B(8,8) = 0;
Compute the two-dimensional DFT of A and B using the fft2 function. Multiply the two DFTs
together and compute the inverse two-dimensional DFT of the result using the ifft2 function.
C = ifft2(fft2(A).*fft2(B));
Extract the nonzero portion of the result and remove the imaginary part caused by roundoff error.
C = C(1:5,1:5);
C = real(C)
C = 5×5
This example shows how to use the Fourier transform to perform correlation, which is closely related
to convolution. Correlation can be used to locate features within an image. In this context, correlation
is often called template matching.
bw = imread('text.png');
Create a template for matching by extracting the letter "a" from the image. Note that you can also
create the template by using the interactive syntax of the imcrop function.
a = bw(32:45,88:98);
Compute the correlation of the template image with the original image by rotating the template
image by 180 degrees and then using the FFT-based convolution technique. (Convolution is
equivalent to correlation if you rotate the convolution kernel by 180 degrees.) To match the template
to the image, use the fft2 and ifft2 functions. In the resulting image, bright peaks correspond to
occurrences of the letter.
C = real(ifft2(fft2(bw) .* fft2(rot90(a,2),256,256)));
figure
imshow(C,[]) % Scale image to appropriate display range.
9-9
9 Transforms
To view the locations of the template in the image, find the maximum pixel value and then define a
threshold value that is less than this maximum. The thresholded image shows the locations of these
peaks as white spots in the thresholded correlation image. (To make the locations easier to see in this
figure, the example dilates the thresholded image to enlarge the size of the points.)
max(C(:))
ans = 68.0000
9-10
Fourier Transform
9-11
9 Transforms
In this section...
“DCT Definition” on page 9-12
“The DCT Transform Matrix” on page 9-13
“Image Compression with the Discrete Cosine Transform” on page 9-13
DCT Definition
The discrete cosine transform (DCT) represents an image as a sum of sinusoids of varying
magnitudes and frequencies. The dct2 function computes the two-dimensional discrete cosine
transform (DCT) of an image. The DCT has the property that, for a typical image, most of the visually
significant information about the image is concentrated in just a few coefficients of the DCT. For this
reason, the DCT is often used in image compression applications. For example, the DCT is at the
heart of the international standard lossy image compression algorithm known as JPEG. (The name
comes from the working group that developed the standard: the Joint Photographic Experts Group.)
M−1N−1
π 2m + 1 p π 2n + 1 q 0 ≤ p ≤ M − 1
Bpq = αpαq ∑ ∑ Amncos
2M
cos
2N
,
0≤q≤ N−1
m=0n=0
1/ M, p = 0 1/ N, q = 0
αp = αq =
2/M, 1 ≤ p ≤ M − 1 2/N, 1 ≤ q ≤ N − 1
The values Bpq are called the DCT coefficients of A. (Note that matrix indices in MATLAB always start
at 1 rather than 0; therefore, the MATLAB matrix elements A(1,1) and B(1,1) correspond to the
mathematical quantities A00 and B00, respectively.)
M−1N−1
π 2m + 1 p π 2n + 1 q 0 ≤ m ≤ M − 1
Amn = ∑ ∑ αpαqBpqcos
2M
cos
2N
,
0≤n≤ N−1
p=0 q=0
1/ M, p = 0 1/ N, q = 0
αp = αq =
2/M, 1 ≤ p ≤ M − 1 2/N, 1 ≤ q ≤ N − 1
The inverse DCT equation can be interpreted as meaning that any M-by-N matrix A can be written as
a sum of MN functions of the form
These functions are called the basis functions of the DCT. The DCT coefficients Bpq, then, can be
regarded as the weights applied to each basis function. For 8-by-8 matrices, the 64 basis functions
are illustrated by this image.
9-12
Discrete Cosine Transform
Horizontal frequencies increase from left to right, and vertical frequencies increase from top to
bottom. The constant-valued basis function at the upper left is often called the DC basis function, and
the corresponding DCT coefficient B00 is often called the DC coefficient.
1
M p = 0, 0≤q≤ M−1
Tpq =
2 π 2q + 1 p 1 ≤ p ≤ M − 1, 0 ≤ q ≤ M − 1
cos
M 2M
For an M-by-M matrix A, T*A is an M-by-M matrix whose columns contain the one-dimensional DCT of
the columns of A. The two-dimensional DCT of A can be computed as B=T*A*T'. Since T is a real
orthonormal matrix, its inverse is the same as its transpose. Therefore, the inverse two-dimensional
DCT of B is given by T'*B*T.
DCT is used in the JPEG image compression algorithm. The input image is divided into 8-by-8 or 16-
by-16 blocks, and the two-dimensional DCT is computed for each block. The DCT coefficients are then
quantized, coded, and transmitted. The JPEG receiver (or JPEG file reader) decodes the quantized
DCT coefficients, computes the inverse two-dimensional DCT of each block, and then puts the blocks
back together into a single image. For typical images, many of the DCT coefficients have values close
9-13
9 Transforms
to zero. These coefficients can be discarded without seriously affecting the quality of the
reconstructed image.
I = imread('cameraman.tif');
I = im2double(I);
Compute the two-dimensional DCT of 8-by-8 blocks in the image. The function dctmtx returns the N-
by-N DCT transform matrix.
T = dctmtx(8);
dct = @(block_struct) T * block_struct.data * T';
B = blockproc(I,[8 8],dct);
mask = [1 1 1 1 0 0 0 0
1 1 1 0 0 0 0 0
1 1 0 0 0 0 0 0
1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0];
B2 = blockproc(B,[8 8],@(block_struct) mask .* block_struct.data);
Reconstruct the image using the two-dimensional inverse DCT of each block.
Display the original image and the reconstructed image, side-by-side. Although there is some loss of
quality in the reconstructed image, it is clearly recognizable, even though almost 85% of the DCT
coefficients were discarded.
imshow(I)
9-14
Discrete Cosine Transform
figure
imshow(I2)
9-15
9 Transforms
Hough Transform
The Image Processing Toolbox supports functions that enable you to use the Hough transform to
detect lines in an image.
The hough function implements the Standard Hough Transform (SHT). The Hough transform is
designed to detect lines, using the parametric representation of a line:
The variable rho is the distance from the origin to the line along a vector perpendicular to the line.
theta is the angle between the x-axis and this vector. The hough function generates a parameter
space matrix whose rows and columns correspond to these rho and theta values, respectively.
After you compute the Hough transform, you can use the houghpeaks function to find peak values in
the parameter space. These peaks represent potential lines in the input image.
After you identify the peaks in the Hough transform, you can use the houghlines function to find the
endpoints of the line segments corresponding to peaks in the Hough transform. This function
automatically fills in small gaps in the line segments.
Read an image into the workspace and, to make this example more illustrative, rotate the image.
Display the image.
I = imread('circuit.tif');
rotI = imrotate(I,33,'crop');
imshow(rotI)
9-16
Hough Transform
9-17
9 Transforms
[H,theta,rho] = hough(BW);
figure
imshow(imadjust(rescale(H)),[],...
'XData',theta,...
'YData',rho,...
'InitialMagnification','fit');
xlabel('\theta (degrees)')
ylabel('\rho')
axis on
axis normal
hold on
colormap(gca,hot)
Find the peaks in the Hough transform matrix, H, using the houghpeaks function.
P = houghpeaks(H,5,'threshold',ceil(0.3*max(H(:))));
Superimpose a plot on the image of the transform that identifies the peaks.
x = theta(P(:,2));
y = rho(P(:,1));
plot(x,y,'s','color','black');
9-18
Hough Transform
lines = houghlines(BW,theta,rho,P,'FillGap',5,'MinLength',7);
Create a plot that displays the original image with the lines superimposed on it.
9-19
9 Transforms
9-20
Radon Transform
Radon Transform
In this section...
“Plot the Radon Transform of an Image” on page 9-23
“Viewing the Radon Transform as an Image” on page 9-25
Note For information about creating projection data from line integrals along paths that radiate from
a single source, called fan-beam projections, see “Fan-Beam Projection” on page 9-37. To convert
parallel-beam projection data to fan-beam projection data, use the para2fan function.
The radon function computes projections of an image matrix along specified directions.
A projection of a two-dimensional function f(x,y) is a set of line integrals. The radon function
computes the line integrals from multiple sources along parallel paths, or beams, in a certain
direction. The beams are spaced 1 pixel unit apart. To represent an image, the radon function takes
multiple, parallel-beam projections of the image from different angles by rotating the source around
the center of the image. The following figure shows a single projection at a specified rotation angle.
For example, the line integral of f(x,y) in the vertical direction is the projection of f(x,y) onto the x-
axis; the line integral in the horizontal direction is the projection of f(x,y) onto the y-axis. The
following figure shows horizontal and vertical projections for a simple two-dimensional function.
9-21
9 Transforms
Projections can be computed along any angle theta (θ). In general, the Radon transform of f(x,y) is
the line integral of f parallel to the y´-axis
∫
∞
Rθ x′ = f x′cosθ − y′sinθ, x′sinθ + y′cosθ dy′
−∞
where
x′ cosθ sinθ x
=
y′ −sinθ cosθ y
9-22
Radon Transform
Create a small sample image for this example that consists of a single square object and display it.
I = zeros(100,100);
I(25:75,25:75) = 1;
imshow(I)
9-23
9 Transforms
Calculate the Radon transform of the image for the angles 0 degrees and 45 degrees.
figure
plot(xp,R(:,1));
title('Radon Transform of a Square Function at 0 degrees')
9-24
Radon Transform
figure
plot(xp,R(:,2));
title('Radon Transform of a Square Function at 45 degrees')
theta = 0:180;
[R,xp] = radon(I,theta);
imagesc(theta,xp,R);
title('R_{\theta} (X\prime)');
xlabel('\theta (degrees)');
ylabel('X\prime');
set(gca,'XTick',0:20:180);
colormap(hot);
colorbar
9-25
9 Transforms
9-26
Detect Lines Using the Radon Transform
I = fitsread('solarspectra.fts');
I = rescale(I);
figure
imshow(I)
title('Original Image')
Compute a binary edge image using the edge function. Display the binary image returned by the
edge function.
BW = edge(I);
figure
imshow(BW)
title('Edges of Original Image')
9-27
9 Transforms
Calculate the radon transform of the image, using the radon function, and display the transform. The
locations of peaks in the transform correspond to the locations of straight lines in the original image.
theta = 0:179;
[R,xp] = radon(BW,theta);
figure
imagesc(theta, xp, R); colormap(hot);
xlabel('\theta (degrees)');
ylabel('x^{\prime} (pixels from center)');
title('R_{\theta} (x^{\prime})');
colorbar
9-28
Detect Lines Using the Radon Transform
The strongest peak in R corresponds to θ = 1 degree and x' = -80 pixels from center.
To visualize this peak in the original figure, find the center of the image, indicated by the blue cross
overlaid on the image below. The red dashed line is the radial line that passes through the center at
an angle θ = 1 degree. If you travel along this line -80 pixels from center (towards the left), the radial
line perpendicularly intersects the solid red line. This solid red line is the straight line with the
strongest signal in the Radon transform.
9-29
9 Transforms
To interpret the Radon transform further, examine the next four strongest peaks in R.
Two strong peaks in R are found at θ = 1 degree, at offsets of -84 and -87 pixels from center. These
peaks correspond to the two red lines to the left of the strongest line, overlaid on the image below.
Two other strong peaks are found near the center of R. These peaks are located at θ = 91 degrees,
with offsets of -8 and -44 pixels from center. The green dashed line in the image below is the radial
line passing through the center at an angle of 91 degrees. If you travel along the radial line a
distance of -8 and -44 pixels from center, then the radial line perpendicularly intersects the solid
green lines. These solid green lines correspond to the strong peaks in R.
9-30
Detect Lines Using the Radon Transform
9-31
9 Transforms
As described in “Radon Transform” on page 9-21, given an image I and a set of angles theta, the
radon function can be used to calculate the Radon transform.
R = radon(I,theta);
The function iradon can then be called to reconstruct the image I from projection data.
IR = iradon(R,theta);
In the example above, projections are calculated from the original image I.
Note, however, that in most application areas, there is no original image from which projections are
formed. For example, the inverse Radon transform is commonly used in tomography applications. In
X-ray absorption tomography, projections are formed by measuring the attenuation of radiation that
passes through a physical specimen at different angles. The original image can be thought of as a
cross section through the specimen, in which intensity values represent the density of the specimen.
Projections are collected using special purpose hardware, and then an internal image of the specimen
is reconstructed by iradon. This allows for noninvasive imaging of the inside of a living body or
another opaque object.
The following figure illustrates how parallel-beam geometry is applied in X-ray absorption
tomography. Note that there is an equal number of n emitters and n sensors. Each sensor measures
the radiation emitted from its corresponding emitter, and the attenuation in the radiation gives a
measure of the integrated density, or mass, of the object. This corresponds to the line integral that is
calculated in the Radon transform.
The parallel-beam geometry used in the figure is the same as the geometry that was described in
“Radon Transform” on page 9-21. f(x,y) denotes the brightness of the image and Rθ(x′) is the
projection at angle theta.
9-32
The Inverse Radon Transformation
Another geometry that is commonly used is fan-beam geometry, in which there is one source and n
sensors. For more information, see “Fan-Beam Projection” on page 9-37. To convert parallel-beam
projection data into fan-beam projection data, use the para2fan function.
iradon uses the filtered back projection algorithm to compute the inverse Radon transform. This
algorithm forms an approximation of the image I based on the projections in the columns of R. A
more accurate result can be obtained by using more projections in the reconstruction. As the number
of projections (the length of theta) increases, the reconstructed image IR more accurately
approximates the original image I. The vector theta must contain monotonically increasing angular
values with a constant incremental angle Dtheta. When the scalar Dtheta is known, it can be
passed to iradon instead of the array of theta values. Here is an example.
IR = iradon(R,Dtheta);
The filtered back projection algorithm filters the projections in R and then reconstructs the image
using the filtered projections. In some cases, noise can be present in the projections. To remove high
frequency noise, apply a window to the filter to attenuate the noise. Many such windowed filters are
available in iradon. The example call to iradon below applies a Hamming window to the filter. See
the iradon reference page for more information. To get unfiltered back projection data, specify
'none' for the filter parameter.
IR = iradon(R,theta,'Hamming');
9-33
9 Transforms
iradon also enables you to specify a normalized frequency, D, above which the filter has zero
response. D must be a scalar in the range [0,1]. With this option, the frequency axis is rescaled so that
the whole filter is compressed to fit into the frequency range [0,D]. This can be useful in cases
where the projections contain little high-frequency information but there is high-frequency noise. In
this case, the noise can be completely suppressed without compromising the reconstruction. The
following call to iradon sets a normalized frequency value of 0.85.
IR = iradon(R,theta,0.85);
P = phantom(256);
imshow(P)
2 Compute the Radon transform of the phantom brain for three different sets of theta values. R1
has 18 projections, R2 has 36 projections, and R3 has 90 projections.
9-34
The Inverse Radon Transformation
Note how some of the features of the input image appear in this image of the transform. The first
column in the Radon transform corresponds to a projection at 0º that is integrating in the vertical
direction. The centermost column corresponds to a projection at 90º, which is integrating in the
horizontal direction. The projection at 90º has a wider profile than the projection at 0º due to the
larger vertical semi-axis of the outermost ellipse of the phantom.
4 Reconstruct the head phantom image from the projection data created in step 2 and display the
results.
I1 = iradon(R1,10);
I2 = iradon(R2,5);
I3 = iradon(R3,2);
imshow(I1)
figure, imshow(I2)
figure, imshow(I3)
The following figure shows the results of all three reconstructions. Notice how image I1, which
was reconstructed from only 18 projections, is the least accurate reconstruction. Image I2,
which was reconstructed from 36 projections, is better, but it is still not clear enough to discern
clearly the small ellipses in the lower portion of the image. I3, reconstructed using 90
projections, most closely resembles the original image. Notice that when the number of
projections is relatively small (as in I1 and I2), the reconstruction can include some artifacts
from the back projection.
9-35
9 Transforms
9-36
Fan-Beam Projection
Fan-Beam Projection
In this section...
“Image Reconstruction from Fan-Beam Projection Data” on page 9-39
“Reconstruct Image using Inverse Fanbeam Projection” on page 9-40
Note For information about creating projection data from line integrals along parallel paths, see
“Radon Transform” on page 9-21. To convert fan-beam projection data to parallel-beam projection
data, use the fan2para function.
The fanbeam function computes projections of an image matrix along specified directions. A
projection of a two-dimensional function f(x,y) is a set of line integrals. The fanbeam function
computes the line integrals along paths that radiate from a single source, forming a fan shape. To
represent an image, the fanbeam function takes multiple projections of the image from different
angles by rotating the source around the center of the image. The following figure shows a single fan-
beam projection at a specified rotation angle.
When you compute fan-beam projection data using the fanbeam function, you specify as arguments
an image and the distance between the vertex of the fan-beam projections and the center of rotation
(the center pixel in the image). The fanbeam function determines the number of beams, based on the
size of the image and the settings of fanbeam parameters.
The FanSensorGeometry parameter specifies how sensors are aligned: 'arc' or 'line'.
9-37
9 Transforms
The FanRotationIncrement parameter specifies the rotation angle increment. By default, fanbeam
takes projections at different angles by rotating the source around the center pixel at 1 degree
intervals.
The following figures illustrate both these geometries. The first figure illustrates geometry used by
the fanbeam function when FanSensorGeometry is set to 'arc' (the default). Note how you specify
the distance between sensors by specifying the angular spacing of the beams.
The following figure illustrates the geometry used by the fanbeam function when
FanSensorGeometry is set to 'line'. In this figure, note how you specify the position of the
sensors by specifying the distance between them in pixels along the x´ axis.
9-38
Fan-Beam Projection
I = ifanbeam(P,D);
By default, the ifanbeam function assumes that the fan-beam projection data was created using the
arc fan sensor geometry, with beams spaced at 1 degree angles and projections taken at 1 degree
increments over a full 360 degree range. As with the fanbeam function, you can use ifanbeam
parameters to specify other values for these characteristics of the projection data. Use the same
values for these parameters that were used when the projection data was created. For more
information about these parameters, see ifanbeam.
9-39
9 Transforms
The ifanbeam function converts the fan-beam projection data to parallel-beam projection data with
the fan2para function, and then calls the iradon function to perform the image reconstruction. For
this reason, the ifanfeam function supports certain iradon parameters, which it passes to the
iradon function. See “The Inverse Radon Transformation” on page 9-32 for more information about
the iradon function.
Generate a test image and display it. The test image is the Shepp-Logan head phantom, which can be
generated by the phantom function. The phantom image illustrates many of the qualities that are
found in real-world tomographic imaging of human heads.
P = phantom(256);
imshow(P)
Compute fan-beam projection data of the test image, using the FanSensorSpacing parameter to
vary the sensor spacing. The example uses the fanbeam arc geometry, so you specify the spacing
between sensors by specifying the angular spacing of the beams. The first call spaces the beams at 2
degrees; the second at 1 degree; and the third at 0.25 degrees. In each call, the distance between the
center of rotation and vertex of the projections is constant at 250 pixels. In addition, fanbeam rotates
the projection around the center pixel at 1 degree increments.
D = 250;
dsensor1 = 2;
F1 = fanbeam(P,D,'FanSensorSpacing',dsensor1);
9-40
Fan-Beam Projection
dsensor2 = 1;
F2 = fanbeam(P,D,'FanSensorSpacing',dsensor2);
dsensor3 = 0.25;
[F3, sensor_pos3, fan_rot_angles3] = fanbeam(P,D,...
'FanSensorSpacing',dsensor3);
Plot the projection data F3 . Because fanbeam calculates projection data at rotation angles from 0 to
360 degrees, the same patterns occur at an offset of 180 degrees. The same features are being
sampled from both sides.
figure, imagesc(fan_rot_angles3, sensor_pos3, F3)
colormap(hot); colorbar
xlabel('Fan Rotation Angle (degrees)')
ylabel('Fan Sensor Position (degrees)')
Reconstruct the image from the fan-beam projection data using ifanbeam . In each reconstruction,
match the fan sensor spacing with the spacing used when the projection data was created previously.
The example uses the OutputSize parameter to constrain the output size of each reconstruction to
be the same as the size of the original image P . In the output, note how the quality of the
reconstruction gets better as the number of beams in the projection increases. The first image,
Ifan1 , was created using 2 degree spacing of the beams; the second image, Ifan2 , was created
using 1 degree spacing of the beams; the third image, Ifan3 , was created using 0.25 spacing of the
beams.
output_size = max(size(P));
9-41
9 Transforms
9-42
Fan-Beam Projection
9-43
9 Transforms
A real-world application that requires image reconstruction is X-ray absorption tomography where
projections are formed by measuring the attenuation of radiation that passes through a physical
specimen at different angles. The original image can be thought of as a cross section through the
specimen in which intensity values represent the density of the specimen. Projections are collected by
special medical imaging devices and then an internal image of the specimen is reconstructed using
iradon or ifanbeam.
See the Image Processing Toolbox™ User's Guide for diagrams that illustrate both geometries.
The test image is the Shepp-Logan head phantom which can be generated using the function
phantom. The phantom image illustrates many qualities that are found in real-world tomographic
imaging of human heads. The bright elliptical shell along the exterior is analogous to a skull and the
many ellipses inside are analogous to brain features or tumors.
P = phantom(256);
imshow(P)
9-44
Reconstructing an Image from Projection Data
Calculate synthetic projections using parallel-beam geometry and vary the number of projection
angles. For each of these calls to radon, the output is a matrix in which each column is the Radon
transform for one of the angles in the corresponding theta.
theta1 = 0:10:170;
[R1,~] = radon(P,theta1);
num_angles_R1 = size(R1,2)
num_angles_R1 = 18
theta2 = 0:5:175;
[R2,~] = radon(P,theta2);
num_angles_R2 = size(R2,2)
num_angles_R2 = 36
theta3 = 0:2:178;
[R3,xp] = radon(P,theta3);
num_angles_R3 = size(R3,2)
num_angles_R3 = 90
Note that for each angle, the projection is computed at N points along the xp-axis, where N is a
constant that depends on the diagonal distance of the image such that every pixel will be projected
for all possible projection angles.
N_R1 = size(R1,1)
N_R1 = 367
N_R2 = size(R2,1)
N_R2 = 367
N_R3 = size(R3,1)
N_R3 = 367
So, if you use a smaller head phantom, the projection needs to be computed at fewer points along the
xp-axis.
P_128 = phantom(128);
[R_128,xp_128] = radon(P_128,theta1);
N_128 = size(R_128,1)
N_128 = 185
Display the projection data R3. Some of the features of the original phantom image are visible in the
image of R3. The first column of R3 corresponds to a projection at 0 degrees, which is integrating in
the vertical direction. The centermost column corresponds to a projection at 90 degrees, which is
integrating in the horizontal directions. The projection at 90 degrees has a wider profile than the
projection at 0 degrees due to the large vertical semi-axis of the outermost ellipse of the phantom.
imagesc(theta3,xp,R3)
colormap(hot)
colorbar
xlabel('Parallel Rotation Angle - \theta (degrees)');
ylabel('Parallel Sensor Position - x\prime (pixels)');
9-45
9 Transforms
Match the parallel rotation-increment, dtheta, in each reconstruction with that used above to create
the corresponding synthetic projections. In a real-world case, you would know the geometry of your
transmitters and sensors, but not the source image, P.
The following three reconstructions (I1, I2, and I3) show the effect of varying the number of angles
at which projections are made. For I1 and I2 some features that were visible in the original phantom
are not clear. Specifically, look at the three ellipses at the bottom of each image. The result in I3
closely resembles the original image, P.
Notice the significant artifacts present in I1 and I2. To avoid these artifacts, use a larger number of
angles.
figure
montage({I1,I2,I3},'Size',[1 3])
title('Reconstruction from Parallel Beam Projection with 18, 24, and 90 Projection Angles')
9-46
Reconstructing an Image from Projection Data
Calculate synthetic projections using fan-beam geometry and vary the 'FanSensorSpacing'.
D = 250;
dsensor1 = 2;
F1 = fanbeam(P,D,'FanSensorSpacing',dsensor1);
dsensor2 = 1;
F2 = fanbeam(P,D,'FanSensorSpacing',dsensor2);
dsensor3 = 0.25;
[F3, sensor_pos3, fan_rot_angles3] = fanbeam(P,D,...
'FanSensorSpacing',dsensor3);
Display the projection data F3. Notice that the fan rotation angles range from 0 to 360 degrees and
the same patterns occur at an offset of 180 degrees because the same features are being sampled
from both sides. You can correlate features in this image of fan-beam projections with the same
features in the image of parallel-beam projections, above.
9-47
9 Transforms
Match the fan-sensor-spacing in each reconstruction with that used to create each of the synthetic
projections. In a real-world case, you would know the geometry of your transmitters and sensors, but
not the source image, P.
Changing the value of the 'FanSensorSpacing' effectively changes the number of sensors used at each
rotation angle. For each of these fan-beam reconstructions, the same rotation angles are used. This is
in contrast to the parallel-beam reconstructions which each used different rotation angles.
Note that 'FanSensorSpacing' is only one parameter of several that you can control when using
fanbeam and ifanbeam. You can also convert back and forth between parallel- and fan-beam
projection data using the functions fan2para and para2fan.
Ifan1 = ifanbeam(F1,D,'FanSensorSpacing',dsensor1,'OutputSize',output_size);
Ifan2 = ifanbeam(F2,D,'FanSensorSpacing',dsensor2,'OutputSize',output_size);
Ifan3 = ifanbeam(F3,D,'FanSensorSpacing',dsensor3,'OutputSize',output_size);
figure
montage({Ifan1,Ifan2,Ifan3},'Size',[1 3])
title('Reconstruction from Fan Beam Projection with 18, 24, and 90 Projection Angles')
9-48
Reconstructing an Image from Projection Data
9-49
10
Morphological Operations
This chapter describes the Image Processing Toolbox morphological functions. You can use these
functions to perform common image processing tasks, such as contrast enhancement, noise removal,
thinning, skeletonization, filling, and segmentation.
Morphology is a broad set of image processing operations that process images based on shapes.
Morphological operations apply a structuring element to an input image, creating an output image of
the same size. In a morphological operation, the value of each pixel in the output image is based on a
comparison of the corresponding pixel in the input image with its neighbors.
10-2
Types of Morphological Operations
The following figure illustrates the dilation of a binary image. Note how the structuring element
defines the neighborhood of the pixel of interest, which is circled. The dilation function applies the
appropriate rule to the pixels in the neighborhood and assigns a value to the corresponding pixel in
the output image. In the figure, the morphological dilation function sets the value of the output pixel
to 1 because one of the elements in the neighborhood defined by the structuring element is on. For
more information, see “Structuring Elements” on page 10-9.
10-3
10 Morphological Operations
The following figure illustrates this processing for a grayscale image. The figure shows the
processing of a particular pixel in the input image. Note how the function applies the rule to the input
pixel's neighborhood and uses the highest value of all the pixels in the neighborhood as the value of
the corresponding pixel in the output image.
This table lists functions in the toolbox that perform common morphological operations that are
based on dilation and erosion.
10-4
Types of Morphological Operations
10-5
10 Morphological Operations
10-6
Types of Morphological Operations
10-7
10 Morphological Operations
See Also
imclose | imdilate | imerode | imopen
More About
• “Structuring Elements” on page 10-9
• “Pixel Connectivity” on page 10-22
• “Border Padding for Morphology” on page 10-13
10-8
Structuring Elements
Structuring Elements
An essential part of the morphological dilation and erosion operations is the structuring element used
to probe the input image. A structuring element is a matrix that identifies the pixel in the image being
processed and defines the neighborhood used in the processing of each pixel. You typically choose a
structuring element the same size and shape as the objects you want to process in the input image.
For example, to find lines in an image, create a linear structuring element.
There are two types of structuring elements: flat and nonflat. A flat structuring element is a binary
valued neighborhood, either 2-D or multidimensional, in which the true pixels are included in the
morphological computation, and the false pixels are not. The center pixel of the structuring element,
called the origin, identifies the pixel in the image being processed. Use the strel function to create
a flat structuring element. You can use flat structuring elements with both binary and grayscale
images. The following figure illustrates a flat structuring element.
A nonflat structuring element is a matrix of type double that identifies the pixel in the image being
processed and defines the neighborhood used in the processing of that pixel. A nonflat structuring
element contains finite values used as additive offsets in the morphological computation. The center
pixel of the matrix, called the origin, identifies the pixel in the image that is being processed. Pixels in
10-9
10 Morphological Operations
the neighborhood with the value -Inf are not used in the computation. Use the offsetstrel
function to create a nonflat structuring element. You can use nonflat structuring elements only with
grayscale images.
origin = floor((size(nhood)+1)/2)
where nhood is the neighborhood defining the structuring element. To see the neighborhood of a flat
structuring element, view the Neighborhood property of the strel object. To see the neighborhood
of a nonflat structuring element, view the Offset property of the offsetstrel object.
For example, the following illustrates the origin of a flat, diamond-shaped structuring element.
10-10
Structuring Elements
For example, dilation by an 11-by-11 square structuring element can be accomplished by dilating first
with a 1-by-11 structuring element, and then with an 11-by-1 structuring element. This results in a
theoretical speed improvement of a factor of 5.5, although in practice the actual speed improvement
is somewhat less.
Structuring element decompositions used for the 'disk' and 'ball' shapes are approximations; all
other decompositions are exact. Decomposition is not used with an arbitrary structuring element
unless it is a flat structuring element whose neighborhood matrix is all 1's.
To see the sequence of structuring elements used in a decomposition, use the decompose method.
Both strel objects and offsetstrel objects support decompose methods. The decompose method
returns an array of the structuring elements that form the decomposition. For example, here are the
structuring elements created in the decomposition of a diamond-shaped structuring element.
SE = strel('diamond',4)
SE =
Call the decompose method. The method returns an array of structuring elements.
decompose(SE)
ans =
Neighborhood
Dimensionality
See Also
offsetstrel | strel
10-11
10 Morphological Operations
More About
• “Types of Morphological Operations” on page 10-2
• “Border Padding for Morphology” on page 10-13
10-12
Border Padding for Morphology
To process border pixels, the morphological functions assign a value to these undefined pixels, as if
the functions had padded the image with additional rows and columns. The value of these padding
pixels varies for dilation and erosion operations. The table describes the padding rules for dilation
and erosion for both binary and grayscale images.
Operation Rule
Dilation Pixels beyond the image border are assigned the minimum value afforded by
the data type.
For binary images, these pixels are assumed to be set to 0. For grayscale
images, the minimum value for uint8 images is 0.
Erosion Pixels beyond the image border are assigned the maximum value afforded by
the data type.
For binary images, these pixels are assumed to be set to 1. For grayscale
images, the maximum value for uint8 images is 255.
Note By using the minimum value for dilation operations and the maximum value for erosion
operations, the toolbox avoids border effects, where regions near the borders of the output image do
not appear to be homogeneous with the rest of the image. For example, if erosion padded with a
minimum value, eroding an image would result in a black border around the edge of the output
image.
See Also
imclose | imdilate | imerode | imopen | offsetstrel | strel
More About
• “Types of Morphological Operations” on page 10-2
• “Structuring Elements” on page 10-9
10-13
10 Morphological Operations
Morphological Reconstruction
Morphological reconstruction can be thought of conceptually as repeated dilations of an image, called
the marker image, until the contour of the marker image fits under a second image, called the mask
image. In morphological reconstruction, the peaks in the marker image “spread out,” or dilate.
This figure illustrates this processing in 1-D. Each successive dilation is constrained to lie underneath
the mask. When further dilation ceases to change the image, processing stops. The final dilation is
the reconstructed image. (Note: the actual implementation of this operation in the toolbox is done
much more efficiently. See the imreconstruct reference page for more details.) The figure shows
the successive dilations of the marker.
10-14
Morphological Reconstruction
Morphological reconstruction is based on morphological dilation, but note the following unique
properties:
• Processing is based on two images, a marker and a mask, rather than one image and a structuring
element.
• Processing is based on the concept of pixel connectivity on page 10-22, rather than a structuring
element.
• Processing repeats until the image is stable and no longer changes.
To illustrate morphological reconstruction, consider this simple image. It contains two primary
regions, the blocks of pixels containing the values 14 and 18. The background is primarily all set to
10, with some pixels set to 11.
One way to create a marker image is to subtract a constant from the mask image, using
imsubtract.
marker = imsubtract(A,2)
marker =
8 8 8 8 8 8 8 8 8 8
8 12 12 12 8 8 9 8 9 8
8 12 12 12 8 8 8 9 8 8
8 12 12 12 8 8 9 8 9 8
8 8 8 8 8 8 8 8 8 8
8 9 8 8 8 16 16 16 8 8
8 8 8 9 8 16 16 16 8 8
8 8 9 8 8 16 16 16 8 8
8 9 8 9 8 8 8 8 8 8
8 8 8 8 8 8 9 8 8 8
2 Call the imreconstruct function to morphologically reconstruct the image. In the output
image, note how all the intensity fluctuations except the intensity peak have been removed.
10-15
10 Morphological Operations
For example, in an image of several spherical objects, points of high intensity could represent the
tops of the objects. Using morphological processing, these maxima can be used to identify objects in
an image.
Terminology
Term Definition
global maxima Highest regional maxima in the image. See the entry for regional
maxima in this table for more information.
global minima Lowest regional minima in the image. See the entry for regional
minima in this table for more information.
regional maxima Connected set of pixels of constant intensity from which it is
impossible to reach a point with higher intensity without first
descending; that is, a connected component of pixels with the
same intensity value, t, surrounded by pixels that all have a value
less than t.
10-16
Morphological Reconstruction
Term Definition
regional minima Connected set of pixels of constant intensity from which it is
impossible to reach a point with lower intensity without first
ascending; that is, a connected component of pixels with the same
intensity value, t, surrounded by pixels that all have a value
greater than t.
An image can have multiple regional maxima or minima but only a single global maximum or
minimum. Determining image peaks or valleys can be used to create marker images that are used in
morphological reconstruction.
The toolbox includes functions that you can use to find areas of high or low intensity in an image:
• The imregionalmax and imregionalmin functions identify all regional minima or maxima.
• The imextendedmax and imextendedmin functions identify regional minima or maxima that are
greater than or less than a specified threshold.
The functions accept a grayscale image as input and return a binary image as output. In the output
binary image, the regional minima or maxima are set to 1; all other pixels are set to 0.
For example, this simple image contains two primary regional maxima, the blocks of pixels containing
the value 13 and 18, and several smaller maxima, set to 11.
The binary image returned by imregionalmax pinpoints all these regional maxima.
10-17
10 Morphological Operations
B = imregionalmax(A)
You might want only to identify areas of the image where the change in intensity is extreme; that is,
the difference between the pixel and neighboring pixels is greater than (or less than) a certain
threshold. For example, to find only those regional maxima in the sample image, A, that are at least
two units higher than their neighbors, use imextendedmax.
B = imextendedmax(A,2)
In an image, every small fluctuation in intensity represents a regional minimum or maximum. You
might only be interested in significant minima or maxima and not in these smaller minima and
maxima caused by background texture.
To remove the less significant minima and maxima but retain the significant minima and maxima, use
the imhmax or imhmin function. With these functions, you can specify a contrast criteria or threshold
level, h, that suppresses all maxima whose height is less than h or whose minima are greater than h.
For example, this simple image contains two primary regional maxima, the blocks of pixels containing
the value 14 and 18, and several smaller maxima, set to 11.
10-18
Morphological Reconstruction
To eliminate all regional maxima except the two significant maxima, use imhmax, specifying a
threshold value of 2. Note that imhmax only affects the maxima; none of the other pixel values are
changed. The two significant maxima remain, although their heights are reduced.
B = imhmax(A,2)
This figure takes the second row from the sample image to illustrate in 1-D how imhmax changes the
profile of the image.
Imposing a Minimum
You can emphasize specific minima (dark objects) in an image using the imimposemin function. The
imimposemin function uses morphological reconstruction to eliminate all minima from the image
except the minima you specify.
To illustrate the process of imposing a minimum, this code creates a simple image containing two
primary regional minima and several other regional minima.
mask = uint8(10*ones(10,10));
mask(6:8,6:8) = 2;
mask(2:4,2:4) = 7;
10-19
10 Morphological Operations
mask(3,3) = 5;
mask(2,9) = 9;
mask(3,8) = 9;
mask(9,2) = 9;
mask(8,3) = 9
To obtain an image that emphasizes the two deepest minima and removes all others, create a marker
image that pinpoints the two minima of interest. You can create the marker image by explicitly
setting certain pixels to specific values or by using other morphological functions to extract the
features you want to emphasize in the mask image.
This example uses imextendedmin to get a binary image that shows the locations of the two deepest
minima.
marker = imextendedmin(mask,1)
Now use imimposemin to create new minima in the mask image at the points specified by the
marker image. Note how imimposemin sets the values of pixels specified by the marker image to the
lowest value supported by the data type (0 for uint8 values). imimposemin also changes the values
of all the other pixels in the image to eliminate the other minima.
I = imimposemin(mask,marker)
I =
11 11 11 11 11 11 11 11 11 11
11 8 8 8 11 11 11 11 11 11
11 8 0 8 11 11 11 11 11 11
11 8 8 8 11 11 11 11 11 11
11 11 11 11 11 11 11 11 11 11
11 11 11 11 11 0 0 0 11 11
11 11 11 11 11 0 0 0 11 11
10-20
Morphological Reconstruction
11 11 11 11 11 0 0 0 11 11
11 11 11 11 11 11 11 11 11 11
11 11 11 11 11 11 11 11 11 11
This figure illustrates in 1-D how imimposemin changes the profile of row 2 of the image.
Imposing a Minimum
See Also
More About
• “Pixel Connectivity” on page 10-22
• “Flood-Fill Operations” on page 10-38
10-21
10 Morphological Operations
Pixel Connectivity
Morphological processing starts at the peaks in the marker image and spreads throughout the rest of
the image based on the connectivity of the pixels. Connectivity defines which pixels are connected to
other pixels. A set of pixels in a binary image that form a connected group is called an object or a
connected component.
Determining which pixels create a connected component depends on how pixel connectivity is
defined. For example, this binary image contains one foreground object or two, depending on the
connectivity. If the foreground is 4-connected, the image is all one object — there is no distinction
between a foreground object and the background. However, if the foreground is 8-connected, the
pixels set to 1 connect to form a closed loop and the image has two separate objects: the pixels in the
loop and the pixels outside the loop.
0 0 0 0 0 0 0 0
0 1 1 1 1 1 0 0
0 1 0 0 0 1 0 0
0 1 0 0 0 1 0 0
0 1 0 0 0 1 0 0
0 1 1 1 1 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
Value Meaning
Two-Dimensional Connectivities
4-connected Pixels are connected if their edges touch. Two
adjoining pixels are part of the same object if
they are both on and are connected along the
horizontal or vertical direction.
10-22
Pixel Connectivity
Value Meaning
18-connected Pixels are connected if their faces or edges
touch. Two adjoining pixels are part of the
same object if they are both on and are
connected in
Choosing a Connectivity
The type of neighborhood you choose affects the number of objects found in an image and the
boundaries of those objects. For this reason, the results of many morphology operations often differ
depending upon the type of connectivity you specify.
For example, if you specify a 4-connected neighborhood, this binary image contains two objects; if
you specify an 8-connected neighborhood, the image has one object.
0 0 0 0 0 0
0 1 1 0 0 0
0 1 1 0 0 0
0 0 0 1 1 0
0 0 0 1 1 0
For example, this array defines a “North/South” connectivity which can be used to break up an image
into independent columns.
CONN = [ 0 1 0; 0 1 0; 0 1 0 ]
CONN =
0 1 0
0 1 0
0 1 0
10-23
10 Morphological Operations
Note Connectivity arrays must be symmetric about their center element. Also, you can use a 2-D
connectivity array with a 3-D image; the connectivity affects each "page" in the 3-D image.
See Also
boundarymask | bwareaopen | bwconncomp | bwconncomp | conndef | imfill | iptcheckconn
More About
• “Morphological Reconstruction” on page 10-14
10-24
Lookup Table Operations
For a 2-by-2 neighborhood, there are 16 possible permutations of the pixels in the neighborhood.
Therefore, the lookup table for this operation is a 16-element vector. For a 3-by-3 neighborhood, there
are 512 permutations, so the lookup table is a 512-element vector.
Note makelut and applylut support only 2-by-2 and 3-by-3 neighborhoods. Lookup tables larger
than 3-by-3 neighborhoods are not practical. For example, a lookup table for a 4-by-4 neighborhood
would have 65,536 entries.
The example below illustrates using lookup table operations to modify an image containing text. The
example creates an anonymous function that returns 1 if three or more pixels in the 3-by-3
neighborhood are 1; otherwise, it returns 0. The example then calls makelut, passing in this function
as the first argument, and using the second argument to specify a 3-by-3 lookup table.
lut is returned as a 512-element vector of 1's and 0's. Each value is the output from the function for
one of the 512 possible permutations.
BW1 = imread('text.png');
BW2 = applylut(BW1,lut);
10-25
10 Morphological Operations
imshow(BW1)
figure, imshow(BW2)
For information about how applylut maps pixel combinations in the image to entries in the lookup
table, see the reference page for applylut.
10-26
Dilate an Image to Enlarge a Shape
Create a simple sample binary image containing one foreground object: the square region of 1's in
the middle of the image.
BW = zeros(9,10);
BW(4:6,4:7) = 1
BW = 9×10
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 1 1 1 1 0 0 0
0 0 0 1 1 1 1 0 0 0
0 0 0 1 1 1 1 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
imshow(imresize(BW,40,'nearest'))
10-27
10 Morphological Operations
Create a structuring element to use with imdilate. To dilate a geometric object, you typically create
a structuring element that is the same shape as the object.
SE = strel('square',3)
SE =
strel is a square shaped structuring element with properties:
Dilate the image, passing the input image and the structuring element to imdilate. Note how
dilation adds a rank of 1's to all sides of the foreground object.
BW2 = imdilate(BW,SE)
BW2 = 9×10
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 1 1 1 1 1 1 0 0
0 0 1 1 1 1 1 1 0 0
0 0 1 1 1 1 1 1 0 0
0 0 1 1 1 1 1 1 0 0
0 0 1 1 1 1 1 1 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
imshow(imresize(BW2,40,'nearest'))
10-28
Dilate an Image to Enlarge a Shape
For comparison, create a structuring element that is a different shape. Dilate the original image using
the new structuring element.
SE2 = strel('diamond',1);
BW3 = imdilate(BW,SE2);
imshow(imresize(BW3,40,'nearest'))
10-29
10 Morphological Operations
10-30
Remove Thin Lines Using Erosion
Read and display a binary image. The white lines that represent wires are approximately four or five
pixels wide. In some places, the wires are touching and the overall width is closer to ten or eleven
pixels.
BW1 = imread('circbw.tif');
imshow(BW1)
Define a neighborhood larger than the width of the lines. This example uses a disk-shaped structuring
element with a radius of 7 pixels so that the overall neighborhood size is 13-by-13 pixels.
SE = strel("disk",7)
SE =
strel is a disk shaped structuring element with properties:
Erode the image, specifying the input image and the structuring element as arguments to the
imerode function.
BW2 = imerode(BW1,SE);
10-31
10 Morphological Operations
imshow(BW2)
10-32
Use Morphological Opening to Extract Large Image Features
In this example, you use morphological opening on an image of a circuit board to remove all the
circuit lines from the image. The output image contains only the rectangular shapes of the
microchips.
You can use the imopen function to perform erosion and dilation in one step.
BW1 = imread('circbw.tif');
figure
imshow(BW1)
Create a structuring element. The structuring element should be large enough to remove the lines
when you erode the image, but not large enough to remove the rectangles. It should consist of all 1s,
so it removes everything but large contiguous patches of foreground pixels.
SE = strel('rectangle',[40 30]);
10-33
10 Morphological Operations
Erode the image with the structuring element. This removes all the lines, but also shrinks the
rectangles.
BW3 = imerode(BW1,SE);
imshow(BW3)
10-34
Use Morphological Opening to Extract Large Image Features
To restore the rectangles to their original sizes, dilate the eroded image using the same structuring
element, SE.
BW4 = imdilate(BW3,SE);
imshow(BW4)
10-35
10 Morphological Operations
By performing the operations sequentially, you have the flexibility to change the structuring element.
Create a different structuring element, and dilate the eroded image using the new structuring
element.
SE = strel('diamond',15);
BW5 = imdilate(BW3,SE);
imshow(BW5)
10-36
Use Morphological Opening to Extract Large Image Features
See Also
imclose | imdilate | imerode | imopen | strel
More About
• “Types of Morphological Operations” on page 10-2
10-37
10 Morphological Operations
Flood-Fill Operations
The imfill function performs a flood-fill operation on binary and grayscale images. This operation
can be useful in removing irrelevant artifacts from images.
• For binary images, imfill changes connected background pixels (0s) to foreground pixels (1s),
stopping when it reaches object boundaries.
• For grayscale images, imfill brings the intensity values of dark areas that are surrounded by
lighter areas up to the same intensity level as surrounding pixels. In effect, imfill removes
regional minima that are not connected to the image border. For more information, see “Finding
Areas of High or Low Intensity” on page 10-17 for more information.
Specifying Connectivity
For both binary and grayscale images, the boundary of the fill operation is determined by the pixel
connectivity on page 10-22 that you specify.
Note imfill differs from the other object-based operations in that it operates on background pixels.
When you specify connectivity with imfill, you are specifying the connectivity of the background,
not the foreground.
If the background is 4-connected, this binary image contains two separate background elements (the
part inside the loop and the part outside). If the background is 8-connected, the pixels connect
diagonally, and there is only one background element.
For example, if you call imfill, specifying the pixel BW(4,3) as the starting point, imfill only fills
the inside of the loop because, by default, the background is 4-connected.
imfill(BW,[4 3])
ans =
0 0 0 0 0 0 0 0
0 1 1 1 1 1 0 0
0 1 1 1 1 1 0 0
0 1 1 1 1 1 0 0
0 1 1 1 1 1 0 0
10-38
Flood-Fill Operations
0 1 1 1 1 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
If you specify the same starting point, but use an 8-connected background connectivity, imfill fills
the entire image.
imfill(BW,[4 3],8)
ans =
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
Filling Holes
A common use of the flood-fill operation is to fill holes in images. For example, suppose you have an
image, binary or grayscale, in which the foreground objects represent spheres. In the image, these
objects should appear as disks, but instead are doughnut shaped because of reflections in the original
photograph. Before doing any further processing of the image, you might want to first fill in the
“doughnut holes” using imfill.
Because the use of flood-fill to fill holes is so common, imfill includes special syntax to support it
for both binary and grayscale images. In this syntax, you just specify the argument 'holes'; you do
not have to specify starting locations in each hole.
[X,map] = imread('spine.tif');
I = ind2gray(X,map);
Ifill = imfill(I,'holes');
imshow(I);figure, imshow(Ifill)
10-39
10 Morphological Operations
See Also
imfill
More About
• “Morphological Reconstruction” on page 10-14
10-40
Detect Cell Using Edge Detection and Morphology
Read in the cell.tif image, which is an image of a prostate cancer cell. Two cells are present in
this image, but only one cell can be seen in its entirety. The goal is to detect, or segment, the cell that
is completely visible.
I = imread('cell.tif');
imshow(I)
title('Original Image');
text(size(I,2),size(I,1)+15, ...
'Image courtesy of Alan Partin', ...
'FontSize',7,'HorizontalAlignment','right');
text(size(I,2),size(I,1)+25, ....
'Johns Hopkins University', ...
'FontSize',7,'HorizontalAlignment','right');
The object to be segmented differs greatly in contrast from the background image. Changes in
contrast can be detected by operators that calculate the gradient of an image. To create a binary
mask containing the segmented cell, calculate the gradient image and apply a threshold.
Use edge and the Sobel operator to calculate the threshold value. Tune the threshold value and use
edge again to obtain a binary mask that contains the segmented cell.
[~,threshold] = edge(I,'sobel');
fudgeFactor = 0.5;
BWs = edge(I,'sobel',threshold * fudgeFactor);
imshow(BWs)
title('Binary Gradient Mask')
10-41
10 Morphological Operations
The binary gradient mask shows lines of high contrast in the image. These lines do not quite
delineate the outline of the object of interest. Compared to the original image, there are gaps in the
lines surrounding the object in the gradient mask. These linear gaps will disappear if the Sobel image
is dilated using linear structuring elements. Create two perpindicular linear structuring elements by
using strel function.
se90 = strel('line',3,90);
se0 = strel('line',3,0);
Dilate the binary gradient mask using the vertical structuring element followed by the horizontal
structuring element. The imdilate function dilates the image.
10-42
Detect Cell Using Edge Detection and Morphology
The dilated gradient mask shows the outline of the cell quite nicely, but there are still holes in the
interior of the cell. To fill these holes, use the imfill function.
BWdfill = imfill(BWsdil,'holes');
imshow(BWdfill)
title('Binary Image with Filled Holes')
The cell of interest has been successfully segmented, but it is not the only object that has been found.
Any objects that are connected to the border of the image can be removed using the imclearborder
function. To remove diagonal connections, set the connectivity in the imclearborder function to 4.
BWnobord = imclearborder(BWdfill,4);
imshow(BWnobord)
title('Cleared Border Image')
10-43
10 Morphological Operations
Finally, in order to make the segmented object look natural, smooth the object by eroding the image
twice with a diamond structuring element. Create the diamond structuring element using the strel
function.
seD = strel('diamond',1);
BWfinal = imerode(BWnobord,seD);
BWfinal = imerode(BWfinal,seD);
imshow(BWfinal)
title('Segmented Image');
You can use the labeloverlay function to display the mask over the original image.
imshow(labeloverlay(I,BWfinal))
title('Mask Over Original Image')
An alternate method to display the segmented object is to draw an outline around the segmented cell.
Draw an outline by using the bwperim function.
10-44
Detect Cell Using Edge Detection and Morphology
BWoutline = bwperim(BWfinal);
Segout = I;
Segout(BWoutline) = 255;
imshow(Segout)
title('Outlined Original Image')
See Also
bwperim | edge | imclearborder | imdilate | imerode | imfill | strel
More About
• “Types of Morphological Operations” on page 10-2
10-45
10 Morphological Operations
Granulometry of Snowflakes
This example shows how to calculate the size distribution of snowflakes in an image by using
granulometry. Granulometry determines the size distribution of objects in an image without explicitly
segmenting (detecting) each object first.
Read Image
Enhance Contrast
Your first step is to maximize the intensity contrast in the image. You can do this using the
adapthisteq function, which performs contrast-limited adaptive histogram equalization. Rescale the
image intensity using the imadjust function so that it fills the data type's entire dynamic range.
claheI = adapthisteq(I,'NumTiles',[10 10]);
claheI = imadjust(claheI);
imshow(claheI)
Granulometry estimates the intensity surface area distribution of snowflakes as a function of size.
Granulometry likens image objects to stones whose sizes can be determined by sifting them through
10-46
Granulometry of Snowflakes
screens of increasing size and collecting what remains after each pass. Image objects are sifted by
opening the image with a structuring element of increasing size and counting the remaining intensity
surface area (summation of pixel values in the image) after each opening.
Choose a counter limit so that the intensity surface area goes to zero as you increase the size of your
structuring element. For display purposes, leave the first entry in the surface area array empty.
radius_range = 0:22;
intensity_area = zeros(size(radius_range));
for counter = radius_range
remain = imopen(claheI, strel('disk', counter));
intensity_area(counter + 1) = sum(remain(:));
end
figure
plot(intensity_area, 'm - *')
grid on
title('Sum of pixel values in opened image versus radius')
xlabel('radius of opening (pixels)')
ylabel('pixel value sum of opened objects (intensity)')
A significant drop in intensity surface area between two consecutive openings indicates that the
image contains objects of comparable size to the smaller opening. This is equivalent to the first
derivative of the intensity surface area array, which contains the size distribution of the snowflakes in
the image. Calculate the first derivative with the diff function.
10-47
10 Morphological Operations
intensity_area_prime = diff(intensity_area);
plot(intensity_area_prime, 'm - *')
grid on
title('Granulometry (Size Distribution) of Snowflakes')
ax = gca;
ax.XTick = [0 2 4 6 8 10 12 14 16 18 20 22];
xlabel('radius of snowflakes (pixels)')
ylabel('Sum of pixel values in snowflakes as a function of radius')
Notice the minima and the radii where they occur in the graph. The minima tell you that snowflakes
in the image have those radii. The more negative the minimum point, the higher the snowflakes'
cumulative intensity at that radius. For example, the most negative minimum point occurs at the 5
pixel radius mark. You can extract the snowflakes having a 5 pixel radius with the following steps.
open5 = imopen(claheI,strel('disk',5));
open6 = imopen(claheI,strel('disk',6));
rad5 = imsubtract(open5,open6);
imshow(rad5,[])
10-48
Granulometry of Snowflakes
See Also
adapthisteq | imadjust | imopen | imsubtract | strel
10-49
10 Morphological Operations
Distance Metrics
This example creates a binary image containing two intersecting circular objects.
center1 = -10;
center2 = -center1;
dist = sqrt(2*(2*center1)^2);
radius = dist/2 * 1.4;
lims = [floor(center1-1.2*radius) ceil(center2+1.2*radius)];
[x,y] = meshgrid(lims(1):lims(2));
bw1 = sqrt((x-center1).^2 + (y-center1).^2) <= radius;
bw2 = sqrt((x-center2).^2 + (y-center2).^2) <= radius;
bw = bw1 | bw2;
figure
imshow(bw)
10-50
Distance Transform of a Binary Image
To compute the distance transform of the complement of the binary image, use the bwdist function.
In the image of the distance transform, note how the centers of the two circular areas are white.
D = bwdist(~bw);
figure
imshow(D,[])
10-51
10 Morphological Operations
Calculate connected components by using bwconncomp. In this sample code, BW is the binary matrix
shown in the above image. Specify a connectivity of 4 so that two adjoining pixels are part of the
same object if they are both on and are connected along the horizontal or vertical direction. The
PixelIdxList field identifies the list of pixels belonging to each connected component.
BW = zeros(8,9);
BW(2:4,2:3) = 1;
BW(5:7,4:5) = 1;
BW(2,7:9) = 1;
BW(3,8:9) = 1;
BW
BW =
0 0 0 0 0 0 0 0 0
0 1 1 0 0 0 1 1 1
0 1 1 0 0 0 0 1 1
0 1 1 0 0 0 0 0 0
0 0 0 1 1 0 0 0 0
0 0 0 1 1 0 0 0 0
0 0 0 1 1 0 0 0 0
0 0 0 0 0 0 0 0 0
cc = bwconncomp(BW,4)
cc =
10-52
Label and Measure Connected Components in a Binary Image
Connectivity: 8
ImageSize: [8 9]
NumObjects: 3
PixelIdxList: {[6x1 double] [6x1 double] [5x1 double]}
Create a label matrix by using the labelmatrix function. This sample code continues with the
connected component structure, cc, defined in the preceding section.
labeled = labelmatrix(cc)
labeled =
0 0 0 0 0 0 0 0 0
0 1 1 0 0 0 3 3 3
0 1 1 0 0 0 0 3 3
0 1 1 0 0 0 0 0 0
0 0 0 2 2 0 0 0 0
0 0 0 2 2 0 0 0 0
0 0 0 2 2 0 0 0 0
0 0 0 0 0 0 0 0 0
To visualize connected components, display the label matrix as a pseudo-color image by using the
label2rgb function. The label identifying each object in the label matrix maps to a different color in
the associated colormap. You can specify the colormap, background color, and how objects in the
label matrix map to colors in the colormap.
RGB_label = label2rgb(labeled,@copper,'c','shuffle');
imshow(RGB_label)
10-53
10 Morphological Operations
For example, use this command to select objects in the image displayed in the current axes.
BW2 = bwselect;
The cursor changes to cross-hairs when it is over the image. Click the objects you want to select;
bwselect displays a small star over each pixel you click. When you are done, press Return.
bwselect returns a binary image consisting of the objects you selected, and removes the stars.
This example uses bwarea to determine the percentage area increase in circbw.tif that results
from a dilation operation. The area is a measure of the size of the foreground of the image and is
roughly equal to the number of on pixels in the image. However, bwarea does not simply count the
number of pixels set to on. Rather, bwarea weights different pixel patterns unequally when
computing the area. This weighting compensates for the distortion that is inherent in representing a
continuous image with discrete pixels. For example, a diagonal line of 50 pixels is longer than a
horizontal line of 50 pixels. As a result of the weighting bwarea uses, the horizontal line has area of
50, but the diagonal line has area of 62.5.
BW = imread('circbw.tif');
SE = ones(5);
BW2 = imdilate(BW,SE);
increase = (bwarea(BW2) - bwarea(BW))/bwarea(BW)
increase =
0.3456
10-54
Label and Measure Connected Components in a Binary Image
See Also
bwconncomp | bwselect | label2rgb | labelmatrix | regionprops
Related Examples
• “Calculate Properties of Image Regions Using Image Region Analyzer” on page 12-28
• “Correct Nonuniform Illumination and Analyze Foreground Objects” on page 1-9
More About
• “Pixel Connectivity” on page 10-22
10-55
11
This topic describes functions that support a range of standard image processing operations for
analyzing and enhancing images.
11-2
Pixel Values
Pixel Values
To determine the values of one or more pixels in an image and return the values in a variable, use the
impixel function. You can specify the pixels by passing their coordinates as input arguments or you
can select the pixels interactively using a mouse. impixel returns the value of specified pixels in a
variable in the MATLAB workspace.
Note You can also get pixel value information interactively using the Image Tool -- see “Get Pixel
Information in Image Viewer App” on page 4-26.
Display an image.
imshow canoe.tif
Call impixel. When called with no input arguments, impixel associates itself with the image in the
current axes.
pixel_values = impixel
Select the points you want to examine in the image by clicking the mouse. impixel places a star at
each point you select.
imshow canoe.tif
When you are finished selecting points, press Return. impixel returns the pixel values in an n-by-3
array, where n is the number of points you selected. impixel removes the stars used to indicate
selected points.
pixel_values =
11-3
11 Analyzing and Enhancing Images
11-4
Intensity Profile of Images
I = fitsread('solarspectra.fts');
imshow(I,[]);
Create the intensity profile. Call improfile with no arguments. The cursor changes to cross-hairs
when you move it over the displayed image. Using the mouse, specify line segments by clicking the
endpoints. improfile draws a line between the endpoints. When you finish specifying the path,
press Return. In the following figure, the line is shown in red.
improfile
After you finish drawing the line over the image, improfile displays a plot of the data along the line.
Notice how the peaks and valleys in the plot correspond to the light and dark bands in the image.
11-5
11 Analyzing and Enhancing Images
imshow peppers.png
Call improfile without any arguments and trace a line segment in the image interactively. In the
figure, the black line indicates a line segment drawn from top to bottom. Double-click to end the line
segment.
improfile
11-6
Intensity Profile of Images
The improfile function displays a plot of the intensity values along the line segment. The plot
includes separate lines for the red, green, and blue intensities. In the plot, notice how low the blue
values are at the beginning of the plot where the line traverses the orange pepper.
11-7
11 Analyzing and Enhancing Images
Read grayscale image and display it. The example uses an example image of grains of rice.
I = imread('rice.png');
imshow(I)
figure;
imcontour(I,3)
11-8
Contour Plot of Image Data
11-9
11 Analyzing and Enhancing Images
Use a helper function, propsSynthesizeImage, to create a grayscale image that contains five
distinct regions.
I = propsSynthesizeImage;
imshow(I)
title('Synthetic Image')
Segment the grayscale image by creating a binary image containing the objects in the image.
BW = I > 0;
imshow(BW)
title('Binary Image')
11-10
Measuring Regions in Grayscale Images
The regionprops function supports several properties that can be used with grayscale images,
including 'WeightedCentroid', 'MeanIntensity', 'MinIntensity', and 'MaxIntensity'. These properties
use the original pixel values of the objects for their calculations.
For example, you can use regionprops to calculate both the centroid and weighted centroid of
objects in the image. Note how you pass in the binary image (BW) containing your objects and the
original grayscale image (I) as arguments into regionprops.
s = regionprops(BW,I,{'Centroid','WeightedCentroid'});
To compare the weighted centroid locations with the unweighted centroid locations, display the
original image and then, using the hold and plot functions, superimpose the centroids on the
image.
imshow(I)
title('Weighted (red) and Unweighted (blue) Centroids');
hold on
numObj = numel(s);
for k = 1 : numObj
plot(s(k).WeightedCentroid(1), s(k).WeightedCentroid(2), 'r*')
plot(s(k).Centroid(1), s(k).Centroid(2), 'bo')
end
hold off
11-11
11 Analyzing and Enhancing Images
You can use the 'PixelValues' property to do custom calculations based on the pixel values of the
original grayscale image. The 'PixelValues' property returns a vector containing the grayscale values
of pixels in a region.
s = regionprops(BW,I,{'Centroid','PixelValues','BoundingBox'});
imshow(I)
title('Standard Deviation of Regions')
hold on
for k = 1 : numObj
s(k).StandardDeviation = std(double(s(k).PixelValues));
text(s(k).Centroid(1),s(k).Centroid(2), ...
sprintf('%2.1f', s(k).StandardDeviation), ...
'EdgeColor','b','Color','r');
end
hold off
11-12
Measuring Regions in Grayscale Images
This figure shows the standard deviation measurement superimposed on each object in the image.
You can also view the results in other ways, for example as a bar plot showing the standard deviation
by label number.
figure
bar(1:numObj,[s.StandardDeviation])
xlabel('Region Label Number')
ylabel('Standard Deviation')
11-13
11 Analyzing and Enhancing Images
You can use the plot to determine how to partition the data. For example, the following code identifies
objects with a standard deviation lower than 50.
sStd = [s.StandardDeviation];
lowStd = find(sStd < 50);
imshow(I)
title('Objects Having Standard Deviation < 50')
hold on
for k = 1 : length(lowStd)
rectangle('Position',s(lowStd(k)).BoundingBox,'EdgeColor','y');
end
hold off
11-14
Measuring Regions in Grayscale Images
See Also
regionprops
11-15
11 Analyzing and Enhancing Images
Load the image frames of a pendulum in motion. The frames in the MAT-file pendulum.mat were
acquired using the following functions in the Image Acquisition Toolbox.
% load MAT-file
load pendulum;
implay(frames);
11-16
Finding the Length of a Pendulum in Motion
You can see that the pendulum is swinging in the upper half of each frame in the image series. Create
a new series of frames that contains only the region where the pendulum is swinging.
To crop a series of frames using imcrop, first perform imcrop on one frame and store its output
image. Then use the previous output's size to create a series of frame regions. For convenience, use
the rect that was loaded by pendulum.mat in imcrop.
nFrames = size(frames,4);
first_frame = frames(:,:,:,1);
first_region = imcrop(first_frame,rect);
frame_regions = repmat(uint8(0), [size(first_region) nFrames]);
for count = 1:nFrames
frame_regions(:,:,:,count) = imcrop(frames(:,:,:,count),rect);
end
imshow(frames(:,:,:,1))
11-17
11 Analyzing and Enhancing Images
Notice that the pendulum is much darker than the background. You can segment the pendulum in
each frame by converting the frame to grayscale, thresholding it using imbinarize, and removing
background structures using imopen and imclearborder.
gfr = rgb2gray(fr);
gfr = imcomplement(gfr);
montage({fr,labeloverlay(gfr,bw)});
pause(0.2)
end
11-18
Finding the Length of a Pendulum in Motion
You can see that the shape of the pendulum varied in different frames. This is not a serious issue
because you just need its center. You will use the pendulum centers to find the length of the
pendulum.
pend_centers = zeros(nFrames,2);
for count = 1:nFrames
property = regionprops(seg_pend(:,:,count), 'Centroid');
pend_centers(count,:) = property.Centroid;
end
x = pend_centers(:,1);
y = pend_centers(:,2);
figure
plot(x,y,'m.')
axis ij
axis equal
hold on;
xlabel('x');
ylabel('y');
title('Pendulum Centers');
11-19
11 Analyzing and Enhancing Images
You can solve for parameters a, b, and c using the least squares method. Rewrite the above equation
as
11-20
Finding the Length of a Pendulum in Motion
b = abc(2);
c = abc(3);
xc = -a/2;
yc = -b/2;
circle_radius = sqrt((xc^2 + yc^2) - c);
pendulum_length = round(circle_radius)
pendulum_length =
253
circle_theta = pi/3:0.01:pi*2/3;
x_fit = circle_radius*cos(circle_theta)+xc;
y_fit = circle_radius*sin(circle_theta)+yc;
plot(x_fit,y_fit,'b-');
plot(xc,yc,'bx','LineWidth',2);
plot([xc x(1)],[yc y(1)],'b-');
text(xc-110,yc+100,sprintf('Pendulum length = %d pixels', pendulum_length));
See Also
imbinarize | imclearborder | imcomplement | imopen | labeloverlay | regionprops
11-21
11 Analyzing and Enhancing Images
I = imread('rice.png');
imshow(I)
Create the histogram. For the example image, showing grains of rice, imhist creates a histogram
with 64 bins. The imhist function displays the histogram, by default. The histogram shows a peak at
around 100, corresponding to the dark gray background in the image.
figure;
imhist(I);
11-22
Create Image Histogram
11-23
11 Analyzing and Enhancing Images
These functions are two-dimensional versions of the mean, std, and corrcoef functions described in
the MATLAB Function Reference.
11-24
Edge Detection
Edge Detection
In an image, an edge is a curve that follows a path of rapid change in image intensity. Edges are often
associated with the boundaries of objects in a scene. Edge detection is used to identify the edges in
an image.
To find edges, you can use the edge function. This function looks for places in the image where the
intensity changes rapidly, using one of these two criteria:
• Places where the first derivative of the intensity is larger in magnitude than some threshold
• Places where the second derivative of the intensity has a zero crossing
edge provides several derivative estimators, each of which implements one of these definitions. For
some of these estimators, you can specify whether the operation should be sensitive to horizontal
edges, vertical edges, or both. edge returns a binary image containing 1's where edges are found and
0's elsewhere.
The most powerful edge-detection method that edge provides is the Canny method. The Canny
method differs from the other edge-detection methods in that it uses two different thresholds (to
detect strong and weak edges), and includes the weak edges in the output only if they are connected
to strong edges. This method is therefore less likely than the others to be affected by noise, and more
likely to detect true weak edges.
I = imread('coins.png');
imshow(I)
11-25
11 Analyzing and Enhancing Images
Apply both the Sobel and Canny edge detectors to the image and display them for comparison.
BW1 = edge(I,'sobel');
BW2 = edge(I,'canny');
figure;
imshowpair(BW1,BW2,'montage')
title('Sobel Filter Canny Filter');
11-26
Boundary Tracing in Images
• bwtraceboundary
• bwboundaries
The bwtraceboundary function returns the row and column coordinates of all the pixels on the
border of an object in an image. You must specify the location of a border pixel on the object as the
starting point for the trace.
The bwboundaries function returns the row and column coordinates of border pixels of all the
objects in an image.
For both functions, the nonzero pixels in the binary image belong to an object, and pixels with the
value 0 (zero) constitute the background.
I = imread('coins.png');
imshow(I)
Convert the image to a binary image. bwtraceboundary and bwboundaries only work with binary
images.
11-27
11 Analyzing and Enhancing Images
BW = im2bw(I);
imshow(BW)
Determine the row and column coordinates of a pixel on the border of the object you want to trace.
bwboundary uses this point as the starting location for the boundary tracing.
dim = size(BW)
dim = 1×2
246 300
col = round(dim(2)/2)-90;
row = min(find(BW(:,col)))
row = 27
Call bwtraceboundary to trace the boundary from the specified point. As required arguments, you
must specify a binary image, the row and column coordinates of the starting point, and the direction
of the first step. The example specifies north ( 'N' ).
Display the original grayscale image and use the coordinates returned by bwtraceboundary to plot
the border on the image.
imshow(I)
hold on;
plot(boundary(:,2),boundary(:,1),'g','LineWidth',3);
11-28
Boundary Tracing in Images
To trace the boundaries of all the coins in the image, use the bwboundaries function. By default,
bwboundaries finds the boundaries of all objects in an image, including objects inside other objects.
In the binary image used in this example, some of the coins contain black areas that bwboundaries
interprets as separate objects. To ensure that bwboundaries only traces the coins, use imfill to fill
the area inside each coin. bwboundaries returns a cell array, where each cell contains the row/
column coordinates for an object in the image.
BW_filled = imfill(BW,'holes');
boundaries = bwboundaries(BW_filled);
Plot the borders of all the coins on the original grayscale image using the coordinates returned by
bwboundaries .
for k=1:10
b = boundaries{k};
plot(b(:,2),b(:,1),'g','LineWidth',3);
end
11-29
11 Analyzing and Enhancing Images
For example, if an object contains a hole and you select a pixel on a thin part of the object as the
starting pixel, you can trace the outside border of the object or the inside border of the hole,
depending on the direction you choose for the first step. For filled objects, the direction you select for
the first step parameter is not as important.
To illustrate, this figure shows the pixels traced when the starting pixel is on a thin part of the object
and the first step is set to north and south. The connectivity is set to 8 (the default).
11-30
Boundary Tracing in Images
11-31
11 Analyzing and Enhancing Images
Quadtree Decomposition
Quadtree decomposition is an analysis technique that involves subdividing an image into blocks that
are more homogeneous than the image itself. This technique reveals information about the structure
of the image. It is also useful as the first step in adaptive compression algorithms.
You can perform quadtree decomposition using the qtdecomp function. This function works by
dividing a square image into four equal-sized square blocks, and then testing each block to see if it
meets some criterion of homogeneity (for example, if all the pixels in the block are within a specific
dynamic range). If a block meets the criterion, it is not divided any further. If it does not meet the
criterion, it is subdivided again into four blocks, and the test criterion is applied to those blocks. This
process is repeated iteratively until each block meets the criterion. The result might have blocks of
several different sizes. Blocks can be as small as 1-by-1, unless you specify otherwise.
qtdecomp returns the quadtree decomposition as a sparse matrix, the same size as I. The nonzero
elements represent the upper left corners of the blocks. The value of each nonzero element indicates
the block size.
I = imread('liftingbody.png');
Perform the quadtree decomposition by calling the qtdecomp function, specifying as arguments the
image and the test criteria used to determine the homogeneity of each block in the decomposition.
For example, the criterion might be a threshold calculation such as max(block(:)) -
min(block(:)) >= 0.27. You can also supply qtdecomp with a function (rather than a threshold
value) for deciding whether to split blocks. For example, you might base the decision on the variance
of the block.
S = qtdecomp(I,0.27);
View a block representation of the quadtree decomposition. Each black square represents a
homogeneous block, and the white lines represent the boundaries between blocks. Notice how the
blocks are smaller in areas corresponding to large changes in intensity in the image.
blocks = repmat(uint8(0),size(S));
blocks(end,1:end) = 1;
blocks(1:end,end) = 1;
11-32
Quadtree Decomposition
11-33
11 Analyzing and Enhancing Images
11-34
Detect and Measure Circular Objects in an Image
Read and display an image of round plastic chips of various colors. Besides having plenty of circles to
detect, there are a few interesting things going on in this image from a circle detection point-of-view:
1 There are chips of different colors, which have different contrasts with respect to the
background. On one end, the blue and red ones have strong contrast on this background. On the
other end, some of the yellow chips do not contrast well with the background.
2 Notice how some chips are on top of each other and some others that are close together and
almost touching each other. Overlapping object boundaries and object occlusion are usually
challenging scenarios for object detection.
rgb = imread('coloredChips.png');
imshow(rgb)
11-35
11 Analyzing and Enhancing Images
Find the appropriate radius range of the circles using the drawline function. Draw a line over the
approximate diameter of a chip.
d = drawline;
The length of the line ROI is the diameter of the chip. Typical chips have diameters in the range 40 to
50 pixels.
pos = d.Position;
diffPos = diff(pos);
diameter = hypot(diffPos(1),diffPos(2))
diameter = 45.3448
The imfindcircles function searches for circles with a range of radii. Search for circles with radii
in the range of 20 to 25 pixels. Before that, it is a good practice to ask whether the objects are
brighter or darker than the background. To answer that question, look at the grayscale version of this
image.
gray_image = rgb2gray(rgb);
imshow(gray_image)
11-36
Detect and Measure Circular Objects in an Image
The background is quite bright and most of the chips are darker than the background. But, by
default, imfindcircles finds circular objects that are brighter than the background. So, set the
parameter 'ObjectPolarity' to 'dark' in imfindcircles to search for dark circles.
[centers,radii] = imfindcircles(rgb,[20 25],'ObjectPolarity','dark')
centers =
[]
radii =
[]
Note that the outputs centers and radii are empty, which means that no circles were found. This
happens frequently because imfindcircles is a circle detector, and similar to most detectors,
imfindcircles has an internal detection threshold that determines its sensitivity. In simple terms it
means that the detector's confidence in a certain (circle) detection has to be greater than a certain
level before it is considered a valid detection. imfindcircles has a parameter 'Sensitivity' which
can be used to control this internal threshold, and consequently, the sensitivity of the algorithm. A
higher 'Sensitivity' value sets the detection threshold lower and leads to detecting more circles. This
is similar to the sensitivity control on the motion detectors used in home security systems.
11-37
11 Analyzing and Enhancing Images
Coming back to the chip image, it is possible that at the default sensitivity level all the circles are
lower than the internal threshold, which is why no circles were detected. By default, 'Sensitivity',
which is a number between 0 and 1, is set to 0.85. Increase 'Sensitivity' to 0.9.
centers = 8×2
146.1895 198.5824
328.8132 135.5883
130.3134 43.8039
175.2698 297.0583
312.2831 192.3709
327.1316 297.0077
243.9893 166.4538
271.5873 280.8920
radii = 8×1
23.1604
22.5710
22.9576
23.7356
22.9551
22.9995
22.9055
23.0298
This time imfindcircles found some circles - eight to be precise. centers contains the locations
of circle centers and radii contains the estimated radii of those circles.
The function viscircles can be used to draw circles on the image. Output variables centers and
radii from imfindcircles can be passed directly to viscircles.
imshow(rgb)
h = viscircles(centers,radii);
11-38
Detect and Measure Circular Objects in an Image
The circle centers seem correctly positioned and their corresponding radii seem to match well to the
actual chips. But still quite a few chips were missed. Try increasing the 'Sensitivity' even more, to
0.92.
length(centers)
ans = 16
So increasing 'Sensitivity' gets us even more circles. Plot these circles on the image again.
11-39
11 Analyzing and Enhancing Images
This result looks better. imfindcircles has two different methods for finding circles. So far the
default method, called the phase coding method, was used for detecting circles. There's another
method, popularly called the two-stage method, that is available in imfindcircles. Use the two-
stage method and show the results.
delete(h)
h = viscircles(centers,radii);
11-40
Detect and Measure Circular Objects in an Image
The two-stage method is detecting more circles, at the Sensitivity of 0.92. In general, these two
method are complementary in that have they have different strengths. The Phase coding method is
typically faster and slightly more robust to noise than the two-stage method. But it may also need
higher 'Sensitivity' levels to get the same number of detections as the two-stage method. For
example, the phase coding method also finds the same chips if the 'Sensitivity' level is raised higher,
say to 0.95.
delete(h)
viscircles(centers,radii);
11-41
11 Analyzing and Enhancing Images
Note that both the methods in imfindcircles find the centers and radii of the partially visible
(occluded) chips accurately.
Looking at the last result, it is curious that imfindcircles does not find the yellow chips in the
image. The yellow chips do not have strong contrast with the background. In fact they seem to have
very similar intensities as the background. Is it possible that the yellow chips are not really 'darker'
than the background as was assumed? To confirm, show the grayscale version of this image again.
imshow(gray_image)
11-42
Detect and Measure Circular Objects in an Image
The yellow chips are almost the same intensity, maybe even brighter, as compared to the background.
Therefore, to detect the yellow chips, change 'ObjectPolarity' to 'bright'.
Draw the bright circles in a different color, by changing the 'Color' parameter in viscircles.
imshow(rgb)
11-43
11 Analyzing and Enhancing Images
Note that three of the missing yellow chips were found, but one yellow chip is still missing. These
yellow chips are hard to find because they don't stand out as well as others on this background.
There is another parameter in imfindcircles which may be useful here, namely 'EdgeThreshold'.
To find circles, imfindcircles uses only the edge pixels in the image. These edge pixels are
essentially pixels with high gradient value. The 'EdgeThreshold' parameter controls how high the
gradient value at a pixel has to be before it is considered an edge pixel and included in computation.
A high value (closer to 1) for this parameter will allow only the strong edges (higher gradient values)
to be included, whereas a low value (closer to 0) is more permissive and includes even the weaker
edges (lower gradient values) in computation. In case of the missing yellow chip, since the contrast is
low, some of the boundary pixels (on the circumference of the chip) are expected to have low gradient
values. Therefore, lower the 'EdgeThreshold' parameter to ensure that the most of the edge pixels for
the yellow chip are included in computation.
delete(hBright)
hBright = viscircles(centersBright, radiiBright,'Color','b');
11-44
Detect and Measure Circular Objects in an Image
Now imfindcircles finds all of the yellow ones, and a green one too. Draw these chips in blue,
together with the other chips that were found earlier (with 'ObjectPolarity' set to 'dark'), in red.
h = viscircles(centers,radii);
11-45
11 Analyzing and Enhancing Images
All the circles are detected. A final word - it should be noted that changing the parameters to be more
aggressive in detection may find more circles, but it also increases the likelihood of detecting false
circles. There is a trade-off between the number of true circles that can be found (detection rate) and
the number of false circles that are found with them (false alarm rate).
See Also
imfindcircles | viscircles
Related Examples
• “Identifying Round Objects” on page 11-47
• “Measuring the Radius of a Roll of Tape” on page 11-61
11-46
Identifying Round Objects
Read in pills_etc.png.
RGB = imread('pillsetc.png');
imshow(RGB)
Convert the image to black and white in order to prepare for boundary tracing using bwboundaries.
I = rgb2gray(RGB);
bw = imbinarize(I);
imshow(bw)
11-47
11 Analyzing and Enhancing Images
Using morphology functions, remove pixels which do not belong to the objects of interest.
bw = bwareaopen(bw,30);
imshow(bw)
11-48
Identifying Round Objects
se = strel('disk',2);
bw = imclose(bw,se);
imshow(bw)
11-49
11 Analyzing and Enhancing Images
Fill any holes, so that regionprops can be used to estimate the area enclosed by each of the
boundaries
bw = imfill(bw,'holes');
imshow(bw)
11-50
Identifying Round Objects
Concentrate only on the exterior boundaries. Option 'noholes' will accelerate the processing by
preventing bwboundaries from searching for inner contours.
[B,L] = bwboundaries(bw,'noholes');
imshow(label2rgb(L,@jet,[.5 .5 .5]))
hold on
for k = 1:length(B)
boundary = B{k};
plot(boundary(:,2),boundary(:,1),'w','LineWidth',2)
end
11-51
11 Analyzing and Enhancing Images
Estimate each object's area and perimeter. Use these results to form a simple metric indicating the
roundness of an object:
4π * area
metric = 2
perimeter
This metric is equal to 1 only for a circle and it is less than one for any other shape. The
discrimination process can be controlled by setting an appropriate threshold. In this example use a
threshold of 0.94 so that only the pills will be classified as round.
Use regionprops to obtain estimates of the area for all of the objects. Notice that the label matrix
returned by bwboundaries can be reused by regionprops.
stats = regionprops(L,'Area','Centroid');
threshold = 0.94;
11-52
Identifying Round Objects
boundary = B{k};
text(boundary(1,2)-35,boundary(1,1)+13,metric_string,'Color','y',...
'FontSize',14,'FontWeight','bold')
end
11-53
11 Analyzing and Enhancing Images
See Also
bwareaopen | bwboundaries | imbinarize | imclose | imfill | label2rgb | regionprops |
strel
Related Examples
• “Detect and Measure Circular Objects in an Image” on page 11-35
• “Measuring the Radius of a Roll of Tape” on page 11-61
11-54
Measuring Angle of Intersection
Read in gantrycrane.png and draw arrows pointing to two beams of interest. It is an image of a
gantry crane used to assemble a bridge.
RGB = imread('gantrycrane.png');
imshow(RGB);
Crop the image to obtain only the beams of the gantry crane chosen earlier. This step will make it
easier to extract the edges of the two metal beams.
11-55
11 Analyzing and Enhancing Images
start_col = 208;
imshow(cropRGB)
% Store (X,Y) offsets for later use; subtract 1 so that each offset will
% correspond to the last pixel before the region of interest
offsetX = start_col-1;
offsetY = start_row-1;
Convert the image to black and white for subsequent extraction of the edge coordinates using
bwtraceboundary routine.
I = rgb2gray(cropRGB);
BW = imbinarize(I);
BW = ~BW; % complement the image (objects of interest must be white)
imshow(BW)
11-56
Measuring Angle of Intersection
The bwtraceboundary routine requires that you specify a single point on a boundary. This point is
used as the starting location for the boundary tracing process.
To extract the edge of the lower beam, pick a column in the image and inspect it until a transition
from a background pixel to the object pixel occurs. Store this location for later use in
bwtraceboundary routine. Repeat this procedure for the other beam, but this time tracing
horizontally.
dim = size(BW);
% horizontal beam
col1 = 4;
row1 = find(BW(:,col1), 1);
% angled beam
row2 = 12;
col2 = find(BW(row2,:), 1);
The bwtraceboundary routine is used to extract (X, Y) locations of the boundary points. In order to
maximize the accuracy of the angle and point of intersection calculations, it is important to extract as
many points belonging to the beam edges as possible. You should determine the number of points
experimentally. Since the initial point for the horizontal bar was obtained by scanning from north to
south, it is safest to set the initial search step to point towards the outside of the object, i.e. 'North'.
11-57
11 Analyzing and Enhancing Images
Although (X,Y) coordinates pairs were obtained in the previous step, not all of the points lie exactly
on a line. Which ones should be used to compute the angle and point of intersection? Assuming that
all of the acquired points are equally important, fit lines to the boundary pixel locations.
The equation for a line is y = [x 1]*[a; b]. You can solve for parameters 'a' and 'b' in the least-squares
sense by using polyfit.
angle = 129.4971
Solve the system of two equations in order to obtain (X,Y) coordinates of the intersection point.
11-58
Measuring Angle of Intersection
intersection = 2×1
143.0917
295.7494
inter_x = intersection(2);
inter_y = intersection(1);
See Also
bwboundaries | bwtraceboundary | imbinarize | polyfit
11-59
11 Analyzing and Enhancing Images
Related Examples
• “Detect Lines Using the Radon Transform” on page 9-27
More About
• “Hough Transform” on page 9-16
• “Radon Transform” on page 9-21
11-60
Measuring the Radius of a Roll of Tape
Read in tape.png.
RGB = imread('tape.png');
imshow(RGB);
Find the center and the radius of the circle in the image using imfindcircles.
Rmin = 60;
Rmax = 100;
[center, radius] = imfindcircles(RGB,[Rmin Rmax],'Sensitivity',0.9)
11-61
11 Analyzing and Enhancing Images
center = 1×2
236.9291 172.4747
radius = 79.5305
delete(hTxt);
message = sprintf('The estimated radius is %2.1f pixels', radius);
text(15,15,message,'Color','y','FontWeight','bold');
See Also
imfindcircles | viscircles
11-62
Measuring the Radius of a Roll of Tape
Related Examples
• “Identifying Round Objects” on page 11-47
• “Detect and Measure Circular Objects in an Image” on page 11-35
11-63
11 Analyzing and Enhancing Images
Function Description
rangefilt Calculates the local range of pixel intensities of an image.
stdfilt Calculates the local standard deviation of an image.
entropyfilt Calculates the local entropy of a grayscale image. Entropy is a
statistical measure of randomness.
The functions all operate in a similar way: they define a neighborhood around the pixel of interest,
calculate the statistic for that neighborhood, and use that value as the value of the pixel of interest in
the output image.
This example shows how the rangefilt function operates on a simple array.
A = [ 1 2 3 4 5; 6 7 8 9 10; 11 12 13 14 15; 16 17 18 19 20 ]
A =
1 2 3 4 5
6 7 8 9 10
11 12 13 14 15
16 17 18 19 20
B = rangefilt(A)
B =
6 7 7 7 6
11 12 12 12 11
11 12 12 12 11
6 7 7 7 6
The following figure shows how the value of element B(2,4) was calculated from A(2,4). By
default, the rangefilt function uses a 3-by-3 neighborhood but you can specify neighborhoods of
different shapes and sizes.
11-64
Calculate Statistical Measures of Texture
The stdfilt and entropyfilt functions operate similarly, defining a neighborhood around the
pixel of interest and calculating the statistic for the neighborhood to determine the pixel value in the
output image. The stdfilt function calculates the standard deviation of all the values in the
neighborhood.
The entropyfilt function calculates the entropy of the neighborhood and assigns that value to the
output pixel. By default, the entropyfilt function defines a 9-by-9 neighborhood around the pixel of
interest. To calculate the entropy of an entire image, use the entropy function.
See Also
entropyfilt | rangefilt | stdfilt
More About
• “Texture Segmentation Using Texture Filters” on page 13-7
11-65
11 Analyzing and Enhancing Images
After you create the GLCMs using graycomatrix, you can derive several statistics from them using
graycoprops. These statistics provide information about the texture of an image. The following
table lists the statistics.
Statistic Description
Contrast Measures the local variations in the gray-level co-occurrence matrix.
Correlation Measures the joint probability occurrence of the specified pixel pairs.
Energy Provides the sum of squared elements in the GLCM. Also known as uniformity
or the angular second moment.
Homogeneity Measures the closeness of the distribution of elements in the GLCM to the
GLCM diagonal.
See Also
Related Examples
• “Derive Statistics from GLCM and Plot Correlation” on page 11-69
More About
• “Create a Gray-Level Co-Occurrence Matrix” on page 11-67
11-66
Create a Gray-Level Co-Occurrence Matrix
The number of gray levels in the image determines the size of the GLCM. By default, graycomatrix
uses scaling to reduce the number of intensity values in an image to eight, but you can use the
NumLevels and the GrayLimits parameters to control this scaling of gray levels. See the
graycomatrix reference page for more information.
The gray-level co-occurrence matrix can reveal certain properties about the spatial distribution of the
gray levels in the texture image. For example, if most of the entries in the GLCM are concentrated
along the diagonal, the texture is coarse with respect to the specified offset. You can also derive
several statistical measures from the GLCM. See “Derive Statistics from GLCM and Plot Correlation”
on page 11-69 for more information.
To illustrate, the following figure shows how graycomatrix calculates the first three values in a
GLCM. In the output GLCM, element (1,1) contains the value 1 because there is only one instance in
the input image where two horizontally adjacent pixels have the values 1 and 1, respectively.
glcm(1,2) contains the value 2 because there are two instances where two horizontally adjacent
pixels have the values 1 and 2. Element (1,3) in the GLCM has the value 0 because there are no
instances of two horizontally adjacent pixels with the values 1 and 3. graycomatrix continues
processing the input image, scanning the image for other pixel pairs (i,j) and recording the sums in
the corresponding elements of the GLCM.
11-67
11 Analyzing and Enhancing Images
To create multiple GLCMs, specify an array of offsets to the graycomatrix function. These offsets
define pixel relationships of varying direction and distance. For example, you can define an array of
offsets that specify four directions (horizontal, vertical, and two diagonals) and four distances. In this
case, the input image is represented by 16 GLCMs. When you calculate statistics from these GLCMs,
you can take the average.
You specify these offsets as a p-by-2 array of integers. Each row in the array is a two-element vector,
[row_offset, col_offset], that specifies one offset. row_offset is the number of rows
between the pixel of interest and its neighbor. col_offset is the number of columns between the
pixel of interest and its neighbor. This example creates an offset that specifies four directions and
four distances for each direction. For more information about specifying offsets, see the
graycomatrix reference page.
offsets = [ 0 1; 0 2; 0 3; 0 4;...
-1 1; -2 2; -3 3; -4 4;...
-1 0; -2 0; -3 0; -4 0;...
-1 -1; -2 -2; -3 -3; -4 -4];
The figure illustrates the spatial relationships of pixels that are defined by this array of offsets, where
D represents the distance from the pixel of interest.
11-68
Derive Statistics from GLCM and Plot Correlation
Read an image into the workspace and display it. The example converts the truecolor image to a
grayscale image and then, for this example, rotates it 90 degrees.
circuitBoard = rot90(rgb2gray(imread('board.tif')));
imshow(circuitBoard)
Define offsets of varying direction and distance. Because the image contains objects of a variety of
shapes and sizes that are arranged in horizontal and vertical directions, the example specifies a set of
horizontal offsets that only vary in distance.
Create the GLCMs. Call the graycomatrix function specifying the offsets.
glcms = graycomatrix(circuitBoard,'Offset',offsets0);
Derive statistics from the GLCMs using the graycoprops function. The example calculates the
contrast and correlation.
figure, plot([stats.Correlation]);
title('Texture Correlation as a function of offset');
xlabel('Horizontal Offset')
ylabel('Correlation')
11-69
11 Analyzing and Enhancing Images
The plot contains peaks at offsets 7, 15, 23, and 30. If you examine the input image closely, you can
see that certain vertical elements in the image have a periodic pattern that repeats every seven
pixels.
11-70
Adjust Image Intensity Values to Specified Range
I = imread('pout.tif');
J = imadjust(I);
Display the original image and the adjusted image, side-by-side. Note the increased contrast in the
adjusted image.
imshowpair(I,J,'montage')
Plot the histogram of the adjust image. Note that the histogram of the adjusted image uses values
across the whole range.
figure
subplot(1,2,1)
imhist(I,64)
subplot(1,2,2)
imhist(J,64)
11-71
11 Analyzing and Enhancing Images
11-72
Gamma Correction
Gamma Correction
imadjust maps low to bottom, and high to top. By default, the values between low and high are
mapped linearly to values between bottom and top. For example, the value halfway between low
and high corresponds to the value halfway between bottom and top.
imadjust can accept an additional argument that specifies the gamma correction factor. Depending
on the value of gamma, the mapping between values in the input and output images might be
nonlinear. For example, the value halfway between low and high might map to a value either greater
than or less than the value halfway between bottom and top.
Gamma can be any value between 0 and infinity. If gamma is 1 (the default), the mapping is linear. If
gamma is less than 1, the mapping is weighted toward higher (brighter) output values. If gamma is
greater than 1, the mapping is weighted toward lower (darker) output values.
The figure illustrates this relationship. The three transformation curves show how values are mapped
when gamma is less than, equal to, and greater than 1. (In each graph, the x-axis represents the
intensity values in the input image, and the y-axis represents the intensity values in the output
image.)
Read an image into the workspace. This example reads an indexed image and then converts it into a
grayscale image.
[X,map] = imread('forest.tif');
I = ind2gray(X,map);
Adjust the contrast, specifying a gamma value of less than 1 (0.5). Notice that in the call to
imadjust, the example specifies the data ranges of the input and output images as empty matrices.
When you specify an empty matrix, imadjust uses the default range of [0,1]. In the example, both
ranges are left empty. This means that gamma correction is applied without any other adjustment of
the data.
J = imadjust(I,[],[],0.5);
11-73
11 Analyzing and Enhancing Images
imshowpair(I,J,'montage')
11-74
Contrast Enhancement Techniques
Using the default settings, compare the effectiveness of the following three techniques:
• imadjust increases the contrast of the image by mapping the values of the input intensity image
to new values such that, by default, 1% of the data is saturated at low and high intensities of the
input data.
• histeq performs histogram equalization. It enhances the contrast of images by transforming the
values in an intensity image so that the histogram of the output image approximately matches a
specified histogram (uniform distribution by default).
• adapthisteq performs contrast-limited adaptive histogram equalization. Unlike histeq, it
operates on small data regions (tiles) rather than the entire image. Each tile's contrast is
enhanced so that the histogram of each output region approximately matches the specified
histogram (uniform distribution by default). The contrast enhancement can be limited in order to
avoid amplifying the noise which might be present in the image.
Read a grayscale image into the workspace. Enhance the image using the three contrast adjustment
techniques.
pout = imread('pout.tif');
pout_imadjust = imadjust(pout);
pout_histeq = histeq(pout);
pout_adapthisteq = adapthisteq(pout);
Display the original image and the three contrast adjusted images as a montage.
montage({pout,pout_imadjust,pout_histeq,pout_adapthisteq},'Size',[1 4])
title("Original Image and Enhanced Images using imadjust, histeq, and adapthisteq")
Read a second grayscale image into the workspace and enhance the image using the three contrast
adjustment techniques.
11-75
11 Analyzing and Enhancing Images
tire = imread('tire.tif');
tire_imadjust = imadjust(tire);
tire_histeq = histeq(tire);
tire_adapthisteq = adapthisteq(tire);
Display the original image and the three contrast adjusted images as a montage.
montage({tire,tire_imadjust,tire_histeq,tire_adapthisteq},'Size',[1 4])
title("Original Image and Enhanced Images using imadjust, histeq, and adapthisteq")
Notice that imadjust had little effect on the image of the tire, but it caused a drastic change in the
case of pout. Plotting the histograms of pout.tif and tire.tif reveals that most of the pixels in
the first image are concentrated in the center of the histogram, while in the case of tire.tif, the
values are already spread out between the minimum of 0 and maximum of 255 thus preventing
imadjust from being effective in adjusting the contrast of the image.
figure
subplot(1,2,1)
imhist(pout)
title('Histogram of pout.tif')
subplot(1,2,2)
imhist(tire)
title('Histogram of tire.tif');
11-76
Contrast Enhancement Techniques
Histogram equalization, on the other hand, substantially changes both images. Many of the previously
hidden features are exposed, especially the debris particles on the tire. Unfortunately, at the same
time, the enhancement over-saturates several areas of both images. Notice how the center of the tire,
part of the child's face, and the jacket became washed out.
Concentrating on the image of the tire, it would be preferable for the center of the wheel to stay at
about the same brightness while enhancing the contrast in other areas of the image. In order for that
to happen, a different transformation would have to be applied to different portions of the image. The
Contrast-Limited Adaptive Histogram Equalization technique, implemented in adapthisteq, can
accomplish this. The algorithm analyzes portions of the image and computes the appropriate
transformations. A limit on the level of contrast enhancement can also be set, thus preventing the
over-saturation caused by the basic histogram equalization method of histeq. This is the most
sophisticated technique in this example.
Contrast enhancement of color images is typically done by converting the image to a color space that
has image luminosity as one of its components, such as the L*a*b* color space. Contrast adjustment
is performed on the luminosity layer 'L*' only, and then the image is converted back to the RGB color
space. Manipulating luminosity affects the intensity of the pixels, while preserving the original colors.
Read an image into the workspace. The 'shadow.tif' image is an indexed image, so convert the
image to a truecolor (RGB) image. Then, convert the image from the RGB color space to the L*a*b*
color space.
11-77
11 Analyzing and Enhancing Images
[X,map] = imread('shadow.tif');
shadow = ind2rgb(X,map);
shadow_lab = rgb2lab(shadow);
The values of luminosity span a range from 0 to 100. Scale the values to the range [0 1], which is the
expected range of images with data type double.
max_luminosity = 100;
L = shadow_lab(:,:,1)/max_luminosity;
Perform the three types of contrast adjustment on the luminosity channel, and keep the a* and b*
channels unchanged. Convert the images back to the RGB color space.
shadow_imadjust = shadow_lab;
shadow_imadjust(:,:,1) = imadjust(L)*max_luminosity;
shadow_imadjust = lab2rgb(shadow_imadjust);
shadow_histeq = shadow_lab;
shadow_histeq(:,:,1) = histeq(L)*max_luminosity;
shadow_histeq = lab2rgb(shadow_histeq);
shadow_adapthisteq = shadow_lab;
shadow_adapthisteq(:,:,1) = adapthisteq(L)*max_luminosity;
shadow_adapthisteq = lab2rgb(shadow_adapthisteq);
Display the original image and the three contrast adjusted images as a montage.
figure
montage({shadow,shadow_imadjust,shadow_histeq,shadow_adapthisteq},'Size',[1 4])
title("Original Image and Enhanced Images using imadjust, histeq, and adapthisteq")
See Also
adapthisteq | histeq | imadjust
More About
• “Adaptive Histogram Equalization” on page 11-84
• “Histogram Equalization” on page 11-81
11-78
Specify Contrast Adjustment Limits
Note You must specify the intensities as values between 0 and 1 regardless of the class of I. If I is
uint8, the values you supply are multiplied by 255 to determine the actual values to use; if I is
uint16, the values are multiplied by 65535. To learn about an alternative way to set these limits
automatically, see “Set Image Intensity Adjustment Limits Automatically” on page 11-80.
I = imread('cameraman.tif');
Adjust the contrast of the image, specifying the range of values used in the output image. In the
example below, the man's coat is too dark to reveal any detail. imadjust maps the range [0,51] in
the uint8 input image to [128,255] in the output image. This brightens the image considerably,
and also widens the dynamic range of the dark portions of the original image, making it much easier
to see the details in the coat. Note, however, that because all values above 51 in the original image
are mapped to 255 (white) in the adjusted image, the adjusted image appears washed out.
imshowpair(I,J,'montage')
11-79
11 Analyzing and Enhancing Images
This function calculates the histogram of the image and determines the adjustment limits
automatically. The stretchlim function returns these values as fractions in a vector that you can
pass as the [low_in high_in] argument to imadjust; for example:
I = imread('rice.png');
J = imadjust(I,stretchlim(I),[0 1]);
By default, stretchlim uses the intensity values that represent the bottom 1% (0.01) and the top
1% (0.99) of the range as the adjustment limits. By trimming the extremes at both ends of the
intensity range, stretchlim makes more room in the adjusted dynamic range for the remaining
intensities. But you can specify other range limits as an argument to stretchlim. See the
stretchlim reference page for more information.
11-80
Histogram Equalization
Histogram Equalization
You can adjust the intensity values of image pixels automatically using histogram equalization.
Histogram equalization involves transforming the intensity values so that the histogram of the output
image approximately matches a specified histogram. By default, the histogram equalization function,
histeq, tries to match a flat histogram with 64 bins, but you can specify a different histogram
instead.
I = imread('pout.tif');
figure
subplot(1,2,1)
imshow(I)
subplot(1,2,2)
imhist(I,64)
11-81
11 Analyzing and Enhancing Images
Adjust the contrast using histogram equalization. In this example, the histogram equalization
function, histeq, tries to match a flat histogram with 64 bins, which is the default behavior. You can
specify a different histogram instead.
J = histeq(I);
figure
subplot(1,2,1)
imshow(J)
subplot(1,2,2)
imhist(J,64)
I = imread('pout.tif');
11-82
Histogram Equalization
Adjust the contrast using histogram equalization, using the histeq function. Specify the gray scale
transformation return value, T, which is a vector that maps graylevels in the intensity image I to gray
levels in J.
[J,T] = histeq(I);
Plot the transformation curve. Notice how this curve reflects the histograms in the previous figure,
with the input values mostly between 0.3 and 0.6, while the output values are distributed evenly
between 0 and 1.
figure
plot((0:255)/255,T);
11-83
11 Analyzing and Enhancing Images
To avoid amplifying any noise that might be present in the image, you can use adapthisteq optional
parameters to limit the contrast, especially in homogeneous areas.
11-84
Adaptive Histogram Equalization
J = adapthisteq(I);
figure
subplot(1,2,1)
imshow(J)
subplot(1,2,2)
imhist(J,64)
See Also
More About
• “Histogram Equalization” on page 11-81
11-85
11 Analyzing and Enhancing Images
The number of color bands, NBANDS, in the image is usually three. But you can apply decorrelation
stretching regardless of the number of color bands.
The original color values of the image are mapped to a new set of color values with a wider range.
The color intensities of each pixel are transformed into the color eigenspace of the NBANDS-by-
NBANDS covariance or correlation matrix, stretched to equalize the band variances, then
transformed back to the original color bands.
To define the band-wise statistics, you can use the entire original image or, with the subset option,
any selected subset of it.
Read an image from the library of images available in the imdata folder. This example uses a
LANDSAT image of the Little Colorado River. The image has seven bands, but just read in the three
visible colors.
B = decorrstretch(A);
Display the original image and the processed image. Compare the two images. The original has a
strong violet (red-bluish) tint, while the transformed image has a somewhat expanded color range.
imshow(A)
title('Little Colorado River Image')
11-86
Enhance Color Separation Using Decorrelation Stretching
imshow(B)
title('Little Colorado River Image After Decorrelation Stretch')
11-87
11 Analyzing and Enhancing Images
[rA,gA,bA] = imsplit(A);
Separate the three color channels of the image after decorrelation stretching.
[rB,gB,bB] = imsplit(B);
Display the color scatterplot of the original image. Then display the color scatterplot of the image
after decorrelation stretching.
figure
plot3(rA(:),gA(:),bA(:),'.')
11-88
Enhance Color Separation Using Decorrelation Stretching
grid on
xlabel('Red (Band 3)')
ylabel('Green (Band 2)')
zlabel('Blue (Band 1)')
title('Color Scatterplot Before Decorrelation Stretch')
figure
plot3(rB(:),gB(:),bB(:),'.')
grid on
xlabel('Red (Band 3)')
ylabel('Green (Band 2)')
zlabel('Blue (Band 1)')
title('Color Scatterplot After Decorrelation Stretch')
11-89
11 Analyzing and Enhancing Images
See the stretchlim function reference page for more about calculating saturation limits.
Note You can apply a linear contrast stretch as a separate operation after performing a decorrelation
stretch, using stretchlim and imadjust. This alternative, however, often gives inferior results for
uint8 and uint16 images, because the pixel values must be clamped to [0 255] (or [0 65535]). The
Tol option in decorrstretch circumvents this limitation.
Apply decorrelation stretching, specifying the linear contrast stretch. Setting the value 'Tol' to 0.01
maps the transformed color range within each band to a normalized interval between 0.01 and 0.99,
saturating 2%.
11-90
Enhance Color Separation Using Decorrelation Stretching
C = decorrstretch(A,'Tol',0.01);
imshow(C)
title(['Little Colorado River After Decorrelation Stretch and ',...
'Linear Contrast Stretch'])
11-91
11 Analyzing and Enhancing Images
The LAN file, paris.lan, contains a 7-band 512-by-512 Landsat image. A 128-byte header is
followed by the pixel values, which are band interleaved by line (BIL) in order of increasing band
number. They are stored as unsigned 8-bit integers, in little-endian byte order.
Read bands 3, 2, and 1 from the LAN file using the MATLAB® function multibandread. These
bands cover the visible part of the spectrum. When they are mapped to the red, green, and blue
planes, respectively, of an RGB image, the result is a standard truecolor composite. The final input
argument to multibandread specifies which bands to read, and in which order, so that you can
create an RGB composite in a single step.
The truecolor composite has very little contrast and the colors are unbalanced.
imshow(truecolor)
title('Truecolor Composite (Un-Enhanced)')
text(size(truecolor,2),size(truecolor,1)+15,...
'Image courtesy of Space Imaging, LLC',...
'FontSize',7,'HorizontalAlignment','right')
11-92
Enhance Multispectral Color Composite Images
By viewing a histogram of the red band, for example, you can see that the data is concentrated within
a small part of the available dynamic range. This is one reason why the truecolor composite appears
dull.
imhist(truecolor(:,:,1))
title('Histogram of the Red Band (Band 3)')
11-93
11 Analyzing and Enhancing Images
Another reason for the dull appearance of the composite is that the visible bands are highly
correlated with each other. Two- and three-band scatterplots are an excellent way to gauge the
degree of correlation among spectral bands. You can make them easily just by using plot. The linear
trend of the red-green-blue scatterplot indicates that the visible bands are highly correlated. This
helps explain the monochromatic look of the un-enhanced truecolor composite.
r = truecolor(:,:,1);
g = truecolor(:,:,2);
b = truecolor(:,:,3);
plot3(r(:),g(:),b(:),'.')
grid('on')
xlabel('Red (Band 3)')
ylabel('Green (Band 2)')
11-94
Enhance Multispectral Color Composite Images
When you use imadjust to apply a linear contrast stretch to the truecolor composite image, the
surface features are easier to recognize.
stretched_truecolor = imadjust(truecolor,stretchlim(truecolor));
imshow(stretched_truecolor)
title('Truecolor Composite after Contrast Stretch')
11-95
11 Analyzing and Enhancing Images
A histogram of the red band after applying a contrast stretch shows that the data has been spread
over much more of the available dynamic range. Create a histogram of all red pixel values in the
image by using the imhist function.
imhist(stretched_truecolor(:,:,1))
title('Histogram of Red Band (Band 3) after Contrast Stretch')
11-96
Enhance Multispectral Color Composite Images
Another way to enhance the truecolor composite is to use a decorrelation stretch, which enhances
color separation across highly correlated channels. Use decorrstretch to perform the
decorrelation stretch. Specify the optional name-value pair 'Tol',0.1 to perform a linear contrast
stretch after the decorrelation stretch. Again, surface features have become much more clearly
visible, but in a different way. The spectral differences across the scene have been exaggerated. A
noticeable example is the area of green on the left edge, which appears black in the contrast-
stretched composite. This green area is the Bois de Boulogne, a large park on the western edge of
Paris.
decorrstretched_truecolor = decorrstretch(truecolor,'Tol',0.01);
imshow(decorrstretched_truecolor)
title('Truecolor Composite after Decorrelation Stretch')
11-97
11 Analyzing and Enhancing Images
As expected, a scatterplot following the decorrelation stretch shows a strong decrease in correlation.
r = decorrstretched_truecolor(:,:,1);
g = decorrstretched_truecolor(:,:,2);
b = decorrstretched_truecolor(:,:,3);
plot3(r(:),g(:),b(:),'.')
grid('on')
xlabel('Red (Band 3)')
ylabel('Green (Band 2)')
zlabel('Blue (Band 1)')
title('Scatterplot of the Visible Bands after Decorrelation Stretch')
11-98
Enhance Multispectral Color Composite Images
Just as with the visible bands, information from Landsat bands covering non-visible portions of the
spectrum can be viewed by constructing and enhancing RGB composite images. The near infrared
(NIR) band (Band 4) is important because of the high reflectance of chlorophyll in this part of the
spectrum. It is even more useful when combined with visible red and green (Bands 3 and 2,
respectively) to form a color infrared (CIR) composite image. Color infrared (CIR) composites are
commonly used to identify vegetation or assess its state of growth and/or health.
Construct a CIR composite by reading from the original LAN file and composing an RGB image that
maps bands 4, 3, and 2 to red, green, and blue, respectively.
11-99
11 Analyzing and Enhancing Images
Even though the near infrared (NIR) band (Band 4) is less correlated with the visible bands than the
visible bands are with each other, a decorrelation stretch makes many features easier to see. A
property of color infrared composites is that they look red in areas with a high vegetation
(chlorophyll) density. Notice that the Bois de Boulogne park is red in the CIR composite, which is
consistent with its green appearance in the decorrelation-stretched truecolor composite.
stretched_CIR = decorrstretch(CIR,'Tol',0.01);
imshow(stretched_CIR)
title('CIR after Decorrelation Stretch')
See Also
decorrstretch | imadjust | imhist | multibandread | plot3 | stretchlim
11-100
Enhance Multispectral Color Composite Images
Related Examples
• “Enhance Color Separation Using Decorrelation Stretching” on page 11-86
11-101
11 Analyzing and Enhancing Images
Using haze removal techniques to enhance low-light images comprises three steps:
A = imread('lowlight_11.jpg');
figure, imshow(A);
11-102
Low-Light Image Enhancement
Invert the image and notice how the low-light areas in the original image appear hazy.
AInv = imcomplement(A);
figure, imshow(AInv);
11-103
11 Analyzing and Enhancing Images
BInv = imreducehaze(AInv);
figure, imshow(BInv);
11-104
Low-Light Image Enhancement
B = imcomplement(BInv);
11-105
11 Analyzing and Enhancing Images
To get a better result, call imreducehaze on the inverted image again, this time specifying some
optional parameters.
11-106
Low-Light Image Enhancement
A = imread('lowlight_21.jpg');
AInv = imcomplement(A);
B = imcomplement(BInv);
11-107
11 Analyzing and Enhancing Images
Convert the input image from the RGB colorspace to the L*a*b* colorspace.
Lab = rgb2lab(A);
LEnh = imcomplement(imreducehaze(LInv,'ContrastEnhancement','none'));
Convert the image back to an RGB image and display the original and the enhanced image, side-by-
side.
AEnh = lab2rgb(LabEnh);
figure, montage({A, AEnh});
11-108
Low-Light Image Enhancement
Low-light images can have high noise levels. Enhancing low-light images can increase this noise level.
Denoising can be a useful post-processing step.
Use the imguidedfilter function to remove noise from the enhanced image.
B = imguidedfilter(BImp);
figure, montage({BImp, B});
11-109
11 Analyzing and Enhancing Images
A = imread('lowlight_21.jpg');
AInv = imcomplement(A);
T = imcomplement(TInv);
Display the original image next to the estimated illumination map in false color.
figure,
subplot(1,2,1);
imshow(A), title('Lowlight Image');
subplot(1,2,2);
imshow(T), title('Illumination Map');
colormap(gca, hot(256));
11-110
Low-Light Image Enhancement
Limitations
This method can lose some details or get over-enhanced because of poor adaptability of the dark
channel in low-light conditions.
References
Dong, Xuan, et al. "Fast efficient algorithm for enhancement of low lighting video." Multimedia and
Expo (ICME), 2011 IEEE International Conference on. IEEE, 2011.
References
[1] Dong, X., G. Wang, Y. Pang, W. Li, J. Wen, W. Meng, and Y. Lu. "Fast efficient algorithm for
enhancement of low lighting video." Proceedings of IEEE® International Conference on
Multimedia and Expo (ICME). 2011, pp. 1–6.
See Also
imcomplement | imguidedfilter | imreducehaze | lab2rgb | rgb2lab
11-111
11 Analyzing and Enhancing Images
I = imread('peppers.png');
I = im2double(I);
imshow(I)
Create a blurry copy of the image using a Gaussian filter with standard deviation of 2.
Iblurred = imgaussfilt(I,2);
imshow(Iblurred)
11-112
Increase Filter Strength Radially Outward
Create a weight image as a Gaussian filter of the same size of the image. To increase the portion of
the image that appears sharp, increase the value of filterStrength.
filterStrength = 50;
weights = fspecial('gaussian',[size(I,1) size(I,2)],filterStrength);
imshow(weights,[])
11-113
11 Analyzing and Enhancing Images
Normalize the weight image to the range [0, 1] by using the rescale function.
weights = rescale(weights);
Create a weighted blurred image that is a weighted sum of the original image and blurry image.
MATLAB automatically replicates the weight matrix for each of the R, G, and B color channels.
Display the result. The image is sharp in the center and becomes more blurry radially outwards. To
increase the portion of the image that appears sharp, increase the value of
imshow(IweightedBlurred)
11-114
Increase Filter Strength Radially Outward
Define the weighting function as the inverse of R, scaled to the range [0, 1].
R2 = rescale(R2);
weights = (1-R2);
imshow(weights)
11-115
11 Analyzing and Enhancing Images
Apply the weighting function to the image and display the result.
I2 = I.*weights;
imshow(I2)
11-116
Increase Filter Strength Radially Outward
See Also
fspecial | imgaussfilt
11-117
11 Analyzing and Enhancing Images
I = imread('cameraman.tif');
Filter the image with isotropic Gaussian smoothing kernels of increasing standard deviations.
Gaussian filters are generally isotropic, that is, they have the same standard deviation along both
dimensions. An image can be filtered by an isotropic Gaussian filter by specifying a scalar value for
sigma.
Iblur1 = imgaussfilt(I,2);
Iblur2 = imgaussfilt(I,4);
Iblur3 = imgaussfilt(I,8);
figure
imshow(I)
title('Original image')
figure
imshow(Iblur1)
title('Smoothed image, \sigma = 2')
11-118
Apply Gaussian Smoothing Filters to Images
figure
imshow(Iblur2)
title('Smoothed image, \sigma = 4')
figure
11-119
11 Analyzing and Enhancing Images
imshow(Iblur3)
title('Smoothed image, \sigma = 8')
Filter the image with anisotropic Gaussian smoothing kernels. imgaussfilt allows the Gaussian
kernel to have different standard deviations along row and column dimensions. These are called axis-
aligned anisotropic Gaussian filters. Specify a 2-element vector for sigma when using anisotropic
filters.
figure
imshow(IblurX1)
title('Smoothed image, \sigma_x = 4, \sigma_y = 1')
11-120
Apply Gaussian Smoothing Filters to Images
figure
imshow(IblurX2)
title('Smoothed image, \sigma_x = 8, \sigma_y = 1')
figure
11-121
11 Analyzing and Enhancing Images
imshow(IblurY1)
title('Smoothed image, \sigma_x = 1, \sigma_y = 4')
figure
imshow(IblurY2)
title('Smoothed image, \sigma_x = 1, \sigma_y = 8')
11-122
Apply Gaussian Smoothing Filters to Images
Suppress the horizontal bands visible in the sky region of the original image. Anisotropic Gaussian
filters can suppress horizontal or vertical features in an image. Extract a section of the sky region of
the image and use a Gaussian filter with higher standard deviation along the X axis (direction of
increasing columns).
I_sky = imadjust(I(20:50,10:70));
IblurX1_sky = imadjust(IblurX1(20:50,10:70));
figure
imshow(I_sky), title('Sky in original image')
figure
imshow(IblurX1_sky), title('Sky in filtered image')
11-123
11 Analyzing and Enhancing Images
Noise Removal
In this section...
“Remove Noise by Linear Filtering” on page 11-124
“Remove Noise Using an Averaging Filter and a Median Filter” on page 11-124
“Remove Noise By Adaptive Filtering” on page 11-127
Digital images are prone to various types of noise. Noise is the result of errors in the image
acquisition process that result in pixel values that do not reflect the true intensities of the real scene.
There are several ways that noise can be introduced into an image, depending on how the image is
created. For example:
• If the image is scanned from a photograph made on film, the film grain is a source of noise. Noise
can also be the result of damage to the film, or be introduced by the scanner itself.
• If the image is acquired directly in a digital format, the mechanism for gathering the data (such as
a CCD detector) can introduce noise.
• Electronic transmission of image data can introduce noise.
To simulate the effects of some of the problems listed above, the toolbox provides the imnoise
function, which you can use to add various types of noise to an image. The examples in this section
use this function.
See “What Is Image Filtering in the Spatial Domain?” on page 8-2 for more information about linear
filtering using imfilter.
Note: Median filtering is a specific case of order-statistic filtering, also known as rank filtering. For
information about order-statistic filtering, see the reference page for the ordfilt2 function.
I = imread('eight.tif');
figure
imshow(I)
11-124
Noise Removal
For this example, add salt and pepper noise to the image. This type of noise consists of random pixels
being set to black or white (the extremes of the data range).
J = imnoise(I,'salt & pepper',0.02);
figure
imshow(J)
Filter the noisy image, J, with an averaging filter and display the results. The example uses a 3-by-3
neighborhood.
11-125
11 Analyzing and Enhancing Images
Kaverage = filter2(fspecial('average',3),J)/255;
figure
imshow(Kaverage)
Now use a median filter to filter the noisy image, J. The example also uses a 3-by-3 neighborhood.
Display the two filtered images side-by-side for comparison. Notice that medfilt2 does a better job
of removing noise, with less blurring of edges of the coins.
Kmedian = medfilt2(J);
imshowpair(Kaverage,Kmedian,'montage')
11-126
Noise Removal
This approach often produces better results than linear filtering. The adaptive filter is more selective
than a comparable linear filter, preserving edges and other high-frequency parts of an image. In
addition, there are no design tasks; the wiener2 function handles all preliminary computations and
implements the filter for an input image. wiener2, however, does require more computation time
than linear filtering.
wiener2 works best when the noise is constant-power ("white") additive noise, such as Gaussian
noise. The example below applies wiener2 to an image of Saturn with added Gaussian noise.
RGB = imread('saturn.png');
I = im2gray(RGB);
J = imnoise(I,'gaussian',0,0.025);
Display the noisy image. Because the image is quite large, display only a portion of the image.
imshow(J(600:1000,1:600));
title('Portion of the Image with Added Gaussian Noise');
11-127
11 Analyzing and Enhancing Images
K = wiener2(J,[5 5]);
Display the processed image. Because the image is quite large, display only a portion of the image.
figure
imshow(K(600:1000,1:600));
title('Portion of the Image with Noise Removed by Wiener Filter');
11-128
Noise Removal
See Also
imbilatfilt | imfilter | imgaussfilt | imguidedfilter | locallapfilt | nlfilter
More About
• “What Is Image Filtering in the Spatial Domain?” on page 8-2
11-129
11 Analyzing and Enhancing Images
load mri;
D = squeeze(D);
A = ind2gray(D,map);
Calculate the 3-D superpixels. Form an output image where each pixel is set to the mean color of its
corresponding superpixel region.
[L,N] = superpixels3(A,34);
imSize = size(A);
imPlusBoundaries = zeros(imSize(1),imSize(2),3,imSize(3),'uint8');
for plane = 1:imSize(3)
BW = boundarymask(L(:, :, plane));
% Create an RGB representation of this plane with boundary shown
% in cyan.
imPlusBoundaries(:, :, :, plane) = imoverlay(A(:, :, plane), BW, 'cyan');
end
implay(imPlusBoundaries,5)
11-130
Compute 3-D Superpixels of Input Volumetric Intensity Image
Set the color of each pixel in output image to the mean intensity of the superpixel region. Show the
mean image next to the original. If you run this code, you can use implay to view each slice of the
MRI data.
pixelIdxList = label2idx(L);
meanA = zeros(size(A),'like',D);
for superpixel = 1:N
memberPixelIdx = pixelIdxList{superpixel};
meanA(memberPixelIdx) = mean(A(memberPixelIdx));
end
implay([A meanA],5);
11-131
11 Analyzing and Enhancing Images
11-132
Image Quality Metrics
Efforts have been made to create objective measures of quality. For many applications, a valuable
quality metric correlates well with the subjective perception of quality by a human observer. Quality
metrics can also track unperceived errors as they propagate through an image processing pipeline,
and can be used to compare image processing algorithms.
If an image without distortion is available, you can use it as a reference to measure the quality of
other images. For example, when evaluating the quality of compressed images, an uncompressed
version of the image provides a useful reference. In these cases, you can use full-reference quality
metrics to directly compare the target image and the reference image.
If a reference image without distortion is not available. you can use a no-reference image quality
metric instead. These metrics compute quality scores based on expected image statistics.
Metric Description
immse Mean-squared error (MSE). MSE measures the average squared difference
between actual and ideal pixel values. This metric is simple to calculate but
might not align well with the human perception of quality.
psnr Peak signal-to-noise ratio (pSNR). pSNR is derived from the mean square error,
and indicates the ratio of the maximum pixel intensity to the power of the
distortion. Like MSE, the pSNR metric is simple to calculate but might not align
well with perceived quality.
ssim Structural similarity (SSIM) index. The SSIM metric combines local image
structure, luminance, and contrast into a single local quality score. In this
metric, structures are patterns of pixel intensities, especially among
neighboring pixels, after normalizing for luminance and contrast. Because the
human visual system is good at perceiving structure, the SSIM quality metric
agrees more closely with the subjective quality score.
multissim Multi-scale structural similarity (MS-SSIM) index. The MS-SSIM metric expands
on the SSIM index by combining luminance information at the highest
multissim3 resolution level with structure and contrast information at several downsampled
resolutions, or scales. The multiple scales account for variability in the
perception of image details caused by factors such as viewing distance from the
image, distance from the scene to the sensor, and resolution of the image
acquisition sensor.
Because structural similarity is computed locally, ssim, multissim, and multissim3 can generate a
map of quality over the image.
11-133
11 Analyzing and Enhancing Images
Metric Description
brisque Blind/Referenceless Image Spatial Quality Evaluator (BRISQUE). A BRISQUE
model is trained on a database of images with known distortions, and BRISQUE
is limited to evaluating the quality of images with the same type of distortion.
BRISQUE is opinion-aware, which means subjective quality scores accompany
the training images.
niqe Natural Image Quality Evaluator (NIQE). Although a NIQE model is trained on a
database of pristine images, NIQE can measure the quality of images with
arbitrary distortion. NIQE is opinion-unaware, and does not use subjective
quality scores. The tradeoff is that the NIQE score of an image might not
correlate as well as the BRISQUE score with human perception of quality.
piqe Perception based Image Quality Evaluator (PIQE). The PIQE algorithm is
opinion-unaware and unsupervised, which means it does not require a trained
model. PIQE can measure the quality of images with arbitrary distortion and in
most cases performs similar to NIQE. PIQE estimates block-wise distortion and
measures the local variance of perceptibly distorted blocks to compute the
quality score.
The BRISQUE and the NIQE algorithms calculate the quality score of an image with computational
efficiency after the model is trained. PIQE is less computationally efficient, but it provides local
measures of quality in addition to a global quality score. All no-reference quality metrics usually
outperform full-reference metrics in terms of agreement with a subjective human quality score.
See Also
More About
• “Train and Use No-Reference Quality Assessment Model” on page 11-135
• “Obtain Local Structural Similarity Index” on page 11-145
• “Compare Image Quality at Various Compression Levels” on page 11-147
11-134
Train and Use No-Reference Quality Assessment Model
Both algorithms train a model using identical predictable statistical features, called natural scene
statistics (NSS). NSS are based on normalized luminance coefficients in the spatial domain, and are
modeled as a multidimensional Gaussian distribution. Distortions appear as perturbations to the
Gaussian distribution.
The algorithms differ in how they use the NSS features to train a model and compute a quality score.
NIQE Workflow
NIQE measures the quality of images with arbitrary distortion. A NIQE model is not trained using
subjective quality scores, but the tradeoff is that the NIQE score does not correlate as reliably as the
BRISQUE score with human perception of quality.
Note If the default NIQE model provides a sufficient quality score for your application, you do not
need to train a new model. You can skip to “Predict Image Quality Using a NIQE Model” on page 11-
135.
To train a NIQE model, pass a datastore of pristine image to the fitniqe function. The function
divides each image into blocks and computes the NSS for each block. The training process includes
only blocks with statistically significant features.
The returned model, niqeModel, stores the multivariate Gaussian mean and standard deviation
derived from the NSS features.
Use the niqe function to calculate an image quality score for an image with arbitrary distortion. The
niqe function extracts the NSS features from statistically significant blocks in the distorted image.
The function fits a multivariate Gaussian distribution to the image NSS features. The quality score is
the distance between the Gaussian distributions.
11-135
11 Analyzing and Enhancing Images
11-136
Train and Use No-Reference Quality Assessment Model
BRISQUE Workflow
BRISQUE is limited to measuring the quality of images with the same type of distortion as the model.
A BRISQUE model is trained using subjective opinion scores, with the advantage that the BRISQUE
score correlates well with human perception of quality.
Note If the default BRISQUE model provides a sufficient quality score for your application, you do
not need to train a new model. You can skip to “Predict Image Quality Using a BRISQUE Model” on
page 11-137.
• A datastore containing images with known distortions and pristine copies of those images
• A subjective opinion score for each distorted image in the database
The function computes the NSS features for each image, without dividing the image into blocks. The
function uses the NSS features and corresponding opinion scores to train a support vector machine
regression model. The returned model, brisqueModel, stores the parameters of the support vector
regressor.
Use the brisque function to calculate an image quality score for an image with the same type of
distortions as the model. The brisque function extracts the NSS features from the distorted image,
and predicts a quality score using support vector regression.
11-137
11 Analyzing and Enhancing Images
See Also
brisque | brisqueModel | fitbrisque | fitniqe | niqe | niqeModel
More About
• “Image Quality Metrics” on page 11-133
• “Compare No Reference Image Quality Metrics” on page 11-139
11-138
Compare No Reference Image Quality Metrics
Evaluating the quality of an image is an important part of image acquisition, compression, and other
image enhancement workflows. It is desirable to have a fast, automated metric that closely mimics
subjective measures of image quality. This example compares the performance of three no-reference
quality metrics.
Each metric has different strengths depending on the images in the data set. To select the best metric
for your data, you can compare the performance of the three metrics on sample image data. This
example shows how to compare the performance in two different situations: varying levels of JPEG
compression on a single image and for a video stream.
Image compression is a tradeoff between visual quality and the compression ratio, or size of the
output data. The tradeoff also depends on the image content. For example, images with uniform areas
can compress to smaller file sizes and exhibit fewer artifacts than images with detailed features.
Image quality metrics can help analyze this tradeoff, while trying to minimize the impact of the image
content on the analysis.
im = imread('llama.jpg');
Write copies of the image with different JPG compression ratios. Read each compressed image back
into the workspace.
jpegQuality = 10:10:100;
numObservations = numel(jpegQuality);
compressedFrames = cell(1,numObservations);
for ind = 1:numObservations
q = jpegQuality(ind);
tempFile = ['llama_compression_',num2str(q),'.jpg'];
imwrite(im,tempFile,'Quality',q);
compressedFrames{ind} = imread(tempFile);
end
tiledlayout(1,3);
h1 = nexttile;
imshow(compressedFrames{1})
title('JPEG Quality: 10')
nexttile
imshow(compressedFrames{7})
title('JPEG Quality: 70')
11-139
11 Analyzing and Enhancing Images
nexttile
imshow(im)
title('Input Image')
linkaxes
Zoom in on the compressed image to see the nature of some specific artifacts. At JPEG quality 10, the
blocking artifacts are obvious.
11-140
Compare No Reference Image Quality Metrics
For each compressed JPG image, calculate the quality score using the three quality metrics.
pQ = zeros(1, numObservations);
nQ = zeros(1, numObservations);
bQ = zeros(1, numObservations);
for ind=1:numObservations
bQ(ind) = brisque(compressedFrames{ind});
nQ(ind) = niqe(compressedFrames{ind});
pQ(ind) = piqe(compressedFrames{ind});
end
Visualize the score of the metrics as the JPEG quality increases. Normalize the scores so that each
score has the same value for the uncompressed image. For these three metrics, lower scores
correspond to higher image quality.
The BRISQUE score for a JPEG quality of 50, 60, and 70 is unrealistically lower than for
uncompressed JPEG images. Therefore, for images similar to this test image, NIQE and PIQE are
more reliable metrics.
figure
hold on
plot(jpegQuality,bQ/bQ(end),'*-');
plot(jpegQuality,nQ/nQ(end),'*-');
plot(jpegQuality,pQ/pQ(end),'*-');
legend('BRISQUE','NIQE','PIQE');
ylabel('Metric Score')
11-141
11 Analyzing and Enhancing Images
xlabel('JPEG Quality')
hold off
In applications such as streaming video, there is a need to evaluate quality metrics at the receiver
which may not have access to the original pristine sample. Also, the content of each frame can vary
significantly. Let us simulate such a scenario to evalute the performance characteristics of these
metrics.
Create a VideoReader object that reads frames from the video 'rhinos.avi'. This video has 114 frames.
vidObjR = VideoReader('rhinos.avi');
vidObjW = VideoWriter('varyingCompressed.avi');
open(vidObjW)
Create a varying compression ratio schedule to mimic real time varying bitrate transmissions
numFrames = vidObjR.NumFrames;
varyingQuality = sin(2*pi*(1:numFrames)*0.01);
varyingQuality = round(rescale(varyingQuality)*100);
varyingQuality = max(varyingQuality,1); % min JPEG quality is 1
figure
plot(varyingQuality);
title('JPEG Quality Schedule');
ylabel('JPEG Quality')
xlabel('Frame Index')
11-142
Compare No Reference Image Quality Metrics
For each frame in the video, compress the frame according to the JPEG quality schedule. Compute
the metrics of the compressed frame and add the compressed frame to the output video for
validation.
pQ = zeros(1,numFrames);
nQ = zeros(1,numFrames);
bQ = zeros(1,numFrames);
ind = 1;
while hasFrame(vidObjR)
im = readFrame(vidObjR);
writeVideo(vidObjW,frame);
bQ(ind) = brisque(frame);
nQ(ind) = niqe(frame);
pQ(ind) = piqe(frame);
ind = ind+1;
end
close(vidObjW);
11-143
11 Analyzing and Enhancing Images
Visualize the trend, expect it to mimic the compression schedule. Rescale the metrics to focus on the
trend, and invert the quality schedule to get the compression ratio trend. Quality metrics can still
give a useful indication of perceived quality without access to the original reference frame.
figure
hold on
plot(rescale(bQ));
plot(rescale(nQ));
plot(rescale(pQ));
% Invert JPEG Quality to get the compression ratio
plot(1-rescale(varyingQuality),'k','LineWidth',2)
legend('BRISQUE','NIQE','PIQE','Compression Ratio');
title('Trend of Quality Metrics with Varying Compression and Content');
ylabel('Metric Score')
xlabel('Frame Index')
hold off
See Also
brisque | brisqueModel | fitbrisque | fitniqe | niqe | niqeModel
More About
• “Image Quality Metrics” on page 11-133
• “Train and Use No-Reference Quality Assessment Model” on page 11-135
11-144
Obtain Local Structural Similarity Index
ref = imread('pout.tif');
Create an image whose quality is to be measured, by making a copy of the reference image and
adding noise. To illustrate local similarity, isolate the noise to half of the image. Display the reference
image and the noisy image side-by-side.
A = ref;
figure, imshowpair(A,ref,'montage')
Calculate the local Structural Similarity Index for the modified image (A), when compared to the
reference image (ref). Visualize the local structural similarity index. Note how left side of the image,
which is identical to the reference image displays as white because all the local structural similarity
values are 1.
11-145
11 Analyzing and Enhancing Images
figure, imshow(local_sim,[])
See Also
ssim
More About
• “Image Quality Metrics” on page 11-133
11-146
Compare Image Quality at Various Compression Levels
I = imread('cameraman.tif');
Write the image to a file using various quality values. The JPEG format supports the 'quality'
parameter. Use the ssim function to check the quality of each written image.
ssimValues = zeros(1,10);
qualityFactor = 10:10:100;
for i = 1:10
imwrite(I,'compressedImage.jpg','jpg','quality',qualityFactor(i));
ssimValues(i) = ssim(imread('compressedImage.jpg'),I);
end
Plot the results. Note how the image quality score improves as you increase the quality value
specified with imwrite.
plot(qualityFactor,ssimValues,'b-o');
11-147
11 Analyzing and Enhancing Images
See Also
ssim
More About
• “Image Quality Metrics” on page 11-133
11-148
Anatomy of Imatest Extended eSFR Chart
Image Processing Toolbox supports the Extended version of the eSFR chart. This version has a 16:9
aspect ratio and additional features to measure color accuracy. The esfrChart object does not
analyze other visual features in the chart, such as focusing targets and wedges.
The Extended eSFR test chart has 15 gray boxes tilted 5° away from vertical. The left, top, right, and
bottom edges of each box are used to measure:
• Local SFR, which indicates image sharpness. Sharp edges have better contrast than blurry edges,
and they more clearly show the actual position of the edge.
• In sharp edges, pixel intensity values quickly transition across boundaries in the scene. Most
pixels clearly belong to one side of the boundary or the other, and few pixels have intermediate
values. The contrast is high because adjacent pixels on either side of the actual edge have
large differences in intensity.
• In blurry edges, the transition happens gradually over many pixels, which makes it unclear
where the boundary actually occurs. The contrast is low because adjacent pixels have similar
intensity values.
Sharpness is higher toward the center of the imaged region and decreases toward the periphery.
Horizontal sharpness is usually higher than vertical sharpness. To measure SFR, use the
measureSharpness function.
• Local chromatic aberration, or color fringing, which indicates how uniformly the camera optical
system focuses light in the red, green, and blue color channels. In captured images, chromatic
aberration appears as an artificial strip of color along edges. Chromatic aberration also lowers
contrast in the luminance channel, and therefore reduces image sharpness.
11-149
11 Analyzing and Enhancing Images
Chromatic aberration increases radially from the center of the image. To measure chromatic
aberration, use the measureChromaticAberration function. This function also returns the edge
profiles for each color channel, which is the averaged projection of pixel intensity values along the
edge.
If you capture an image of the test chart at the full 16:9 aspect ratio, then esfrChart automatically
identifies and labels all 60 available slanted-edge ROIs. You can also image the test chart at the 4:3
and 3:2 aspect ratios, as indicated on the chart. At these ratios, fewer edges are available, and
esfrChart indexes edges according to the convention used by the 16:9 aspect ratio.
In a proper capture of the test chart, orient the chart without rotation, so that the angle of the slanted
edges is close to 5°. The contrast of the edges must be greater than 20%. If the contrast is less than
20%, adjust the scene lighting and the camera exposure.
The Extended eSFR test chart has 20 gray patches of increasing brightness, arranged in a ring
around the center of the image. The gray patches are used to measure:
• Scene illumination, which estimates the color of the light illuminating the scene. You can use the
measured illumination to white-balance images acquired under similar lighting conditions. To
measure scene illumination, use the measureIlluminant function. To white-balance the image,
use chromadapt.
• Noise, which quantifies how much the camera electronics generate error in pixel values. To
estimate noise in each color channel, use the measureNoise function.
In a proper capture of the test chart, set the scene lighting and camera exposure so that each gray
patch appears distinct from the other patches and no pixels are clipped. If the darkest patches appear
identical, or have values of 0, increase the scene lighting or the exposure. If the brightest patches
appear identical, or if the brightest patch is saturated, decrease the scene lighting or the exposure.
11-150
Anatomy of Imatest Extended eSFR Chart
The Extended eSFR test chart has 16 color patches arranged in four groups. The color patches are
used to measure:
• Color accuracy, which indicates how well the measured red, green, and blue values agree with
expected color values. To measure color accuracy, use the measureColor function. This function
also returns a color correction matrix, which you can use to adjust the image colors toward the
expected values.
Registration Markers
The eSFR test charts have registration markers used to orient the image properly. When you import a
chart, esfrChart detects four black-and-white checkered circles and uses their position to define
regions of interest automatically. You can optionally specify the [x, y] coordinates of the circle
centers manually.
11-151
11 Analyzing and Enhancing Images
References
[1] Imatest. "Esfr". https://www.imatest.com/mathworks/esfr/.
[2] ISO 12233:2014. "Photography – Electronic still picture imaging – Resolution and spatial
frequency responses." International Organization for Standardization; ISO/TC 42
Photography. URL: https://www.iso.org/standard/59419.html.
See Also
displayChart | esfrChart
More About
• “Evaluate Quality Metrics on eSFR Test Chart” on page 11-153
11-152
Evaluate Quality Metrics on eSFR Test Chart
Read an image of an eSFR chart into the workspace. Display the chart.
I = imread('eSFRTestImage.jpg');
figure
imshow(I)
title('Captured Image of eSFR Chart')
text(size(I,2),size(I,1)+15, ...
['Chart courtesy of Imatest',char(174)],'FontSize',10,'HorizontalAlignment','right');
Create an eSFR test chart object that automatically defines regions of interest (ROIs) based on
detected registration markers.
chart = esfrChart(I);
Highlight and label the detected ROIs to visually confirm that the ROIs are suitable for
measurements.
displayChart(chart)
11-153
11 Analyzing and Enhancing Images
All 60 slanted edge ROIs (labeled in green) are visible and centered on appropriate edges. In
addition, 20 gray patch ROIs (labeled in red) and 16 color patch ROIs (labeled in white) are visible
and are contained within the boundary of each patch. The chart is correctly imported.
Measure the sharpness of all 60 slanted edge ROIs. Also measure the averaged horizontal and
vertical sharpness of these ROIs.
[sharpnessTable,aggregateSharpnessTable] = measureSharpness(chart);
plotSFR(sharpnessTable,'ROIIndex',1:4,'displayLegend',false,'displayTitle',true)
11-154
Evaluate Quality Metrics on eSFR Test Chart
11-155
11 Analyzing and Enhancing Images
11-156
Evaluate Quality Metrics on eSFR Test Chart
11-157
11 Analyzing and Enhancing Images
Display the average SFR of the averaged vertical and horizontal edges. The average vertical SFR
drops off more rapidly than the average horizontal SFR. Therefore, the average vertical edge is less
sharp than the average horizontal edge.
plotSFR(aggregateSharpnessTable)
11-158
Evaluate Quality Metrics on eSFR Test Chart
11-159
11 Analyzing and Enhancing Images
chTable = measureChromaticAberration(chart);
Plot the normalized intensity profile of the three color channels in the first ROI. Store the normalized
edge profile in a separate variable, edgeProfile, for clarity.
roi_index = 1;
edgeProfile = chTable.normalizedEdgeProfile{roi_index};
figure
p = length(edgeProfile.normalizedEdgeProfile_R);
plot(1:p,edgeProfile.normalizedEdgeProfile_R,'r', ...
1:p,edgeProfile.normalizedEdgeProfile_G,'g', ...
1:p,edgeProfile.normalizedEdgeProfile_B,'b')
xlabel('Pixel')
ylabel('Normalized Intensity')
title(['ROI ' num2str(1) ' with Aberration ' num2str(chTable.aberration(1))])
11-160
Evaluate Quality Metrics on eSFR Test Chart
The color channels have similar normalized intensity profiles, and not much color fringing is visible
along the edge.
Measure Noise
Plot the average raw signal and the signal-to-noise ratio (SNR) in each grayscale ROI.
figure
subplot(1,2,1)
plot(noiseTable.ROI,noiseTable.MeanIntensity_R,'r', ...
noiseTable.ROI,noiseTable.MeanIntensity_G,'g', ...
noiseTable.ROI,noiseTable.MeanIntensity_B,'b')
title('Signal')
ylabel('Intensity')
xlabel('Gray ROI Number')
grid on
subplot(1,2,2)
plot(noiseTable.ROI,noiseTable.SNR_R,'r', ...
noiseTable.ROI,noiseTable.SNR_G,'g', ...
noiseTable.ROI,noiseTable.SNR_B,'b')
title('SNR')
ylabel('dB')
11-161
11 Analyzing and Enhancing Images
Estimate Illuminant
Estimate the scene illumination using the 20 gray patch ROIs. The illuminant has a stronger blue
component a weaker red component, which is consistent with the blue tint of the test chart image.
illum = measureIlluminant(chart)
illum = 1×3
Display the average measured color and the expected color of the ROIs. Display the color accuracy
measurement, Delta_E. The closer the Delta_E value is to 1, the less perceptible the color
difference is. Typical values of Delta_E range from 3 to 6 for printing, and up to 20 in other
commercial applications.
figure
displayColorPatch(colorTable)
11-162
Evaluate Quality Metrics on eSFR Test Chart
Plot the measured and reference colors in the CIE 1976 L*a*b* color space on a chromaticity
diagram. Red circles indicate the reference color. Green circles indicate the measured color of each
color patch.
figure
plotChromaticity(colorTable)
11-163
11 Analyzing and Enhancing Images
You can use the color correction matrix, ccm, to color-correct the test chart images. For an example,
see “Correct Colors Using Color Correction Matrix” on page 11-165.
References
[1] Imatest. "Esfr". https://www.imatest.com/mathworks/esfr/.
See Also
displayChart | esfrChart | measureChromaticAberration | measureColor |
measureIlluminant | measureNoise | measureSharpness
More About
• “Anatomy of Imatest Extended eSFR Chart” on page 11-149
• “Correct Colors Using Color Correction Matrix” on page 11-165
11-164
Correct Colors Using Color Correction Matrix
I = imread('eSFRTestImage.jpg');
Create an esfrChart object that stores information about the test chart. Display the chart,
highlighting the 16 color patches. The image has a blue tint.
chart = esfrChart(I);
displayChart(chart,'displayEdgeROIs',false, ...
'displayGrayROIs',false,'displayRegistrationPoints',false)
Measure the color accuracy of the 16 color patches by using the measureColor function. The
function also returns the color correction matrix that is used to perform the color correction.
[colorTable,ccm] = measureColor(chart);
Compare the measured and reference colors on a color patch diagram. The closer the Delta_E value
is to 1, the less perceptible the color difference is.
displayColorPatch(colorTable)
11-165
11 Analyzing and Enhancing Images
Color-correct the original test chart image and display the result.
I_cc = imapplymatrix(ccm(1:3,:)',I,ccm(4,:));
imshow(I_cc)
title('Color-Corrected Image Using Color Patches')
11-166
Correct Colors Using Color Correction Matrix
Create an esfrChart object that stores information about the color-corrected test chart. Measure
the color accuracy of the 16 color-corrected color patches.
chart_cc = esfrChart(I_cc);
colorTable_cc = measureColor(chart_cc);
Compare the corrected and reference colors on a color patch diagram. The measured color errors,
delta_E, are smaller for the color-corrected image than for the original image. Therefore, the colors
in this image better agree with the reference colors. However, the chart now has an overall yellow
tint and the contrast of the image has decreased.
displayColorPatch(colorTable_cc)
11-167
11 Analyzing and Enhancing Images
You can improve the color correction by including the gray patches as well as the color patches in the
least squares fit. Display the original chart, highlighting the 20 gray patches and 16 color patches.
displayChart(chart,'displayEdgeROIs',false, ...
'displayRegistrationPoints',false)
11-168
Correct Colors Using Color Correction Matrix
Get the reference L*a*b* values of the color and grayscale patches, which are stored in the
ReferenceColorLab and ReferenceGrayLab properties of the eSFR chart object. Convert these
values to the RGB color space.
Measure the mean gray value in each of the 20 gray patches by using the measureNoise function.
noiseTable = measureNoise(chart);
measuredGrayRGB = [noiseTable.MeanIntensity_R, ...
noiseTable.MeanIntensity_G, ...
noiseTable.MeanIntensity_B];
Concatenate all measured RGB color values of the color and grayscale patches.
Perform the color correction and display the result. The chart no longer has a yellow tint and the
overall appearance of the chart has improved.
I_cc2 = imapplymatrix(ccm_cc(1:3,:)',I,ccm_cc(4,:)');
imshow(I_cc2)
title('Color-Corrected Image Using Gray and Color Patches')
11-169
11 Analyzing and Enhancing Images
Compare the corrected and reference colors on a color patch diagram. Some of the measured color
errors have decreased, while others have increased.
chart_cc2 = esfrChart(I_cc2);
colorTable_cc2 = measureColor(chart_cc2);
displayColorPatch(colorTable_cc2)
11-170
Correct Colors Using Color Correction Matrix
References
[1] Imatest. "Esfr". https://www.imatest.com/mathworks/esfr/.
See Also
displayColorPatch | esfrChart | measureColor | plotChromaticity
More About
• “Evaluate Quality Metrics on eSFR Test Chart” on page 11-153
11-171
11 Analyzing and Enhancing Images
1 Select Get Add-ons from the Add-ons drop-down menu on the MATLAB desktop. The Add-on
Explorer opens.
2 In the Add-On Explorer, search for the data package Image Processing Toolbox Image Data.
The data package is a MathWorks Optional Feature.
11-172
Install Sample Data Using Add-On Explorer
3 Click the data package in the search results. On the data package page, click Install. Follow the
instructions presented by the installer.
11-173
12
ROI-Based Processing
Any binary image can be used as a mask, provided that the binary image is the same size as the
image being filtered.
BW = (I > 0.5)
img = imread('pout.tif');
h_im = imshow(img);
12-2
Create a Binary Mask
Create an ROI on the image using one of the ROI creation functions.
circ = drawcircle('Center',[113,66],'Radius',60);
Create a binary mask from the ROI using createMask. The createMask function returns a binary
image the same size as the input image. The pixels inside the ROI are set to 1 and the pixel values
everywhere else are set to 0.
BW = createMask(circ);
imshow(BW)
12-3
12 ROI-Based Processing
12-4
ROI Creation Overview
You can control aspects of the ROI position and appearance. You can create masks from ROIs and
perform other operations. You can also specify how the ROI responds to events that occur within the
ROI, such as mouse clicks and movement.
12-5
12 ROI-Based Processing
• Create an ROI interactively by using a creation convenience function. The creation functions
enable you to draw the ROI on an image. Use this approach if you do not have prior knowledge of
the size and position of the ROI and want to use the image content to assist in the placement of
the ROI. For more information, see “Create ROI Using Creation Convenience Functions” on page
12-8.
• Create an ROI programmatically by specifying information about the size and shape of the ROI.
Use this approach if you already know details about the size and shape of the ROI, such as the
coordinates of polygon vertices or the center coordinates and radius of a circle.
• Create an ROI programmatically, then use the draw function to interactively draw the ROI on an
image. Use this approach if you want to set the display properties and behavior of the ROI before
you specify the size and position of the ROI. The draw function also enables you to redraw an
existing ROI, preserving the appearance of the ROI. For more information, see “Create ROI Using
draw Function” on page 12-10.
The table shows the supported ROIs and their respective creation convenience functions.
12-6
ROI Creation Overview
Freehand drawfreehand Freehand ROI that follows the path of the mouse
12-7
12 ROI-Based Processing
12-8
ROI Creation Overview
This example creates an ellipsoid ROI. You can use a similar process to create any ROI object.
Create an Ellipse ROI by using the drawellipse function. Customize the look of the ROI by
specifying the StripeColor name-value pair argument as yellow.
roi = drawellipse('StripeColor','y');
12-9
12 ROI-Based Processing
roi
roi =
Ellipse with properties:
12-10
ROI Creation Overview
the size and position of the ROI. For example, you may want to create and customize an ROI before
you have an axes in which to display the ROI.
This example creates and draws an ellipsoid ROI. You can use a similar process to create any ROI
object.
roi = images.roi.Ellipse('Color','c','StripeColor','r');
roi
roi =
Ellipse with properties:
Center: []
SemiAxes: []
RotationAngle: 0
AspectRatio: 1.6180
Label: ''
Inspect the parent axes of the ROI. The ROI is not drawn so the parent axes is empty.
roi.Parent
ans =
0×0 empty GraphicsPlaceholder array.
I = imread('pears.png');
imshow(I)
12-11
12 ROI-Based Processing
Draw the ROI on the image by using the draw function. Click and drag the cursor over the image to
create the elliptical shape. The displayed ROI has the face color and stripe color that you specified
when you created the ROI.
draw(roi)
12-12
ROI Creation Overview
Inspect properties of the ROI. Several properties of the ROI are updated after drawing.
roi
roi =
Ellipse with properties:
The ROI now has a parent axes. Get all graphics objects who share the same parent axes. In this
example, the ROI has the same parent as the displayed image.
roi.Parent.Children
ans =
2×1 graphics array:
12-13
12 ROI-Based Processing
Ellipse
Image
• The mouse cursor does not update when you hover over the ROI. The cursor is always an arrow.
• The ROI does not change color when you hover over it.
• The ROI right-click menu (UIContextMenu) is not supported.
The following code, while not a typical app-creation workflow, shows how to specify an ROI in a
UIAxes in an app (UIFigure).
1 Create a UIAxes. When you call the uiaxes function, it creates a UIFigure automatically.
uax = uiaxes;
2 Create the ROI in the UIAxes. Call any of the ROI creation functions, such as drawcircle, or
the ROI classes, such as images.roi.Circle. Specify the UIAxes as an argument. Move the
12-14
ROI Creation Overview
cursor over the axes, and click and drag the mouse to draw the ROI. The shape of the cursor does
not change when used with a UIAxes.
h = drawcircle(uax);
You can also create an ROI using the object creation function, such as images.roi.Circle. If
you use the objects, you must also call the draw function, specifying the ROI object as an
argument.
See Also
Related Examples
• “Create Binary Mask Using an ROI Function” on page 12-2
More About
• “Display Graphics in App Designer”
12-15
12 ROI-Based Processing
ROI Migration
Starting in R2018b, a new set of ROI objects replaced the previous set of ROI objects. The new
objects provide better performance and more functional capabilities, such as face color transparency.
With the new objects, you can also receive notification of interactions with the object, such as clicks
or movement, using events. Although there are no plans to remove the old ROI objects at this time,
switch to the new ROIs to take advantage of the additional capabilities and flexibility. For more
information on the new ROIs, see “ROI Creation Overview” on page 12-5.
12-16
ROI Migration
roi_color = roi.Color;.
getPosition Retrieve the value of the Position property of
the ROI. For example,
roi_pos = roi.Position;.
getPositionConstraintFcn Use the DrawingArea property to specify
position constraints.
getVertices Retrieve the value of the Vertices property of
the ROI. For example,
roi_vert = roi.Vertices;.
makeConstrainToRectFcn Use the DrawingArea property to specify
position constraints.
removeNewPositionCallback Use the addListener object function to specify
the function to be called with the ROI moves. To
remove this callback function, delete the object
returned by the addListener object function.
resume Use uiresume instead.
setClosed Assign a value to the ROI Closed property. For
example, roi.Closed = 'y'.
setColor Assign a value to the new ROI Color property.
For example, roi.Color = 'y'.
setConstrainedPosition Use the DrawingArea property to specify
position constraints.
setFixedAspectRatioMode Use FixedAspectRatio property of the new
ROIs, setting the value to true.
setPosition Assign a value to the new ROI Position
property. The way to specify the position varies
with each object. For example, roi.Position =
[50 50].
setPositionConstraintFcn Use the DrawingArea property to specify
position constraints.
setResizable Use the InteractionsAllowed property,
setting the value to 'translate'.
setString Assign a value to the new ROI Label property.
For example, roi.Label = 'My Label';.
setVerticesDraggable Use the InteractionsAllowed property,
setting the value to 'translate'.
12-17
12 ROI-Based Processing
ROI Events
With the previous ROIs, you could use the addNewPositionCallback object function to receive
notification when the ROI moves. You specify the object and the function that you want executed
when the event occurs: id = addNewPositionCallback(h,fcn).
With the new ROIs, you use the addListener object function to receive notification when the ROI
moves. You specify the object, the name of the event you want to receive notification, and the name of
the function you want executed when the event occurs: el =
addlistener(roi,'ROIMoving',mycallbackfcn). With the new ROIs, you must specify the
name of the event because you can receive notification of many other events, such as when the ROI is
clicked.
See Also
More About
• “ROI Creation Overview” on page 12-5
12-18
Overview of ROI Filtering
To filter an ROI in an image, use the roifilt2 function. When you call roifilt2, specify:
roifilt2 filters the input image and returns an image that consists of filtered values for pixels
where the binary mask contains 1s and unfiltered values for pixels where the binary mask contains
0s. This type of operation is called masked filtering.
roifilt2 is best suited for operations that return data in the same range as in the original image,
because the output image takes some of its data directly from the input image. Certain filtering
operations can result in values outside the normal image data range (that is, [0, 1] for images of class
double, [0, 255] for images of class uint8, and [0, 65535] for images of class uint16). For more
information, see the reference page for roifilt2.
12-19
12 ROI-Based Processing
I = imread('pout.tif');
imshow(I)
Draw a region of interest over the image to specify the area you want to filter. Use the drawcircle
function to create the region of interest, specifying the center of the circle and the radius of the
circle. Alternatively, if you want to draw the circle interactively, then do not specify the center or
radius of the circle.
12-20
Sharpen Region of Interest in an Image
Create the mask using the createMask function and specifying the ROI.
mask = createMask(hax);
Define the function you want to use as a filter. This function, named f, passes the input image x to the
imsharpen function and specifies the strength of the sharpening effect by using the 'Amount'
name-value pair argument.
f = @(x)imsharpen(x,'Amount',3)
Filter the ROI using the roifilt2 function and specifying the image, mask, and filtering function.
J = roifilt2(I,mask,f);
imshow(J)
12-21
12 ROI-Based Processing
See Also
Circle | createMask | drawcircle | imsharpen | roifilt2
12-22
Apply Custom Filter to Region of Interest in Image
I = imread('cameraman.tif');
figure
imshow(I)
Create the mask image. This example uses a binary image of text as the mask image. All the 1-valued
pixels define the regions of interest. The example crops the image because a mask image must be the
same size as the image to be filtered.
BW = imread('text.png');
mask = BW(1:256,1:256);
figure
imshow(mask)
12-23
12 ROI-Based Processing
f = @(x) imadjust(x,[],[],0.3);
Filter the ROI, specifying the image to be filtered, the mask that defines the ROI, and the filter that
you want to use.
I2 = roifilt2(I,mask,f);
figure
imshow(I2)
12-24
Apply Custom Filter to Region of Interest in Image
See Also
imadjust | roifilt2
12-25
12 ROI-Based Processing
I = imread('eight.tif');
imshow(I)
Create a mask image to specify the region of interest (ROI) you want to fill. Use the roipoly
function to specify the region interactively. Call roipoly and move the pointer over the image. The
pointer shape changes to cross hairs . Define the ROI by clicking the mouse to specify the vertices
of a polygon. You can use the mouse to adjust the size and position of the ROI.
mask = roipoly(I);
Double-click to finish defining the region. roipoly creates a binary image with the region filled with
1-valued pixels.
12-26
Fill Region of Interest in an Image
figure
imshow(mask)
Fill the region, using regionfill, specifying the image to be filled and the mask image as inputs.
Display the result. Note the image contains one less coin.
J = regionfill(I,mask);
figure
imshow(J)
See Also
createMask | drawpolygon | regionfill | roipoly
12-27
12 ROI-Based Processing
BW = imread('text.png');
Open the Image Region Analyzer app from the MATLAB toolstrip. On the Apps tab, in the Image
12-28
Calculate Properties of Image Regions Using Image Region Analyzer
On the app toolstrip, click Load Image, and then select Load Image from Workspace to load the
image from the workspace into the app. In the Import From Workspace dialog box, select the image
you read into the workspace, and then click OK.
You can also open the app from the command line by using the imageRegionAnalyzer function,
specifying the image you want to analyze: imageRegionAnalyzer(BW);.
The Image Region Analyzer app displays the image you selected and a table where each row is a
region identified in the image and each column is a property of that region, such as the area,
perimeter, and orientation. (The Image Region Analyzer app uses the regionprops function to
identify regions in the image and calculate properties of these regions.)
To explore the image, move the cursor over the image to access the pan and zoom controls.
12-29
12 ROI-Based Processing
The app calculates more properties than it initially includes in the table. To view additional properties
in the table, click Choose Properties and select the properties you want to view. The app updates
the table automatically, adding a new column to the table for each property.
To explore properties of the image, sort the information in the table. Initially, the app lists the
properties in the order it finds them, starting in the upper-left corner of the image. To change the
sorting order, in the Properties section of the app toolstrip, click Sort Table and select the property
12-30
Calculate Properties of Image Regions Using Image Region Analyzer
on which you want to sort. For example, to find the largest region in the image, sort on the Area
property. The Image Region Analyzer app sorts the table by size.
To view the region in the image with the largest area, click the item in the table. The app highlights
the corresponding region in the image.
12-31
12 ROI-Based Processing
See Also
Image Region Analyzer | bwareafilt | bwpropfilt | regionprops
12-32
Filter Images on Properties Using Image Region Analyzer App
Open the Image Region Analyzer app from the MATLAB toolstrip. On the Apps tab, in the Image
On the app toolstrip, click Load Image, and then select Load Image from Workspace to load the
image from the workspace into the app. In the Import from Workspace dialog box, select the image
you read into the workspace, and then click OK.
12-33
12 ROI-Based Processing
You can also open the app from the command line using the imageRegionAnalyzer function,
specifying the image you want to analyze: imageRegionAnalyzer(BW);.
The Image Region Analyzer app displays the image you selected and a table where each row is a
region identified in the image and each column is a property of that region, such as the area,
perimeter, and orientation. (The Image Region Analyzer app uses the regionprops command to
identify regions in the image and calculate properties of those regions.)
To filter on the value of a region property, on the app toolstrip, click Filter. Then, select the property
on which you want to filter.
12-34
Filter Images on Properties Using Image Region Analyzer App
Next, specify the filter criteria. For example, to create an image that removes all but the largest
regions, choose the greater than or equal to symbol (>=), and then specify the minimum value. To
identify the minimum value for the desired property, you can sort the values in the table by that
property. The app uses the bwpropfilt and bwareafilt functions to filter binary images.
To filter on another property, click Add. The app displays another row in which you can select a
property and specify filter criteria. The result is the intersection (logical AND) of the two filtering
operations.
If you are creating a mask image, you can optionally perform cleanup operations on the mask, such as
clearing all foreground pixels that touch the border and filling holes in objects. Filling holes can
12-35
12 ROI-Based Processing
change the area of regions in the image, and areas that had been filtered because they were below
the threshold can become visible.
When you are done filtering the image, you can save it. Click Export and select Export Image. In the
Export to Workspace dialog box, accept the default name for the mask image, or specify another
name. Then, click OK.
See Also
Image Region Analyzer | bwareafilt | bwpropfilt | regionprops
12-36
Create Image Comparison Tool Using ROIs
Read a sample image into the workspace and then create a grayscale version of the image. Use the
imshowpair function to view the two images. The montage option shows the images side-by-side.
im = imread('peppers.png');
imgray = rgb2gray(im);
figure;
imshowpair(im,imgray,'montage')
Using an ROI, set the alpha layer (transparency) of two stacked images so that one image shows
through only inside the ROI. This selective view follows the ROI so it can be moved interactively.
Create a new figure and an axes.
hFigure = figure();
hAxes = axes('Parent', hFigure);
12-37
12 ROI-Based Processing
12-38
Create Image Comparison Tool Using ROIs
12-39
12 ROI-Based Processing
Create a listener that listens to changes in the position of the ROI (the circle). The updateAlpha
function is defined at the end of this example.
addlistener(hC,'MovingROI', @updateAlpha);
updateAlpha(hC)
12-40
Create Image Comparison Tool Using ROIs
12-41
12 ROI-Based Processing
This file contains the source code for a function that implements this image comparison tool. This
code listens for two additional events. When a user to enter the 't/T' key to switch which image is on
top. The code also listens for the mouse scroll wheel to increase or decrease the radius of the ROI.
edit helperImageComparer
12-42
Create Image Comparison Tool Using ROIs
Callback function to update the alpha layer as the ROI object is moved.
function updateAlpha(hC, ~)
hImages = findobj(hC.Parent,'Type','image');
% Create a BW mask from the Circle ROI
mask = hC.createMask(hImages(1).CData);
% Set the alpha data so that the underlying image shows through
% only inside the circle
set(hImages(1),'AlphaData', ~mask);
end
See Also
Circle | addlistener | createMask
More About
• “Callbacks — Programmed Response to User Action”
• “Overview Events and Listeners”
12-43
12 ROI-Based Processing
Segmentation algorithms are used to segment interesting parts of an image. To illustrate, this
example uses K-means clustering to segment bone and tissue in an MRI image.
im = dicomread('knee1.dcm');
segmentedLabels = imsegkmeans(im,3);
boneMask = segmentedLabels==2;
imshowpair(im, boneMask);
12-44
Use Freehand ROIs to Refine Segmentation Masks
Often, the results of automated segmentation algorithms need additional post-processing to clean up
the masks. As a first step, select the two largest bones from the mask, the femur and the tibia.
12-45
12 ROI-Based Processing
To refine the edges of the automatic k-means segmentation, convert the two masks into interactive
freehand ROI objects. First, retrieve the locations of boundary pixels that delineate these two
segmented regions. Note that these ROI objects are densely sampled--their Position property has
the same resolution as the image pixels.
blocations = bwboundaries(boneMask,'noholes');
figure
imshow(im, []);
for ind = 1:numel(blocations)
% Convert to x,y order.
pos = blocations{ind};
pos = fliplr(pos);
% Create a freehand ROI.
12-46
Use Freehand ROIs to Refine Segmentation Masks
drawfreehand('Position', pos);
end
The Freehand ROI object allows simple 'rubber-band' interactive edits. To edit the ROI, click and drag
any of the waypoints along the ROI boundary. You can add additional waypoints anywhere on the
boundary by double-clicking the ROI edge or by using the context menu accessible by right-clicking
the edge.
After editing the ROIs, convert these ROI objects back to binary masks using the ROI object's
createMask method. Note the additional step required to include the boundary pixels in the final
mask.
12-47
12 ROI-Based Processing
See Also
Freehand | bwareafilt | bwboundaries | createMask | dicomread | drawfreehand |
imsegkmeans
12-48
Rotate Image Interactively Using Rectangle ROI
Image rotation is a common preprocessing step. In this example, an image needs to be rotated by an
unknown amount to align the horizon with the x-axis. You can use the imrotate function to rotate
the image, but you need prior knowledge of the rotation angle. By using an interactive rotatable ROI,
you can rotate the image in real time to match the rotation of the ROI.
im = imread('baby.jpg');
hIm = imshow(im);
12-49
12 ROI-Based Processing
sz = size(im);
Determine the position and size of the Rectangle ROI as a 4-element vector of the form [x y w h]. The
ROI will be drawn at the center of the image and have half of the image width and height.
12-50
Rotate Image Interactively Using Rectangle ROI
Create a rotatable Rectangle ROI at the specified position and set the Rotatable property to true.
You can then rotate the rectangle by clicking and dragging near the corners. As the ROI moves, it
broadcasts an event MovingROI. By adding a listener for that event and a callback function that
executes when the event occurs, you can rotate the image in response to movements of the ROI.
h = drawrectangle('Rotatable',true,...
'DrawingArea','unlimited',...
'Position',pos,...
'FaceAlpha',0);
12-51
12 ROI-Based Processing
12-52
Rotate Image Interactively Using Rectangle ROI
Define a callback function that executes as the Rectangle ROI moves. This function retrieves the
current rotation angle of the ROI, calls imrotate on the image with that rotation angle, and updates
the display. The function also updates the label to display the current rotation angle.
12-53
12 ROI-Based Processing
function rotateImage(src,evt,hIm,im)
% Only rotate the image when the ROI is rotated. Determine if the
% RotationAngle has changed
if evt.PreviousRotationAngle ~= evt.CurrentRotationAngle
end
end
See Also
Rectangle | addlistener | drawrectangle | imrotate
More About
• “Callbacks — Programmed Response to User Action”
• “Overview Events and Listeners”
12-54
Subsample or Simplify a Freehand ROI
Introduction
When drawing interactively, mouse motion determines the density of points. For large complex ROIs,
the number of points used can be quite large.
The Smoothing property controls how the boundary looks. By default, the Freehand object uses a
Gaussian smoothing kernel with a sigma value of 1 and a filter size of 5. Changing this value only
changes how the boundary looks, it does not change the underlying Position property of the object.
Reducing the density of points can help reduce the space required to store the ROI data and may also
speed up any computation that depends on the number of these points. One way to reduce the density
of points is to subsample the points, for example, pick every other point.
Create a sample freehand ROI by converting a mask to an ROI. The ROI is very dense since every
boundary pixel will correspond to a point in the ROI.
im = imread('football.jpg');
bw = im(:,:,1)>200;
bw = bwareafilt(bw, 1);
bloc = bwboundaries(bw,'noholes');
roipos = fliplr(bloc{1});
imshow(im);
hfh = drawfreehand('Position', roipos);
12-55
12 ROI-Based Processing
To visualize the density of the points, turn every point in the ROI into a waypoint.
hfh.Waypoints(:) = true;
title('Original density');
snapnow
% Zoom in.
xlim([80 200]);
ylim([70 160]);
snapnow
12-56
Subsample or Simplify a Freehand ROI
Subsample the points that make up the Position property of the freehand ROI. Since the freehand
ROI is very dense. Subsampling can substantially reduce the size without loosing fidelity. Query the
initial, full, fine-grained position.
fpos = hfh.Position;
12-57
12 ROI-Based Processing
cpos = fpos(1:2:end,:);
hfh.Position = cpos;
hfh.Waypoints(:) = true;
title('Simple Subsample');
snapnow
12-58
Subsample or Simplify a Freehand ROI
A better approach to subsample the points would be to selectively start removing points which have
low curvature. It makes more sense to remove a point that is along a relatively straight portion of the
ROI rather than one near a curve. One simple approach to define a curvature value is to measure the
rate of the change in position locations.
Measure the rate of change. The neighbor of the first point is the last point.
cm = sum(abs(conv2(dfpos, ones(3,2),'same')),2);
Sort by curvature.
Pick 3/4 of the points with lower curvature values to remove from the ROI.
numPointsToCull = round(0.25*size(fpos,1));
cpos = fpos;
cpos(cmInds(1:numPointsToCull),:) = [];
hfh.Position = cpos;
hfh.Waypoints(:) = true;
12-59
12 ROI-Based Processing
An even better approach to subsample the points would be to use the reduce method on the ROI
object. The reduce method operates directly on the Position property of the ROI object. You can
affect the number of points removed by specifying a tolerance value between [0 1.0] as an optional
input argument. The default value of tolerance is 0.01.
Reset the Position property and call reduce on the ROI object.
hfh.Position = fpos;
reduce(hfh);
% View the updated ROI, turning all the points into waypoints to see the
% impact.
hfh.Waypoints(:) = true;
title('Subsampling using reduce method');
snapnow
12-60
Subsample or Simplify a Freehand ROI
Interactive Subsampling
Another way to subsample is to use events to make this process easier. First create a listener to
interactively change the number of points that the freehand ROI uses. Use the UserData property of
the Freehand object to cache the full resolution Position data, along with the current value of
tolerance. Then add a custom context menu to the ROI object by creating a new uimenu and
parenting it to the UIContextMenu of the Freehand object. This menu option allows you to finalize
the ROI, which deletes the temporary cache.
Restore the original ROI, and cache the original position along with its curvature measure in
UserData.
hfh.Waypoints(:) = true;
hfh.UserData.fpos = fpos;
hfh.UserData.tol = 0;
12-61
12 ROI-Based Processing
h = gcf;
h.WindowScrollWheelFcn = @(h, evt) changeSampleDensity(hfh, evt);
Add a context menu to finalize the ROI and perform any clean up needed.
uimenu(hfh.UIContextMenu, 'Text','Finalize',...
'MenuSelectedFcn', @(varargin)finalize(hfh));
12-62
Subsample or Simplify a Freehand ROI
This function gets called on scroll action. Scrolling up increases the density, and scrolling down
decreases it. This allows you to interactively select the number of points to retain.
12-63
12 ROI-Based Processing
% useful range.
tol = max(min(tol, 0.15), 0);
% Call |reduce| with the specified tolerance.
reduce(hfh,tol);
hfh.UserData.tol = tol;
% Update the ROI and turn all the points into waypoints to show the
% density.
hfh.Waypoints(:) = true;
end
Delete and create a new Freehand ROI with the subsampled points to save on space.
function finalize(hfh)
h = ancestor(hfh, 'figure');
% Reset the mouse scroll wheel callback.
h.WindowScrollWheelFcn = [];
% Save finalized set of points.
pos = hfh.Position;
% Delete and create a new Freehand ROI with the new |Position| value.
delete(hfh);
drawfreehand(gca, 'Position', pos);
end
See Also
Freehand | bwareafilt | bwboundaries | drawfreehand
More About
• “Callbacks — Programmed Response to User Action”
• “Overview Events and Listeners”
12-64
Measure Distances in an Image
im = imread('concordorthophoto.png');
Gather data about the image, such as its size, and store the data in a structure that you can pass to
callback functions.
sz = size(im);
myData.Units = 'pixels';
myData.MaxValue = hypot(sz(1),sz(2));
myData.Colormap = hot;
myData.ScaleFactor = 1;
hIm = imshow(im);
12-65
12 ROI-Based Processing
Specify a callback function for the ButtonDownFcn callback on the image. Pass the myData
structure to the callback function. This callback function creates the line objects and starts drawing
the ROIs.
12-66
Measure Distances in an Image
Create the function used with the ButtonDownFcn callback to create line ROIs. This function:
3. Adds a custom context menu to the ROIs that includes a 'Delete All' option.
4. Begins drawing the ROI, using the point clicked in the image as the starting point.
function startDrawing(hAx,myData)
% Create a line ROI object. Specify the initial color of the line and
% store the |myData| structure in the |UserData| property of the ROI.
h = images.roi.Line('Color',[0, 0, 0.5625],'UserData',myData);
% Set up a listener for movement of the line ROI. When the line ROI moves,
% the |updateLabel| callback updates the text in the line ROI label and
% changes the color of the line, based on its length.
addlistener(h,'MovingROI',@updateLabel);
12-67
12 ROI-Based Processing
% Set up a listener for clicks on the line ROI. When you click on the line
% ROI, the |updateUnits| callback opens a GUI that lets you specify the
% known distance in real-world units, such as, meters or feet.
addlistener(h,'ROIClicked',@updateUnits);
% Get the current mouse location from the |CurrentPoint| property of the
% axes and extract the _x_ and _y_ coordinates.
cp = hAx.CurrentPoint;
cp = [cp(1,1) cp(1,2)];
% Begin drawing the ROI from the current mouse location. Using the
% |beginDrawingFromPoint| method, you can draw multiple ROIs.
h.beginDrawingFromPoint(cp);
% Add a custom option to the line ROI context menu to delete all existing
% line ROIs.
c = h.UIContextMenu;
uimenu(c,'Label','Delete All','Callback',@deleteAll);
end
Create the function that is called whenever the line ROI is moving, that is, when the 'MovingROI'
event occurs. This function updates the ROI label with the length of the line and changes the color of
the line based on its length.
This function is called repeatedly when the ROI moves. If you want to update the ROI only when the
movement has finished, listen for the 'ROIMoved' event instead.
function updateLabel(src,evt)
12-68
Measure Distances in an Image
% Choose a color from the color map based on the length of the line. The
% line changes color as it gets longer or shorter.
color = src.UserData.Colormap(ceil(64*(mag/src.UserData.MaxValue)),:);
end
Create the function that is called whenever you double-click the ROI label. This function opens a
popup dialog box in which you can enter information about the real-world distance and units.
This function listens for the 'ROIClicked' event, using event data to check the type of click and the
part of the ROI that was clicked.
The popup dialog box prompts you to enter the known distance and units for this measurement. With
this information, you can calibrate all the ROI measurements to real world units.
function updateUnits(src,evt)
% When you double-click the ROI label, the example opens a popup dialog box
% to get information about the actual distance. Use this information to
% scale all line ROI measurements.
if strcmp(evt.SelectionType,'double') && strcmp(evt.SelectedPart,'label')
% Calculate the scale factor by dividing the known length value by the
% current length, measured in pixels.
scale = num/mag;
% Store the scale factor and the units information in the |myData|
% structure.
myData.Units = answer{2};
myData.MaxValue = src.UserData.MaxValue;
myData.Colormap = src.UserData.Colormap;
myData.ScaleFactor = scale;
12-69
12 ROI-Based Processing
% Reset the data stored in the |UserData| property of all existing line
% ROI objects. Use |findobj| to find all line ROI objects in the axes.
hAx = src.Parent;
hROIs = findobj(hAx,'Type','images.roi.Line');
set(hROIs,'UserData',myData);
% Update the label in each line ROI object, based on the information
% collected in the input dialog.
for i = 1:numel(hROIs)
pos = hROIs(i).Position;
diffPos = diff(pos);
mag = hypot(diffPos(1),diffPos(2));
end
end
end
Create the function to delete all ROIs. You added a custom context menu item to each line ROI in the
startDrawing callback function. This is the callback associated with that custom context menu. This
callback uses the findobj function to search for the ROI Type and deletes any found ROIs.
function deleteAll(src,~)
12-70
Measure Distances in an Image
hFig = ancestor(src,'figure');
hROIs = findobj(hFig,'Type','images.roi.Line');
delete(hROIs)
end
See Also
Line | addlistener | beginDrawingFromPoint | drawline
More About
• “Callbacks — Programmed Response to User Action”
• “Overview Events and Listeners”
12-71
12 ROI-Based Processing
You can change the angle by clicking and dragging the polyline vertices. When the ROI moves, it
broadcasts an event named MovingROI. By adding a listener for that event and a callback function
that executes when the event occurs, the tool can measure and display changes to the angle in real
time.
Get the coordinates of the center of the image. The example places the vertex of the angle
measurement tool at the center of the image.
midy = ceil(y/2);
midx = ceil(x/2);
Specify the coordinates of the first point in the polyline ROI. This example places the first point in the
polyline directly above the image center.
firstx = midx;
firsty = midy - ceil(y/4);
12-72
Use Polyline to Create Angle Measurement Tool
Specify the coordinates of the third point in the polyline ROI. This example places the third point in
the polyline directly to the right of the image center.
c = uicontextmenu;
Draw the polyline in red over the image. Specify the coordinates of the three vertices, and add a label
with instructions to interact with the polyline.
h = drawpolyline("Parent",gca, ...
"Position",[firstx,firsty;midx,midy;lastx,lasty], ...
"Label","Modify angle to begin...", ...
"Color",[0.8,0.2,0.2], ...
"UIContextMenu",c);
12-73
12 ROI-Based Processing
Add a listener that listens for movement of the ROI. When the listener detects movement, it calls the
custom callback function updateAngle. This custom function is defined in the section "Update Angle
Label Using Callback Function".
addlistener(h,'MovingROI',@(src,evt) updateAngle(src,evt));
Polyline ROIs also support interactive addition and deletion of vertices. However, an angle
measurement tool requires exactly three vertices at any time, so the addition and deletion of vertices
are undesirable interactions with the ROI. Add listeners that listen for the addition or deletion of
vertices. When you attempt to change the number of vertices, the appropriate listener calls a custom
callback function to suppress the change. These custom functions, storePositionInUserData and
recallPositionInUserData, are defined in the section "Prevent Addition or Deletion of Vertices
Using Callback Functions".
addlistener(h,'AddingVertex',@(src,evt) storePositionInUserData(src,evt));
addlistener(h,'VertexAdded',@(src,evt) recallPositionInUserData(src,evt));
addlistener(h,'DeletingVertex',@(src,evt) storePositionInUserData(src,evt));
addlistener(h,'VertexDeleted',@(src,evt) recallPositionInUserData(src,evt));
Define a callback function that executes as the polyline ROI moves. This function retrieves the
current position of the three vertices, calculates the angle in degrees between the vertices, and
updates the label to display the current rotation angle.
function updateAngle(src,evt)
% Get the current position
p = evt.CurrentPosition;
12-74
Use Polyline to Create Angle Measurement Tool
v2 = [p(3,1)-p(2,1), p(3,2)-p(2,2)];
theta = acos(dot(v1,v2)/(norm(v1)*norm(v2)));
Define a callback function that executes when the listeners detect the 'AddingVertex' or
'DeletingVertex' events. These events occur immediately before the vertex of interest is added to or
deleted from the polyline. Store the current three polyline vertices in the UserData property.
function storePositionInUserData(src,~)
src.UserData = src.Position;
end
Define a callback function that executes when the listeners detect the 'VertexAdded' or
'VertexDeleted' events. These events occur immediately after the vertex of interest is added to or
deleted from the polyline. Restore the stored set of three polyline vertices in the UserData property.
function recallPositionInUserData(src,~)
src.Position = src.UserData;
end
See Also
Polyline | addlistener | drawpolyline
More About
• “Callbacks — Programmed Response to User Action”
• “Overview Events and Listeners”
12-75
12 ROI-Based Processing
Another way to edit the shape of freehand ROIs, offered by many popular image manipulation
programs, is an 'eraser' or 'brush' tool. This example implements one of these tools, using another
ROI object to edit the freehand ROI.
Create a Freehand ROI that follows the shape of a segmentation mask. For more details on this
process, see “Use Freehand ROIs to Refine Segmentation Masks” on page 12-44.
im = dicomread('knee1.dcm');
Segment the MRI image and select the two largest regions of the mask.
segmentedLabels = imsegkmeans(im,3);
boneMask = segmentedLabels==2;
boneMask = bwareafilt(boneMask, 1);
blocations = bwboundaries(boneMask,'noholes');
pos = blocations{1};
pos = fliplr(pos);
figure
hImage = imshow(im,[]);
12-76
Create Freehand ROI Editing Tool
hf = drawfreehand('Position', pos);
12-77
12 ROI-Based Processing
Create a Circle ROI that will be used as the eraser or brush ROI editing tool. (You can use any of the
images.roi.* classes by making a small change, mentioned below).
he = images.roi.Circle(...
'Center', [50 50],...
'Radius', 10,...
'Parent', gca,...
'Color','r');
12-78
Create Freehand ROI Editing Tool
Associate two event listeners with the Circle ROI. One listens for ROI movement and the other listens
for when movement stops. The ROI moving callback function , the example makes sure to have its
position snap to pixel locations and also to change color (Red/Green) to indicate if the edit operation
will remove or add to the target freehand ROI. Once the editor ROI stops moving, we will create
corresponding binary masks for the editor ROI and the target freehand ROI and make the required
edit. Finally, we'll transform the updated mask back to a freehand ROI object.Wire up a listener to
react whenever this editor ROI is moved
12-79
12 ROI-Based Processing
This is the ROI moving callback function. This function ensure that the editor ROI snaps to the pixel
grid, and changes the color of the editor ROI to indicate if it will add to the freehand ROI or a remove
a region from the freehand ROI. If the center of the editor ROI is outside the target freehand ROI,
removes operation, otherwise it will 'add'.
% Check if the circle ROI's center is inside or outside the freehand ROI.
center = he.Center;
isAdd = hf.inROI(center(1), center(2));
if isAdd
% Green if inside (since we will add to the freehand).
he.Color = 'g';
else
% Red otherwise.
he.Color = 'r';
end
end
12-80
Create Freehand ROI Editing Tool
This is the edit freehand ROI callback that adds or removes the region of the editor ROI that
intersects the target freehand ROI.
% Check if center of the editor ROI is inside the target freehand. If you
% use a different editor ROI, ensure to update center computation.
center = he.Center; %
isAdd = hf.inROI(center(1), center(2));
if isAdd
% Add the editor mask to the freehand mask
newMask = tmask|emask;
else
% Delete out the part of the freehand which intersects the editor
newMask = tmask&~emask;
end
end
See Also
Circle | Freehand | addlistener | bwareafilt | bwboundaries | createMask | dicomread |
drawfreehand | imsegkmeans | inROI
More About
• “Callbacks — Programmed Response to User Action”
• “Overview Events and Listeners”
12-81
12 ROI-Based Processing
Display an image.
imshow('pears.png')
Use a custom wait function to block the MATLAB command line while you interact with the rectangle.
This example specifies a function called customWait (defined at the end of the example).
While the command line is blocked, resize and reposition the rectangle so that it encompasses one
pear. Double-click on the rectangle to resume execution of the customWait function. The function
returns the final position of the rectangle.
pos = customWait(h)
12-82
Use Wait Function After Drawing ROI
pos = 1×4
This is the custom wait function that blocks the program execution when you click an ROI. When you
have finished interacting with the ROI, the function returns the position of the ROI.
% Remove listener
delete(l);
end
This click callback function resumes program execution when you double-click the ROI. Note that
event data is passed to the callback function as an images.roi.ROIClickedEventData object,
12-83
12 ROI-Based Processing
which enables you to define callback functions that respond to different types of actions. For example,
you could define a callback function to resume program execution when you click the ROI while
pressing the Shift key or when you click a specific part of the ROI such as the label.
function clickCallback(~,evt)
if strcmp(evt.SelectionType,'double')
uiresume;
end
end
See Also
Rectangle | addlistener | drawrectangle | uiresume | uiwait
More About
• “Callbacks — Programmed Response to User Action”
• “Overview Events and Listeners”
12-84
Interactive Image Inpainting Using Exemplar Matching
In this example, you perform region filling and object removal by:
Read an image to inpaint into the workspace. The image has missing image regions to be filled
through inpainting.
I = imread('greensdistorted.png');
Create an interactive figure window to display the image to be inpainted. In the window, you can
select the region of interest (ROI), and dynamically update the parameter values.
hImage = imshow(I,'Parent',ax);
Select ROIs interactively and dynamically inpaint the selected ROIs by using the callback function
clickCallback. Assign a function handle that references the clickCallback function to the
ButtonDownFcn property of the image object.
hImage.ButtonDownFcn = @(hImage,eventdata)clickCallback(hImage,eventdata,data);
12-85
12 ROI-Based Processing
Step 1: Choose the patch size and the fill order for inpainting. To inpaint with local parameter values,
modify the patch size and the fill order to desired values by using user controls in the interactive
figure window.
The choice of patch size and the fill order influences the quality of inpainting and their best value
depend on the characteristics of the image region to be inpainted.
• To inpaint regions with regular textures, choose a larger patch size and achieve seamless
inpainting.
• To inpaint regions that are locally uniform with respect to a small neighborhood, choose smaller
patch size.
The default fill order is set to 'gradient'. You can choose 'gradient' or 'tensor' based fill order for
inpainting image regions. However, 'tensor' based fill order is more suitable for inpainting image
regions with linear structures and regular textures.
Step 2: Create a freehand ROI interactively by using your mouse. Position the pointer on the axes
and click and drag to draw the ROI shape. Release the pointer to close the shape.
The function dynamically updates the parameter values specified by using the user control interface
and inpaints the selected ROI. Repeat steps 1 and 2, in order to inpaint all the desired regions in the
image.
12-86
Interactive Image Inpainting Using Exemplar Matching
Create the clickCallback to be used with the ButtonDownFcn to interactively select and inpaint
ROIs.
function clickCallback(src,~,data)
% Get the parameter values for inpainting.
fillOrder = data.fillOrder.String{data.fillOrder.Value};
pSize = data.patchSize.String;
patchSize = str2double(pSize);
% Select and draw freehand ROI.
h = drawfreehand('Parent',src.Parent);
% Create a binary mask of the selected ROI.
mask = h.createMask(src.CData);
% Run exemplar-based inpainting algorithm with user given parameters.
newImage = inpaintExemplar(src.CData,mask,'PatchSize',patchSize,'FillOrder',fillOrder);
% Update input image with output.
src.CData = newImage;
% Delete ROI handle.
delete(h);
end
See Also
Freehand | createMask | drawfreehand | inpaintExemplar
12-87
12 ROI-Based Processing
More About
• “Callbacks — Programmed Response to User Action”
• “Overview Events and Listeners”
12-88
13
Image Segmentation
This topic describes a range of techniques and apps that are used to segment images.
From experimentation, it is known that Gabor filters are a reasonable model of simple cells in the
Mammalian vision system. Because of this, Gabor filters are thought to be a good model of how
humans distinguish texture, and are therefore a useful model to use when designing algorithms to
recognize texture. This example uses the basic approach described in (A. K. Jain and F. Farrokhnia,
"Unsupervised Texture Segmentation Using Gabor Filters",1991) to perform texture segmentation.
Read and display the input image. This example shrinks the image to make the example run more
quickly.
A = imread('kobi.png');
A = imresize(A,0.25);
Agray = rgb2gray(A);
figure
imshow(A)
Design an array of Gabor Filters which are tuned to different frequencies and orientations. The set of
frequencies and orientations is designed to localize different, roughly orthogonal, subsets of
13-2
Texture Segmentation Using Gabor Filters
frequency and orientation information in the input image. Regularly sample orientations between
[0,150] degrees in steps of 30 degrees. Sample wavelength in increasing powers of two starting from
4/sqrt(2) up to the hypotenuse length of the input image. These combinations of frequency and
orientation are taken from [Jain,1991] cited in the introduction.
imageSize = size(A);
numRows = imageSize(1);
numCols = imageSize(2);
wavelengthMin = 4/sqrt(2);
wavelengthMax = hypot(numRows,numCols);
n = floor(log2(wavelengthMax/wavelengthMin));
wavelength = 2.^(0:(n-2)) * wavelengthMin;
deltaTheta = 45;
orientation = 0:deltaTheta:(180-deltaTheta);
g = gabor(wavelength,orientation);
Extract Gabor magnitude features from source image. When working with Gabor filters, it is common
to work with the magnitude response of each filter. Gabor magnitude response is also sometimes
referred to as "Gabor Energy". Each MxN Gabor magnitude output image in gabormag(:,:,ind) is
the output of the corresponding Gabor filter g(ind).
gabormag = imgaborfilt(Agray,g);
To use Gabor magnitude responses as features for use in classification, some post-processing is
required. This post processing includes Gaussian smoothing, adding additional spatial information to
the feature set, reshaping our feature set to the form expected by the pca and kmeans functions, and
normalizing the feature information to a common variance and mean.
Each Gabor magnitude image contains some local variations, even within well segmented regions of
constant texture. These local variations will throw off the segmentation. We can compensate for these
variations using simple Gaussian low-pass filtering to smooth the Gabor magnitude information. We
choose a sigma that is matched to the Gabor filter that extracted each feature. We introduce a
smoothing term K that controls how much smoothing is applied to the Gabor magnitude responses.
for i = 1:length(g)
sigma = 0.5*g(i).Wavelength;
K = 3;
gabormag(:,:,i) = imgaussfilt(gabormag(:,:,i),K*sigma);
end
When constructing Gabor feature sets for classification, it is useful to add a map of spatial location
information in both X and Y. This additional information allows the classifier to prefer groupings
which are close together spatially.
X = 1:numCols;
Y = 1:numRows;
[X,Y] = meshgrid(X,Y);
featureSet = cat(3,gabormag,X);
featureSet = cat(3,featureSet,Y);
Reshape data into a matrix X of the form expected by the kmeans function. Each pixel in the image
grid is a separate datapoint, and each plane in the variable featureSet is a separate feature. In this
13-3
13 Image Segmentation
example, there is a separate feature for each filter in the Gabor filter bank, plus two additional
features from the spatial information that was added in the previous step. In total, there are 24 Gabor
features and 2 spatial features for each pixel in the input image.
numPoints = numRows*numCols;
X = reshape(featureSet,numRows*numCols,[]);
Visualize the feature set. To get a sense of what the Gabor magnitude features look like, Principal
Component Analysis can be used to move from a 26-D representation of each pixel in the input image
into a 1-D intensity value for each pixel.
coeff = pca(X);
feature2DImage = reshape(X*coeff(:,1),numRows,numCols);
figure
imshow(feature2DImage,[])
It is apparent in this visualization that there is sufficient variance in the Gabor feature information to
obtain a good segmentation for this image. The dog is very dark compared to the floor because of the
texture differences between the dog and the floor.
Repeat k-means clustering five times to avoid local minima when searching for means that minimize
objective function. The only prior information assumed in this example is how many distinct regions
13-4
Texture Segmentation Using Gabor Filters
of texture are present in the image being segmented. There are two distinct regions in this case. This
part of the example requires the Statistics and Machine Learning Toolbox™.
L = kmeans(X,2,'Replicates',5);
L = reshape(L,[numRows numCols]);
figure
imshow(label2rgb(L))
Visualize the segmented image using imshowpair. Examine the foreground and background images
that result from the mask BW that is associated with the label matrix L.
Aseg1 = zeros(size(A),'like',A);
Aseg2 = zeros(size(A),'like',A);
BW = L == 2;
BW = repmat(BW,[1 1 3]);
Aseg1(BW) = A(BW);
Aseg2(~BW) = A(~BW);
figure
imshowpair(Aseg1,Aseg2,'montage');
13-5
13 Image Segmentation
13-6
Texture Segmentation Using Texture Filters
Read Image
Use entropyfilt to create a texture image. The function entropyfilt returns an array where
each output pixel contains the entropy value of the 9-by-9 neighborhood around the corresponding
pixel in the input image I. Entropy is a statistical measure of randomness.
You can also use stdfilt and rangefilt to achieve similar segmentation results. For comparison
to the texture image of local entropy, create texture images S and R showing the local standard
deviation and local range, respectively.
E = entropyfilt(I);
S = stdfilt(I,ones(9));
R = rangefilt(I,ones(9));
Use rescale to rescale the texture images E and S so that pixel values are in the range [0, 1] as
expected of images of data type double.
Eim = rescale(E);
Sim = rescale(S);
13-7
13 Image Segmentation
montage({Eim,Sim,R},'Size',[1 3],'BackgroundColor','w',"BorderSize",20)
title('Texture Images Showing Local Entropy, Local Standard Deviation, and Local Range')
This example continues by processing the entropy texture image Eim. You can repeat a similar
process for the other two types of texture images with other morphological functions to achieve
similar segmentation results.
Threshold the rescaled image Eim to segment the textures. A threshold value of 0.8 is selected
because it is roughly the intensity value of pixels along the boundary between the textures.
BW1 = imbinarize(Eim,0.8);
imshow(BW1)
title('Thresholded Texture Image')
13-8
Texture Segmentation Using Texture Filters
The segmented objects in the binary image BW1 are white. If you compare BW1 to I, you notice the
top texture is overly segmented (multiple white objects) and the bottom texture is segmented almost
in its entirety. Remove the objects in the top texture by using bwareaopen.
BWao = bwareaopen(BW1,2000);
imshow(BWao)
title('Area-Opened Texture Image')
13-9
13 Image Segmentation
Use imclose to smooth the edges and to close any open holes in the object in BWao. Specify the
same 9-by-9 neighborhood that was used by entropyfilt.
nhood = ones(9);
closeBWao = imclose(BWao,nhood);
imshow(closeBWao)
title('Closed Texture Image')
Use imfill to fill holes in the object in closeBWao. The mask for the bottom texture is not perfect
because the mask does not extend to the bottom of the image. However, you can use the mask to
segment the textures.
mask = imfill(closeBWao,'holes');
imshow(mask);
title('Mask of Bottom Texture')
13-10
Texture Segmentation Using Texture Filters
textureTop = I;
textureTop(mask) = 0;
textureBottom = I;
textureBottom(~mask) = 0;
montage({textureTop,textureBottom},'Size',[1 2],'BackgroundColor','w',"BorderSize",20)
title('Segmented Top Texture (Left) and Segmented Bottom Texture (Right)')
13-11
13 Image Segmentation
Create a label matrix that has the label 1 where the mask is false and the label 2 where the mask is
true. Overlay label matrix on the original image.
L = mask+1;
imshow(labeloverlay(I,L))
title('Labeled Segmentation Regions')
13-12
Texture Segmentation Using Texture Filters
boundary = bwperim(mask);
imshow(labeloverlay(I,boundary,"Colormap",[0 1 1]))
title('Boundary Between Textures')
13-13
13 Image Segmentation
See Also
bwareaopen | bwperim | entropyfilt | imbinarize | imclose | imfill | rangefilt
13-14
Color-Based Segmentation Using the L*a*b* Color Space
Read in the fabric.png image, which is an image of colorful fabric. Instead of using fabric.png,
you can acquire an image using the following functions in the Image Acquisition Toolbox.
% Open a live preview window. Point camera onto a piece of colorful fabric.
% preview(vidobj);
fabric = imread('fabric.png');
imshow(fabric)
title('Fabric')
13-15
13 Image Segmentation
Step 2: Calculate Sample Colors in L*a*b* Color Space for Each Region
You can see six major colors in the image: the background color, red, green, purple, yellow, and
magenta. Notice how easily you can visually distinguish these colors from one another. The L*a*b*
colorspace (also known as CIELAB or CIE L*a*b*) enables you to quantify these visual differences.
The L*a*b* color space is derived from the CIE XYZ tristimulus values. The L*a*b* space consists of a
luminosity 'L*' or brightness layer, chromaticity layer 'a*' indicating where color falls along the red-
green axis, and chromaticity layer 'b*' indicating where the color falls along the blue-yellow axis.
Your approach is to choose a small sample region for each color and to calculate each sample region's
average color in 'a*b*' space. You will use these color markers to classify each pixel.
To simplify this example, load the region coordinates that are stored in a MAT-file.
load regioncoordinates;
nColors = 6;
sample_regions = false([size(fabric,1) size(fabric,2) nColors]);
13-16
Color-Based Segmentation Using the L*a*b* Color Space
end
imshow(sample_regions(:,:,2))
title('Sample Region for Red')
Convert your fabric RGB image into an L*a*b* image using rgb2lab .
lab_fabric = rgb2lab(fabric);
Calculate the mean 'a*' and 'b*' value for each area that you extracted with roipoly. These values
serve as your color markers in 'a*b*' space.
a = lab_fabric(:,:,2);
b = lab_fabric(:,:,3);
color_markers = zeros([nColors, 2]);
For example, the average color of the red sample region in 'a*b*' space is
fprintf('[%0.3f,%0.3f] \n',color_markers(2,1),color_markers(2,2));
13-17
13 Image Segmentation
[69.828,20.106]
Each color marker now has an 'a*' and a 'b*' value. You can classify each pixel in the lab_fabric
image by calculating the Euclidean distance between that pixel and each color marker. The smallest
distance will tell you that the pixel most closely matches that color marker. For example, if the
distance between a pixel and the red color marker is the smallest, then the pixel would be labeled as
a red pixel.
Create an array that contains your color labels, i.e., 0 = background, 1 = red, 2 = green, 3 = purple,
4 = magenta, and 5 = yellow.
color_labels = 0:nColors-1;
a = double(a);
b = double(b);
distance = zeros([size(a), nColors]);
Perform classification
[~,label] = min(distance,[],3);
label = color_labels(label);
clear distance;
The label matrix contains a color label for each pixel in the fabric image. Use the label matrix to
separate objects in the original fabric image by color.
Display the five segmented colors as a montage. Also display the background pixels in the image that
are not classified as a color.
montage({segmented_images(:,:,:,2),segmented_images(:,:,:,3) ...
segmented_images(:,:,:,4),segmented_images(:,:,:,5) ...
segmented_images(:,:,:,6),segmented_images(:,:,:,1)});
title("Montage of Red, Green, Purple, Magenta, and Yellow Objects, and Background")
13-18
Color-Based Segmentation Using the L*a*b* Color Space
You can see how well the nearest neighbor classification separated the different color populations by
plotting the 'a*' and 'b*' values of pixels that were classified into separate colors. For display
purposes, label each point with its color label.
figure
for count = 1:nColors
plot(a(label==count-1),b(label==count-1),'.','MarkerEdgeColor', ...
plot_labels{count}, 'MarkerFaceColor', plot_labels{count});
hold on;
end
13-19
13 Image Segmentation
13-20
Color-Based Segmentation Using K-Means Clustering
Read in hestain.png, which is an image of tissue stained with hemotoxylin and eosin (H&E). This
staining method helps pathologists distinguish different tissue types.
he = imread('hestain.png');
imshow(he), title('H&E image');
text(size(he,2),size(he,1)+15,...
'Image courtesy of Alan Partin, Johns Hopkins University', ...
'FontSize',7,'HorizontalAlignment','right');
Step 2: Convert Image from RGB Color Space to L*a*b* Color Space
How many colors do you see in the image if you ignore variations in brightness? There are three
colors: white, blue, and pink. Notice how easily you can visually distinguish these colors from one
another. The L*a*b* color space (also known as CIELAB or CIE L*a*b*) enables you to quantify these
visual differences.
The L*a*b* color space is derived from the CIE XYZ tristimulus values. The L*a*b* space consists of a
luminosity layer 'L*', chromaticity-layer 'a*' indicating where color falls along the red-green axis, and
chromaticity-layer 'b*' indicating where the color falls along the blue-yellow axis. All of the color
information is in the 'a*' and 'b*' layers. You can measure the difference between two colors using the
Euclidean distance metric.
lab_he = rgb2lab(he);
13-21
13 Image Segmentation
Clustering is a way to separate groups of objects. K-means clustering treats each object as having a
location in space. It finds partitions such that objects within each cluster are as close to each other as
possible, and as far from objects in other clusters as possible. K-means clustering requires that you
specify the number of clusters to be partitioned and a distance metric to quantify how close two
objects are to each other.
Since the color information exists in the 'a*b*' color space, your objects are pixels with 'a*' and 'b*'
values. Convert the data to data type single for use with imsegkmeans. Use imsegkmeans to
cluster the objects into three clusters.
ab = lab_he(:,:,2:3);
ab = im2single(ab);
nColors = 3;
% repeat the clustering 3 times to avoid local minima
pixel_labels = imsegkmeans(ab,nColors,'NumAttempts',3);
For every object in your input, imsegkmeans returns an index, or a label, corresponding to a cluster.
Label every pixel in the image with its pixel label.
imshow(pixel_labels,[])
title('Image Labeled by Cluster Index');
Using pixel_labels, you can separate objects in hestain.png by color, which will result in three
images.
mask1 = pixel_labels==1;
cluster1 = he .* uint8(mask1);
imshow(cluster1)
title('Objects in Cluster 1');
13-22
Color-Based Segmentation Using K-Means Clustering
mask2 = pixel_labels==2;
cluster2 = he .* uint8(mask2);
imshow(cluster2)
title('Objects in Cluster 2');
mask3 = pixel_labels==3;
cluster3 = he .* uint8(mask3);
imshow(cluster3)
title('Objects in Cluster 3');
13-23
13 Image Segmentation
Cluster 3 contains the blue objects. Notice that there are dark and light blue objects. You can
separate dark blue from light blue using the 'L*' layer in the L*a*b* color space. The cell nuclei are
dark blue.
Recall that the 'L*' layer contains the brightness values of each color. Extract the brightness values of
the pixels in this cluster and threshold them with a global threshold using imbinarize. The mask
is_light_blue gives the indices of light blue pixels.
L = lab_he(:,:,1);
L_blue = L .* double(mask3);
L_blue = rescale(L_blue);
idx_light_blue = imbinarize(nonzeros(L_blue));
Copy the mask of blue objects, mask3, then remove the light blue pixels from the mask. Apply the
new mask to the original image and display the result. Only dark blue cell nuclei are visible.
blue_idx = find(mask3);
mask_dark_blue = mask3;
mask_dark_blue(blue_idx(idx_light_blue)) = 0;
blue_nuclei = he .* uint8(mask_dark_blue);
imshow(blue_nuclei)
title('Blue Nuclei');
13-24
Color-Based Segmentation Using K-Means Clustering
13-25
13 Image Segmentation
Segmentation using the watershed transform works better if you can identify, or "mark," foreground
objects and background locations. Marker-controlled watershed segmentation follows this basic
procedure:
1 Compute a segmentation function. This is an image whose dark regions are the objects you are
trying to segment.
2 Compute foreground markers. These are connected blobs of pixels within each of the objects.
3 Compute background markers. These are pixels that are not part of any object.
4 Modify the segmentation function so that it only has minima at the foreground and background
marker locations.
5 Compute the watershed transform of the modified segmentation function.
rgb = imread('pears.png');
I = rgb2gray(rgb);
imshow(I)
13-26
Marker-Controlled Watershed Segmentation
Compute the gradient magnitude. The gradient is high at the borders of the objects and low (mostly)
inside the objects.
gmag = imgradient(I);
imshow(gmag,[])
title('Gradient Magnitude')
13-27
13 Image Segmentation
Can you segment the image by using the watershed transform directly on the gradient magnitude?
L = watershed(gmag);
Lrgb = label2rgb(L);
imshow(Lrgb)
title('Watershed Transform of Gradient Magnitude')
13-28
Marker-Controlled Watershed Segmentation
No. Without additional preprocessing such as the marker computations below, using the watershed
transform directly often results in "oversegmentation."
A variety of procedures could be applied here to find the foreground markers, which must be
connected blobs of pixels inside each of the foreground objects. In this example you'll use
morphological techniques called "opening-by-reconstruction" and "closing-by-reconstruction" to
"clean" up the image. These operations will create flat maxima inside each object that can be located
using imregionalmax.
se = strel('disk',20);
Io = imopen(I,se);
imshow(Io)
title('Opening')
13-29
13 Image Segmentation
Ie = imerode(I,se);
Iobr = imreconstruct(Ie,I);
imshow(Iobr)
title('Opening-by-Reconstruction')
13-30
Marker-Controlled Watershed Segmentation
Following the opening with a closing can remove the dark spots and stem marks. Compare a regular
morphological closing with a closing-by-reconstruction. First try imclose:
Ioc = imclose(Io,se);
imshow(Ioc)
title('Opening-Closing')
13-31
13 Image Segmentation
Now use imdilate followed by imreconstruct. Notice you must complement the image inputs and
output of imreconstruct.
Iobrd = imdilate(Iobr,se);
Iobrcbr = imreconstruct(imcomplement(Iobrd),imcomplement(Iobr));
Iobrcbr = imcomplement(Iobrcbr);
imshow(Iobrcbr)
title('Opening-Closing by Reconstruction')
13-32
Marker-Controlled Watershed Segmentation
As you can see by comparing Iobrcbr with Ioc, reconstruction-based opening and closing are more
effective than standard opening and closing at removing small blemishes without affecting the overall
shapes of the objects. Calculate the regional maxima of Iobrcbr to obtain good foreground markers.
fgm = imregionalmax(Iobrcbr);
imshow(fgm)
title('Regional Maxima of Opening-Closing by Reconstruction')
13-33
13 Image Segmentation
To help interpret the result, superimpose the foreground marker image on the original image.
I2 = labeloverlay(I,fgm);
imshow(I2)
title('Regional Maxima Superimposed on Original Image')
13-34
Marker-Controlled Watershed Segmentation
Notice that some of the mostly-occluded and shadowed objects are not marked, which means that
these objects will not be segmented properly in the end result. Also, the foreground markers in some
objects go right up to the objects' edge. That means you should clean the edges of the marker blobs
and then shrink them a bit. You can do this by a closing followed by an erosion.
se2 = strel(ones(5,5));
fgm2 = imclose(fgm,se2);
fgm3 = imerode(fgm2,se2);
This procedure tends to leave some stray isolated pixels that must be removed. You can do this using
bwareaopen, which removes all blobs that have fewer than a certain number of pixels.
fgm4 = bwareaopen(fgm3,20);
I3 = labeloverlay(I,fgm4);
imshow(I3)
title('Modified Regional Maxima Superimposed on Original Image')
13-35
13 Image Segmentation
Now you need to mark the background. In the cleaned-up image, Iobrcbr, the dark pixels belong to
the background, so you could start with a thresholding operation.
bw = imbinarize(Iobrcbr);
imshow(bw)
title('Thresholded Opening-Closing by Reconstruction')
13-36
Marker-Controlled Watershed Segmentation
The background pixels are in black, but ideally we don't want the background markers to be too close
to the edges of the objects we are trying to segment. We'll "thin" the background by computing the
"skeleton by influence zones", or SKIZ, of the foreground of bw. This can be done by computing the
watershed transform of the distance transform of bw, and then looking for the watershed ridge lines
(DL == 0) of the result.
D = bwdist(bw);
DL = watershed(D);
bgm = DL == 0;
imshow(bgm)
title('Watershed Ridge Lines')
13-37
13 Image Segmentation
The function imimposemin can be used to modify an image so that it has regional minima only in
certain desired locations. Here you can use imimposemin to modify the gradient magnitude image so
that its only regional minima occur at foreground and background marker pixels.
L = watershed(gmag2);
One visualization technique is to superimpose the foreground markers, background markers, and
segmented object boundaries on the original image. You can use dilation as needed to make certain
aspects, such as the object boundaries, more visible. Object boundaries are located where L == 0.
The binary foreground and background markers are scaled to different integer values so that they are
assigned different labels.
13-38
Marker-Controlled Watershed Segmentation
This visualization illustrates how the locations of the foreground and background markers affect the
result. In a couple of locations, partially occluded darker objects were merged with their brighter
neighbor objects because the occluded objects did not have foreground markers.
Another useful visualization technique is to display the label matrix as a color image. Label matrices,
such as those produced by watershed and bwlabel, can be converted to truecolor images for
visualization purposes by using label2rgb.
Lrgb = label2rgb(L,'jet','w','shuffle');
imshow(Lrgb)
title('Colored Watershed Label Matrix')
13-39
13 Image Segmentation
You can use transparency to superimpose this pseudo-color label matrix on top of the original
intensity image.
figure
imshow(I)
hold on
himage = imshow(Lrgb);
himage.AlphaData = 0.3;
title('Colored Labels Superimposed Transparently on Original Image')
13-40
Marker-Controlled Watershed Segmentation
See Also
bwareaopen | bwdist | imclose | imcomplement | imdilate | imerode | imgradient | imopen |
imreconstruct | imregionalmax | label2rgb | labeloverlay | watershed
13-41
13 Image Segmentation
In the Color Thresholder app, image segmentation can be an iterative process. For example, try
segmenting the image in several of the color spaces supported by the app because one color space
might isolate a particular color better than another. In any of the supported color spaces, you can
initially perform an automatic segmentation by selecting a region in the foreground or background.
Then, you can refine the segmentation by using color component controls provided by the app.
The last part of this example shows how to save the results of your work, create a mask image, and
get the MATLAB® code the app used to perform the segmentation.
Open the Color Thresholder app from the MATLAB toolstrip. On the Apps tab, in the Image
13-42
Segment Image and Create Mask Using Color Thresholder App
Load the image into the Color Thresholder app. Click Load Image, and then select Load Image
from Workspace. In the Import From Workspace dialog box, select the image from the workspace,
and then click OK.
You can also open the app from the command line by using the colorThresholder function,
specifying the name of the image: colorThresholder(rgb);. You can also acquire an image
“Acquire Live Images in the Color Thresholder App” on page 13-55.
The Color Thresholder app displays the image in the Choose a Color Space tab, with point clouds
representing the image in these color spaces: RGB, HSV, YCbCr, and L*a*b*. For color-based
segmentation, select the color space that provides the best color separation. Using the mouse, rotate
the point cloud representations to see how they isolate individual colors. Segmentation using the
Color Thresholder app can be an iterative process—try several different color spaces before you
achieve a segmentation that meets your needs. For this example, start the process by selecting the
YCbCr color space.
13-43
13 Image Segmentation
When you choose a color space, the app opens a new tab, displaying the image along with a set of
controls for each color component and the point cloud representation. The color controls vary
depending on the color space. For the YCbCr color space, the Color Thresholder app displays three
histograms representing the three color components: the Y component represents brightness, the Cb
component represents the blue-yellow spectrum, and the Cr component represents the red-green
spectrum.
13-44
Segment Image and Create Mask Using Color Thresholder App
To explore the image, move the cursor over the image to access the pan and zoom controls.
Automatic Thresholding
First, segment the image using automatic thresholding. Because the background color (purple cloth)
is close to a uniform color, segment it rather than the foreground objects (the peppers). You can
always invert the mask later using the Invert Mask option.
13-45
13 Image Segmentation
Define a region using the freehand ROI tool. Click the button in the upper-left corner of the
image and draw an ROI on the background. You can draw multiple regions.
After drawing the region, the Color Thresholder app automatically thresholds the image based on
the colors you selected in the region you drew. The Y, Cb, and Cr color controls change to reflect the
segmentation. This automatic thresholding does not create a clean segmentation of the background
and foreground, especially at the lower border between the foreground and background. For this
example, the background color is lighter near the bottom of the image. If you want to delete a region
you drew and start over, right-click anywhere in the region and select Delete Freehand.
13-46
Segment Image and Create Mask Using Color Thresholder App
To fine tune the automatic thresholding, use the color controls. For each Y, Cb, and Cr color control,
you can set the range of values by dragging the lower and upper bounds in that histogram. Using
these color controls you can significantly improve the segmentation of the foreground.
Another approach to segmenting the image in the YCbCr color space is to draw an ROI on the point
cloud to select a range of colors.
On the app toolstrip, click Reset Thresholds to revert back to the original image. In the bottom-right
pane of the app, click and drag the point cloud to rotate until you isolate the view of the color you are
interested in thresholding. Click the button in the upper left corner of the point cloud. The Color
Thresholder app converts the 3-D point cloud into a 2-D representation and activates the polygon
ROI tool. Draw an ROI around the color you want to segment (purple). This method can create a
better segmentation than the initial automatic thresholding approach.
13-47
13 Image Segmentation
To segment the image in another color space, click New Color Space in the app toolstrip. In the
Choose a Color Space tab, choose the HSV color space.
The Color Thresholder app creates a new tab displaying the image and the color component
controls for the HSV color space. In this color space, H represents hue, S represents saturation, and
V represents value. The HSV color space uses a dual-direction knob for the H component and two
histogram sliders for the S and V components. The tab also contains the point cloud representation of
the colors in the image.
13-48
Segment Image and Create Mask Using Color Thresholder App
As in the previous iteration, you can use all of the same techniques: automatic thresholding and
interactive use of the color component controls, including the point cloud. When you use the color
controls, you can see the segmentation in progress. In the pane with the H control, change the range
of the hue by clicking and dragging one arrow at a time. Experiment with the controls until you have
a clean separation of the background from the foreground. You can clean up small imperfections after
you create the mask image using toolbox functions, such as morphological operators.
13-49
13 Image Segmentation
This part of the example shows how to create a mask image after segmentation. You can also get the
segmented image and the MATLAB code used to create the mask image.
Because the example segmented the background (the purple cloth) rather than the foreground
objects (the peppers), swap the foreground and background by clicking Invert Mask.
13-50
Segment Image and Create Mask Using Color Thresholder App
View the binary mask image that you created by clicking Show Binary on the app toolstrip.
13-51
13 Image Segmentation
Save the mask image in the workspace. On the mask toolstrip, click Export and select Export
Images.
In the Export To Workspace dialog box, specify variable names for the binary mask image. You can
also save the original input RGB image and the segmented version of the original image.
13-52
Segment Image and Create Mask Using Color Thresholder App
To save the MATLAB code required to recreate the segmentation, click Export and select Export
Function. The Color Thresholder app opens the MATLAB Editor with the code that creates the
segmentation. To save the code, click Save on the MATLAB Editor toolstrip. You can run this code,
passing it an RGB image, to create the same mask image programmatically.
% Invert mask
BW = ~BW;
13-53
13 Image Segmentation
end
See Also
Color Thresholder
13-54
Acquire Live Images in the Color Thresholder App
To begin color thresholding, add images that you acquire live from a webcam using the MATLAB
Webcam support. Install the MATLAB Support Package for USB Webcams to use this feature. See
“Install the MATLAB Support Package for USB Webcams” (Image Acquisition Toolbox) for information
on installing the support package.
3 On the Image Capture tab, if you have only one webcam connected to your system, it is selected
by default, and a live preview window opens. If you have multiple cameras connected and want to
use a different one, select the camera in the Camera list.
4 Set properties for the camera to control the image. Click the Camera Properties field to open a
list of your camera’s properties. This list varies, depending on your device.
13-55
13 Image Segmentation
Use the sliders or drop-downs to change property settings. The Preview window updates
dynamically when you change a setting. When you are done setting properties, click outside of
the box to close the properties list.
5 Use the Preview window as a guide to set up the image you want to acquire. The Preview
window shows the live images streamed as RGB data.
13-56
Acquire Live Images in the Color Thresholder App
6 Click Take Snapshot. The Choose a Color Space dialog box opens and displays the four color
space options: RGB, HSV, YCbCr, and L*a*b*.
13-57
13 Image Segmentation
7 Choose a color space by clicking the button of your choice. Your image displays in a new tab
using the selected color space.
13-58
Acquire Live Images in the Color Thresholder App
8 You can now perform color thresholding on the image. See “Segment Image and Create Mask
Using Color Thresholder App” on page 13-42 for information about processing the image.
9 If you want to save the image that you captured, click Export and select Export Images.
In the Export To Workspace dialog box, you can save the binary mask, segmented image, and the
captured image. The Input RGB Image option saves the RGB image captured from the webcam.
10 You can take more snapshots, preview, or change camera properties at any time by clicking the
Camera tab.
See Also
Color Thresholder
13-59
13 Image Segmentation
More About
• “Segment Image and Create Mask Using Color Thresholder App” on page 13-42
13-60
Image Segmentation Using Point Clouds in the Color Thresholder App
Read an image into the workspace. For this example, read the sample image mandi.tif into the
workspace. The image is a Bayer pattern-encoded image. To work with the image in the Color
Thresholder, you must convert the image into an RGB image, using the demosaic function. After
converting the image, display it with the imshow function.
X = imread('mandi.tif');
rgb = demosaic(X,'bggr');
imshow(rgb)
From the MATLAB® Toolstrip, open the Apps tab and under Image Processing and Computer Vision,
click the Color Thresholder icon . The Color Thresholder app opens.
13-61
13 Image Segmentation
To bring an image into the Color Thresholder app, click Load Image. Because the image is already in
the workspace, choose the Load Image from Workspace option. In the Import from Workspace
dialog box, select the variable you created and click OK. You can also load an image by specifying its
file name.
13-62
Image Segmentation Using Point Clouds in the Color Thresholder App
You can also open the app using the colorThresholder command, specifying the name of the
image you want to open: colorThresholder(rgb). For information about acquiring an image from
a camera, see “Acquire Live Images in the Color Thresholder App” on page 13-55.
When it opens, the Color Thresholder app displays the Choose a color space tab. This tab displays
the image, and point cloud representations of the image, in several popular color spaces: RGB, HSV,
YCbCr, and L*a*b*.
Explore the point cloud representations of the image in each color space. Rotate the 3-D depiction in
each color space to see how well the colors are differentiated. You select the color to segment from
this 3-D display, so it is important to choose a representation that allows you to select the colors of
the area you want to segment. For this example, choose the L*a*b* color space.
13-63
13 Image Segmentation
When you choose a color space, the app opens a new tab, displaying the image along with a set of
controls for each color component of the color space you chose. For the L*a*b* color space, the Color
Thresholder displays three histograms representing the three components in the color space. The tab
also includes a 3-D point cloud representation of the colors of the image in the color space. Other
color spaces use different types of controls.
13-64
Image Segmentation Using Point Clouds in the Color Thresholder App
To explore the image, move the cursor over the image and use the pan and zoom controls.
13-65
13 Image Segmentation
To segment the image, rotate the 3-D color cloud, using the mouse, to find a view of the color cloud
that isolates the colors that you want to segment. To select the colors in the image, click the drawing
tool in the upper-left corner of the point cloud. Then, using the mouse, draw a polygon around the
colors you want to segment. When you close the polygon, the Color Thresholder app performs the
segmentation based on the colors you selected. You can use the histograms to refine your
segmentation.
13-66
Image Segmentation Using Point Clouds in the Color Thresholder App
For information about creating a mask and saving it, see “Segment Image and Create Mask Using
Color Thresholder App” on page 13-42.
See Also
Color Thresholder
More About
• “Segment Image and Create Mask Using Color Thresholder App” on page 13-42
13-67
13 Image Segmentation
You can open the Image Segmenter from the command line. Specify an image in the workspace or
the name of a file.
I = imread('coins.png');
imageSegmenter(I)
Alternatively, open the app from the Apps tab, under Image Processing and Computer Vision.
Then, from the Load menu, choose the name of a workspace variable or the name of the file
containing the image.
After you load an image, you can optionally load an existing binary mask. For example, you might
have previously created a mask of an RGB image in the Color Thresholder app and you want to
refine the segmentation. To load an existing mask, click Load Mask. The segmentation mask image
must be a logical image of the same size as the image you are segmenting.
To add segmented regions to an existing mask, use tools in the Add to Mask menu. The app displays
the steps you take while creating the segmentation in the History panel of the Data Browser.
Tool Description
Threshold An automatic technique where you specify an intensity value that you want
to isolate. This technique can be useful if the objects you want to segment
in the image have similar pixel intensity values and these values are easily
distinguished from other areas of the image, such as the background. For
more information, see “Segment Image Using Thresholding in Image
Segmenter” on page 13-71.
Graph Cut A semi-automatic technique that can segment foreground and background.
This technique does not require careful placement of seed points and you
can refine the segmentation interactively. For more information, see
“Segment Image Using Graph Cut in Image Segmenter” on page 13-92.
13-68
Getting Started with Image Segmenter App
Tool Description
Auto Cluster An automatic technique where the app groups image features into a binary
segmentation. This option is only available if you have Statistics and
Machine Learning Toolbox™. For more information, see “Segment Image
Using Auto Cluster in Image Segmenter” on page 13-117.
Find Circles An automatic technique where you specify the minimum and maximum
diameter of the circular objects you want to detect. For more information,
see “Segment Image Using Find Circles in Image Segmenter” on page 13-
110
Local Graph Cut A semi-automatic technique, similar to the Graph Cut method, that can
(grabcut) segment foreground and background. With Local Graph Cut (grabcut), you
first define an ROI that encompasses the object in the image that you want
to segment. The Image Segmenter automatically segments the object in
the ROI. You can refine the segmentation by drawing lines on the image to
identify the foreground and the background within the ROI. Everything
outside the ROI is considered background. For more information, see
“Segment Image Using Local Graph Cut (Grabcut) in Image Segmenter” on
page 13-101.
Flood Fill An automatic technique where you specify starting points and the method
segments areas with similar intensity values.
Draw ROI A manual technique where you draw shapes that outline the region the
objects you want to segment. Using the mouse, you can draw rectangles,
ellipses, polygons, or freehand shapes. For more information, see
“Segment Image By Drawing Regions Using Image Segmenter” on page
13-75.
When using the Auto Cluster, Graph Cut, and Flood Fill segmentation tools, you can also include
texture as an additional consideration in your segmentation. Texture filtering can help distinguish
foreground from background. To turn the texture option on and off, click Include Texture Features.
When enabled, Image Segmenter uses Gabor filters to analyze the texture of the image as a
preprocessing step in the segmentation. For more information about Gabor filters, see “Texture
Segmentation Using Gabor Filters” on page 13-2.
Tool Description
Morphology Many morphological techniques, such as dilation and erosion. For an
example, view “Refine Segmentation Using Morphology in Image
Segmenter” on page 13-87.
Active contours (also An iterative method that grows or shrinks regions in an image. You identify
known as snakes) the regions with seed points. For an example, view “Segment Image Using
Active Contours in Image Segmenter” on page 13-81.
Clear borders A fast way to remove small regions on the edge of the image.
Fill holes A fast way to fill small holes in foreground regions. For an example, view
“Refine Segmentation Using Morphology in Image Segmenter” on page 13-
87.
13-69
13 Image Segmentation
Tool Description
Invert mask Sometimes the segmentation is easier to evaluate if you invert the
foreground and background. For an example, view “Segment Image Using
Auto Cluster in Image Segmenter” on page 13-117
You can also generate the code used to perform the segmentation (requires Statistics and Machine
Learning Toolbox.) Use the code to apply the same segmentation algorithm to similar images. To get
the code, click Export and select Generate Function. The app opens the MATLAB editor containing
a function with the autogenerated code. To save the code, click Save in the MATLAB editor.
See Also
Image Segmenter
13-70
Segment Image Using Thresholding in Image Segmenter
The Image Segmenter app supports many different segmentation methods and using the app can be
an iterative process. You might try several different methods until you achieve the results you want.
Open the Image Segmenter app and load an image to be segmented. The app can open any file that
can be read by imread.
For this example, first read an image into the workspace. This example uses an MRI image of a knee.
The goal is to create a mask image that segments the bone from the soft tissue in the image.
I = dicomread('knee1');
knee = mat2gray(I);
From the MATLAB® toolstrip, open the Image Segmenter app. On the Apps tab, in the Image
On the app toolstrip, click Load, and then select Load Image from Workspace. In the Import from
Workspace dialog box, select the image you read into the workspace. The Image Segmenter app
displays the image you selected.
13-71
13 Image Segmentation
You can also open the image in the Image Segmenter app using the imageSegmenter command, as
follows:
imageSegmenter(knee);
After you load an image, you can optionally load an existing binary mask. For example, you might
have previously created a mask of an RGB image in the Color Thresholder app and you want to
refine the segmentation. To load an existing mask, click Load Mask. The segmentation mask image
must be a logical image of the same size as the image you are segmenting.
Click Threshold in the Create Mask section of the Image Segmenter app toolstrip. The app displays
the thresholded image in the Threshold tab. By default, the app uses global thresholding.
13-72
Segment Image Using Thresholding in Image Segmenter
You can also choose Manual or Adaptive thresholding. Each thresholding option supports controls
that you can use to fine-tune the thresholding. For example, with Manual thresholding, you can
choose the threshold value using the slider. With Adaptive thresholding, you can choose the
sensitivity using the slider. Try each option to see which thresholding method performs the best
segmentation.
13-73
13 Image Segmentation
The knee image does not have well-defined pixel intensity differences between foreground and
background and thresholding does not seem like the best choice to segment this image.
To save the segmentation, click Create Mask. If you want to try another segmentation method in the
Image Segmenter app, click Cancel to return to the main segmentation app window.
See Also
Image Segmenter
Related Examples
• “Getting Started with Image Segmenter App” on page 13-68
13-74
Segment Image By Drawing Regions Using Image Segmenter
The Image Segmenter app offers many different segmentation methods and using the app can be an
iterative process. You might try several different methods until you achieve the results you want.
Open the Image Segmenter app and load an image to be segmented. The app can open any file that
can be read by imread.
For this example, read an image into the workspace. This example uses an MRI image of a knee. The
goal is to create a mask image that segments the bone from the soft tissue in the image.
I = dicomread('knee1');
knee = mat2gray(I);
Open the Image Segmenter app from the MATLAB® toolstrip. On the Apps tab, in the Image
On the app toolstrip, click Load, and then select Load image from Workspace. In the Import from
Workspace dialog box, select the image you read into the workspace. The Image Segmenter app
displays the image you selected.
13-75
13 Image Segmentation
You can also open the image in the Image Segmenter app using the imageSegmenter command, as
follows:
imageSegmenter(knee);
After you load an image into the app, you can optionally load an existing binary mask. For example,
you might have previously created a mask of an RGB image in the Color Thresholder app and you
want to refine the segmentation. To load an existing mask, click Load Mask. The segmentation mask
image must be a logical image of the same size as the image you are segmenting.
Expand the Add to Mask group and click Draw ROIs. The app opens the ROI tab.
13-76
Segment Image By Drawing Regions Using Image Segmenter
Select the type of ROI you want to draw. For this example, choose Assisted Freehand. As you move
the cursor over the image, it changes to the crosshairs shape. Press the mouse button, and begin
drawing a freehand shape over the area of the image that you want to segment. With the Assisted
Freehand ROI option, which is preselected, you can draw a freehand shape that automatically follows
edges in the underlying image to help you draw a more accurate ROI. As you draw, click the mouse to
create waypoints. Waypoints can help you make fine adjustments to the shape after you finish
drawing. To add additional waypoints after you finish drawing, double-click on the ROI edge .
13-77
13 Image Segmentation
Continue drawing shapes until all the areas you want to segment are identified. To save the regions
your have drawn, Click Apply (their color changes to yellow). To return to the Segmentation tab, click
Close ROI.
13-78
Segment Image By Drawing Regions Using Image Segmenter
To view the mask image, click Show Binary on the Segmentation tab. To refine the mask image, use
the tools in the Refine Mask section of Image Segmenter app toolstrip, such as Clear Borders or
Fill Holes. When you are done, click Export to save the mask image to the workspace.
13-79
13 Image Segmentation
See Also
Image Segmenter
Related Examples
• “Getting Started with Image Segmenter App” on page 13-68
13-80
Segment Image Using Active Contours in Image Segmenter
The Image Segmenter app offers many different segmentation methods and using the app can be an
iterative process. You might try several different methods until you achieve the results you want.
Open the Image Segmenter app and load an image to be segmented. The app can open any file that
can be read by imread.
For this example, read an image into the workspace. This example uses an MRI image of a knee. The
goal is to create a mask image that segments the bone from the soft tissue in the image.
I = dicomread('knee1');
knee = mat2gray(I);
From the MATLAB® toolstrip, open the Image Segmenter app. On the Apps tab, in the Image
On the app toolstrip, click Load, and then select Load Image from Workspace. In the Import from
Workspace dialog box, select the image you read into the workspace. The Image Segmenter app
displays the image you selected.
13-81
13 Image Segmentation
You can also open the image in the Image Segmenter app using the imageSegmenter command, as
follows:
imageSegmenter(knee);
After you load an image, you can optionally load an existing binary mask. For example, you might
have previously created a mask of an RGB image in the Color Thresholder app and you want to
refine the segmentation. To load an existing mask, click Load Mask. The segmentation mask image
must be a logical image of the same size as the image you are segmenting.
To segment an image using Active Contours, you must first create a rough estimation of the
segmentation. For example, you can use the ROI tools to create a rough segmentation of the image
(see “Segment Image By Drawing Regions Using Image Segmenter” on page 13-75). You could also
load an existing binary mask image.
13-82
Segment Image Using Active Contours in Image Segmenter
For this example, use the ROI tools to create seed shapes in the areas you want to segment. When
you are done drawing the regions, click Apply and then click Close ROI to return to the
Segmentation tab.
On the Segmentation tab, in the Refine Mask section of the toolstrip and click the Active Contours.
The Image Segmenter app opens the Active Contours tab.
To use active contours, click Evolve. The app starts performing iterations to grow the seed masks to
fill the objects to their borders. Initially, use the default active contours method (Region-based) and
the default number of iterations (100). The Image Segmenter displays the progress of the processing
in the lower right corner. Looking at the results, you can see that this approach worked well for two
of the three objects but the segmentation bled into the background for one of the objects. The object
boundary isn’t as well-defined in this area.
13-83
13 Image Segmentation
One way to get a better segmentation is to repeat active contours, reducing the number of iterations.
Change the number of iterations in the iterations box, specifying 35, and click Evolve again. This
time, the segmentation does not bleed into the background.
13-84
Segment Image Using Active Contours in Image Segmenter
To save the segmentation, click Apply. To return to the Segmentation tab, click Close Active
Contours.
To view the mask image, click Show Binary on the Segmentation tab. You can use other tools in the
Image Segmenter app to refine the mask image, such as Clear Borders or Fill Holes. To save the
mask image to the workspace, click Export .
13-85
13 Image Segmentation
See Also
Image Segmenter | activecontour
Related Examples
• “Getting Started with Image Segmenter App” on page 13-68
13-86
Refine Segmentation Using Morphology in Image Segmenter
This example creates a mask image using hand-drawn ROIs and active contours (see “Segment Image
Using Active Contours in Image Segmenter” on page 13-81).
Open the Image Segmenter app and load an image to be segmented. The Image Segmenter can
open any file that can be read by imread.
For this example, first read an image into the workspace. This example uses an MRI image of a knee.
The goal is to create a mask image that segments the bone from the soft tissue in the image.
I = dicomread('knee1');
knee = mat2gray(I);
From the MATLAB® toolstrip, open the Image Segmenter app. On the Apps tab, in the Image
On the app toolstrip, click Load, and then select Load image from Workspace. In the Import from
Workspace dialog box, select the image you read into the workspace. The Image Segmenter app
displays the image you selected.
13-87
13 Image Segmentation
You can also open the image in the Image Segmenter app using the imageSegmenter command, as
follows:
imageSegmenter(knee);
After you load an image, you can optionally load an existing binary mask. For example, you might
have previously created a mask by drawing ROIs. To load an existing mask, click Load Mask. The
segmentation mask image must be a logical image of the same size as the image you are segmenting.
Create a rough segmentation of the image using ROI drawing tools. Use active contours to finish the
segmentation. For more details on this process, see “Segment Image Using Active Contours in Image
Segmenter” on page 13-81.
After finishing the segmentation, click Show Binary on the Segmentation tab to view the mask
image. Upon close examination, you can see several small holes in the mask image.
13-88
Refine Segmentation Using Morphology in Image Segmenter
The Image Segmenter includes morphological tools to refine the binary mask. Expand the Refine
Mask section of the app toolstrip and click Fill Holes.
13-89
13 Image Segmentation
13-90
Refine Segmentation Using Morphology in Image Segmenter
To save the binary mask, click Export and select Export Images.
See Also
Image Segmenter
Related Examples
• “Getting Started with Image Segmenter App” on page 13-68
13-91
13 Image Segmentation
The Graph Cut technique applies graph theory to image processing to achieve fast segmentation. The
technique creates a graph of the image where each pixel is a node connected by weighted edges. The
higher the probability that pixels are related the higher the weight. The algorithm cuts along weak
edges, achieving the segmentation of objects in the image. The Image Segmenter uses a particular
variety of the Graph Cut algorithm called lazysnapping. For information about another segmentation
technique that is related to graph cut, see “Segment Image Using Local Graph Cut (Grabcut) in
Image Segmenter” on page 13-101.
Read an image into the workspace. For this example, read the sample image baby.png into the
workspace.
b = imread('baby.jpg');
From the MATLAB® toolstrip, open the Image Segmenter app. On the Apps tab, in the Image
13-92
Segment Image Using Graph Cut in Image Segmenter
On the app toolstrip, click Load Image, and then select Load Image from Workspace. In the
Import from Workspace dialog box, select the image you read into the workspace. The Image
Segmenter app displays the image you selected.
13-93
13 Image Segmentation
You can also open the app using the imageSegmenter command, specifying the image:
imageSegmenter(b);
13-94
Segment Image Using Graph Cut in Image Segmenter
The Image Segmenter opens a new tab for Graph Cut segmentation. As a first step in Graph Cut
segmentation, mark the elements of the image that you want to be in the foreground. When the
Image Segmenter opens the Graph Cut tab, it preselects the Mark Foreground option. To mark an
object as foreground, draw a line (also called a scribble) over the object. When you draw a line, try to
include all the different values in the object you want to segment. You can draw as many separate
lines as you like. If you are not satisfied with the lines you draw, you can always edit them. Click
Erase and move the cursor over any part of the line you want to remove. If want to start over, click
Clear Markings.
13-95
13 Image Segmentation
Next, click Mark Background and draw scribbles to mark the elements of the image you want to be
the background. When you finish drawing the lines, the Image Segmenter immediately performs the
segmentation (shown in blue).
13-96
Segment Image Using Graph Cut in Image Segmenter
To refine the segmentation, continue drawing foreground and background lines. For example, there
are several areas near the bottom of the image that need to be removed from the foreground. To fix
these problems, draw additional background lines on these parts of the image.
13-97
13 Image Segmentation
13-98
Segment Image Using Graph Cut in Image Segmenter
When you are satisfied with the segmentation, click Create Mask in the toolstrip on the Graph Cut
tab. The app closes the Graph Cut tab and returns you to the Segmentation tab.
When you return to the main Segmentation tab, you can use tools to refine the mask image, such as
Morphology and Active Contours. To save the mask image, click Export. You can also use the Export
option to obtain the code the Image Segmenter app used to create the segmentation.
13-99
13 Image Segmentation
See Also
Image Segmenter | lazysnapping
Related Examples
• “Getting Started with Image Segmenter App” on page 13-68
13-100
Segment Image Using Local Graph Cut (Grabcut) in Image Segmenter
Then, as with Graph Cut, you refine the automatic segmentation by drawing lines, called scribbles, on
the image inside the ROI. The lines you draw identify what you want in the foreground and what you
want in the background. The Local Graph Cut option only segments elements within the boundaries
of the ROI.
The Local Graph Cut technique, similar to the Graph Cut technique, applies graph theory to image
processing to achieve fast segmentation. The algorithm creates a graph of the image where each
pixel is a node connected by weighted edges. The higher the probability that pixels are related the
higher the weight. The algorithm cuts along weak edges, achieving the segmentation of objects in the
image. For information about the Graph Cut technique, see “Segment Image Using Graph Cut in
Image Segmenter” on page 13-92.
car = imread('car2.jpg');
From the MATLAB® Toolstrip, open the Image Segmenter app. On the Apps tab, in the Image
13-101
13 Image Segmentation
On the app toolstrip, click Load Image, and then select Load Image from Workspace. In the
Import from Workspace dialog box, select the image you read into the workspace. The Image
Segmenter app displays the image you selected.
13-102
Segment Image Using Local Graph Cut (Grabcut) in Image Segmenter
You can also open the app using the imageSegmenter command, specifying the name of the image
variable.
imageSegmenter(b);
13-103
13 Image Segmentation
The Image Segmenter app opens a new tab for Local Graph Cut segmentation. As a first step in
Local Graph Cut segmentation, draw an ROI around the object in the image that you want to
segment. When the Image Segmenter app opens the Local Graph Cut tab, it preselects the Draw
ROI button. Position the cursor over the image and draw an ROI that encompasses the entire object
you want to segment. To get a good initial segmentation, make sure the ROI you draw completely
surrounds the object, leaving a small amount of space between the object and the ROI boundary.
Make sure the object you want to segment is completely inside the ROI.
You can choose to draw a rectangular or polygonal ROI. Use the ROI Style menu to choose. To draw
a rectangle, position the cursor over the image and then click and drag. To draw a polygon, click and
drag the mouse, creating a vertex at each click. Double-click to finish the polygon. If you are not
satisfied with the shape you drew, you can always edit it. Right-click the ROI and choose Delete.
When you finish the ROI, the Image Segmenter app automatically segments the object in the ROI.
The blue shading indicates the segmented area.
13-104
Segment Image Using Local Graph Cut (Grabcut) in Image Segmenter
To refine the automatic segmentation, draw lines (scribbles) to mark any parts of the foreground that
weren't included in the automatic segmentation. After you draw the ROI, the Image Segmenter
selects the Mark Foreground button automatically.
13-105
13 Image Segmentation
To remove areas from the segmentation that are not part of the foreground, mark those areas as
background. Select the Mark Background option and draw lines inside the ROI to identify parts of
the segmentation that should be in the background.
13-106
Segment Image Using Local Graph Cut (Grabcut) in Image Segmenter
When you are satisfied with the segmentation, click Apply. The Image Segmenter app changes the
color of the segmented part of the image to yellow.
13-107
13 Image Segmentation
To view the mask image, click Show Binary. You can also view the binary mask image in the main
Segmentation tab. To return to the main Image Segmenter app, click Close Local Graph Cut.
13-108
Segment Image Using Local Graph Cut (Grabcut) in Image Segmenter
When you are done segmenting the image, you can save the binary mask, using the Export option.
You can also obtain the code used for the segmentation.
See Also
Image Segmenter | grabcut
Related Examples
• “Getting Started with Image Segmenter App” on page 13-68
13-109
13 Image Segmentation
coins = imread('coins.png');
From the MATLAB® Toolstrip, open the Image Segmenter app. On the Apps tab, in the Image
On the app toolstrip, click Load Image, and then select Load Image from Workspace. In the
Import from Workspace dialog box, select the image you read into the workspace. The Image
Segmenter app displays the image you selected.
13-110
Segment Image Using Find Circles in Image Segmenter
You can also open the app using the imageSegmenter command, specifying the image:
imageSegmenter(coins);
On the Image Segmenter app toolstrip, expand the Create Mask section and select Find Circles.
13-111
13 Image Segmentation
The Image Segmenter app opens a new tab for the Find Circles segmentation option.
In the Find Circles tab, first click Ruler and measure the diameters of some representative circles in
the image to determine the range of sizes. To find circles, you must specify the lower and upper
bounds on the diameters. Set the values in the Min. Diameter and Max. Diameter fields to values
that you think include all the objects, 50 and 150.
13-112
Segment Image Using Find Circles in Image Segmenter
On the Find Circles tab, click Find Circles. The Image Segmenter app fills the circles it finds.
However, find circles does not find two of the circles. Examining the image more closely, you discover
that the diameter of these coins are slightly smaller than the specified minimum diameter.
13-113
13 Image Segmentation
Change the minimum value to accommodate the sizes of the objects that were not segmented and run
the find circles segmentation operation again. This time, Find Circles segments all the objects in the
image.
13-114
Segment Image Using Find Circles in Image Segmenter
When you are satisfied with the segmentation, click Create Mask on the Find Circles tab toolstrip
and create the mask image. The Image Segmenter app closes the Find Circles tab and returns to
the Segmentation tab. The color of the segmented circles changes to yellow. To view the mask image,
click Show Binary.
When you are done segmenting the image, save the mask image by using the Export option. You can
also obtain the code used for the segmentation.
13-115
13 Image Segmentation
See Also
Image Segmenter | imfindcircles
Related Examples
• “Getting Started with Image Segmenter App” on page 13-68
13-116
Segment Image Using Auto Cluster in Image Segmenter
coins = imread('coins.png');
From the MATLAB® Toolstrip, open the Image Segmenter app. On the Apps tab, in the Image
On the app toolstrip, click Load Image, and then select Load Image from Workspace. In the
Import from Workspace dialog box, select the image you read into the workspace. The Image
Segmenter app displays the image you selected.
13-117
13 Image Segmentation
You can also open the app using the imageSegmenter command, specifying the image:
imageSegmenter(coins);
On the Image Segmenter app toolstrip, expand the Create Mask section and choose Auto Cluster.
13-118
Segment Image Using Auto Cluster in Image Segmenter
The Image Segmenter app automatically segments the image, displaying the result. The Auto Cluster
option has correctly segmented all the circles. However, some of the circles have holes.
13-119
13 Image Segmentation
Clean up the holes in the segmented image using the Fill Holes option in the Refine Mask toolstrip
group.
13-120
Segment Image Using Auto Cluster in Image Segmenter
When you are satisfied with the segmentation, click Show Binary to view the mask image. To save
the binary mask, use the Export option. You can also obtain the code used for the segmentation.
13-121
13 Image Segmentation
See Also
Image Segmenter | imsegkmeans
Related Examples
• “Getting Started with Image Segmenter App” on page 13-68
13-122
Plot Land Classification with Color Features and Superpixels
Read an image into the workspace. For better performance, this example reduces the size of the
image by half. Visually, there are four types of land that are distinguishable in the blue marble image
based only on color features: forested regions, dry/desert regions, ice covered regions, and water.
A = imread('http://eoimages.gsfc.nasa.gov/images/imagerecords/74000/74192/world.200411.3x5400x270
A = imresize(A,0.5);
imshow(A)
Alab = rgb2lab(A);
Compute the superpixel oversegmentation of the original image and display it.
[L,N] = superpixels(Alab,20000,'isInputLab',true);
BW = boundarymask(L);
imshow(imoverlay(A,BW,'cyan'))
13-123
13 Image Segmentation
pixelIdxList = label2idx(L);
Determine the median color of each superpixel region in the L*a*b* color space.
[m,n] = size(L);
meanColor = zeros(m,n,3,'single');
for i = 1:N
meanColor(pixelIdxList{i}) = mean(Alab(pixelIdxList{i}));
meanColor(pixelIdxList{i}+m*n) = mean(Alab(pixelIdxList{i}+m*n));
meanColor(pixelIdxList{i}+2*m*n) = mean(Alab(pixelIdxList{i}+2*m*n));
end
Cluster the color feature of each superpixel by using the imsegkmeans function.
numColors = 4;
[Lout,cmap] = imsegkmeans(meanColor,numColors,'numAttempts',2);
cmap = lab2rgb(cmap);
imshow(label2rgb(Lout))
13-124
Plot Land Classification with Color Features and Superpixels
Use cluster centers as the colormap for a thematic map. The mean colors found during K-means
clustering can be used directly as a colormap to give a more natural visual interpretation of the land
classification assignments of forest, ice, dry land, and water.
imshow(double(Lout),cmap)
13-125
13 Image Segmentation
Load the human chest CT scan data into the workspace. To run this example, you must download the
sample data from MathWorks™ using the Add-On Explorer. See “Install Sample Data Using Add-On
Explorer” on page 11-172.
load chestVolume
whos
Convert the CT scan data from int16 to single to normalize the values to the range [0, 1].
V = im2single(V);
View the chest scans using the Volume Viewer app. Open the app from the MATLAB® Apps toolstrip.
You can also open the app by using the volumeViewer command and specifying the volume as an
argument: volumeViewer(V). Volume Viewer has preset alphamaps that are intended to provide the
best view of certain types of data. To get the best view of the chest scans, select the ct-bone preset.
13-126
Segment Lungs from 3-D Chest Scan
Segment the lungs in the CT scan data using the active contour technique. Active contours is a region
growing algorithm which requires initial seed points. The example uses the Image Segmenter app to
create this seed mask by segmenting two orthogonal 2-D slices, one in the XY plane and the other in
the XZ plane. The example then inserts these two segmentations into a 3-D mask. The example passes
this mask to the activecontour function to create a 3-D segmentation of the lungs in the chest
cavity. (This example uses the active contour method but you could use other segmentation
techniques to accomplish the same goal, such as flood-fill.)
13-127
13 Image Segmentation
imshow(XZ,[],'Border','tight');
13-128
Segment Lungs from 3-D Chest Scan
You can perform the segmentation in the Image Segmenter app. Open the app from the MATLAB
Apps toolstrip or use the imageSegmenter command, specifying a 2-D slice as an argument,
imageSegmenter(XY).
To start the segmentation process, click Threshold to open the lung slice in the Threshold tab. On
the Threshold tab, select the Manual Threshold option and move the Threshold slider to specify a
threshold value that achieves a good segmentation of the lungs. Click Create Mask to accept the
thresholding and return the Segmentation tab.
13-129
13 Image Segmentation
BW = XY > 5.098000e-01;
After this initial lung segmentation, clean up the mask using options on the Refine Mask menu.
13-130
Segment Lungs from 3-D Chest Scan
In the app, you can click each option to invert the mask image so that the lungs are in the foreground
(Invert Mask), remove other segmented elements besides the lungs (Clear Borders), and fill holes
inside the lung segmentation (Fill Holes). Finally, use the Morphology option to smooth the edges of
the lung segmentation. On the Morphology tab, select the Erode Mask operation. After performing
these steps, select Show Binary and save the mask image to the workspace.
BW = imcomplement(BW);
BW = imclearborder(BW);
BW = imfill(BW, 'holes');
radius = 3;
decomposition = 0;
se = strel('disk',radius,decomposition);
BW = imerode(BW, se);
maskedImageXY = XY;
maskedImageXY(~BW) = 0;
imshow(maskedImageXY)
13-131
13 Image Segmentation
Perform the same operation on the XZ slice. Using Load Image, select the XZ variable. Use
thresholding to perform the initial segmentation of the lungs. For the XZ slice, the Global Threshold
option creates an adequate segmentation (the call to imbinarize in the following code). As with the
XY slice, use options on the Refine Mask menu to create a polished segmentation of the lungs. In the
erosion operation on the Morphology tab, specify a radius of 13 to remove small extraneous objects.
To segment the XZ slice and polish the result, the app executes the following code.
BW = imbinarize(XZ);
BW = imcomplement(BW);
BW = imclearborder(BW);
BW = imfill(BW,'holes');
radius = 13;
decomposition = 0;
13-132
Segment Lungs from 3-D Chest Scan
se = strel('disk',radius,decomposition);
BW = imerode(BW, se);
maskedImageXZ = XZ;
maskedImageXZ(~BW) = 0;
imshow(maskedImageXZ)
Create the 3-D seed mask that you can use with the activecontour function to segment the lungs.
Create a logical 3-D volume the same size as the input volume and insert mask_XY and mask_XZ at
the appropriate spatial locations.
13-133
13 Image Segmentation
mask = false(size(V));
mask(:,:,160) = maskedImageXY;
mask(256,:,:) = mask(256,:,:)|reshape(maskedImageXZ,[1,512,318]);
Using this 3-D seed mask, segment the lungs in the 3-D volume using the active contour method. This
operation can take a few minutes. To get a quality segmentation, use histeq to spread voxel values
over the available range.
V = histeq(V);
BW = activecontour(V,mask,100,'Chan-Vese');
segmentedImage = V.*single(BW);
You can view the segmented lungs in the Volume Viewer app by running the command
volumeViewer(segmentedImage). By manipulating the alphamap settings in the Rendering Editor,
you can get a good view of just the lungs.
Use the regionprops3 function with the 'volume' option to calculate the volume of the lungs.
volLungsPixels = regionprops3(logical(BW),'volume');
13-134
Segment Lungs from 3-D Chest Scan
Specify the spacing of the voxels in the x, y, and z dimensions, which was gathered from the original
file metadata. The metadata is not included with the image data that you download from the Add-On
Explorer.
spacingx = 0.76;
spacingy = 0.76;
spacingz = 1.26*1e-6;
unitvol = spacingx*spacingy*spacingz;
volLungs1 = volLungsPixels.Volume(1)*unitvol;
volLungs2 = volLungsPixels.Volume(2)*unitvol;
volLungsLiters = volLungs1 + volLungs2
volLungsLiters = 5.7726
See Also
activecontour | histeq | regionprops3
13-135
13 Image Segmentation
Load a volume into the workspace. This example uses a stack of MRI brain images, stored in the MAT-
file vol_001.mat.
load(fullfile(toolboxdir('images'),'imdata',...
'BrainMRILabeled','images','vol_001.mat'));
whos vol
Open the Volume Segmenter app. Click the Apps tab on the MATLAB® toolstrip. In the Image
Processing and Computer Vision section, click Volume Segmenter.
13-136
Create Binary Mask Using Volume Segmenter
To load the volume in the Volume Segmenter app, click Open Volume in the app toolstrip. For this
example, select Open from Workspace. In the Import Volume dialog box, select the volume you
loaded into the workspace, vol, and click OK. (You can also specify a volume when you open the app
by using the volumeSegmenter command: volumeSegmenter(vol).)
The Volume Segmenter app displays a 3-D representation of the volume in the 3-D Display pane
and displays individual slices of the data set in the Slice pane.
13-137
13 Image Segmentation
By default, the Slice pane displays the first slice of your data. The app displays the number of the
slice displayed at the top of the image, for example, 1/155. In this dataset, the first few slices do not
contain images of the brain.
The app also automatically creates a label for the segmentation in the Labels pane, using the default
name Label1. You can define multiple labels in the Labels pane. However, to create a binary mask,
you must use only one label.
To change the name of the label, double-click the label name. To change the color associated with the
label, double-click the color square displayed in the Labels pane. You can optionally load an existing
set of labels into the app using the Open Labels button.
To determine what you want to segment, explore the volume using the 3-D Display pane and the
Slice pane.
In the 3-D Display pane, you can rotate the volume to examine the data from every angle, using the
mouse. You can also customize the display of the volume in the 3-D Display tab in the app toolstrip.
13-138
Create Binary Mask Using Volume Segmenter
For example, if you have metadata that descibes the relative size of the voxels, you can specify it in
the Spatial Referencing part of the 3-D Display tab in the app toolstrip. To improve your view of
the data, you can change the background color used in the 3-D display, modify the threshold and
opacity of the display, and include orientation axes with the display, as shown in this figure. With the
brain MRI data, you can see the tumor in the temporal lobe that you want to segment.
You can also view each slice of the volume in the Slice pane. Use the slider at the bottom of the pane
to move from slice to slice. You can see the tumor on slice 35 through slice 88. By default, the Slice
pane displays the volume oriented along the X-Y axis, but you can change this using buttons in the
Orientation section of the toolstrip on the Segmenter tab. The Slice pane is also where you use
drawing tools to define the mask.
13-139
13 Image Segmentation
Once you have identified the object you want to segment, you can use the tools on the Draw tab in
the app toolstrip to define the region. Select the drawing tool you want to use from the ROI tools:
Freehand, Assisted Freehand, and Polygon, and a Paint Brush tool.
In the Slice pane, navigate to the slice where the object first appears, slice 35, and draw an outline
around the object. For this example, use the Polygon drawing tool. Click to create a vertex, then
move the cursor and click again to create a second vertex with a straight line connecting them.
Continue this process to create a connected line. To add additional vertices after you finish drawing,
double-click on the ROI edge.
13-140
Create Binary Mask Using Volume Segmenter
You could move through the volume, slice-by-slice, and draw an ROI on each slice where the object
appears. However, the Volume Segmenter app provides several automated interpolation tools that
can help with segmenting an object across slices.
To use interpolation, you must first manually define the region on two slices. You have already defined
the region on the first slice where the object appears, slice 35. Use the same process to define the
region on the last slice where it appears, slice 88. The app places two bars on top of the slider, using
the color associated with the label, to indicate the slices with ROIs.
13-141
13 Image Segmentation
With the object defined on two slices, click Auto Interpolate. The app automatically defines the ROI
on all the intervening slides. The app uses blue bars to indicate all the slices that have ROIs, which
now appear like a solid bar fro slice 35 to slice 88.
Alternatively, after defining an ROI on two slices, you can click Manually Interpolate. With this
option, the app opens the Manually Interpolate dialog box. You select the two regions from which you
want to interpolate, Region One and Region Two. To select the first region, use the slider at the
bottom of the dialog box to navigate to the first slice with an ROI, slice 35, and then click inside the
ROI displayed. To select the second region, click Region Two, navigate to slide 88, and click inside the
ROI displayed. After selecting both regions, click Run to interpolate the ROI on all intervening slices.
13-142
Create Binary Mask Using Volume Segmenter
After using interpolation, check the individual slices to see if the interpolation created satisfactory
ROIs. Note that the ROI on slice 71 does not fill the entire object that you want to segment. You can
manually adjust the ROI using the Paint Brush tool. Alternatively, you can use one of the tools on
Automate tab. For example, you can use Active Contours to grow the ROIs on the slices where it
does not fill the full size of the tumor. You can also use the Add Algorithm to specify your own
algorithm to operate on the ROIs.
13-143
13 Image Segmentation
You can also your own algorithm to operate on the ROIs. On the Automate tab, click Add
Algorithm. Choose whether you want your processing to operate on each 2-D slice (Slice-based) or
on the entire 3-D volume (Volume-based).
13-144
Create Binary Mask Using Volume Segmenter
For this example, under Slice-Based, select the New option and click Function Template to create a
new function that operates on each 2-D slice. The app opens the template in the MATLAB editor.
Replace the sample code in the template with code that you want to use. Your function must accept
two arguments: each slice as a separate image and a mask. Your function must also return a mask
image.
When you are done editing the template, save the file. The Volume Segmenter app automatically
creates a button in the Automate tab toolstrip for your function. To test your function on one slice,
click Run. By default, the app applies the function to only the current slice.
13-145
13 Image Segmentation
After testing your function on a single slice, you can run it on all of the slices or a subset of the slices.
You can run it from the current slice to the end (the highest numbered slice) or from the current slice
back to the beginning (slice 1). You can also specify a range of slices by specifying the starting slice
and the ending slice.
When you choose one of the directional options, the app updates the slice numbers in the display. You
can use this display to view the progress of processing.
To create the binary mask volume, click Save Labels on the Segmenter tab. You can save the mask
to a MAT-file or to a workspace variable. For this example, click Save As Workspace Variable. In the
Save to workspace dialog box, specify whether you want to save the segmentation as a logical or
categorial mask. Choose logical,(the default when there is only one label), give the variable a name,
my_mask_volume, and click OK. The app creates a 3-D volume of class logical with the same
dimensions as the original volume.
13-146
Create Binary Mask Using Volume Segmenter
See Also
Volume Segmenter
Related Examples
• “Create Semantic Segmentation Using Volume Segmenter” on page 13-148
13-147
13 Image Segmentation
Load a volume into the workspace. This example uses a stack of MRI brain images, stored in the MAT-
file vol_001.mat.
load(fullfile(toolboxdir('images'),'imdata', ...
'BrainMRILabeled','images','vol_001.mat'));
This command loads a 240-by-240-by-155 volume named vol into the workspace.
whos vol
Open the Volume Segmenter app. Click the Apps tab on the MATLAB® toolstrip. In the Image
Processing and Computer Vision section, click Volume Segmenter.
13-148
Create Semantic Segmentation Using Volume Segmenter
To load the volume in the Volume Segmenter app, click Open Volume in the app toolstrip. For this
example, select Open from Workspace. In the Import Volume dialog box, select the volume you
loaded into the workspace, vol, and click OK. (You can also specify a volume when you open the app
by using the volumeSegmenter command: volumeSegmenter(vol).)
The Volume Segmenter app displays a 3-D representation of the volume in the 3-D Display pane
and displays individual slices of the data set in the Slice pane.
13-149
13 Image Segmentation
By default, the Slice pane displays the first slice of your data. The app displays the number of the
slice displayed at the top of the image, for example, 1/155. In this dataset, the first few slices do not
contain images of the brain.
The app also automatically creates a label for the segmentation in the Labels pane, using the default
name Label1. You can define multiple labels in the Labels pane. However, to create a binary mask,
you must use only one label.
To change the name of the label, double-click the label name. To change the color associated with the
label, double-click the color square displayed in the Labels pane. You can optionally load an existing
set of labels into the app using the Open Labels button.
To determine what you want to segment, explore the volume using the 3-D Display pane and the
Slice pane.
In the 3-D Display pane, you can rotate the volume to examine the data from every angle, using the
mouse. You can also customize the display of the volume in the 3-D Display tab in the app toolstrip.
13-150
Create Semantic Segmentation Using Volume Segmenter
For example, if you have metadata that descibes the relative size of the voxels, you can specify it in
the Spatial Referencing part of the 3-D Display tab in the app toolstrip. To improve your view of
the data, you can change the background color used in the 3-D display, modify the threshold and
opacity of the display, and include orientation axes with the display, as shown in this figure. With the
brain MRI data, you can see the tumor in the temporal lobe that you want to segment.
You can also view each slice of the volume in the Slice pane. Use the slider at the bottom of the pane
to move from slice to slice. You can see the tumor on slice 35 through slice 88. By default, the Slice
pane displays the volume oriented along the X-Y axis, but you can change this using buttons in the
Orientation section of the toolstrip on the Segmenter tab. The Slice pane is also where you use
drawing tools to define the mask.
13-151
13 Image Segmentation
Once you have identified the object you want to segment, you can use the tools on the Draw tab in
the app toolstrip to define the region. Select the drawing tool you want to use from the ROI tools:
Freehand, Assisted Freehand, and Polygon, and a Paint Brush tool.
Start by labeling the brain. When one object is nested in another object, as the tumor appears over
the brain on slices, label the larger region first. The first step is to create a label in the Labels pane.
The app provides one label by default, named Label1. To change the name of the label to be more
descriptive for your application, double-click in the label and type in the new name. To change the
default color associated with the label, double-click the colored square in the label identifier and
select a color from the Color dialog box.
13-152
Create Semantic Segmentation Using Volume Segmenter
In the Slice pane, navigate to the slice where the object first appears and use a drawing tool to label
the object. In the following figure, this example uses the Paint Brush tool to label the brain, but you
can use any of the drawing tools.
13-153
13 Image Segmentation
You could move through the volume, slice-by-slice, and draw an ROI on each slice where the object
appears. However, the Volume Segmenter app provides several automated interpolation tools that
can help with segmenting an object across slices.
To use interpolation, you must first manually define the region on two slices. You have already defined
the region on the first slice where the object appears, slice 35. Use the same process to define the
region on the last slice where it appears, slice 88. The app places two bars on top of the slider, using
the color associated with the label, to indicate the slices with ROIs. The app places two bars on the
slider, using the color associated with the label, to indicate the slices with ROIs.
13-154
Create Semantic Segmentation Using Volume Segmenter
With the object defined on two slices, click Auto Interpolate. The app automatically defines the ROI
on all the intervening slides. The app uses blue bars to indicate all the slices that have ROIs, which
now appear like a solid bar fro slice 35 to slice 88.
Alternatively, after defining an ROI on two slices, you can click Manually Interpolate. With this
option, the app opens the Manually Interpolate dialog box. You select the two regions from which you
want to interpolate, Region One and Region Two. To select the first region, use the slider at the
bottom of the dialog box to navigate to the first slice with an ROI, slice 35, and then click inside the
ROI displayed. To select the second region, click Region Two, navigate to slide 88, and click inside the
ROI displayed. After selecting both regions, click Run to interpolate the ROI on all intervening slices.
13-155
13 Image Segmentation
After using interpolation, check the individual slices to see if the interpolation created satisfactory
ROIs. Note that the ROI on slice 71 does not fill the entire object that you want to segment. You can
manually adjust the ROI using the Paint Brush tool. Alternatively, you can use one of the tools on
Automate tab. For example, you can use Active Contours to grow the ROIs on the slices where it
does not fill the full size of the tumor.
13-156
Create Semantic Segmentation Using Volume Segmenter
You can also your own algorithm to operate on the ROIs. On the Automate tab, click Add
Algorithm. Choose whether you want your processing to operate on each 2-D slice (Slice-based) or
on the entire 3-D volume (Volume-based).
13-157
13 Image Segmentation
For this example, under Slice-Based, select the New option and click Function Template to create a
new function that operates on each 2-D slice. The app opens the template in the MATLAB editor.
Replace the sample code in the template with code that you want to use. Your function must accept
two arguments: each slice as a separate image and a mask. Your function must also return a mask
image.
When you are done editing the template, save the file. The Volume Segmenter app automatically
creates a button in the Automate tab toolstrip for your function. To test your function on one slice,
click Run. By default, the app applies the function to only the current slice.
13-158
Create Semantic Segmentation Using Volume Segmenter
After testing your function on a single slice, you can run it on all of the slices or a subset of the slices.
You can run it from the current slice to the end (the highest numbered slice) or from the current slice
back to the beginning (slice 1). You can also specify a range of slices by specifying the starting slice
and the ending slice.
When you choose one of the directional options, the app updates the slice numbers in the display. You
can use this display to view the progress of processing.
After labeling the brain on each slice, label the tumor wherever it appears on a slice, repeating the
process described previously.
First, define a new label in the Labels pane. Click the Plus sign in the Labels pane to create a new
label.
13-159
13 Image Segmentation
In the Slice pane, navigate to the slice where the object first appears and start labeling the object on
each slice using a drawing tool. In the following figure, this example uses the Paint Brush tool to label
the tumor. As previously, you can draw the object on each slice where it appears or use the
interpolation tools to draw on multiple slices automatically. After interpolation, you can use drawing
tools, such as the Eraser, to modify the automated segmentation on each slice.
13-160
Create Semantic Segmentation Using Volume Segmenter
When you complete labeling the brain and the tumor in the volume, save the segmentation. Click
Save Labels on the Segmenter tab and choose from several options.You can save the labeled MRI
data as a MAT-file or as a variable in the workspace. For this example, choose a workspace variable.
When you define multiple labels, the Volume Segmenter app creates a categorical volume in the
workspace that is the same size as the input volume. After you save the segmentation, you can
optionally turn on Autosave, which periodically saves the segmentation automatically.
13-161
13 Image Segmentation
After labeling the brain and the tumor, and saving the segmentation to the workspace as a categorical
volume, you might notice that the background voxels all have the value <undefined>. To label the
background voxels so that they have a recognizable categorization as well, follow a similar process to
that previously described:
1 Define a new label in the Labels pane, give the label a descriptive name, and select the color you
want for the background.
2 Label the background on each slice. Navigate to a slice, select Fill Region on the Draw tab, and
click anywhere in the background. Repeat this process on each slice.
13-162
Create Semantic Segmentation Using Volume Segmenter
When you add a background, it can obscure the other labels in the visualization of the volume in the
3-D Display pane. To view the other labeled regions in the 3-D Display pane, disable the visibility of
the background label. Click Show Labels in the 3-D Display tab, click Customize, and deselect the
visibility of the background label.
See Also
Volume Segmenter
Related Examples
• “Create Binary Mask Using Volume Segmenter” on page 13-136
13-163
13 Image Segmentation
To use the Volume Segmenter app with a blocked image, you must create a blocked image from the
original volume and open the blocked image in the app. Once in the app, working with the blocked
image is very similar to working with any volume.
• Explore the blocked image just as you would any volume, by viewing each slice individually or
manipulating the 3-D representation of the volume. However, with a blocked image, you view the
volume one block at a time. The app includes navigation aids you can use to view each block in the
blocked image.
• Segment the blocked image just as you would any volume, drawing labels on areas of the volume.
However, with a blocked image, you draw labels on the volume a block at a time. To label the
blocked image, use the drawing tools in the app to create ROIs. You can also use interpolation to
automatically label intermediate slices in a block. As you view each block, you segment the part of
the object you find in that block. You can also use automated methods to segment a blocked
image. When using automation, you can process all blocks at the same time.
When working with blocked images in the Volume Segmenter app, create all the labels you want to
use and then save the segmentation. This is more efficient than adding or removing labels
individually. Also, as you finish processing a block, before you begin processing the next block, you
must save the processed block in a file. When you are done, the blockedImage object combines the
individually processed block files into one volume.
If you want to segment a volume that does not fit into memory, create a blockedImage object to
represent the volume. This example uses a stack of MRI brain images as a volume, stored in the MAT
file vol_001.mat. In this MRI data, you can see the tumor that you want to segment in the temporal
lobe.
load(fullfile(toolboxdir('images'),'imdata','BrainMRILabeled','images','vol_001.mat'));
Reading the file loads a 240-by-240-by-155 volume named vol into the workspace.
whos vol
Create a blocked image from the volume, specifying the size of the blocks. Note: If you have a volume
that won't fit in memory, you can specify the file name to blockedImage.
Given the specified block size, the blocked image creates two blocks in each dimension.
13-164
Work with Blocked Images Using the Volume Segmenter
Open the Volume Segmenter app. Select the Apps tab on the MATLAB® toolstrip. In the Image
Processing and Computer Vision section, select Volume Segmenter.
13-165
13 Image Segmentation
To load the blocked image into the Volume Segmenter app, select Open Volume on the app
toolstrip. For this example, select Open Blocked Image from Workspace. In the Import Volume
dialog box, select the blocked image you created in the workspace, bim, and click OK. Alternatively,
you can specify a blocked image when you open the app by using the volumeSegmenter command:
volumeSegmenter(bim).
The app loads the volume and displays its content. When working with a blocked image, the app
displays the contents of one block at a time. The Overview tab indicates which block you are
currently viewing in the context of the entire volume.
13-166
Work with Blocked Images Using the Volume Segmenter
Using the Volume Segmenter app, explore the volume to determine what you want to segment. With
a blocked image, the app includes several navigational aids that help you explore each block.
Current Block -- View a 3-D representation of the block contents in the Current Block tab. To add
orientation axes and a wireframe to the display, use the options on the 3-D Display tab of the app
toolstrip. To view the block from different angles, use the mouse to rotate the display.
13-167
13 Image Segmentation
Overview -- Shows the location of the current block in relation to the other blocks in the blocked
image. To add orientation axes and a wireframe to the display, use the options on the 3-D Display tab
of the app toolstrip. To view the block from different angles, use the mouse to rotate the display. As
you explore blocks, the display updates to show which block you currently have selected, as well as
which you have visited and which you have marked as done. The current block is shown in red.
Visited blocks or processed blocks are yellow. Blocks that you mark as done are green.
13-168
Work with Blocked Images Using the Volume Segmenter
You can also customize the display of the volume in the 3-D Display tab in the app toolstrip. For
example, if you have metadata that describes the relative size of the voxels, you can specify it in the
Spatial Referencing section of the 3-D Display tab. To improve your view of the data, you can
change the background color used in the 3-D display, modify the threshold and opacity of the display,
and include orientation axes with the display.
Blocked Image tab -- For blocked images, the app adds a Blocked Image tab to the app toolstrip.
This tab contains navigation aids that help you move among the blocks in the blocked image. For
example, to move to the next unprocessed block, click Next Block. You can also move to a particular
block by specifying block coordinates along the X-, Y-, and Z-axes. To indicate that you are done
processing a block, click Mark Block Complete. When you mark a block complete, the app
calculates the completion percentage for the entire volume.
13-169
13 Image Segmentation
Slice pane -- View each slice of the volume in the Slice pane. Use the slider at the bottom of the pane
to move from slice to slice. By default, the Slice pane displays the volume oriented along the X-Y axis,
but you can change this by using buttons in the Orientation section of the toolstrip on the
Segmenter tab. The Slice pane is also where you use drawing tools to define ROIs. With blocked
images, the slice view shows only the current block. The object you want to segment may span
several blocks. The app displays the number of the current slice, out of the total number of slices, at
the top of the pane. For example, 50/120.
13-170
Work with Blocked Images Using the Volume Segmenter
Once you have identified the object that you want to segment, use the tools on the Draw tab in the
app toolstrip to label the object in each block where it appears. You can use any of drawing tools with
blocked images: the Paint Brush tool, the Fill Region tool, the Eraser tool, and the Freehand,
Assisted Freehand, and Polygon region-of-interest (ROI) shapes.
13-171
13 Image Segmentation
As with any volume, to start labeling the brain, first create all the labels you want to use in the
segmentation. In the Labels pane, the app provides one label by default, named Label1. To change
the name of the label to be more descriptive for your application, double-click the label and type in
the new name. To change the default color associated with the label, double-click the colored square
associated with the label and select a color from the Color dialog box. When one object is nested in
another object, as the tumor appears over the brain on slices, label the larger region first. Click the
plus button to create additional labels.
In the Slice pane, navigate to a slice where the object appears in the block and use a drawing tool to
label the object. This figure shows the Paint Brush tool, but you can use any of the drawing tools.
13-172
Work with Blocked Images Using the Volume Segmenter
You could move through a block slice-by-slice, drawing an ROI on each slice where the object
appears. However, the Volume Segmenter app provides several automated tools that can help with
segmenting an object across slices. These automated options process only the slices within a block.
To use interpolation to speed up labeling, you must first manually label the region on two slices. For
example, create a label on one slice and use the same process to define the label on another slice.
The app places two bars on the slider, using the color associated with the label, to indicate the slices
with defined ROIs.
13-173
13 Image Segmentation
With the object defined on two slices, select Auto Interpolate. The app automatically defines the
ROI on all the intervening slides. The app uses a solid blue bar to indicate that all the slices in that
range have ROIs.
Alternatively, after defining an ROI on two slices, click Manually Interpolate. With this option, the
app opens the Manually Interpolate dialog box. You select the two regions from which you want to
interpolate, Region One and Region Two. By default, the dialog box opens on a slice on which you
have defined a region. To select the first region, click Region One. Navigate to the other slice on
which you have defined a region, using the slider or by clicking the blue indicator above the slider. To
select the second region, click Region Two. After selecting both regions, click Run to interpolate the
ROI on all intervening slices.
13-174
Work with Blocked Images Using the Volume Segmenter
You can use an algorithm to refine label definitions and perform other processing of blocked images
automatically. The app includes several slice-based and volume-based algorithms on the Automate
tab. First, select the algorithm. For example, select the volume-based algorithm Otsu's Threshold in
the Algorithm section of the Automate tab toolstrip. Once you select the algorithm, select
Algorithm Parameters to specify values for any algorithm-specific parameters that might be
associated with the algorithm. Because Otsu's threshold algorithm does not support any parameters,
this option is not enabled. For slice-based algorithms, you can specify which slices you want to
process: the current slice, a set of slices from the current slice back to the beginning or from the
current slice to the end. After selecting the algorithm, specifying algorithm-specific parameters, if
available, and choosing the slides to operate on, click Run.
13-175
13 Image Segmentation
When working with blocked images, you have several other options for automated processing. For
blocked images, by default, automation algorithms operate on the slices in the current block.
However, to perform automated processing on all the blocks in the blocked image at one time, click
Automate On All Blocks. If you have already marked some blocks completed, make sure Skip
Completed is not enabled. To enable parallel processing of blocks, click Use Parallel.
To review the results of the processing and accept or reject each block, click Review Results. The
app displays the Review and accept automation results dialog box. Select the check box for each
block you accept and click Accept Selected to finish.
13-176
Work with Blocked Images Using the Volume Segmenter
You can also add your own algorithm to operate on the ROIs. On the Automate tab, click Add
Algorithm. Choose whether you want your processing to operate on each 2-D slice (Slice-based) or
on the entire 3-D volume (Volume-based).
13-177
13 Image Segmentation
For this example, under Slice-Based, select the New option and click Function Template to create a
new function that operates on each 2-D slice. The app opens the template in the MATLAB editor.
Replace the sample code in the template with code that you want to use. Your function must accept
two arguments: each slice as a separate image and a mask. Your function must also return a mask
image.
When you are done editing the template, save the file. The Volume Segmenter app automatically
creates a button in the Automate tab toolstrip for your function. To test your function on one slice,
click Run. By default, the app applies the function to only the current slice.
13-178
Work with Blocked Images Using the Volume Segmenter
When you complete labeling the brain and the tumor in the volume, save the segmentation. Click
Save Labels on the Segmenter tab and choose from several options. You can save the labeled MRI
data as a MAT-file or as a variable in the workspace. For this example, choose a workspace variable.
When you define multiple labels, the Volume Segmenter app creates a categorical volume in the
workspace that is the same size as the input volume. After you save the segmentation, you can
optionally turn on Autosave, which periodically saves the segmentation automatically.
See Also
Volume Segmenter
13-179
13 Image Segmentation
Related Examples
• “Create Binary Mask Using Volume Segmenter” on page 13-136
13-180
14
Image Deblurring
This chapter describes how to deblur an image using the toolbox deblurring functions.
Image Deblurring
The blurring, or degradation, of an image can be caused by many factors:
• Movement during the image capture process, by the camera or, when long exposure times are
used, by the subject
• Out-of-focus optics, use of a wide-angle lens, atmospheric turbulence, or a short exposure time,
which reduces the number of photons captured
• Scattered light distortion in confocal microscopy
Note The image f does not really exist. This image represents what you would have if you
had perfect image acquisition conditions.
n Additive noise, introduced during image acquisition, that corrupts the image
Based on this model, the fundamental task of deblurring is to deconvolve the blurred image with the
PSF that exactly describes the distortion. Deconvolution is the process of reversing the effect of
convolution.
Note The quality of the deblurred image is mainly determined by knowledge of the PSF.
To illustrate, this example takes a clear image and deliberately blurs it by convolving it with a PSF.
The example uses the fspecial function to create a PSF that simulates a motion blur, specifying the
length of the blur in pixels, (LEN=31), and the angle of the blur in degrees (THETA=11). Once the PSF
is created, the example uses the imfilter function to convolve the PSF with the original image, I,
to create the blurred image, Blurred. To see how deblurring is the reverse of this process, using the
same images, see “Deblur Images Using a Wiener Filter” on page 14-5.
I = imread('peppers.png');
I = I(60+[1:256],222+[1:256],:); % crop the image
figure; imshow(I); title('Original Image');
14-2
Image Deblurring
LEN = 31;
THETA = 11;
PSF = fspecial('motion',LEN,THETA); % create PSF
Blurred = imfilter(I,PSF,'circular','conv');
figure; imshow(Blurred); title('Blurred Image');
Deblurring Functions
The toolbox includes four deblurring functions, listed here in order of complexity. All the functions
accept a PSF and the blurred image as their primary arguments.
deconvwnr Implements a least squares solution. You should provide some information about
the noise to reduce possible noise amplification during deblurring. See “Deblur
Images Using a Wiener Filter” on page 14-5 for more information.
deconvreg Implements a constrained least squares solution, where you can place
constraints on the output image (the smoothness requirement is the default).
You should provide some information about the noise to reduce possible noise
amplification during deblurring. See “Deblur Images Using a Regularized Filter”
on page 14-12 for more information.
deconvlucy Implements an accelerated, damped Lucy-Richardson algorithm. This function
performs multiple iterations, using optimization techniques and Poisson
statistics. You do not need to provide information about the additive noise in the
corrupted image. See “Adapt the Lucy-Richardson Deconvolution for Various
Image Distortions” on page 14-22 for more information.
14-3
14 Image Deblurring
• Deblurring is an iterative process. You might need to repeat the deblurring process multiple times,
varying the parameters you specify to the deblurring functions with each iteration, until you
achieve an image that, based on the limits of your information, is the best approximation of the
original scene. Along the way, you must make numerous judgments about whether newly
uncovered features in the image are features of the original scene or simply artifacts of the
deblurring process.
• To avoid "ringing" in a deblurred image, you can use the edgetaper function to preprocess your
image before passing it to the deblurring functions. See “Avoid Ringing in Deblurred Images” on
page 14-55 for more information.
• For information about creating your own deblurring functions, see “Create Your Own Deblurring
Functions” on page 14-54.
14-4
Deblur Images Using a Wiener Filter
Read and display a pristine image that does not have blur or noise.
Ioriginal = imread('cameraman.tif');
imshow(Ioriginal)
title('Original Image')
Simulate a blurred image that might result from camera motion. First, create a point-spread function,
PSF, by using the fspecial function and specifying linear motion across 21 pixels at an angle of 11
degrees. Then, convolve the point-spread function with the image by using imfilter.
The original image has data type uint8. If you pass a uint8 image to imfilter, then the function
will quantize the output in order to return another uint8 image. To reduce quantization errors,
convert the image to double before calling imfilter.
PSF = fspecial('motion',21,11);
Idouble = im2double(Ioriginal);
blurred = imfilter(Idouble,PSF,'conv','circular');
imshow(blurred)
title('Blurred Image')
14-5
14 Image Deblurring
Restore the blurred image by using the deconvwnr function. The blurred image does not have noise
so you can omit the noise-to-signal (NSR) input argument.
wnr1 = deconvwnr(blurred,PSF);
imshow(wnr1)
title('Restored Blurred Image')
14-6
Deblur Images Using a Wiener Filter
Add zero-mean Gaussian noise to the blurred image by using the imnoise function.
noise_mean = 0;
noise_var = 0.0001;
blurred_noisy = imnoise(blurred,'gaussian',noise_mean,noise_var);
imshow(blurred_noisy)
title('Blurred and Noisy Image')
Try to restore the blurred noisy image by using deconvwnr without providing a noise estimate. By
default, the Wiener restoration filter assumes the NSR is equal to 0. In this case, the Wiener
restoration filter is equivalent to an ideal inverse filter, which can be extremely sensitive to noise in
the input image.
In this example, the noise in this restoration is amplified to such a degree that the image content is
lost.
wnr2 = deconvwnr(blurred_noisy,PSF);
imshow(wnr2)
title('Restoration of Blurred Noisy Image (NSR = 0)')
14-7
14 Image Deblurring
Try to restore the blurred noisy image by using deconvwnr with a more realistic value of the
estimated noise.
signal_var = var(Idouble(:));
NSR = noise_var / signal_var;
wnr3 = deconvwnr(blurred_noisy,PSF,NSR);
imshow(wnr3)
title('Restoration of Blurred Noisy Image (Estimated NSR)')
14-8
Deblur Images Using a Wiener Filter
Even a visually imperceptible amount of noise can affect the result. One source of noise is
quantization errors from working with images in uint8 representation. Earlier, to avoid quantization
errors, this example simulated a blurred image from a pristine image in data type double. Now, to
explore the impact of quantization errors on the restoration, simulate a blurred image from the
pristine image in the original uint8 data type.
blurred_quantized = imfilter(Ioriginal,PSF,'conv','circular');
imshow(blurred_quantized)
title('Blurred Quantized Image')
14-9
14 Image Deblurring
Try to restore the blurred quantized image by using deconvwnr without providing a noise estimate.
Even though no additional noise was added, this restoration is degraded compared to the restoration
of the blurred image in data type double.
wnr4 = deconvwnr(blurred_quantized,PSF);
imshow(wnr4)
title('Restoration of Blurred Quantized Image (NSR = 0)');
14-10
Deblur Images Using a Wiener Filter
Try to restore the blurred quantized image by using deconvwnr with a more realistic value of the
estimated noise.
See Also
deconvwnr | fspecial | imfilter | imnoise
More About
• “Image Deblurring” on page 14-2
14-11
14 Image Deblurring
Read and display a pristine image that does not have blur or noise.
I = im2double(imread('tissue.png'));
imshow(I);
title('Original Image');
text(size(I,2),size(I,1)+15, ...
'Image courtesy of Alan Partin, Johns Hopkins University', ...
'FontSize',7,'HorizontalAlignment','right');
Simulate a blurred image that might result from an out-of-focus lens. First, create a point-spread
function, PSF, by using the fspecial function and specifying a Gaussian filter of size 11-by-11 and
standard deviation 5. Then, convolve the point-spread function with the image by using imfilter.
PSF = fspecial('gaussian',11,5);
blurred = imfilter(I,PSF,'conv');
Add zero-mean Gaussian noise to the blurred image by using the imnoise function.
14-12
Deblur Images Using a Regularized Filter
noise_mean = 0;
noise_var = 0.02;
blurred_noisy = imnoise(blurred,'gaussian',noise_mean,noise_var);
imshow(blurred_noisy)
title('Blurred and Noisy Image')
Restore the blurred image by using the deconvreg function, supplying the noise power (NP) as the
third input parameter. To illustrate how sensitive the algorithm is to the value of noise power, this
example performs three restorations.
For the first restoration, use the true NP. Note that the example outputs two parameters here. The
first return value, reg1, is the restored image. The second return value, lagra, is a scalar Lagrange
multiplier on which the regularized deconvolution has converged. This value is used later in the
example.
NP = noise_var*numel(I);
[reg1,lagra] = deconvreg(blurred_noisy,PSF,NP);
imshow(reg1)
title('Restored with True NP')
14-13
14 Image Deblurring
For the second restoration, use a slightly overestimated noise power. The restoration has poor
resolution.
reg2 = deconvreg(blurred_noisy,PSF,NP*1.3);
imshow(reg2)
title('Restored with Larger NP')
14-14
Deblur Images Using a Regularized Filter
For the third restoration, use a slightly underestimated noise power. The restoration has
overwhelming noise amplification and ringing from the image borders.
reg3 = deconvreg(blurred_noisy,PSF,NP/1.3);
imshow(reg3)
title('Restored with Smaller NP')
14-15
14 Image Deblurring
You can reduce the noise amplification and ringing along the boundary of the image by calling the
edgetaper function prior to deconvolution. The image restoration becomes less sensitive to the
noise power parameter.
Edged = edgetaper(blurred_noisy,PSF);
reg4 = deconvreg(Edged,PSF,NP/1.3);
imshow(reg4)
title('Restored with Smaller NP and Edge Tapering')
14-16
Deblur Images Using a Regularized Filter
Restore the blurred and noisy image, assuming that the optimal solution is already found and the
corresponding Lagrange multiplier is known. In this case, any value passed for noise power, NP, is
ignored.
To illustrate how sensitive the algorithm is to the Lagrange multiplier, this example performs three
restorations. The first restoration uses the lagra output from the reg1 restoration performed earlier.
reg5 = deconvreg(Edged,PSF,[],lagra);
imshow(reg5)
title('Restored with LAGRA')
14-17
14 Image Deblurring
The second restoration uses 100*lagra which increases the significance of the constraint. By default,
this leads to oversmoothing of the image.
reg6 = deconvreg(Edged,PSF,[],lagra*100);
imshow(reg6)
title('Restored with Large LAGRA')
14-18
Deblur Images Using a Regularized Filter
The third restoration uses lagra/100 which weakens the constraint (the smoothness requirement set
for the image). It amplifies the noise. For the extreme case when the Lagrange multiplier equals 0,
the reconstruction is a pure inverse filtering.
reg7 = deconvreg(Edged,PSF,[],lagra/100);
imshow(reg7)
title('Restored with Small LAGRA')
14-19
14 Image Deblurring
Restore the blurred and noisy image using a different constraint for the regularization operator.
Instead of using the default Laplacian constraint on image smoothness, constrain the image
smoothness only in one dimension (1-D Laplacian).
14-20
Deblur Images Using a Regularized Filter
See Also
deconvreg | fspecial | imfilter | imnoise
More About
• “Image Deblurring” on page 14-2
14-21
14 Image Deblurring
In this section...
“Reduce the Effect of Noise Amplification” on page 14-22
“Account for Nonuniform Image Quality” on page 14-22
“Handle Camera Read-Out Noise” on page 14-23
“Handling Undersampled Images” on page 14-23
“Refine the Result” on page 14-23
Use the deconvlucy function to deblur an image using the accelerated, damped, Lucy-Richardson
algorithm. The algorithm maximizes the likelihood that the resulting image, when convolved with the
PSF, is an instance of the blurred image, assuming Poisson noise statistics. This function can be
effective when you know the PSF but know little about the additive noise in the image.
The deconvlucy function implements several adaptations to the original Lucy-Richardson maximum
likelihood algorithm that address complex image restoration tasks.
To control noise amplification, the deconvlucy function uses a damping parameter, DAMPAR. This
parameter specifies the threshold level for the deviation of the resulting image from the original
image, below which damping occurs. For pixels that deviate in the vicinity of their original values,
iterations are suppressed.
Damping is also used to reduce ringing, the appearance of high-frequency structures in a restored
image. Ringing is not necessarily the result of noise amplification. See “Avoid Ringing in Deblurred
Images” on page 14-55 for more information.
The algorithm converges on predicted values for the bad pixels based on the information from
neighborhood pixels. The variation in the detector response from pixel to pixel (the so-called flat-field
correction) can also be accommodated by the WEIGHT array. Instead of assigning a weight of 1.0 to
the good pixels, you can specify fractional values and weight the pixels according to the amount of
the flat-field correction.
14-22
Adapt the Lucy-Richardson Deconvolution for Various Image Distortions
The Lucy-Richardson iterations intrinsically account for the first type of noise. You must account for
the second type of noise; otherwise, it can cause pixels with low levels of incident photons to have
negative values.
The deconvlucy function uses the READOUT input argument to handle camera read-out noise. The
value of this parameter is typically the sum of the read-out noise variance and the background noise,
such as the number of counts from the background radiation. The value of the READOUT argument
specifies an offset that ensures that all values are positive.
If the undersampled data is the result of camera pixel binning during image acquisition, the PSF
observed at each pixel rate can serve as a finer grid PSF. Otherwise, the PSF can be obtained via
observations taken at subpixel offsets or via optical modeling techniques. This method is especially
effective for images of stars (high signal-to-noise ratio), because the stars are effectively forced to be
in the center of a pixel. If a star is centered between pixels, it is restored as a combination of the
neighboring pixels. A finer grid redirects the consequent spreading of the star flux back to the center
of the star's image.
Element Description
output{1} Original input image
output{2} Image produced by the last iteration
output{3} Image produced by the next to last iteration
output{4} Internal information used by deconvlucy to know where to restart the
process
See Also
deconvblind | deconvlucy
14-23
14 Image Deblurring
Related Examples
• “Deblurring Images Using the Lucy-Richardson Algorithm” on page 14-25
More About
• “Image Deblurring” on page 14-2
14-24
Deblurring Images Using the Lucy-Richardson Algorithm
The example reads in an RGB image and crops it to be 256-by-256-by-3. The deconvlucy function
can handle arrays of any dimension.
I = imread('board.tif');
I = I(50+(1:256),2+(1:256),:);
figure;
imshow(I);
title('Original Image');
text(size(I,2),size(I,1)+15, ...
'Image courtesy of courtesy of Alexander V. Panasyuk, Ph.D.', ...
'FontSize',7,'HorizontalAlignment','right');
text(size(I,2),size(I,1)+25, ...
'Harvard-Smithsonian Center for Astrophysics', ...
'FontSize',7,'HorizontalAlignment','right');
Simulate a real-life image that could be blurred due to camera motion or lack of focus. The image
could also be noisy due to random disturbances. The example simulates the blur by convolving a
Gaussian filter with the true image (using imfilter). The Gaussian filter then represents a point-
spread function, PSF.
14-25
14 Image Deblurring
PSF = fspecial('gaussian',5,5);
Blurred = imfilter(I,PSF,'symmetric','conv');
figure;
imshow(Blurred);
title('Blurred');
The example simulates the noise by adding a Gaussian noise of variance V to the blurred image (using
imnoise). The noise variance V is used later to define a damping parameter of the algorithm.
V = .002;
BlurredNoisy = imnoise(Blurred,'gaussian',0,V);
figure;
imshow(BlurredNoisy);
title('Blurred & Noisy');
14-26
Deblurring Images Using the Lucy-Richardson Algorithm
Restore the blurred and noisy image providing the PSF and using only 5 iterations (default is 10). The
output is an array of the same type as the input image.
luc1 = deconvlucy(BlurredNoisy,PSF,5);
figure;
imshow(luc1);
title('Restored Image, NUMIT = 5');
14-27
14 Image Deblurring
The resulting image changes with each iteration. To investigate the evolution of the image
restoration, you can do the deconvolution in steps: do a set of iterations, see the result, and then
resume the iterations from where they were stopped. To do so, the input image has to be passed as a
part of a cell array. For example, start the first set of iterations by passing in {BlurredNoisy}
instead of BlurredNoisy as input image parameter.
luc1_cell = deconvlucy({BlurredNoisy},PSF,5);
In that case the output, luc1_cell, becomes a cell array. The cell output consists of four numeric
arrays, where the first is the BlurredNoisy image, the second is the restored image of class double,
the third array is the result of the one-before-last iteration, and the fourth array is an internal
parameter of the iterated set. The second numeric array of the output cell-array, image
luc1_cell{2}, is identical to the output array of the Step 3, image luc1, with a possible exception
of their class (the cell output always gives the restored image of class double).
To resume the iterations, take the output from the previous function call, the cell-array luc1_cell,
and pass it into the deconvlucy function. Use the default number of iterations (NUMIT = 10). The
restored image is the result of a total of 15 iterations.
luc2_cell = deconvlucy(luc1_cell,PSF);
luc2 = im2uint8(luc2_cell{2});
figure;
imshow(luc2);
title('Restored Image, NUMIT = 15');
14-28
Deblurring Images Using the Lucy-Richardson Algorithm
The latest image, luc2, is the result of 15 iterations. Although it is sharper than the earlier result
from 5 iterations, the image develops a "speckled" appearance. The speckles do not correspond to
any real structures (compare it to the true image), but instead are the result of fitting the noise in the
data too closely.
To control the noise amplification, use the damping option by specifying the DAMPAR parameter.
DAMPAR has to be of the same class as the input image. The algorithm dampens changes in the model
in regions where the differences are small compared with the noise. The DAMPAR used here equals 3
standard deviations of the noise. Notice that the image is smoother.
DAMPAR = im2uint8(3*sqrt(V));
luc3 = deconvlucy(BlurredNoisy,PSF,15,DAMPAR);
figure;
imshow(luc3);
title('Restored Image with Damping, NUMIT = 15');
14-29
14 Image Deblurring
The next part of this example explores the WEIGHT and SUBSMPL input parameters of the deconvlucy
function, using a simulated star image (for simplicity & speed).
I = zeros(32);
I(5,5) = 1;
I(10,3) = 1;
I(27,26) = 1;
I(29,25) = 1;
figure;
imshow(1-I,[],'InitialMagnification','fit');
ax = gca;
ax.Visible = 'on';
ax.XTickLabel = [];
ax.YTickLabel = [];
ax.XTick = [7 24];
ax.XGrid = 'on';
ax.YTick = [5 28];
ax.YGrid = 'on';
title('Data');
14-30
Deblurring Images Using the Lucy-Richardson Algorithm
The example simulates a blur of the image of the stars by creating a Gaussian filter, PSF, and
convolving it with the true image.
PSF = fspecial('gaussian',15,3);
Blurred = imfilter(I,PSF,'conv','sym');
Now simulate a camera that can only observe part of the stars' images (only the blur is seen). Create
a weighting function array, WEIGHT, that consists of ones in the central part of the Blurred image
("good" pixels, located within the dashed lines) and zeros at the edges ("bad" pixels - those that do
not receive the signal).
WT = zeros(32);
WT(6:27,8:23) = 1;
CutImage = Blurred.*WT;
To reduce the ringing associated with borders, apply the edgetaper function with the given PSF.
CutEdged = edgetaper(CutImage,PSF);
figure;
imshow(1-CutEdged,[],'InitialMagnification','fit');
ax = gca;
ax.Visible = 'on';
ax.XTickLabel = [];
ax.YTickLabel = [];
ax.XTick = [7 24];
14-31
14 Image Deblurring
ax.XGrid = 'on';
ax.YTick = [5 28];
ax.YGrid = 'on';
title('Observed');
The algorithm weights each pixel value according to the WEIGHT array while restoring the image. In
our example, only the values of the central pixels are used (where WEIGHT = 1), while the "bad"
pixel values are excluded from the optimization. However, the algorithm can place the signal power
into the location of these "bad" pixels, beyond the edge of the camera's view. Notice the accuracy of
the resolved star positions.
luc4 = deconvlucy(CutEdged,PSF,300,0,WT);
figure;
imshow(1-luc4,[],'InitialMagnification','fit');
ax = gca;
ax.Visible = 'on';
ax.XTickLabel = [];
ax.YTickLabel = [];
ax.XTick = [7 24];
ax.XGrid = 'on';
ax.YTick = [5 28];
ax.YGrid = 'on';
title('Restored');
14-32
Deblurring Images Using the Lucy-Richardson Algorithm
deconvlucy can restore undersampled image given a finer sampled PSF (finer by SUBSMPL times). To
simulate the poorly resolved image and PSF, the example bins the Blurred image and the original
PSF, two pixels in one, in each dimension.
14-33
14 Image Deblurring
Restore the undersampled image, BinnedImage, using the undersampled PSF, BinnedPSF. Notice
that the luc5 image distinguishes only 3 stars.
luc5 = deconvlucy(BinnedImage,BinnedPSF,100);
figure;
imshow(1-luc5,[],'InitialMagnification','fit');
ax = gca;
ax.Visible = 'on';
ax.XTick = [];
ax.YTick = [];
title('Poor PSF');
14-34
Deblurring Images Using the Lucy-Richardson Algorithm
The next example restores the undersampled image (BinnedImage), this time using the finer PSF
(defined on a SUBSMPL-times finer grid). The reconstructed image (luc6) resolves the position of
the stars more accurately. Note how it distributes power between the two stars in the lower right
corner of the image. This hints at the existence of two bright objects, instead of one, as in the
previous restoration.
luc6 = deconvlucy(BinnedImage,PSF,100,[],[],[],2);
figure;
imshow(1-luc6,[],'InitialMagnification','fit');
ax = gca;
ax.Visible = 'on';
ax.XTick = [];
ax.YTick = [];
title('Fine PSF');
14-35
14 Image Deblurring
See Also
deconvblind | deconvlucy | deconvreg | deconvwnr
More About
• “Image Deblurring” on page 14-2
• “Adapt the Lucy-Richardson Deconvolution for Various Image Distortions” on page 14-22
14-36
Adapt Blind Deconvolution for Various Image Distortions
The deconvblind function, just like the deconvlucy function, implements several adaptations to
the original Lucy-Richardson maximum likelihood algorithm that address complex image restoration
tasks. Using these adaptations, you can
For more information about these adaptations, see “Adapt the Lucy-Richardson Deconvolution for
Various Image Distortions” on page 14-22. The deconvblind function also supports PSF constraints
that you can provide through a user-specified function.
I = imread('cameraman.tif');
figure
imshow(I)
title('Original Image')
14-37
14 Image Deblurring
Create a point spread function (PSF). A PSF describes the degree to which an optical system blurs
(spreads) a point of light.
PSF = fspecial('motion',13,45);
figure
imshow(PSF,[],'InitialMagnification','fit')
title('Original PSF')
14-38
Adapt Blind Deconvolution for Various Image Distortions
Create a simulated blur in the image, using the PSF, and display the blurred image.
Blurred = imfilter(I,PSF,'circ','conv');
figure
imshow(Blurred)
title('Blurred Image')
14-39
14 Image Deblurring
Deblur the image using the deconvblind function. You must make an initial guess at the PSF. To
determine the size of the PSF, examine the blurred image and measure the width of a blur (in pixels)
around an obviously sharp object. Because the size of the PSF is more important than the values it
contains, you can typically specify an array of 1's as the initial PSF.
In this initial restoration, deconvblind was able to deblur the image to a great extent. Note,
however, the ringing around the sharp intensity contrast areas in the restored image. (The example
eliminated edge-related ringing by using the 'circular' option with imfilter when creating the
simulated blurred image.) To achieve a more satisfactory result, rerun the operation, experimenting
with PSFs of different sizes. The restored PSF returned by each deconvolution can also provide
valuable hints at the optimal PSF size.
INITPSF = ones(size(PSF));
[J P] = deconvblind(Blurred,INITPSF,30);
figure
imshow(J)
title('Restored Image')
14-40
Adapt Blind Deconvolution for Various Image Distortions
figure
imshow(P,[],'InitialMagnification','fit')
title('Restored PSF')
14-41
14 Image Deblurring
One way to improve the result is to create a weight array to exclude areas of high contrast from the
deblurring operation. This can reduce contrast-related ringing in the result.
To create a weight array, create an array the same size as the image, and assign the value 0 to the
pixels in the array that correspond to pixels in the original image that you want to exclude from
processing. The example uses a combination of edge detection and morphological processing to
detect high-contrast areas in the image. Because the blur in the image is linear, the example dilates
the image twice. To exclude the image boundary pixels (a high-contrast area) from processing, the
example uses padarray to assign the value 0 to all border pixels.
WEIGHT = edge(I,'sobel',.28);
se1 = strel('disk',1);
se2 = strel('line',13,45);
WEIGHT = ~imdilate(WEIGHT,[se1 se2]);
WEIGHT = padarray(WEIGHT(2:end-1,2:end-1),[1 1]);
figure
imshow(WEIGHT)
title('Weight Array')
14-42
Adapt Blind Deconvolution for Various Image Distortions
Refine the guess at the PSF. The reconstructed PSF returned by the first pass at deconvolution, P ,
shows a clear linearity. For this second pass, the example uses a new PSF which is the same as the
returned PSF but with the small amplitude pixels set to 0.
P1 = P;
P1(find(P1 < 0.01))= 0;
Run the deconvolution again, this time specifying the weight array and the modified PSF. Note how
the restored image has much less ringing around the sharp intensity areas than the result of the first
pass.
14-43
14 Image Deblurring
figure, imshow(P2,[],'InitialMagnification','fit')
title('Newly Reconstructed PSF')
14-44
Adapt Blind Deconvolution for Various Image Distortions
The deconvblind function returns the output image and the restored PSF as cell arrays. The output
image cell array contains these four elements:
Element Description
output{1} Original input image
output{2} Image produced by the last iteration
output{3} Image produced by the next to last iteration
output{4} Internal information used by deconvblind to know where to restart the
process
See Also
deconvblind | deconvlucy
Related Examples
• “Deblurring Images Using the Blind Deconvolution Algorithm” on page 14-46
More About
• “Image Deblurring” on page 14-2
14-45
14 Image Deblurring
Read a grayscale image into the workspace. The deconvblind function can handle arrays of any
dimension.
I = imread('cameraman.tif');
figure;imshow(I);title('Original Image');
text(size(I,2),size(I,1)+15, ...
'Image courtesy of Massachusetts Institute of Technology', ...
'FontSize',7,'HorizontalAlignment','right');
Simulate a real-life image that could be blurred (e.g., due to camera motion or lack of focus). The
example simulates the blur by convolving a Gaussian filter with the true image (using imfilter).
The Gaussian filter then represents a point-spread function, PSF.
PSF = fspecial('gaussian',7,10);
Blurred = imfilter(I,PSF,'symmetric','conv');
imshow(Blurred)
title('Blurred Image')
14-46
Deblurring Images Using the Blind Deconvolution Algorithm
To illustrate the importance of knowing the size of the true PSF, this example performs three
restorations. Each time the PSF reconstruction starts from a uniform array (an array of ones).
The first restoration, J1 and P1, uses an undersized array, UNDERPSF, for an initial guess of the PSF.
The size of the UNDERPSF array is 4 pixels shorter in each dimension than the true PSF.
UNDERPSF = ones(size(PSF)-4);
[J1,P1] = deconvblind(Blurred,UNDERPSF);
imshow(J1)
title('Deblurring with Undersized PSF')
14-47
14 Image Deblurring
The second restoration, J2 and P2, uses an array of ones, OVERPSF, for an initial PSF that is 4 pixels
longer in each dimension than the true PSF.
OVERPSF = padarray(UNDERPSF,[4 4],'replicate','both');
[J2,P2] = deconvblind(Blurred,OVERPSF);
imshow(J2)
title('Deblurring with Oversized PSF')
14-48
Deblurring Images Using the Blind Deconvolution Algorithm
The third restoration, J3 and P3, uses an array of ones, INITPSF, for an initial PSF that is exactly of
the same size as the true PSF.
All three restorations also produce a PSF. The following pictures show how the analysis of the
reconstructed PSF might help in guessing the right size for the initial PSF. In the true PSF, a Gaussian
filter, the maximum values are at the center (white) and diminish at the borders (black).
figure;
subplot(2,2,1)
imshow(PSF,[],'InitialMagnification','fit')
title('True PSF')
subplot(222)
imshow(P1,[],'InitialMagnification','fit')
title('Reconstructed Undersized PSF')
subplot(2,2,3)
imshow(P2,[],'InitialMagnification','fit')
title('Reconstructed Oversized PSF')
subplot(2,2,4)
imshow(P3,[],'InitialMagnification','fit')
title('Reconstructed true PSF')
14-49
14 Image Deblurring
The PSF reconstructed in the first restoration, P1, obviously does not fit into the constrained size. It
has a strong signal variation at the borders. The corresponding image, J1, does not show any
improved clarity vs. the blurred image, Blurred.
The PSF reconstructed in the second restoration, P2, becomes very smooth at the edges. This implies
that the restoration can handle a PSF of a smaller size. The corresponding image, J2, shows some
deblurring but it is strongly corrupted by the ringing.
Finally, the PSF reconstructed in the third restoration, P3, is somewhat intermediate between P1 and
P2. The array, P3, resembles the true PSF very well. The corresponding image, J3, shows significant
improvement; however it is still corrupted by the ringing.
The ringing in the restored image, J3, occurs along the areas of sharp intensity contrast in the image
and along the image borders. This example shows how to reduce the ringing effect by specifying a
weighting function. The algorithm weights each pixel according to the WEIGHT array while restoring
the image and the PSF. In our example, we start by finding the "sharp" pixels using the edge function.
By trial and error, we determine that a desirable threshold level is 0.08.
WEIGHT = edge(Blurred,'sobel',.08);
To widen the area, we use imdilate and pass in a structuring element, se.
se = strel('disk',2);
WEIGHT = 1-double(imdilate(WEIGHT,se));
14-50
Deblurring Images Using the Blind Deconvolution Algorithm
The pixels close to the borders are also assigned the value 0.
WEIGHT([1:3 end-(0:2)],:) = 0;
WEIGHT(:,[1:3 end-(0:2)]) = 0;
figure
imshow(WEIGHT)
title('Weight Array')
The image is restored by calling deconvblind with the WEIGHT array and an increased number of
iterations (30). Almost all the ringing is suppressed.
[J,P] = deconvblind(Blurred,INITPSF,30,[],WEIGHT);
imshow(J)
title('Deblurred Image')
14-51
14 Image Deblurring
The example shows how you can specify additional constraints on the PSF. The function, FUN, below
returns a modified PSF array which deconvblind uses for the next iteration.
In this example, FUN modifies the PSF by cropping it by P1 and P2 number of pixels in each
dimension, and then padding the array back to its original size with zeros. This operation does not
change the values in the center of the PSF, but effectively reduces the PSF size by 2*P1 and 2*P2
pixels.
P1 = 2;
P2 = 2;
FUN = @(PSF) padarray(PSF(P1+1:end-P1,P2+1:end-P2),[P1 P2]);
The anonymous function, FUN, is passed into deconvblind last. See the section Parameterizing
Functions, in the MATLAB Mathematics documentation, for information about providing additional
parameters to the function FUN.
In this example, the size of the initial PSF, OVERPSF, is 4 pixels larger than the true PSF. Setting P1 =
2 and P2 = 2 as parameters in FUN effectively makes the valuable space in OVERPSF the same size as
the true PSF. Therefore, the outcome, JF and PF, is similar to the result of deconvolution with the
right sized PSF and no FUN call, J and P, from step 4.
[JF,PF] = deconvblind(Blurred,OVERPSF,30,[],WEIGHT,FUN);
imshow(JF)
title('Deblurred Image')
14-52
Deblurring Images Using the Blind Deconvolution Algorithm
If we had used the oversized initial PSF, OVERPSF, without the constraining function, FUN, the
resulting image would be similar to the unsatisfactory result, J2, achieved in Step 3.
Note, that any unspecified parameters before FUN can be omitted, such as DAMPAR and READOUT in
this example, without requiring a place holder, ([]).
See Also
deconvblind | deconvlucy
More About
• “Image Deblurring” on page 14-2
• “Adapt Blind Deconvolution for Various Image Distortions” on page 14-37
14-53
14 Image Deblurring
To aid this conversion between PSFs and OTFs, use the padding function padarray.
See Also
deconvblind
More About
• “Image Deblurring” on page 14-2
14-54
Avoid Ringing in Deblurred Images
This high-frequency drop-off can create an effect called boundary related ringing in deblurred
images. In this figure, note the horizontal and vertical patterns in the image.
To avoid ringing, use the edgetaper function to preprocess your images before passing them to the
deblurring functions. The edgetaper function removes the high-frequency drop-off at the edge of an
image by blurring the entire image and then replacing the center pixels of the blurred image with the
original image. In this way, the edges of the image taper off to a lower frequency.
See Also
deconvblind | deconvlucy | deconvreg | deconvwnr
More About
• “Image Deblurring” on page 14-2
• “Fourier Transform” on page 9-2
14-55
15
Color
This chapter describes the toolbox functions that help you work with color image data. Note that
"color" includes shades of gray; therefore much of the discussion in this chapter applies to grayscale
images as well as color images.
Display Colors
The number of bits per screen pixel determines the display's screen bit depth. The screen bit depth
determines the screen color resolution, which is how many distinct colors the display can produce.
Most computer displays use 8, 16, or 24 bits per screen pixel. Depending on your system, you might
be able to choose the screen bit depth you want to use. In general, 24-bit display mode produces the
best results. If you need to use a lower screen bit depth, 16-bit is generally preferable to 8-bit.
However, keep in mind that a 16-bit display has certain limitations, such as
• An image might have finer gradations of color than a 16-bit display can represent. If a color is
unavailable, MATLAB uses the closest approximation.
• There are only 32 shades of gray available. If you are working primarily with grayscale images,
you might get better display results using 8-bit display mode, which provides up to 256 shades of
gray.
To determine the bit depth of your system's screen, enter this command at the MATLAB prompt.
get(0,'ScreenDepth')
ans =
32
The integer MATLAB returns represents the number of bits per screen pixel:
Regardless of the number of colors your system can display, MATLAB can store and process images
with very high bit depths: 224 colors for uint8 RGB images, 248 colors for uint16 RGB images, and
2159 for double RGB images. These images are displayed best on systems with 24-bit color, but
usually look fine on 16-bit systems as well. For information about reducing the number of colors used
by an image, see “Reduce the Number of Colors in an Image” on page 15-3.
15-2
Reduce the Number of Colors in an Image
Indexed images, however, might cause problems if they have a large number of colors. In general,
you should limit indexed images to 256 colors for the following reasons:
• On systems with 8-bit display, indexed images with more than 256 colors will need to be dithered
or mapped and, therefore, might not display well.
• On some platforms, color maps cannot exceed 256 entries.
• If an indexed image has more than 256 colors, MATLAB cannot store the image data in a uint8
array, but generally uses an array of class double instead, making the storage size of the image
much larger (each pixel uses 64 bits).
• Most image file formats limit indexed images to 256 colors. If you write an indexed image with
more than 256 colors (using imwrite) to a format that does not support more than 256 colors,
you will receive an error.
In this section...
“Reduce Colors of Truecolor Image Using Color Approximation” on page 15-3
“Reduce Colors of Indexed Image Using imapprox” on page 15-7
“Reduce Colors Using Dithering” on page 15-7
• Uniform quantization
• Minimum variance quantization
• Color map mapping (described in “Color map Mapping” on page 15-6)
The quality of the resulting image depends on the approximation method you use, the range of colors
in the input image, and whether or not you use dithering. Note that different methods work better for
different images. See “Reduce Colors Using Dithering” on page 15-7 for a description of dithering
and how to enable or disable it.
Quantization
Reducing the number of colors in an image involves quantization. The function rgb2ind uses
quantization as part of its color reduction algorithm. rgb2ind supports two quantization methods:
uniform quantization and minimum variance quantization.
15-3
15 Color
An important term in discussions of image quantization is RGB color cube. The RGB color cube is a
three-dimensional array of all of the colors that are defined for a particular data type. Since RGB
images in MATLAB can be of type uint8, uint16, or double, three possible color cube definitions
exist. For example, if an RGB image is of class uint8, 256 values are defined for each color plane
(red, blue, and green), and, in total, there will be 224 (or 16,777,216) colors defined by the color cube.
This color cube is the same for all uint8 RGB images, regardless of which colors they actually use.
The uint8, uint16, and double color cubes all have the same range of colors. In other words, the
brightest red in a uint8 RGB image appears the same as the brightest red in a double RGB image.
The difference is that the double RGB color cube has many more shades of red (and many more
shades of all colors). The following figure shows an RGB color cube for a uint8 image.
Quantization involves dividing the RGB color cube into a number of smaller boxes, and then mapping
all colors that fall within each box to the color value at the center of that box.
Uniform quantization and minimum variance quantization differ in the approach used to divide up the
RGB color cube. With uniform quantization, the color cube is cut up into equal-sized boxes (smaller
cubes). With minimum variance quantization, the color cube is cut up into boxes (not necessarily
cubes) of different sizes; the sizes of the boxes depend on how the colors are distributed in the image.
Uniform Quantization
To perform uniform quantization, call rgb2ind and specify a tolerance. The tolerance determines the
size of the cube-shaped boxes into which the RGB color cube is divided. The allowable range for a
tolerance setting is [0,1]. For example, if you specify a tolerance of 0.1, then the edges of the boxes
are one-tenth the length of the RGB color cube and the maximum total number of boxes is
n = (floor(1/tol)+1)^3
RGB = imread('peppers.png');
[x,map] = rgb2ind(RGB, 0.1);
The following figure illustrates uniform quantization of a uint8 image. For clarity, the figure shows a
two-dimensional slice (or color plane) from the color cube where red=0 and green and blue range
from 0 to 255. The actual pixel values are denoted by the centers of the x's.
15-4
Reduce the Number of Colors in an Image
After the color cube has been divided, all empty boxes are thrown out. Therefore, only one of the
boxes is used to produce a color for the color map. As shown earlier, the maximum length of a color
map created by uniform quantization can be predicted, but the color map can be smaller than the
prediction because rgb2ind removes any colors that do not appear in the input image.
To perform minimum variance quantization, call rgb2ind and specify the maximum number of colors
in the output image's color map. The number you specify determines the number of boxes into which
the RGB color cube is divided. These commands use minimum variance quantization to create an
indexed image with 185 colors.
RGB = imread('peppers.png');
[X,map] = rgb2ind(RGB,185);
Minimum variance quantization works by associating pixels into groups based on the variance
between their pixel values. For example, a set of blue pixels might be grouped together because they
have a small variance from the center pixel of the group.
In minimum variance quantization, the boxes that divide the color cube vary in size, and do not
necessarily fill the color cube. If some areas of the color cube do not have pixels, there are no boxes
in these areas.
While you set the number of boxes, n, to be used by rgb2ind, the placement is determined by the
algorithm as it analyzes the color data in your image. Once the image is divided into n optimally
located boxes, the pixels within each box are mapped to the pixel value at the center of the box, as in
uniform quantization.
The resulting color map usually has the number of entries you specify. This is because the color cube
is divided so that each region contains at least one color that appears in the input image. If the input
image uses fewer colors than the number you specify, the output color map will have fewer than n
colors, and the output image will contain all of the colors of the input image.
15-5
15 Color
The following figure shows the same two-dimensional slice of the color cube as shown in the
preceding figure (demonstrating uniform quantization). Eleven boxes have been created using
minimum variance quantization.
For a given number of colors, minimum variance quantization produces better results than uniform
quantization, because it takes into account the actual data. Minimum variance quantization allocates
more of the color map entries to colors that appear frequently in the input image. It allocates fewer
entries to colors that appear infrequently. As a result, the accuracy of the colors is higher than with
uniform quantization. For example, if the input image has many shades of green and few shades of
red, there will be more greens than reds in the output color map. Note that the computation for
minimum variance quantization takes longer than that for uniform quantization.
If you specify an actual color map to use, rgb2ind uses color map mapping (instead of quantization)
to find the colors in the specified color map that best match the colors in the RGB image. This method
is useful if you need to create images that use a fixed color map. For example, if you want to display
multiple indexed images on an 8-bit display, you can avoid color problems by mapping them all to the
same color map. Color map mapping produces a good approximation if the specified color map has
similar colors to those in the RGB image. If the color map does not have similar colors to those in the
RGB image, this method produces poor results.
This example illustrates mapping two images to the same color map. The color map used for the two
images is created on the fly using the MATLAB function colorcube, which creates an RGB color map
containing the number of colors that you specify. (colorcube always creates the same color map for
a given number of colors.) Because the color map includes colors all throughout the RGB color cube,
the output images can reasonably approximate the input images.
RGB1 = imread('autumn.tif');
RGB2 = imread('peppers.png');
X1 = rgb2ind(RGB1,colorcube(128));
X2 = rgb2ind(RGB2,colorcube(128));
15-6
Reduce the Number of Colors in an Image
Note The function imshow is also helpful for displaying multiple indexed images. For more
information, see “Display Images Individually in the Same Figure” on page 4-9 or the reference page
for imshow.
For example, these commands create a version of the trees image with 64 colors, rather than the
original 128.
load trees
[Y,newmap] = imapprox(X,map,64);
imshow(Y,newmap)
The quality of the resulting image depends on the approximation method you use, the range of colors
in the input image, and whether or not you use dithering. Note that different methods work better for
different images. See “Reduce Colors Using Dithering” on page 15-7 for a description of dithering
and how to enable or disable it.
For an example of how dithering works, consider an image that contains a number of dark orange
pixels for which there is no exact match in the colormap. To create the appearance of this shade of
orange, dithering selects a combination of colors from the colormap, that, taken together as a six-
pixel group, approximate the desired shade of orange. From a distance, the pixels appear to be the
correct shade, but if you look up close at the image, you can see a blend of other shades. To illustrate
dithering, the following example loads a 24-bit truecolor image, and then uses rgb2ind to create an
indexed image with just eight colors. The first example does not use dithering, the second does use
dithering.
rgb = imread('onion.png');
imshow(rgb)
15-7
15 Color
[X_no_dither,map] = rgb2ind(rgb,8,'nodither');
imshow(X_no_dither,map)
Create an indexed image using eight colors with dithering. Notice that the dithered image has a
larger number of apparent colors but is somewhat fuzzy-looking. The image produced without
dithering has fewer apparent colors, but an improved spatial resolution when compared to the
dithered image. One risk in doing color reduction without dithering is that the new image can contain
false contours.
[X_dither,map] = rgb2ind(rgb,8,'dither');
imshow(X_dither,map)
15-8
Reduce the Number of Colors in an Image
15-9
15 Color
For this reason, the International Color Consortium (ICC) has defined a Color Management System
(CMS) that provides a means for communicating color information among input, output, and display
devices. The CMS uses device profiles that contain color information specific to a particular device.
Vendors that support CMS provide profiles that characterize the color reproduction of their devices,
and methods, called Color Management Modules (CMM), that interpret the contents of each profile
and perform the necessary image processing.
Device profiles contain the information that color management systems need to translate color data
between devices. Any conversion between color spaces is a mathematical transformation from some
domain space to a range space. With profile-based conversions, the domain space is often called the
source space and the range space is called the destination space. In the ICC color management
model, profiles are used to represent the source and destination spaces.
For more information about color management systems, go to the International Color Consortium
website, http://www.color.org.
P = iccread('sRGB.icm');
You can use the iccfind function to find ICC color profiles on your system, or to find a particular
ICC color profile whose description contains a certain text string. To get the name of the directory
that is the default system repository for ICC profiles, use iccroot.
iccread returns the contents of the profile in the structure P. All profiles contain a header, a tag
table, and a series of tagged elements. The header contains general information about the profile,
such as the device class, the device color space, and the file size. The tagged elements, or tags, are
the data constructs that contain the information used by the CMM. For more information about the
contents of this structure, see the iccread function reference page.
Using iccread, you can read both Version 2 (ICC.1:2001-04) or Version 4 (ICC.1:2001-12) ICC profile
formats. For detailed information about these specifications and their differences, visit the ICC
website, http://www.color.org.
P = iccread('sRGB.icm');
P_new = iccwrite(P,'my_profile.icm');
15-10
Profile-Based Color Space Conversions
iccwrite returns the profile it writes to the file in P_new because it can be different than the input
profile P. For example, iccwrite updates the Filename field in P to match the name of the file
specified as the second argument.
When it creates the output file, iccwrite checks the validity of the input profile structure. If any
required fields are missing, iccwrite returns an error message. For more information about the
writing ICC profile data to a file, see the iccwrite function reference page. To determine if a
structure is a valid ICC profile, use the isicc function.
Using iccwrite, you can export profile information in both Version 2 (ICC.1:2001-04) or Version 4
(ICC.1:2001-12) ICC profile formats. The value of the Version field in the file profile header
determines the format version. For detailed information about these specifications and their
differences, visit the ICC website, http://www.color.org.
Import RGB color space data. This example imports an RGB color image into the workspace.
I_rgb = imread('peppers.png');
Read ICC profiles. Read the source and destination profiles into the workspace. This example uses
the sRGB profile as the source profile. The sRGB profile is an industry-standard color space that
describes a color monitor.
inprof = iccread('sRGB.icm');
For the destination profile, the example uses a profile that describes a particular color printer. The
printer vendor supplies this profile. (The following profile and several other useful profiles can be
obtained as downloads from www.adobe.com.)
outprof = iccread('USSheetfedCoated.icc');
Create a color transformation structure. You must create a color transformation structure to define
the conversion between the color spaces in the profiles. You use the makecform function to create
the structure, specifying a transformation type string as an argument. This example creates a color
transformation structure that defines a conversion from RGB color data to CMYK color data. The
color space conversion might involve an intermediate conversion into a device-independent color
space, called the Profile Connection Space (PCS), but this is transparent to the user.
C = makecform('icc',inprof,outprof);
Perform the conversion. You use the applycform function to perform the conversion, specifying as
arguments the color data you want to convert and the color transformation structure that defines the
conversion. The function returns the converted data.
I_cmyk = applycform(I_rgb,C);
Write the converted data to a file. To export the CMYK data, use the imwrite function, specifying the
format as TIFF. If the format is TIFF and the data is an m-by-n-by-4 array, imwrite writes CMYK data
to the file.
15-11
15 Color
imwrite(I_cmyk,'pep_cmyk.tif','tif')
To verify that the CMYK data was written to the file, use imfinfo to get information about the file
and look at the PhotometricInterpretation field.
info = imfinfo('pep_cmyk.tif');
info.PhotometricInterpretation
ans =
'CMYK'
When you create a profile-based color transformation structure, you can specify the rendering intent
for the source as well as the destination profiles. For more information, see the makecform reference
information.
15-12
Device-Independent Color Spaces
In 1931, the International Commission on Illumination, known by the acronym CIE, for Commission
Internationale de l'Éclairage, studied human color perception and developed a standard, called the
CIE XYZ. This standard defined a three-dimensional space where three values, called tristimulus
values, define a color. This standard is still widely used today.
In the decades since that initial specification, the CIE has developed several additional color space
specifications that attempt to provide alternative color representations that are better suited to some
purposes than XYZ. For example, in 1976, in an effort to get a perceptually uniform color space that
could be correlated with the visual appearance of colors, the CIE created the L*a*b* color space.
This table lists all the device-independent color spaces that the toolbox supports.
15-13
15 Color
color space does not define a uint8 encoding. The following table lists the data types that can be
used to represent values in all the device-independent color spaces.
As the table indicates, certain color spaces have data type limitations. For example, the XYZ color
space does not define a uint8 encoding. If you convert 8-bit CIE LAB data into the XYZ color space,
the data is returned in uint16 format. To change the encoding of XYZ data, use these functions:
• xyz2double
• xyz2uint16
• lab2double
• lab2uint8
• lab2uint16
• im2double
• im2uint8
• im2uint16
15-14
Understanding Color Spaces and Color Space Conversion
The various color spaces exist because they present color information in ways that make certain
calculations more convenient or because they provide a way to identify colors that is more intuitive.
For example, the RGB color space defines a color as the percentages of red, green, and blue hues
mixed together. Other color models describe colors by their hue (shade of color), saturation (amount
of gray or pure color), and luminance (intensity, or overall brightness).
The toolbox enables converting color data from one color space to another through mathematical
transformations.
RGB
The RGB color space represents images as an m-by-n-by-3 numeric array whose elements specify the
intensity values of the red, green, and blue color channels. The range of numeric values depends on
the data type of the image.
• For single or double arrays, RGB values range from [0, 1].
• For uint8 arrays, RGB values range from [0, 255].
• For uint16 arrays, RGB values range from [0, 65535].
15-15
15 Color
f(u) = c ⋅ u, 0≤u<d
f(u) = a ⋅ uɣ + b, u ≥ d,
a = 1.055
b = –0.055
c = 12.92
d = 0.0031308
ɣ = 1/2.4
Adobe RGB (1998) Adobe RGB (1998) RGB values apply gamma correction to linear RGB
values using a simple power function:
v = uɣ, u≥0
v = -(-u)ɣ, u < 0,
with
ɣ = 1/2.19921875
HSV
The HSV (Hue, Saturation, Value) color space corresponds better to how people experience color
than the RGB color space does. For example, this color space is often used by people who are
selecting colors, such as paint or ink color, from a color wheel or palette.
Attribute Description
H Hue, which corresponds to the color’s position on a color wheel. H is in the
range [0, 1]. As H increases, colors transition from red to orange, yellow,
green, cyan, blue, magenta, and finally back to red. Both 0 and 1 indicate
red.
S Saturation, which is the amount of hue or departure from neutral. S is in
the range [0, 1]. As S increases, colors vary from unsaturated (shades of
gray) to fully saturated (no white component).
15-16
Understanding Color Spaces and Color Space Conversion
Attribute Description
V Value, which is the maximum value among the red, green, and blue
components of a specific color. V is in the range [0, 1]. As V increases, the
corresponding colors become increasingly brighter.
Note MATLAB and the Image Processing Toolbox software do not support the HSI color space (Hue,
Saturation, Intensity). However, if you want to work with color data in terms of hue, saturation, and
intensity, the HSV color space is very similar. Another option is to use the LCH color space
(Luminosity, Chroma, and Hue), which is a polar transformation of the CIE L*a*b* color space — see
“Device-Independent Color Spaces” on page 15-13.
Use the rgb2hsv and hsv2rgb functions to convert between the RGB and HSV color spaces.
The XYZ color space is the original model developed by the CIE. The Y channel represents the
luminance of a color. The Z channel approximately relates to the amount of blue in an image, but the
value of Z in the XYZ color space is not identical to the value of B in the RGB color space. The X
channel does not have a clear color analogy. However, if you consider the XYZ color space as a 3-D
coordinate system, then X values lie along the axis that is orthogonal to the Y (luminance) axis and
the Z axis.
The L*a*b* color space provides a more perceptually uniform color space than the XYZ model. Colors
in the L*a*b* color space can exist outside the RGB gamut (the valid set of RGB colors). For example,
when you convert the L*a*b* value [100, 100, 100] to the RGB color space, the returned value is
15-17
15 Color
[1.7682, 0.5746, 0.1940], which is not a valid RGB color. For more information, see “Determine If
L*a*b* Value Is in RGB Gamut” on page 15-24.
Attribute Description
L* Luminance or brightness of the image. Values are in the range [0, 100],
where 0 specifies black and 100 specifies white. As L* increases, colors
become brighter.
a* Amount of red or green tones in the image. A large positive a* value
corresponds to red/magenta. A large negative a* value corresponds to
green. Although there is no single range for a*, values commonly fall in the
range [-100, 100] or [-128, 127).
b* Amount of yellow or blue tones in the image. A large positive b* value
corresponds to yellow. A large negative b* value corresponds to blue.
Although there is no single range for b*, values commonly fall in the range
[-100, 100] or [-128, 127).
Device-independent color spaces include the effect of the illumination source, called the reference
white point. The source imparts a color hue to the raw image data according to the color temperature
of the illuminant. For example, sunlight during sunrise or sunset imparts a yellow hue to an image,
whereas sunlight around noontime imparts a blue hue.
Use the rgb2xyz and xyz2rgb functions to convert between the RGB and XYZ color spaces. Use the
rgb2lab and lab2rgb functions to convert between the RGB and L*a*b* color spaces.
The toolbox supports several related color space specifications that are better suited to some
purposes than XYZ. For more information see “Device-Independent Color Spaces” on page 15-13.
YCbCr
The YCbCr color space is widely used for digital video. In this format, luminance information is stored
as a single component (Y) and chrominance information is stored as two color-difference components
(Cb and Cr). Cb and Cr represent the difference between a reference value and the blue or red
component, respectively. (YUV, another color space widely used for digital video, is very similar to
YCbCr but not identical.)
Attribute Description
Y Luminance or brightness of the image. Colors increase in brightness as Y
increases.
Cb Chrominance value that indicates the difference between the blue
component and a reference value.
Cr Chrominance value that indicates the difference between the red
component and a reference value.
The range of numeric values depends on the data type of the image. YCbCr does not use the full
range of the image data type so that the video stream can include additional (non-image) information.
• For single or double arrays, Y is in the range [16/255, 235/255] and Cb and Cr are in the range
[16/255, 240/255].
• For uint8 arrays, Y is in the range [16, 235] and Cb and Cr are in the range [16, 240].
15-18
Understanding Color Spaces and Color Space Conversion
• For uint16, Y is in the range [4112, 60395] and Cb and Cr are in the range [4112, 61680].
Use the rgb2ycbcr and ycbcr2rgb functions to convert between the RGB and YCbCr color spaces.
YIQ
The National Television Systems Committee (NTSC) defines a color space known as YIQ. This color
space is used in televisions in the United States. This color space separates grayscale information
from color data, so the same signal can be used for both color and black and white television sets.
Attribute Description
Y Luma, or brightness of the image. Values are in the range [0, 1], where 0
specifies black and 1 specifies white. Colors increase in brightness as Y
increases.
I In-phase, which is approximately the amount of blue or orange tones in the
image. I in the range [-0.5959, 0.5959], where negative numbers indicate
blue tones and positive numbers indicate orange tones. As the magnitude
of I increases, the saturation of the color increases.
Q Quadrature, which is approximately the amount of green or purple tones in
the image. Q in the range [-0.5229, 0.5229], where negative numbers
indicate green tones and positive numbers indicate purple tones. As the
magnitude of Q increases, the saturation of the color increases.
Use the rgb2ntsc and ntsc2rgb functions to convert between the RGB and YIQ color spaces.
Because luminance is one of the components of the NTSC format, the RGB to NTSC conversion is also
useful for isolating the gray level information in an image. In fact, the toolbox functions rgb2gray
and ind2gray use the rgb2ntsc function to extract the grayscale information from a color image.
See Also
Related Examples
• “Convert Between RGB and HSV Color Spaces” on page 15-20
• “Determine If L*a*b* Value Is in RGB Gamut” on page 15-24
15-19
15 Color
Process the HSV image. This example increases the saturation of the image by multiplying the S
channel by a scale factor.
[h,s,v] = imsplit(HSV);
saturationFactor = 2;
s_sat = s*saturationFactor;
HSV_sat = cat(3,h,s_sat,v);
15-20
Convert Between RGB and HSV Color Spaces
Convert the processed HSV image back to the RGB color space. Display the new RGB image. Colors
in the processed image are more vibrant.
RGB_sat = hsv2rgb(HSV_sat);
imshow(RGB_sat)
For closer inspection of the HSV color space, create a synthetic RGB image.
RGB = reshape(ones(64,1)*reshape(jet(64),1,192),[64,64,3]);
HSV = rgb2hsv(RGB);
Split the HSV version of the synthetic image into its component planes: hue, saturation, and value.
[h,s,v] = imsplit(HSV);
Display the individual HSV color planes with the original image.
montage({h,s,v,RGB},"BorderSize",10,"BackgroundColor",'w');
15-21
15 Color
As the hue plane image in the preceding figure illustrates, hue values make a linear transition from
high to low. If you compare the hue plane image against the original image, you can see that shades
of deep blue have the highest values, and shades of deep red have the lowest values. (As stated
previously, there are values of red on both ends of the hue scale. To avoid confusion, the sample
image uses only the red values from the beginning of the hue range.)
Saturation can be thought of as the purity of a color. As the saturation plane image shows, the colors
with the highest saturation have the highest values and are represented as white. In the center of the
saturation image, notice the various shades of gray. These correspond to a mixture of colors; the
cyans, greens, and yellow shades are mixtures of true colors. Value is roughly equivalent to
15-22
Convert Between RGB and HSV Color Spaces
brightness, and you will notice that the brightest areas of the value plane correspond to the brightest
colors in the original image.
15-23
15 Color
Convert an L*a*b* value to RGB. The negative values returned demonstrate that the L*a*b* color [80
-130 85] is not in the gamut of the sRGB color space, which is the default RGB color space used by
lab2rgb. An RGB color is out of gamut when any of its component values are less than 0 or greater
than 1.
ans = 1×3
Convert the L*a*b* value to RGB, this time specifying a different RGB colorspace, the Adobe RGB
(1998) color space. The Adobe RGB (1998) has a larger gamut than sRGB. Use the 'ColorSpace'
name-value pair. Because the output values are between 0.0 and 1.0 (inclusive), you can conclude
that the L*a*b* color [80 -130 85] is inside the Adobe RGB (1998) gamut.
lab2rgb(lab,'ColorSpace','adobe-rgb-1998')
ans = 1×3
15-24
Comparison of Auto White Balance Algorithms
Eyes are very good at judging what is white under different lighting conditions. Digital cameras,
however, without some kind of adjustment, can easily capture unrealistic images with a strong color
cast. Automatic white balance (AWB) algorithms try to correct for the ambient light with minimum
input from the user, so that the resulting image looks like what our eyes would see.
Several different algorithms exist to estimate scene illuminant. The performance of each algorithm
depends on the scene, lighting, and imaging conditions. This example judges the quality of three
algorithms for illuminant estimation for one specific image by comparing them to the ground truth
scene illuminant:
After the ambient light is known, then correcting the colors in the image (Step 2) is an easy and fixed
process.
AWB algorithms are usually applied on the raw image data after a minimal amount of preprocessing,
before the image is compressed and saved to the memory card.
Read a 16-bit raw image into the workspace. foosballraw.tiff is an image file that contains raw
sensor data after correcting the black level and scaling the intensities to 16 bits per pixel. This image
is free of the white balancing done by the camera, as well as other preprocessing operations such as
demosaicing, denoising, chromatic aberration compensation, tone adjustments, and gamma
correction.
A = imread('foosballraw.tiff');
Digital cameras use a color filter array superimposed on the imaging sensor to simulate color vision,
so that each pixel is sensitive to either red, green or blue. To recover the missing color information at
every pixel, interpolate using the demosaic function. The Bayer pattern used by the camera with
which the photo was captured (Canon EOS 30D) is RGGB.
A = demosaic(A,'rggb');
The image A contains linear RGB values. Linear RGB values are appropriate for estimating scene
illuminant and correcting the color balance of an image. However, if you try to display the linear RGB
15-25
15 Color
image, it will appear very dim, because of the nonlinear characteristic of display devices. Therefore,
for display purposes, gamma-correct the image to the sRGB color space using the lin2rgb function.
A_sRGB = lin2rgb(A);
montage({A,A_sRGB})
title('Original Image Before and After Gamma Correction')
Calculate the ground truth illuminant using the X-Rite ColorChecker chart that is included in the
scene. This chart consists of 24 neutral and color patches with known spectral reflectances.
Detect the chart in the gamma-corrected image by using the colorChecker function. The linear
RGB image is too dark for colorChecker to detect the chart automatically.
chart_sRGB = colorChecker(A_sRGB);
displayChart(chart_sRGB)
15-26
Comparison of Auto White Balance Algorithms
Get the coordinates of the registration points at the four corners of the chart.
registrationPoints = chart_sRGB.RegistrationPoints;
Create a new colorChecker object from the linear RGB data. Specify the location of the chart using
the coordinates of the registration points.
chart = colorChecker(A,"RegistrationPoints",registrationPoints);
Measure the ground truth illuminant of the linear RGB data using the measureIlluminant function.
illuminant_groundtruth = measureIlluminant(chart)
illuminant_groundtruth = 1×3
103 ×
When testing the AWB algorithms, prevent the algorithms from unfairly taking advantage of the chart
by masking out the chart.
Create a polygon ROI over the chart by using the drawpolygon function. Specify the vertices of the
polygon as the registration points.
15-27
15 Color
chartROI = drawpolygon("Position",registrationPoints);
Convert the polygon ROI to a binary mask by using the createMask function.
mask_chart = createMask(chartROI);
Invert the mask. Pixels within the chart are excluded from the mask and pixels of the rest of the
scene are included in the mask.
mask_scene = ~mask_chart;
To confirm the accuracy of the mask, display the mask over the image. Pixels included in the mask
have a blue tint.
imshow(labeloverlay(A_sRGB,mask_scene));
15-28
Comparison of Auto White Balance Algorithms
Angular Error
You can consider an illuminant as a vector in 3-D RGB color space. The magnitude of the estimated
illuminant does not matter as much as its direction, because the direction of the illuminant is what is
used to white balance an image.
To evaluate the quality of an estimated illuminant, compute the angular error between the estimated
illuminant and the ground truth. Angular error is the angle (in degrees) formed by the two vectors.
The smaller the angular error, the better the estimation is.
To better understand the concept of angular error, consider the following visualization of an arbitrary
illuminant and the ground truth measured using the ColorChecker chart. The plotColorAngle
helper function plots a unit vector of an illuminant in 3-D RGB color space, and is defined at the end
of the example.
sample_illuminant = [0.066 0.1262 0.0691];
15-29
15 Color
grid on
axis equal
The White Patch Retinex algorithm for illuminant estimation assumes that the scene contains a bright
achromatic patch. This patch reflects the maximum light possible for each color band, which is the
color of the scene illuminant. Use the illumwhite function to estimate illumination using the White
Patch Retinex algorithm.
Estimate the illuminant using all the pixels in the scene. Exclude the ColorChecker chart from the
scene by using the 'Mask' name-value pair argument.
percentileToExclude = 0;
illuminant_wp1 = illumwhite(A,percentileToExclude,'Mask',mask_scene);
Compute the angular error for the illuminant estimated with White Patch Retinex.
15-30
Comparison of Auto White Balance Algorithms
White balance the image using the chromadapt function. Specify the estimated illuminant and
indicate that color values are in the linear RGB color space.
B_wp1 = chromadapt(A,illuminant_wp1,'ColorSpace','linear-rgb');
figure
imshow(B_wp1_sRGB)
title('White-Balanced Image using White Patch Retinex with percentile=0')
The White Patch Retinex algorithm does not perform well when pixels are overexposed. To improve
the performance of the algorithm, exclude the top 1% of the brightest pixels.
percentileToExclude = 1;
illuminant_wp2 = illumwhite(A,percentileToExclude,'Mask',mask_scene);
Calculate the angular error for the estimated illuminant. The error is less than when estimating the
illuminant using all pixels.
err_wp2 = colorangle(illuminant_wp2,illuminant_groundtruth);
disp(['Angular error for White Patch with percentile=1: ' num2str(err_wp2)])
15-31
15 Color
White balance the image in the linear RGB color space using the estimated illuminant.
B_wp2 = chromadapt(A,illuminant_wp2,'ColorSpace','linear-rgb');
Gray World
The Gray World algorithm for illuminant estimation assumes that the average color of the world is
gray, or achromatic. Therefore, it calculates the scene illuminant as the average RGB value in the
image. Use the illumgray function to estimate illumination using the Gray World algorithm.
First, estimate the scene illuminant using all pixels of the image, excluding those corresponding to
the ColorChecker chart. The illumgray function provides a parameter to specify the percentiles of
bottom and top values (ordered by brightness) to exclude. Here, specify the percentiles as 0.
percentileToExclude = 0;
illuminant_gw1 = illumgray(A,percentileToExclude,'Mask',mask_scene);
15-32
Comparison of Auto White Balance Algorithms
Calculate the angular error between the estimated illuminant and the ground truth illuminant.
err_gw1 = colorangle(illuminant_gw1,illuminant_groundtruth);
disp(['Angular error for Gray World with percentiles=[0 0]: ' num2str(err_gw1)])
White balance the image in the linear RGB color space using the estimated illuminant.
B_gw1 = chromadapt(A,illuminant_gw1,'ColorSpace','linear-rgb');
B_gw1_sRGB = lin2rgb(B_gw1);
imshow(B_gw1_sRGB)
title('White-Balanced Image using Gray World with percentiles=[0 0]')
The Gray World algorithm does not perform well when pixels are underexposed or overexposed. To
improve the performance of the algorithm, exclude the top 1% of the darkest and brightest pixels.
percentileToExclude = 1;
illuminant_gw2 = illumgray(A,percentileToExclude,'Mask',mask_scene);
15-33
15 Color
Calculate the angular error for the estimated illuminant. The error is less than when estimating the
illuminant using all pixels.
White balance the image in the linear RGB color space using the estimated illuminant.
B_gw2 = chromadapt(A,illuminant_gw2,'ColorSpace','linear-rgb');
B_gw2_sRGB = lin2rgb(B_gw2);
imshow(B_gw2_sRGB)
title('White-Balanced Image using Gray World with percentiles=[1 1]')
Cheng's illuminant estimation method draws inspiration from spatial domain methods such as Grey
Edge [4], which assumes that the gradients of an image are achromatic. They show that Grey Edge
can be improved by artificially introducing strong gradients by shuffling image blocks, and conclude
that the strongest gradients follow the direction of the illuminant. Their method consists in ordering
pixels according to the norm of their projection along the direction of the mean image color, and
15-34
Comparison of Auto White Balance Algorithms
retaining the bottom and top percentile. These two groups correspond to strong gradients in the
image. Finally, they perform a principal component analysis (PCA) on the retained pixels and return
the first component as the estimated illuminant. Use the illumpca function to estimate illumination
using Cheng's PCA algorithm.
First, estimate the illuminant using the default percentage value of Cheng's PCA method, excluding
those corresponding to the ColorChecker chart.
illuminant_ch2 = illumpca(A,'Mask',mask_scene);
Calculate the angular error between the estimated illuminant and the ground truth illuminant.
err_ch2 = colorangle(illuminant_ch2,illuminant_groundtruth);
disp(['Angular error for Cheng with percentage=3.5: ' num2str(err_ch2)])
White balance the image in the linear RGB color space using the estimated illuminant.
B_ch2 = chromadapt(A,illuminant_ch2,'ColorSpace','linear-rgb');
B_ch2_sRGB = lin2rgb(B_ch2);
imshow(B_ch2_sRGB)
title('White-Balanced Image using Cheng with percentile=3.5')
15-35
15 Color
Now, estimate the scene illuminant using the bottom and top 5% of pixels along the direction of the
mean color. The second argument of the illumpca function specifies the percentiles of bottom and
top values (ordered by brightness) to exclude.
illuminant_ch1 = illumpca(A,5,'Mask',mask_scene);
Calculate the angular error between the estimated illuminant and the ground truth illuminant. The
error is less than when estimating the illuminant using the default percentage.
White balance the image in the linear RGB color space using the estimated illuminant.
B_ch1 = chromadapt(A,illuminant_ch1,'ColorSpace','linear-rgb');
B_ch1_sRGB = lin2rgb(B_ch1);
imshow(B_ch1_sRGB)
title('White-Balanced Image using Cheng with percentage=5')
15-36
Comparison of Auto White Balance Algorithms
To find the best parameter to use for each algorithm, you can sweep through a range and calculate
the angular error for each of them. The parameters of the three algorithms have different meanings,
but the similar ranges of these parameters makes it easy to programmatically search for the best one
for each algorithm.
param_range = 0:0.25:5;
err = zeros(numel(param_range),3);
for k = 1:numel(param_range)
% White Patch
illuminant_wp = illumwhite(A,param_range(k),'Mask',mask_scene);
err(k,1) = colorangle(illuminant_wp,illuminant_groundtruth);
% Gray World
illuminant_gw = illumgray(A,param_range(k),'Mask',mask_scene);
err(k,2) = colorangle(illuminant_gw,illuminant_groundtruth);
% Cheng
if (param_range(k) ~= 0)
illuminant_ch = illumpca(A,param_range(k),'Mask',mask_scene);
err(k,3) = colorangle(illuminant_ch,illuminant_groundtruth);
else
% Cheng's algorithm is undefined for percentage=0.
err(k,3) = NaN;
end
end
15-37
15 Color
Display a heatmap of the angular error using the heatmap function. Dark blue colors indicate a low
angular error while yellow colors indicate a high angular error. The optimal parameter has the
smallest angular error.
heatmap(err,'Title','Angular Error','Colormap',parula(length(param_range)), ...
'XData',["White Patch" "Gray World" "Cheng's PCA"], ...
'YLabel','Parameter Value','YData',string(param_range));
fprintf('The best parameter for White Patch is %1.2f with angular error %1.2f degrees\n', ...
best_param_wp,err(idx_best(1),1));
The best parameter for White Patch is 0.25 with angular error 3.35 degrees
fprintf('The best parameter for Gray World is %1.2f with angular error %1.2f degrees\n', ...
best_param_gw,err(idx_best(2),2));
The best parameter for Gray World is 0.00 with angular error 5.04 degrees
fprintf('The best parameter for Cheng is %1.2f with angular error %1.2f degrees\n', ...
best_param_ch,err(idx_best(3),3));
15-38
Comparison of Auto White Balance Algorithms
The best parameter for Cheng is 0.50 with angular error 1.74 degrees
Calculate the estimated illuminant for each algorithm using the best parameter.
best_illum_wp = illumwhite(A,best_param_wp,'Mask',mask_scene);
best_illum_gw = illumgray(A,best_param_gw,'Mask',mask_scene);
best_illum_ch = illumpca(A,best_param_ch,'Mask',mask_scene);
Display the angular error of each best illuminant in the RGB color space.
p = plot3([0 1],[0 1],[0,1],'LineStyle',':','Color','k');
ax = p.Parent;
hold on
plotColorAngle(illuminant_groundtruth,ax)
plotColorAngle(best_illum_wp,ax)
plotColorAngle(best_illum_gw,ax)
plotColorAngle(best_illum_ch,ax)
title('Best Illuminants in RGB space')
view(28,36)
legend('Achromatic Line','Ground Truth','White Patch','Gray World','Cheng')
grid on
axis equal
Calculate the optimal white-balanced images for each algorithm using the best illuminant.
B_wp_best = chromadapt(A,best_illum_wp,'ColorSpace','linear-rgb');
B_wp_best_sRGB = lin2rgb(B_wp_best);
15-39
15 Color
B_gw_best = chromadapt(A,best_illum_gw,'ColorSpace','linear-rgb');
B_gw_best_sRGB = lin2rgb(B_gw_best);
B_ch_best = chromadapt(A,best_illum_ch,'ColorSpace','linear-rgb');
B_ch_best_sRGB = lin2rgb(B_ch_best);
figure
montage({B_wp_best_sRGB,B_gw_best_sRGB,B_ch_best_sRGB},'Size',[1 3])
title('Montage of Best White-Balanced Images: White Point, Gray World, Cheng')
Conclusion
This comparison of two classic illuminant estimation algorithms and a more recent one shows that
Cheng's method, using the top and bottom 0.75% darkest and brightest pixels, wins for that
particular image. However, this result should be taken with a grain of salt.
First, the ground truth illuminant was measured using a ColorChecker chart and is sensitive to shot
and sensor noise. The ground truth illuminant of a scene can be better estimated using a
spectrophotometer.
Second, the ground truth illuminant is estimated as the mean color of the neutral patches. It is
common to use the median instead of the mean, which could shift the ground truth by a significant
amount. For example, for the image in this study, using the same pixels, the median color and the
mean color of the neutral patches are 0.5 degrees apart, which in some cases can be more than the
angular error of the illuminants estimated by different algorithms.
Third, a full comparison of illuminant estimation algorithms should use a variety of images taken
under different conditions. One algorithm might work better than the others for a particular image,
but might perform poorly over the entire data set.
Supporting Function
The plotColorAngle function plots a unit vector of an illuminant in 3-D RGB color space. The input
argument illum specifies the illuminant as an RGB color and the input argument ax specifies the
axes on which to plot the unit vector.
function plotColorAngle(illum,ax)
R = illum(1);
G = illum(2);
B = illum(3);
magRGB = norm(illum);
15-40
Comparison of Auto White Balance Algorithms
References
[1] Ebner, Marc. White Patch Retinex, Color Constancy. John Wiley & Sons, 2007. ISBN
978-0-470-05829-9.
[2] Ebner, Marc. The Gray World Assumption, Color Constancy. John Wiley & Sons, 2007. ISBN
978-0-470-05829-9.
[3] Cheng, Dongliang, Dilip K. Prasad, and Michael S. Brown. "Illuminant estimation for color
constancy: why spatial-domain methods work and the role of the color distribution." JOSA A
31.5 (2014): 1049-1058.
[4] Van De Weijer, Joost, Theo Gevers, and Arjan Gijsenij. "Edge-based color constancy." IEEE
Transactions on image processing 16.9 (2007): 2207-2214.
See Also
chromadapt | colorChecker | colorangle | illumgray | illumpca | illumwhite | lin2rgb |
measureColor | rgb2lin
More About
• “Gamma Correction” on page 11-73
15-41
15 Color
Create a colorChecker object then display the chart with ROI annotations.
chart = colorChecker(I);
displayChart(chart)
Measure the color in each color patch ROI and return the measurements in a table, colorTable. The
color difference measurements in the Delta_E variable of the table follow the CIE76 standard.
colorTable = measureColor(chart);
On a color patch diagram, display the measured and reference colors with the corresponding CIE76
color difference superimposed on each patch.
displayColorPatch(colorTable)
15-42
Calculate CIE94 Color Difference of Colors on Test Chart
Extract the reference L*a*b* and measured RGB color values into a table.
referenceLab = colorTable{:,{'Reference_L','Reference_a','Reference_b'}};
measuredRGB = colorTable{:,{'Measured_R','Measured_G','Measured_B'}};
Convert the measured RGB colors to the L*a*b* color space, specifying a D50 white point.
measuredLab = rgb2lab(measuredRGB,"WhitePoint","d50");
Calculate the color difference using the imcolordiff function, specifying that the color
measurements are in the L*a*b* color space. By default, this function calculates color differences
using the CIE94 standard.
dE = imcolordiff(measuredLab,referenceLab,"isInputLab",true);
Create a new color table using the new color difference measurements.
colorTable94 = colorTable;
colorTable94{:,"Delta_E"} = dE;
On a color patch diagram, display the measured and reference colors with the corresponding CIE94
color difference superimposed on each patch.
displayColorPatch(colorTable94)
15-43
15 Color
See Also
deltaE | displayChart | displayColorPatch | imcolordiff | plotChromaticity | rgb2lab
Related Examples
• “Correct Colors Using Color Correction Matrix” on page 11-165
• “Comparison of Auto White Balance Algorithms” on page 15-25
15-44
16
This topic describes how to work with big image data that does not fit in memory. Big images can
have multiple resolution levels.
Blocked images work with multiresolution images, in which image data of a scene is stored as a set of
images at different resolution levels. Blocked images assume that the spatial extents of each level are
the same, in other words, that all levels cover the same physical area in the real world. The first step
in working with a large multiresolution image is to validate this assumption.
This example uses one image from the Camelyon16 data set. This data set contains 400 whole-slide
images (WSIs) of lymph nodes, stored as multiresolution TIF files that are too large to be loaded into
memory.
imageDir = fullfile('I:\','Camelyon16');
if ~exist(imageDir,'dir')
mkdir(imageDir);
end
To download the image, go to the Camelyon17 website and click the first "CAMELYON16 data set"
link. Open the "training" then "tumor" directory. Download the "tumor_091.tif" file and move the file
to the directory specified by the imageDir variable.
Create a blockedImage object with the default spatial referencing information. By default, a blocked
image sets the spatial referencing of each level to have the same world extents as the finest layer. The
finest layer is the layer that has the highest resolution and the most pixels.
fileName = fullfile(imageDir,'tumor_091.tif');
bim = blockedImage(fileName);
Display the spatial referencing information at the finest level. The image size (specified by the Size
property) matches the extents in world coordinates. Notice that the default image coordinate system
puts the center of the first pixel at (1,1). Because the pixel extents are 1 unit wide in each dimension,
the left edge of the first pixel starts at (0.5,0.5).
finestLevel = 1;
finestStart = bim.WorldStart(finestLevel,:)
finestEnd = bim.WorldEnd(finestLevel,:)
finestStart =
finestEnd =
1.0e+04 *
16-2
Set Up Spatial Referencing for Blocked Images
Display the spatial referencing information at the coarsest level. The world extents are the same as
the finest level, but the coarse image size is only 512-by-512 pixels. Effectively, each pixel in this
coarse level corresponds to a 105-by-120 block of pixels in the finest resolution.
coarsestLevel = bim.NumLevels;
coarsestStart = bim.WorldStart(coarsestLevel,:)
coarsestEnd = bim.WorldEnd(coarsestLevel,:)
coarsestStart =
coarsestEnd =
1.0e+04 *
Display the image size and aspect ratio at each level. The aspect ratio is not consistent, which
indicates that levels do not all span the same world area. Therefore, the default assumption is
incorrect for this image.
t = table((1:8)',bim.Size(:,1),bim.Size(:,2), ...
bim.Size(:,1)./bim.Size(:,2), ...
'VariableNames',["Level" "Height" "Width" "Aspect Ratio"]);
disp(t)
Display the blocked image by using the bigimageshow function. Display the coarsest resolution
level.
figure
subplot(1,2,1);
hl = bigimageshow(bim,'ResolutionLevel',coarsestLevel);
title('Coarsest Resolution Level (8)')
%
16-3
16 Blocked Image Processing
Display image data at the default resolution level in the same figure window. By default,
bigimageshow selects the level to display based on the screen resolution and the size of the
displayed region.
subplot(1,2,2);
hr = bigimageshow(bim);
title('Default Resolution Level')
%
16-4
Set Up Spatial Referencing for Blocked Images
linkaxes([hl.Parent,hr.Parent]);
16-5
16 Blocked Image Processing
Zoom in on a feature.
xlim([45000 50000]);
ylim([12000 17000]);
16-6
Set Up Spatial Referencing for Blocked Images
Change the resolution level of the image on the right side of the figure window. At level 6, the
features look aligned with the coarsest level.
hr.ResolutionLevel = 6;
title('Level 6');
snapnow
%
16-7
16 Blocked Image Processing
At level 1, the features are not aligned. Therefore, level 1 and level 8 do not span the same world
extents.
hr.ResolutionLevel = 1;
title('Level 1');
snapnow
%
16-8
Set Up Spatial Referencing for Blocked Images
Usually the original source of the data has spatial referencing information encoded in its metadata.
For the Camelyon16 data set, spatial referencing information is stored as XML content in the
ImageDescription metadata field at the finest resolution level. The XML content has a
DICOM_PIXEL_SPACING attribute for each resolution level that specifies the pixel extents.
Get the ImageDescription metadata field at the finest resolution level of the blockedImage
object.
binfo = imfinfo(bim.Source);
binfo = binfo(1).ImageDescription;
Search the content for the string "DICOM_PIXEL_SPACING". There are nine matches found. The
second instance of the attribute corresponds to the pixel spacing of the finest level. The last instance
of the attribute corresponds to the pixel spacing of the coarsest level.
indx = strfind(binfo,"DICOM_PIXEL_SPACING");
Store the pixel spacing at the finest level. To extract the value of the pixel spacing from the XML text,
visually inspect the text following the second instance of the "DICOM_PIXEL_SPACING" attribute.
disp(binfo(indx(2):indx(2)+100))
pixelSpacing_L1 = 0.000227273;
16-9
16 Blocked Image Processing
Similarly, store the pixel spacing at the coarsest level. To extract the value of the pixel spacing from
the XML text, visually inspect the text following the last instance of the "DICOM_PIXEL_SPACING"
attribute.
disp(binfo(indx(end):indx(end)+100))
pixelSpacing_L8 = 0.0290909;
pixelDims = pixelSpacing_L8/pixelSpacing_L1;
The finest level has the reference spatial extent. Calculate the corresponding extents for level 8 with
respect to the extents of level 1.
worldExtents = bim.Size(8,1:2).*pixelDims;
bim.WorldEnd(8,1:2) = worldExtents(2);
Redisplay the data to confirm alignment of the key feature. Show level 8 on the left side and level 1
on the right side.
hl.CData = bim;
hl.ResolutionLevel = 8;
snapnow
hr.CData = bim;
hr.ResolutionLevel = 1;
snapnow
16-10
Set Up Spatial Referencing for Blocked Images
16-11
16 Blocked Image Processing
See Also
bigimageshow | blockedImage
More About
• “Define World Coordinate System of Image” on page 2-6
16-12
Process Blocked Images Efficiently Using Partial Images or Lower Resolutions
Processing blocked images can be time consuming, which makes iterative development of algorithms
prohibitively expensive. There are two common ways to shorten the feedback cycle: iterate on a lower
resolution image or iterate on a partial region of the blocked image. This example demonstrates both
of these approaches for creating a segmentation mask for a blocked image.
If you have Parallel Computing Toolbox™ installed, then you can further accelerate the processing by
using multiple workers.
Create a blockedImage object using a modified version of image "tumor_091.tif" from the
CAMELYON16 data set. The original image is a training image of a lymph node containing tumor
tissue. The original image has eight resolution levels, and the finest level has resolution 53760-
by-61440. The modified image has only three coarse resolution levels. The spatial referencing of the
modified image has been adjusted to enforce a consistent aspect ratio and to register features at each
level.
bim = blockedImage('tumor_091R.tif');
16-13
16 Blocked Image Processing
Many blocked images contain multiple resolution levels, including coarse lower resolution versions of
the finest high-resolution image. In general, the distribution of individual pixel values should be
roughly equal across all the levels. Leveraging this assumption, you can compute global statistics at a
coarse level and then use the statistics to process the finer levels.
Extract the image at the coarsest level, then convert the image to grayscale.
Threshold the image into two classes and display the result.
thresh = graythresh(imLowResGray);
imLowResQuant = imbinarize(imLowResGray,thresh);
imshow(imLowResQuant)
16-14
Process Blocked Images Efficiently Using Partial Images or Lower Resolutions
Validate on the largest image. Negate the result to obtain a mask for the stained region.
bq = apply(bim, ...
@(bs)~imbinarize(rgb2gray(bs.Data),thresh));
bigimageshow(bq,'CDataMapping','scaled');
16-15
16 Blocked Image Processing
Another approach while working with large images is to extract a smaller region with features of
interest. You can compute statistics from the ROI and then use the statistics to process the entire
high-resolution image.
16-16
Process Blocked Images Efficiently Using Partial Images or Lower Resolutions
xrange = xlim;
yrange = ylim;
imRegion = getRegion(bim,[900 2400 1],[1700 3300 3],'Level',1);
imshow(imRegion);
16-17
16 Blocked Image Processing
imRegionGray = rgb2gray(imRegion);
thresh = graythresh(imRegionGray);
imLowResQuant = ~imbinarize(imRegionGray,thresh);
imshow(imLowResQuant)
16-18
Process Blocked Images Efficiently Using Partial Images or Lower Resolutions
bq = apply(bim, ...
@(bs)~imbinarize(rgb2gray(bs.Data),thresh));
bigimageshow(bq,'CDataMapping','scaled');
16-19
16 Blocked Image Processing
If you have the Parallel Computing Toolbox™ installed, then you can distribute the processing across
multiple workers to accelerate the processing. To try processing the image in parallel, set the
runInParallel variable to true.
runInParallel = false;
if runInParallel
% Open a pool
p = gcp;
% Ensure workers are on the same folder as the file to be able to
% access it using just the relative path
sourceDir = fileparts(which('tumor_091R.tif'));
spmd
cd(sourceDir)
end
% Run in parallel
16-20
Process Blocked Images Efficiently Using Partial Images or Lower Resolutions
bq = apply(bim, ...
@(bs)~imbinarize(rgb2gray(bs.Data),thresh),'UseParallel',true);
end
See Also
apply | bigimageshow | blockedImage
16-21
16 Blocked Image Processing
Some sources of large images have meaningful data in only a small portion of the image. You can
improve the total processing time by limiting processing to the regions of interest (ROI) containing
meaningful data. Use a mask to define ROIs. A mask is a logical image in which true pixels represent
the ROI.
In the blocked image workflow, the mask represents the same spatial region as the image data but it
does not need not have the same size as the image. To further improve the efficiency of the workflow,
create a mask from a coarse image, especially one that fits in memory. Then, use the coarse mask to
process the finer images.
Create a blocked image using a modified version of image "tumor_091.tif" from the CAMELYON16
data set. The original image is a training image of a lymph node containing tumor tissue. The original
image has eight resolution levels, and the finest level has resolution 53760-by-61440. The modified
image has only three coarse resolution levels. The spatial referencing of the modified image has been
adjusted to enforce a consistent aspect ratio and to register features at each level.
bim = blockedImage('tumor_091R.tif');
bigimageshow(bim);
16-22
Process Blocked Images Efficiently Using Mask
Create Mask
coarseLevelSize = 1×3
625 670 3
imLowRes = gather(bim);
You can generate a mask from the coarse level using the Image Segmenter app. The app expects a
grayscale input image, so get the lightness channel from the coarse image.
imLowResL = rgb2lightness(imLowRes);
To run Image Segmenter app, enter this command in the command window:
imageSegmenter(imLowResL). After you define the mask, export the mask, BW, or the code that the
app used to create the mask. This section of the example uses code exported from the app. Run this
code to create a mask from the coarse input image.
%----------------------------------------------------
% Normalize input data to range in [0,1].
Xmin = min(imLowResL(:));
Xmax = max(imLowResL(:));
if isequal(Xmax,Xmin)
imLowResL = 0*imLowResL;
else
imLowResL = (imLowResL - Xmin) ./ (Xmax - Xmin);
end
% Invert mask
BW = imcomplement(BW);
imshow(BW)
16-23
16 Blocked Image Processing
Create a blocked image from the mask with the same spatial referencing as the input mask.
bmask = blockedImage(BW,'WorldEnd',bim.WorldEnd(3,1:2));
h = bigimageshow(bim);
h.Parent.Color = 'g';
h.Parent.Alphamap = [1 .5];
h.AlphaData = bmask;
h.AlphaDataMapping = 'direct';
16-24
Process Blocked Images Efficiently Using Mask
The apply function processes blocked images one block at a time. You can use the
'InclusionThreshold' property with the mask to specify which blocks the apply function uses.
The inclusion threshold specifies the percentage of mask pixels that must be true for apply to
process the block.
Highlight the blocks that apply will process using the default inclusion threshold, 0.5. Only center
blocks, highlighted in green, will be processed.
h = bigimageshow(bim);
showmask(h,bmask,1);
title('Mask with Default Inclusion Threshold')
16-25
16 Blocked Image Processing
showmask(h,bmask,1,'InclusionThreshold',0.4);
title('InclusionThreshold == 0.4')
16-26
Process Blocked Images Efficiently Using Mask
In the extreme case, process all blocks that have at least a single true pixel in the mask. To specify
this option, set the 'InclusionThreshold' property to 0. Not all blocks of the image are included.
showmask(h,bmask,1,'InclusionThreshold',0);
title('InclusionThreshold == 0')
16-27
16 Blocked Image Processing
Using the mask with any value of 'InclusionThreshold' will decrease the total execution time
because apply will process fewer blocks than the full image. The benefit of using a mask becomes
more significant as the image increases in resolution and as the processing pipelines increases in
complexity.
Measure the execution time of filtering only blocks with the ROI.
bls = selectBlockLocations(bim, "Mask", bmask,"InclusionThreshold", 0);
tic
16-28
Process Blocked Images Efficiently Using Mask
bigimageshow(boutMasked)
defaultBlockSize = bim.BlockSize(1,:);
title(['Processed Image Using Mask with Default BlockSize == [' ...
num2str(defaultBlockSize) ']']);
Compare the execution time of processing the full image compared to processing only blocks in the
ROI.
16-29
16 Blocked Image Processing
You can decrease the block size to gain a tighter wrap around the ROI. For some block sizes, this will
reduce the execution time because apply will process fewer pixels outside the ROI. However, if the
block size is too small, then performance will decrease because the overhead of processing a larger
number of block will offset the reduction in the number of pixels processed.
Highlight the blocks that apply will process using a smaller block size. To specify the block size, set
the 'BlockSize' property.
Measure the execution time of filtering all blocks within the ROI with a decreased block size.
16-30
Process Blocked Images Efficiently Using Mask
tSmallerBlockProcessing = toc;
bigimageshow(boutMasked);
title(['Processed Image Using Mask with BlockSize == [' ...
num2str(blockSize) ']']);
16-31
16 Blocked Image Processing
Compare the execution time of processing the entire ROI with smaller blocks compared to larger
blocks.
disp(['Additional speedup using mask with decreased block size: ' ...
num2str(tMaskedProcessing/tSmallerBlockProcessing) 'x']);
See Also
apply | bigimageshow | blockedImage
16-32
Explore Blocked Image Details with Interactive ROIs
bigimageshow displays blockedImage objects. If the blockedImage object has multiple levels,
then bigimageshow automatically picks the appropriate level based on the screen size and the view
port. bigimageshow always works in a single 'world coordinate' and displays each level based on its
spatial referencing information. This allows two displays of the same blockedImage object to show
image detail at different levels, but share the same coordinate system.
Create a blockedImage using a modified version of image "tumor_091.tif" from the CAMELYON16
data set. The original image is a training image of a lymph node containing tumor tissue. The original
image has eight resolution levels, and the finest level has resolution 53760-by-61440. The modified
image has only three coarse resolution levels. The spatial referencing of the modified image has been
adjusted to enforce a consistent aspect ratio and to register features at each level.
bim = blockedImage('tumor_091R.tif');
Display the entire big image on the left side of a figure window by using the bigimageshow function.
The resolution level of the displayed overview automatically changes depending on the size of the
window and your screen size.
hf = figure;
haOView = subplot(1,2,1);
haOView.Tag = 'OverView';
hl = bigimageshow(bim,'Parent',haOView);
16-33
16 Blocked Image Processing
Fix the resolution level of the overview image as the coarsest resolution level.
coarsestLevel = bim.NumLevels;
hl.ResolutionLevel = coarsestLevel;
title('Overview');
16-34
Explore Blocked Image Details with Interactive ROIs
Display a detail view of the big image on the right side of the figure window. Allow bigimageshow to
manage the level of the detail image automatically.
haDetailView = subplot(1,2,2);
haDetailView.Tag = 'DetailView';
hr = bigimageshow(bim,'Parent',haDetailView);
16-35
16 Blocked Image Processing
xlim([2800,3050])
ylim([500,750])
title('Detailed View');
16-36
Explore Blocked Image Details with Interactive ROIs
In the overview image, draw a rectangle ROI. This example specifies the initial size and position of
the rectangle programmatically by setting the Position property as a four-element vector of the
form [xmin,ymin,width,height]. After the ROI appears on the overview, you can adjust the size and
postion of the ROI interactively.
xrange = xlim;
yrange = ylim;
roiPosition = [xrange(1) yrange(1) xrange(2)-xrange(1) yrange(2)-yrange(1)];
hrOView = drawrectangle(haOView,'Position',roiPosition,'Color','r');
16-37
16 Blocked Image Processing
Save the handles of the rectangle to use when defining the interaction between the rectangle and the
detail view.
hrOView.UserData.haDetailView = haDetailView;
haDetailView.UserData.hrOView = hrOView;
Add listeners to the detail view. These listeners detect changes in the spatial extents of the detail
view. When the spatial extents change, the listeners call the updateOverviewROI helper function,
which updates the extents of the ROI to match the extents of the detail view. The helper function is
defined at the end of this example.
addlistener(haDetailView,'XLim','PostSet',@updateOverviewROI);
addlistener(haDetailView,'YLim','PostSet',@updateOverviewROI);
Add a listener to the rectangle ROI. These listeners detect changes in the spatial extent of the
rectangle. When the limits change, the listeners call the updateOverViewROI helper function, which
updates the extents of the detail image to match the extents of the ROI. The helper function is defined
at the end of this example.
addlistener(hrOView,'MovingROI',@updateDetailView);
You can now change the size and position of the rectangle ROI interactively to adjust the display view.
Similarly, when you zoom and pan the detail view, the size and position of the ROI updates.
This example changes the size and position of the ROI programmatically by setting the Position
property.
16-38
Explore Blocked Image Details with Interactive ROIs
hrOView.Position = [2230,1300,980,840];
evt.CurrentPosition = hrOView.Position;
updateDetailView(hrOView,evt);
function updateOverviewROI(~,hEvt)
% Update overview rectangle position whenever the right hand side
% zooms/pans.
ha = hEvt.AffectedObject;
hr = hEvt.AffectedObject.UserData.hrOView;
hr.Position = [ha.XLim(1),ha.YLim(1),diff(ha.XLim),diff(ha.YLim)];
end
function updateDetailView(hSrc,hEvt)
% Update the right side detail view anytime the overview rectangle is
% moved. bigimageshow automatically picks the appropriate image level.
ha = hSrc.UserData.haDetailView;
ha.XLim = [hEvt.CurrentPosition(1), ...
hEvt.CurrentPosition(1)+hEvt.CurrentPosition(3)];
ha.YLim = [hEvt.CurrentPosition(2), ...
hEvt.CurrentPosition(2)+hEvt.CurrentPosition(4)];
end
16-39
16 Blocked Image Processing
See Also
bigimageshow | blockedImage
16-40
Warp Blocked Image at Coarse and Fine Resolution Levels
Applying a geometric transformation to an image is a key step in many image processing applications
like image registration. You can use imwarp to warp coarse images that fit in memory. For large,
high-resolution images that do not fit in memory, use a blocked image. Set the spatial referencing of
the warped image to preserve characteristics of the image such as pixel extents.
Create a blocked image using a modified version of image "tumor_091.tif" from the CAMELYON16
data set. The original image is a training image of a lymph node containing tumor tissue. The original
image has eight resolution levels, and the finest level has resolution 53760-by-61440. The modified
image has only three coarse resolution levels. The spatial referencing of the modified image has been
adjusted to enforce a consistent aspect ratio and to register features at each level.
bim = blockedImage('tumor_091R.tif');
Create an affine2d object that stores information about an affine geometric transformation. This
transformation applies translation and shear.
tform = affine2d([
0.99 0.01 0
0.17 0.98 0
120 -30 1]);
imCoarse = gather(bim);
Warp the coarse image by using imwarp. Display the original image and the warped image in a
montage.
imCoarseWarped = imwarp(imCoarse,tform);
figure
imshow(imCoarseWarped)
16-41
16 Blocked Image Processing
Before applying the geometric transformation to the image at a fine resolution level, calculate the
spatial referencing of the blocked image after the warping. Use this spatial referencing when
transforming blocks
Get the pixel extent of the original image from its spatial referencing information.
inPixelExtent = (bim.WorldEnd(1,:)-bim.WorldStart(1,:))./bim.Size(1,:);
Calculate the output horizontal and vertical spatial limits when the transformation is applied.
yWorldLimits = [bim.WorldStart(1,1), bim.WorldEnd(1,1)];
xWorldLimits = [bim.WorldStart(1,2), bim.WorldEnd(1,2)];
[xout, yout] = outputLimits(tform,xWorldLimits,yWorldLimits);
Calculate the size of the output image that preserves the pixel extent. Specify the image size in the
format [numrows, numcols].
outImgSize = [ceil(diff(yout)/inPixelExtent(1)),...
ceil(diff(xout)/inPixelExtent(2))];
16-42
Warp Blocked Image at Coarse and Fine Resolution Levels
Store the spatial referencing information of the warped image. Set the world limits and image size of
the warped image.
outWorldStart = [yout(1),xout(1)];
outWorldEnd = [yout(2), xout(2)];
Create a writable blocked image by specifying the output spatial referencing information. Specify a
block size that is large enough to use memory efficiently.
outBlockSize = [1024 1024 3];
bwarped = blockedImage([],[outImgSize 3],outBlockSize,uint8(0),...
'Mode','w',...
'WorldStart', [yout(1), xout(1)],...
'WorldEnd', [yout(2), xout(2)]);
Loop through the output image, one block at a time. For each output block:
If you have Parallel Computing Toolbox™, then you can replace the outer for statement with a
parfor statement to run the loop in parallel.
inYWorldLimits = [bim.WorldStart(1,1), bim.WorldEnd(1,1)];
inXWorldLimits = [bim.WorldStart(1,2), bim.WorldEnd(1,2)];
16-43
16 Blocked Image Processing
blockEndWorld(1)+halfPixWidth(1)];
outRegionRef.XWorldLimits = [blockStartWorld(2)-halfPixWidth(2),...
blockEndWorld(2)+halfPixWidth(2)];
end
end
16-44
Warp Blocked Image at Coarse and Fine Resolution Levels
bwarped.Mode = 'r';
figure
bigimageshow(bwarped)
See Also
affine2d | bigimageshow | blockedImage | getRegion | imref2d | setBlock |
transformPointsInverse
16-45
16 Blocked Image Processing
There are several ways to specify categorical label data for an image. This example shows two
approaches. One approach uses polygonal ROI objects that store the coordinates of the boundaries of
tumor and normal tissue. The other approach uses a mask to indicate a binary segmentation of the
image into tissue and background. The example combines the information in the polygon coordinates
and mask representations to create a single labeled blocked image.
Create a blocked image using a modified version of an image from the CAMELYON16 data set, a
training image of a lymph node containing tumor tissue. The modified image has three coarse
resolution levels. The spatial referencing has been adjusted to enforce a consistent aspect ratio and
to register features at each level.
bim = blockedImage('tumor_091R.tif');
Get the spatial referencing and pixel extent of the blocked image at the desired output level.
The CAMELYON16 data set provides labels of tumor and normal regions as a set of coordinates
specifying manually annotated region boundaries with respect to the finest resolution level. When
pixels exist within the boundaries of both a normal region and a tumor region, the correct label for
those pixels is normal tissue.
Load label data for the blocked image. This example uses a modified version of labels of the
"tumor_091.tif" image from the CAMELYON16 data set. The original labels are stored in XML format.
The modified labels are resampled and saved as a MAT file.
roiPoints = load('labelledROIs.mat')
Create polygonal ROI objects that store the coordinates of the tumor boundaries and normal tissue
boundaries.
Display the image overlaid with the annotated ROIs. The ROIs have the same coordinate system as
the image, so changing the resolution levels of the displayed image still renders the ROIs accurately.
figure
h = bigimageshow(bim);
16-46
Create Labeled Blocked Image from ROIs and Masks
set(tumorPolys,'Parent',gca);
set(normalPolys,'Parent',gca);
title(['Resolution Level:' num2str(h.ResolutionLevel)]);
xlim([3940 4290])
ylim([2680 3010])
title(['Resolution Level:' num2str(h.ResolutionLevel)]);
16-47
16 Blocked Image Processing
Create a mask at the coarsest resolution level for the stained region, which includes both tumor and
normal tissue. The mask is 1 (true) for pixels whose grayscale value is less than 130. Fill small holes
in the mask by performing morphological closing using the bwmorph function.
16-48
Create Labeled Blocked Image from ROIs and Masks
Combine the information in the polygon coordinates and mask representations to create a single
labeled blocked image.
To store the labeled image, create a writeable blocked image with data type categorical. Specify the
required class names and the corresponding numeric pixel label ID values. Assign the label 0 to the
'Background' class.
bLabeled =
blockedImage with properties:
16-49
16 Blocked Image Processing
Settable properties
InitialValue: <undefined>
Loop through each output blocked image, one block at a time. Determine the label of each block, then
set the pixel data of the block accordingly.
To determine the label of each block, start with the mask of all tissue. Pixel values of 0 in the mask
correspond to background, which matches the pixel label ID of the 'Background' class. Pixel values of
1 in the mask correspond to all tissue, which matches the pixel label ID of the 'Normal' class. Convert
the polygon coordinates of tumor tissue to a mask by using the poly2mask function, then replace
those pixels with the pixel label ID of the 'Tumor' class, 2.
If you have Parallel Computing Toolbox™, then you can run the loop in parallel by replacing the for
statement with a parfor statement.
% Read the corresponding region from the tissue mask. Since the
% mask is at a different level, convert coordinates to world and
% back.
blockStartEndInWorld = bLabeled.sub2world([blockStart; blockEnd]);
blockStartEndMaskSub = tissueMask.world2sub(blockStartEndInWorld);
maskBlock = getRegion(tissueMask, blockStartEndMaskSub(1,:), blockStartEndMaskSub(2,:));
% Some healthy tissue ROIs are enclosed within a tumor ROI. Find
% the pixel coordinates of healthy tissue then convert the polygon
% coordinates to a mask.
for ind = 1:numel(roiPoints.nonCancerRegions)
vertices = roiPoints.nonCancerRegions{ind};
% Transform coordinates to local block pixel locations
16-50
Create Labeled Blocked Image from ROIs and Masks
vertices = (vertices-xyStart);
isHealthy = poly2mask(vertices(:,1),vertices(:,2), ...
size(roiBlock,1), size(roiBlock,2));
roiBlock = roiBlock & ~isHealthy;
end
end
end
bLabeled.Mode = 'r';
Display the image data, then display the labeled image data in the same axes. The three labels
(normal, tumor, and background) appear in three different colors. Make the labels partially
transparent so that you can distinguish the image content underneath.
figure
hbim = bigimageshow(bim);
hla = axes;
hbl = bigimageshow(bLabeled,'Parent',hla);
hbl.AlphaData = 0.7;
hla.Visible = 'off';
16-51
16 Blocked Image Processing
Zoom in to a ROI. Increase the label transparency so that you can more clearly distinguish the image
content underneath.
linkaxes(findall(gcf,'Type','axes'));
16-52
Create Labeled Blocked Image from ROIs and Masks
xlim([3940 4290])
ylim([2680 3010])
hbl.AlphaData = 0.5;
16-53
16 Blocked Image Processing
For most data sets, you can create labels once and then reuse the labels for multiple training
sessions. The labeled blocked image, bLabeled, is backed by temporary files that do not exist across
MATLAB® sessions. To reuse the labels in a different session of MATLAB, write bLabeled to a
persistent location.
imageDir = 'Labels';
if exist(imageDir,'dir')
rmdir('Labels','s');
end
labelDir = fullfile(imageDir,'labelled');
write(bLabeled,labelDir);
In a fresh session of MATLAB, you can reload the labeled blocked image by creating a new
blockedImage object. When loading a labeled blocked image, you must also specify 'Classes',
'PixelLabelIDs', and 'UndefinedID'.
bLabeled = blockedImage(labelDir);
See Also
bigimageshow | blockedImage | blockedImageDatastore
16-54
Create Labeled Blocked Image from ROIs and Masks
Related Examples
• “Classify Large Multiresolution Images Using blockedImage and Deep Learning” (Deep
Learning Toolbox)
16-55
16 Blocked Image Processing
The example first builds a C++ interface to the OpenSlide library using the MATLAB clibgen
function. The example then uses functions from the OpenSlide library to implement a custom blocked
image adapter.
addpath(pwd)
Download the latest OpenSlide library for your computer and operating system. This example
assumes a Windows computer.
Create a variable that points to where you extracted the OpenSlide Library. This folder is expected to
contain bin\, include\, and lib\ subfolders.
OpenSlideInstall = 'I:\my_example\openslide-win64-20171122';
dir(OpenSlideInstall)
Add the location of the OpenSlide shared library to the system path.
Create variables that contain the names of folders containing key elements of your development
environment.
Create a variable that points to the folder where you want to store the predefined definition file for
the OpenSlide interface that you are creating.
ExampleDir = 'I:\my_example';
Create a variable to point to a test image file. Download CMU-1.zip test file from the OpenSlide test
data page, and update the variable below to point to the extracted image file.
imageLocation = 'I:\my_example\CMU-1.mrxs';
16-56
Read Whole-slide Images with Custom Blocked Image Adapter
Create a variable that points to a writable folder to store the generated interface files. Create a folder
in which to write the MATLAB OpenSlide Interface file and change to that folder.
OpenSlideInterface = 'I:\my_example\interfaceFolder';
if ~isfolder(OpenSlideInterface)
mkdir(OpenSlideInterface)
end
cd(OpenSlideInterface)
Create variables to point to the OpenSlide library folder, the two OpenSlide header files, the path to
the header files, the name of the OpenSlide library, and define the name you want to assign to the
generated interface.
libPath = fullfile(OpenSlideInstall,'lib');
hppFiles = {'openslide.h', 'openslide-features.h'};
hppPath = fullfile(OpenSlideInstall, 'include', 'openslide');
libFile = 'libopenslide.lib';
myPkg = 'OpenSlideInterface';
Call the clib.generateLibraryDefinition function, specifying the variables you have set up.
You can optional set the 'Verbose' parameter to true to display messages produced during
generation.
16-57
16 Blocked Image Processing
provides a template interface file that contains these edits but you still need to provide location
information for certain key folders. This section performs these edits on the template file.
First, rename the interface file you generated with the clib.generateLibraryDefinition
command and keep it as a backup file.
movefile('defineOpenSlideInterface.m','defineOpenSlideInterface_generated.m');
Delete the .mlx file created by the clibgen.generateLibraryDefinition function and then call
rehash.
delete defineOpenSlideInterface.mlx;
rehash
Edit the interface definition file template file that is included with the toolbox. The edits provide the
locations of key folders in your installation. First, open the interface definition template file, included
with this example, for read access. Read the contents into a variable, called interfaceContents,
and then close the template file.
fid = fopen(fullfile('defineOpenSlideInterface_template.m'),'rt');
interfaceContents = fread(fid, 'char=>char');
fclose(fid);
Update place holder variables in the template file variable, interfaceContents, with your actual
folder names.
Now, write the updated interface definition template variable to a new file. Open an interface
definition file for write access, write the template file variable to the file, and then close the file..
fid = fopen('defineOpenSlideInterface.m','wt');
fwrite(fid, interfaceContents);
fclose(fid);
To verify that the changes to the interface file were successful, you can view the differences between
the generated interface file and the edited interface template file.
Using the OpenSlide interface definition file, use the build command to create a MATLAB
OpenSlideInterface shared library.
build(defineOpenSlideInterface)
addpath osInterface\OpenSlideInterface\
Be sure to click the link in the message after the build is complete to add the interface file to the path
To view the functional capabilities of the interface library, use the summary function.
16-58
Read Whole-slide Images with Custom Blocked Image Adapter
summary(defineOpenSlideInterface)
Class clib.OpenSlideInterface.openslide_t
No Constructors defined
No Methods defined
No Properties defined
Functions
clib.OpenSlideInterface.openslide_t clib.OpenSlideInterface.openslide_open(string)
int32 clib.OpenSlideInterface.openslide_get_level_count(clib.OpenSlideInterface.openslide_t)
[int64,int64] clib.OpenSlideInterface.openslide_get_level_dimensions(clib.OpenSlideInterface.open
Note: 'int64' used as MLTYPE for C++ pointer argument.
Passing nullptr is not supported with 'int64' types.
To allow nullptr as an input, set MLTYPE to clib.array.
double clib.OpenSlideInterface.openslide_get_level_downsample(clib.OpenSlideInterface.openslide_t
clib.OpenSlideInterface.openslide_read_region(clib.OpenSlideInterface.openslide_t,clib.array.Open
clib.OpenSlideInterface.openslide_close(clib.OpenSlideInterface.openslide_t)
To test the library interface, try using the functions in the library with the sample image.
109240 220696
Read a region from level 0 using the openslide_read_region function. Setup a clibArray of
type UnsignedInt with desired width and height dimensions. Specify the top left x-coordinate and
the top left y-coordinate in the level 0 reference frame.
rawCData = clibArray('clib.OpenSlideInterface.UnsignedInt', [1024, 1024]);
clib.OpenSlideInterface.openslide_read_region(ob,rawCData,int64(33792),int64(113664),int32(0));
Post-process the acquired region from the clibArray to convert it into a uint8 RGB image.
rawImageData = uint32(rawCData);
RGBA = typecast(rawImageData(:), 'uint8');
% Ignore the A channel
RGB(:,:,1) = reshape(RGBA(3:4:end),1024,1024);
RGB(:,:,2) = reshape(RGBA(2:4:end),1024,1024);
RGB(:,:,3) = reshape(RGBA(1:4:end),1024,1024);
16-59
16 Blocked Image Processing
figure;
imshow(RGB);
To read whole-slide images, create a custom Adapter for block-based reading and writing that uses
the capabilities of the OpenSlide Interface built above.
To create a blocked image adapter, first create a class that subclasses the blocked image adapter
interface class, images.blocked.Adapter. To learn more about blocked images and creating a
blocked image adapter, view the images.blocked.Adapter documentation.
16-60
Read Whole-slide Images with Custom Blocked Image Adapter
Use the OpenSlide interface functions generated above to implement these methods.
A sample adapter is included in this example, OpenSlideAdapter.m. To view this adapter, you can
open the file in an editor.
Use the new adapter with the sample image by specifying it in the blockedImage object constructor:
bim =
blockedImage with properties:
Settable properties
BlockSize: [10×3 double]
UserData: [1×1 struct]
disp(bim.Size)
220696 109240 3
110348 54620 3
55174 27310 3
27587 13655 3
13793 6827 3
6896 3413 3
3448 1706 3
1724 853 3
862 426 3
431 213 3
bigimageshow(bim)
16-61
16 Blocked Image Processing
See Also
bigimageshow | blockedImage
16-62
17
This chapter discusses these generic block processing functions. Topics covered include
The toolbox includes several functions that you can use to implement image processing algorithms as
a block or neighborhood operation. These functions break the input image into blocks or
neighborhoods, call the specified function to process each block or neighborhood, and then
reassemble the results into an output image. The following table summarizes these functions.
Function Description
nlfilter Implements sliding neighborhood operations that you can use to
process an input image in a pixel-wise fashion. For each pixel in
the input image, the function performs the operation you specify
on a block of neighboring pixels to determine the value of the
corresponding pixel in the output image. For more information,
see “Sliding Neighborhood Operations” on page 17-3
blockproc Implements distinct block operations that you can use to process
an input image a block at a time. The function divides the image
into rectangular blocks, and performs the operation you specify
on each individual block to determine the values of the pixels in
the corresponding block of the output image. For more
information, see “Distinct Block Processing” on page 17-6
colfilt Implements column-wise processing operations which provide a
way of speeding up neighborhood or block operations by
rearranging blocks into matrix columns. For more information,
see “Use Column-wise Processing to Speed Up Sliding
Neighborhood or Distinct Block Operations” on page 17-21.
17-2
Sliding Neighborhood Operations
In this section...
“Determine the Center Pixel” on page 17-3
“General Algorithm of Sliding Neighborhood Operations” on page 17-4
“Border Padding Behavior in Sliding Neighborhood Operations” on page 17-4
“Implementing Linear and Nonlinear Filtering as Sliding Neighborhood Operations” on page 17-4
A sliding neighborhood operation is an operation that is performed a pixel at a time, with the value of
any given pixel in the output image being determined by the application of an algorithm to the values
of the corresponding input pixel's neighborhood. A pixel's neighborhood is some set of pixels, defined
by their locations relative to that pixel, which is called the center pixel. The neighborhood is a
rectangular block, and as you move from one element to the next in an image matrix, the
neighborhood block slides in the same direction. (To operate on an image a block at a time, rather
than a pixel at a time, use the distinct block processing function. See “Distinct Block Processing” on
page 17-6 for more information.)
The following figure shows the neighborhood blocks for some of the elements in a 6-by-5 matrix with
2-by-3 sliding blocks. The center pixel for each neighborhood is marked with a dot. For information
about how the center pixel is determined, see “Determine the Center Pixel” on page 17-3.
floor(([m n]+1)/2)
In the 2-by-3 block shown in the preceding figure, the center pixel is (1,2), or the pixel in the second
column of the top row of the neighborhood.
17-3
17 Neighborhood and Block Operations
For example, the function might be an averaging operation that sums the values of the neighborhood
pixels and then divides the result by the number of pixels in the neighborhood. The result of this
calculation is the value of the output pixel.
To process these neighborhoods, sliding neighborhood operations pad the borders of the image,
usually with 0's. In other words, these functions process the border pixels by assuming that the image
is surrounded by additional rows and columns of 0's. These rows and columns do not become part of
the output image and are used only as parts of the neighborhoods of the actual pixels in the image.
In addition to convolution, there are many other filtering operations you can implement through
sliding neighborhoods. Many of these operations are nonlinear in nature. For example, you can
implement a sliding neighborhood operation where the value of an output pixel is equal to the
standard deviation of the values of the pixels in the input pixel's neighborhood.
To implement a variety of sliding neighborhood operations, use the nlfilter function. nlfilter
takes as input arguments an image, a neighborhood size, and a function that returns a scalar, and
returns an image of the same size as the input image. nlfilter calculates the value of each pixel in
the output image by passing the corresponding input pixel's neighborhood to the function.
Note Many operations that nlfilter can implement run much faster if the computations are
performed on matrix columns rather than rectangular neighborhoods. For information about this
approach, see “Use Column-wise Processing to Speed Up Sliding Neighborhood or Distinct Block
Operations” on page 17-21.
17-4
Sliding Neighborhood Operations
For example, this code computes each output pixel by taking the standard deviation of the values of
the input pixel's 3-by-3 neighborhood (that is, the pixel itself and its eight contiguous neighbors).
I = imread('tire.tif');
I2 = nlfilter(I,[3 3],'std2');
You can also write code to implement a specific function, and then use this function with nlfilter.
For example, this command processes the matrix I in 2-by-3 neighborhoods with a function called
myfun.m. The syntax @myfun is an example of a function handle.
I2 = nlfilter(I,[2 3],@myfun);
If you prefer not to write code to implement a specific function, you can use an anonymous function
instead. This example converts the image to class double because the square root function is not
defined for the uint8 data type.
I = im2double(imread('tire.tif'));
f = @(x) sqrt(min(x(:)));
I2 = nlfilter(I,[2 2],f);
(For more information on function handles, see “Create Function Handle”. For more information
about anonymous functions, see “Anonymous Functions”.)
The following example uses nlfilter to set each pixel to the maximum value in its 3-by-3
neighborhood.
Note This example is only intended to illustrate the use of nlfilter. For a faster way to perform
this local maximum operation, use imdilate.
I = imread('tire.tif');
f = @(x) max(x(:));
I2 = nlfilter(I,[3 3],f);
imshow(I);
figure, imshow(I2);
17-5
17 Neighborhood and Block Operations
In distinct block processing, you divide an image matrix into rectangular blocks and perform image
processing operations on individual blocks. Blocks start in the upper left corner and completely cover
the image without overlap. If the blocks do not fit exactly over the image, then any incomplete blocks
are considered partial blocks. The figure shows a 15-by-30 pixel image divided into 4-by-8 pixel
blocks. The right and bottom edges have partial blocks.
You can process partial blocks as is, or you can add padding to the image so that the image size is a
multiple of the block size. For more information, see “Apply Padding” on page 17-7.
For example, the commands below process image I in 25-by-25 blocks with the function myfun. In
this case, the myfun function resizes the blocks to make a thumbnail. (For more information about
function handles, see “Create Function Handle”. For more information about anonymous functions,
see “Anonymous Functions”.)
Note Due to block edge effects, resizing an image using blockproc does not produce the same
results as resizing the entire image at once.
17-6
Distinct Block Processing
The example below uses the blockproc function to set every pixel in each 32-by-32 block of an
image to the average of the elements in that block. The anonymous function computes the mean of
the block, and then multiplies the result by a matrix of ones, so that the output block is the same size
as the input block. As a result, the output image is the same size as the input image. The blockproc
function does not require that the images be the same size. If this is the result you want, make sure
that the function you specify returns blocks of the appropriate size:
Note Many operations that blockproc can implement run much faster if the computations are
performed on matrix columns rather than rectangular blocks. For information about this approach,
see “Use Column-wise Processing to Speed Up Sliding Neighborhood or Distinct Block Operations”
on page 17-21.
Apply Padding
When processing an image in blocks, you may wish to add padding for two reasons:
• To address partial blocks when the image size is not a multiple of the block size.
• To create overlapping borders to each block.
By default, partial blocks are processed as is, with no additional padding. Set the
'PadPartialBlocks' argument to true to pad the right or bottom edges of the image and make
the blocks full-sized.
Use the 'BorderSize' argument to specify extra rows and columns of pixels outside the block
whose values are taken into account when processing the block. When there is a border, blockproc
passes the expanded block, including the border, to the specified function.
For example, this command processes image A in 4-by-8 pixel blocks, adding a 1-by-2 pixel border
around each block and zero-padding partial blocks to the full block size. This pixel border expands
each block by one additional pixel on the top and bottom edges and two pixels along the left and right
17-7
17 Neighborhood and Block Operations
edges during processing. The figure depicts a sample image A and indicates in gray the pixel border
added to three sample blocks.
Both padding of partial blocks and block borders add to the overall size of image A, as depicted in the
figure. Because partial blocks are padded, the original 15-by-30 pixel image increases in size to the
next multiple of the block size, in this case, 16-by-32 pixels. Because a 1-by-2 pixel border is added to
each block, blocks along the image edges include pixels that extend beyond the bounds of the original
image. The border pixels along the image edges increase the effective size of the input matrix to 18-
by-36 pixels. The outermost rectangle in the figure delineates the new boundaries of the image after
all padding is added.
By default, blockproc pads the image with zeros. If you need a different type of padding, use the
'PadMethod' parameter of the blockproc function.
See Also
More About
• “Neighborhood or Block Processing: An Overview” on page 17-2
• “Block Size and Performance” on page 17-9
17-8
Block Size and Performance
When selecting an appropriate block size for TIFF image processing, understanding the organization
of your TIFF image is important. To find out whether your image is organized in tiles or strips, use
the imfinfo function.
The struct returned by imfinfo for TIFF images contains the fields TileWidth and TileLength. If
these fields have valid (nonempty) values, then the image is a tiled TIFF, and these fields define the
size of each tile. If these fields contain values of empty ([]), then the TIFF is organized in strips. For
TIFFs with strip layout, refer to the struct field RowsPerStrip, which defines the size of each strip
of data.
When reading TIFF images, the minimum amount of data that can be read is a single tile or a single
strip, depending on the type of TIFF. To optimize the performance of blockproc, select block sizes
that correspond closely with how your TIFF image is organized on disk. In this way, you can avoid
rereading the same pixels multiple times.
I = imread('concordorthophoto.png','PNG');
imshow(I)
17-9
17 Neighborhood and Block Operations
imwrite(I,'concordorthophoto.tif','TIFF');
info = imfinfo('concordorthophoto.tif');
info.RowsPerStrip
ans = 34
Get the image size from the Height and Width fields of info. This image has size 2215-by-2956
pixels.
h = info.Height
h = 2215
w = info.Width
w = 2956
17-10
Block Size and Performance
Process the image using square blocks of size 500-by-500 pixels. Each time the blockproc function
accesses the disk, it reads in an entire strip and discards any part of the strip not included in the
current block. With 34 rows per strip and 500 rows per block, blockproc accesses the disk 15 times
for each block. The image is approximately 6 blocks wide (2956/500 = 5.912). blockproc reads the
same strip over and over again for each block that includes pixels contained in that strip. Since the
image is six blocks wide, blockproc reads every strip of the file six times.
blockSizeSquare = 500;
tic
im = blockproc('concordorthophoto.tif',[blockSizeSquare blockSizeSquare],@(s) s.data);
toc
Process the image using blocks that span the full height of the image. Stripped TIFF files are
organized in rows, so this block layout is exactly opposite the actual file layout on disk.
Select a block width such that the blocks have approximately the same number of pixels as the
square block.
numCols = ceil(blockSizeSquare.^2 / h)
numCols = 113
The image is over 26 blocks wide (2956/numCols = 26.1593). Every strip must be read for every
block, therefore blockproc reads the entire image from disk 26 times.
tic
im = blockproc('concordorthophoto.tif',[h numCols],@(s) s.data);
toc
Process the image using blocks that span the full width of the image. This block layout aligns with the
TIFF file layout on disk.
Select a block height such that the blocks have approximately the same number of pixels as the
square block.
numRows = ceil(blockSizeSquare.^2 / w)
numRows = 85
Each block spans the width of the image, therefore blockproc reads each strip exactly once. The
execution time is shortest when the block layout aligns with the TIFF image strips.
tic
im = blockproc('concordorthophoto.tif',[numRows w],@(s) s.data);
toc
17-11
17 Neighborhood and Block Operations
See Also
blockproc
More About
• “Neighborhood or Block Processing: An Overview” on page 17-2
• “Distinct Block Processing” on page 17-6
17-12
Parallel Block Processing on Large Image Files
In general, using larger blocks while block processing an image results in faster performance than
completing the same task using smaller blocks. However, sometimes the task or algorithm you are
applying to your image requires a certain block size, and you must use smaller blocks. When block
processing using smaller blocks, parallel block processing is typically faster than regular (serial)
block processing, often by a large margin. If you are using larger blocks, however, you might need to
experiment to determine whether parallel block processing saves computing time.
If you meet these conditions, then you can invoke parallel processing in blockproc by specifying the
'UseParallel' argument as true. When you do so, MATLAB automatically opens a parallel pool of
workers on your local machine and uses all available workers to process the input image.
In the following example, compute a discrete cosine transform for each 8-by-8 block of an image in
parallel:
Control parallel behavior with the parallel preferences, including scaling up to a cluster. See
parpool for information on configuring your parallel environment.
17-13
17 Neighborhood and Block Operations
See Also
blockproc
More About
• “What Is Parallel Computing?” (Parallel Computing Toolbox)
• “Choose a Parallel Computing Solution” (Parallel Computing Toolbox)
• “Run Batch Parallel Jobs” (Parallel Computing Toolbox)
17-14
Perform Block Processing on Image Files in Unsupported Formats
This section demonstrates the process of writing an Image Adapter class by discussing an example
class (the LanAdapter class). The LanAdapter class is part of the toolbox. Use this simple, read-
only class to process arbitrarily large uint8 LAN files with blockproc.
The remaining 104 bytes contain various other properties of the file, which this example does not use.
file_name = 'rio.lan';
fid = fopen(file_name,'r');
2 Read the first six bytes of the header:
17-15
17 Neighborhood and Block Operations
headword = fread(fid,6,'uint8=>char')';
fprintf('Version ID: %s\n',headword);
3 Read the pack type:
pack_type = fread(fid,1,'uint16',0,'ieee-le');
fprintf('Pack Type: %d\n',pack_type);
4 Read the number of spectral bands:
num_bands = fread(fid,1,'uint16',0,'ieee-le');
fprintf('Number of Bands: %d\n',num_bands);
5 Read the image width and height:
unused_bytes = fread(fid,6,'uint8',0,'ieee-le');
width = fread(fid,1,'uint32',0,'ieee-le');
height = fread(fid,1,'uint32',0,'ieee-le');
fprintf('Image Size (w x h): %d x %d\n',width,height);
6 Close the file:
fclose(fid);
The rio.lan file is a 512-by-512, 7-band image. The pack type of 0 indicates that each sample is an
8-bit, unsigned integer (uint8 data type).
For very large LAN files, however, reading and processing the entire image in memory using
multibandread can be impractical, depending on your system capabilities. To avoid memory
limitations, use the blockproc function. With blockproc, you can process images with a file-based
workflow. You can read, process, and then write the results, one block at a time.
The blockproc function only supports reading and writing certain file formats, but it is extensible
via the ImageAdapter class. To write an Image Adapter class for a particular file format, you must
be able to:
If you meet these two conditions, you can write an Image Adapter class for LAN files. You can parse
the image header to query the file size, and you can modify the call to multibandread to read a
particular block of data. You can encapsulate the code for these two objectives in an Image Adapter
class structure, and then operate directly on large LAN files with the blockproc function. The
17-16
Perform Block Processing on Image Files in Unsupported Formats
LanAdapter class is an Image Adapter class for LAN files, and is part of the Image Processing
Toolbox software.
Classdef
The LanAdapter class begins with the keyword classdef. The classdef section defines the class
name and indicates that LanAdapter inherits from the ImageAdapter superclass. Inheriting from
ImageAdapter allows the new class to:
Properties
Following the classdef section, the LanAdapter class contains two blocks of class properties. The
first block contains properties that are publicly visible, but not publicly modifiable. The second block
contains fully public properties. The LanAdapter class stores some information from the file header
as class properties. Other classes that also inherit from ImageAdapter, but that support different file
formats, can have different properties.
properties(Access = public)
SelectedBands
end
In addition to the properties defined in LanAdapter.m, the class inherits the ImageSize property
from the ImageAdapter superclass. The new class sets the ImageSize property in the constructor.
The class constructor initializes the LanAdapter object. The LanAdapter constructor parses the
LAN file header information and sets the class properties. Implement the constructor, a class method,
inside a methods block.
The constructor contains much of the same code used to parse the LAN file header. The LanAdapter
class only supports uint8 data type files, so the constructor validates the pack type of the LAN file,
as well as the headword. The class properties store the remaining information. The method
responsible for reading pixel data uses these properties. The SelectedBands property allows you to
read a subset of the bands, with the default set to read all bands.
17-17
17 Neighborhood and Block Operations
methods
end % LanAdapter
Methods: Required
Adapter classes have two required methods defined in the abstract superclass, ImageAdapter. All
Image Adapter classes must implement these methods. The blockproc function uses the first
method, readRegion, to read blocks of data from files on disk. The second method, close, performs
any necessary cleanup of the Image Adapter object.
function data = readRegion(obj, region_start, region_size)
% readRegion reads a rectangular block of data from the file.
17-18
Perform Block Processing on Image Files in Unsupported Formats
end % readRegion
readRegion has two input arguments, region_start and region_size. The region_start
argument, a two-element vector in the form [row col], defines the first pixel in the request block of
data. The region_size argument, a two-element vector in the form [num_rows num_cols],
defines the size of the requested block of data. The readRegion method uses these input arguments
to read and return the requested block of data from the image.
The readRegion method is implemented differently for different file formats, depending on what
tools are available for reading the specific files. The readRegion method for the LanAdapter class
uses the input arguments to prepare custom input for multibandread. For LAN files,
multibandread provides a convenient way to read specific subsections of an image.
The other required method is close. The close method of the LanAdapter class appears as
follows:
function close(obj)
% Close the LanAdapter object. This method is a part
% of the ImageAdapter interface and is required.
% Since the readRegion method is "atomic", there are
% no open file handles to close, so this method is empty.
end
end % LanAdapter
As the comments indicate, the close method for LanAdapter has nothing to do, so close is empty.
The multibandread function does not require maintenance of open file handles, so the close
method has no handles to clean up. Image Adapter classes for other file formats may have more
substantial close methods including closing file handles and performing other class clean-up
responsibilities.
Methods (Optional)
As written, the LanAdapter class can only read LAN files, not write them. If you want to write output
to a LAN format file, or another file with a format that blockproc does not support, implement the
optional writeRegion method. Then, you can specify your class as a 'Destination' parameter in
blockproc and write output to a file of your chosen format.
The first argument, region_start, indicates the first pixel of the block that the writeRegion
method writes. The second argument, region_data, contains the new data that the method writes
to the file.
17-19
17 Neighborhood and Block Operations
Classes that implement the writeRegion method can be more complex than LanAdapter. When
creating a writable Image Adapter object, classes often have the additional responsibility of creating
new files in the class constructor. This file creation requires a more complex syntax in the
constructor, where you potentially need to specify the size and data type of a new file you want to
create. Constructors that create new files can also encounter other issues, such as operating system
file permissions or potentially difficult file-creation code.
See Also
ImageAdapter | blockproc | multibandread
More About
• “Compute Statistics for Large Images” on page 17-29
17-20
Use Column-wise Processing to Speed Up Sliding Neighborhood or Distinct Block Operations
Performing sliding neighborhood and distinct block operations column-wise, when possible, can
reduce the execution time required to process an image.
For example, suppose the operation you are performing involves computing the mean of each block.
This computation is much faster if you first rearrange the blocks into columns, because you can
compute the mean of every column with a single call to the mean function, rather than calling mean
for each block individually.
1 Reshapes each sliding or distinct block of an image matrix into a column in a temporary matrix
2 Passes the temporary matrix to a function you specify
3 Rearranges the resulting matrix back into the original shape
The following figure illustrates this process. In this figure, a 6-by-5 image matrix is processed in 2-
by-3 neighborhoods. colfilt creates one column for each pixel in the image, so there are a total of
30 columns in the temporary matrix. Each pixel's column contains the value of the pixels in its
neighborhood, so there are six rows. colfilt zero-pads the input image as necessary. For example,
the neighborhood of the upper left pixel in the figure has two zero-valued neighbors, due to zero
padding.
17-21
17 Neighborhood and Block Operations
The temporary matrix is passed to a function, which must return a single value for each column.
(Many MATLAB functions work this way, for example, mean, median, std, sum, etc.) The resulting
values are then assigned to the appropriate pixels in the output image.
colfilt can produce the same results as nlfilter with faster execution time; however, it might
use more memory. The example below sets each output pixel to the maximum value in the input
pixel's neighborhood, producing the same result as the nlfilter example shown in “Implementing
Linear and Nonlinear Filtering as Sliding Neighborhood Operations” on page 17-4.
I2 = colfilt(I,[3 3],'sliding',@max);
The following figure illustrates this process. A 6-by-16 image matrix is processed in 4-by-6 blocks.
colfilt first zero-pads the image to make the size 8-by-18 (six 4-by-6 blocks), and then rearranges
the blocks into six columns of 24 elements each.
After rearranging the image into a temporary matrix, colfilt passes this matrix to the function. The
function must return a matrix of the same size as the temporary matrix. If the block size is m-by-n,
and the image is mm-by-nn, the size of the temporary matrix is (m*n)-by-(ceil(mm/m)*ceil(nn/
n)). After the function processes the temporary matrix, the output is rearranged into the shape of
the original image matrix.
17-22
Use Column-wise Processing to Speed Up Sliding Neighborhood or Distinct Block Operations
This example sets all the pixels in each 8-by-8 block of an image to the mean pixel value for the block.
I = im2double(imread('tire.tif'));
f = @(x) ones(64,1)*mean(x);
I2 = colfilt(I,[8 8],'distinct',f);
The anonymous function in the example computes the mean of the block and then multiplies the
result by a vector of ones, so that the output block is the same size as the input block. As a result, the
output image is the same size as the input image.
Restrictions
You can use colfilt to implement many of the same distinct block operations that blockproc
performs. However, colfilt has certain restrictions that blockproc does not:
• The output image must be the same size as the input image.
• The blocks cannot overlap.
17-23
17 Neighborhood and Block Operations
To avoid these problems, you can process large images incrementally: reading, processing, and finally
writing the results back to disk, one region at a time. The blockproc function helps you with this
process. Using blockproc, specify an image, a block size, and a function handle. blockproc then
divides the input image into blocks of the specified size, processes them using the function handle
one block at a time, and then assembles the results into an output image. blockproc returns the
output to memory or to a new file on disk.
First, consider the results of performing edge detection without block processing. This example uses
a small image, cameraman.tif, to illustrate the concepts, but block processing is often more useful for
large images.
file_name = 'cameraman.tif';
I = imread(file_name);
normal_edges = edge(I,'canny');
imshow(I)
title('Original Image')
imshow(normal_edges)
title('Conventional Edge Detection')
17-24
Block Processing Large Images
Now try the same task using block processing. The blockproc function has built-in support for TIFF
images, so you do not have to read the file completely into memory using imread. Instead, call the
function using the string filename as input. blockproc reads in one block at a time, making this
workflow ideal for very large images.
When working with large images you will often use the 'Destination' parameter to specify a file into
which blockproc will write the output image. However, in this example you will return the results to
a variable, in memory.
This example uses a block size of [50 50]. In general, choosing larger block sizes yields better
performance for blockproc. This is particularly true for file-to-file workflows where accessing the
disk will incur a significant performance cost. Appropriate block sizes vary based on the machine
resources available, but should likely be in the range of thousands of pixels per dimension.
% You can use an anonymous function to define the function handle. The
% function is passed a structure as input, a "block struct", with several
% fields containing the block data as well as other relevant information.
% The function should return the processed block data.
edgeFun = @(block_struct) edge(block_struct.data,'canny');
imshow(block_edges)
title('Block Processing - Simplest Syntax')
17-25
17 Neighborhood and Block Operations
Notice the significant artifacts from the block processing. Determining whether a pixel is an edge
pixel or not requires information from the neighboring pixels. This means that each block cannot be
processed completely separately from its surrounding pixels. To remedy this, use the blockproc
parameter 'BorderSize' to specify vertical and horizontal borders around each block. The necessary
'BorderSize' varies depending on the task being performed.
imshow(block_edges)
title('Block Processing - Block Borders')
17-26
Block Processing Large Images
The blocks are now being processed with an additional 10 pixels of image data on each side. This
looks better, but the result is still significantly different from the original in-memory result. The
reason for this is that the Canny edge detector uses a threshold that is computed based on the
complete image histogram. Since the blockproc function calls the edge function for each block, the
Canny algorithm is working with incomplete histograms and therefore using varying thresholds
across the image.
When block processing images, it is important to understand these types of algorithm constraints.
Some functions will not directly translate to block processing for all syntaxes. In this case, the edge
function allows you to pass in a fixed threshold as an input argument instead of computing it. Modify
your function handle to use the three-argument syntax of edge, and thus remove one of the "global"
constraints of the function. Some trial and error finds that a threshold of 0.09 gives good results.
thresh = 0.09;
edgeFun = @(block_struct) edge(block_struct.data,'canny',thresh);
block_edges = blockproc(file_name,block_size,edgeFun,'BorderSize',border_size);
imshow(block_edges)
title('Block Processing - Borders & Fixed Threshold')
17-27
17 Neighborhood and Block Operations
The result now closely matches the original in-memory result. You can see some additional artifacts
along the boundaries. These are due to the different methods of padding used by the Canny edge
detector. Currently, blockproc only supports zero-padding along the image boundaries.
See Also
blockproc | edge
Related Examples
• “Compute Statistics for Large Images” on page 17-29
More About
• “Distinct Block Processing” on page 17-6
• “Block Size and Performance” on page 17-9
• “Create Function Handle”
17-28
Compute Statistics for Large Images
This example performs a task similar to that found in the “Enhance Multispectral Color Composite
Images” on page 11-92 example, but adapted for large images using blockproc. You will be
enhancing the visible bands of the Erdas LAN file rio.lan. These types of block processing
techniques are typically more useful for large images, but a small image will work for the purpose of
this example.
Using blockproc, read the data from rio.lan, a file containing Landsat thematic mapper imagery
in the Erdas LAN file format. blockproc has built-in support for reading TIFF and JPEG2000 files
only. To read other types of files you must write an Image Adapter class to support I/O for your
particular file format. This example uses a pre-built Image Adapter class, the LanAdapter, which
supports reading LAN files. For more information on writing Image Adapter classes, see “Perform
Block Processing on Image Files in Unsupported Formats” on page 17-15.
The Erdas LAN format contains the visible red, green, and blue spectrum in bands 3, 2, and 1,
respectively. Use blockproc to extract the visible bands into an RGB image.
input_adapter = LanAdapter('rio.lan');
input_adapter.SelectedBands = [3 2 1];
imshow(truecolor)
title('Truecolor Composite (Not Enhanced)')
17-29
17 Neighborhood and Block Operations
The resulting truecolor image is similar to that of paris.lan in the “Enhance Multispectral Color
Composite Images” on page 11-92 example. The RGB image appears dull, with little contrast.
First, try to stretch the data across the dynamic range using blockproc. This first attempt simply
defines a new function handle that calls stretchlim and imadjust on each block of data
individually.
adjustFcn = @(block_struct) imadjust(block_struct.data,...
stretchlim(block_struct.data));
truecolor_enhanced = blockproc(input_adapter,[100 100],adjustFcn);
imshow(truecolor_enhanced)
title('Truecolor Composite with Blockwise Contrast Stretch')
17-30
Compute Statistics for Large Images
You can see immediately that the results are incorrect. The problem is that the stretchlim function
computes the histogram on the input image and uses this information to compute the stretch limits.
Since each block is adjusted in isolation from its neighbors, each block is computing different limits
from its local histogram.
To examine the distribution of data across the dynamic range of the image, you can compute the
histogram for each of the three visible bands.
When working with sufficiently large images, you cannot simply call imhist to create an image
histogram. One way to incrementally build the histogram is to use blockproc with a class that will
sum the histograms of each block as you move over the image.
17-31
17 Neighborhood and Block Operations
type HistogramAccumulator
properties
Histogram
Range
end
methods
function addToHistogram(obj,new_data)
if isempty(obj.Histogram)
obj.Range = double(0:intmax(class(new_data)));
obj.Histogram = hist(double(new_data(:)),obj.Range);
else
new_hist = hist(double(new_data(:)),obj.Range);
obj.Histogram = obj.Histogram + new_hist;
end
end
end
end
The class is a simple wrapper around the hist function, allowing you to add data to a histogram
incrementally. It is not specific to blockproc. Observe the following simple use of the
HistogramAccumulator class.
hist_obj = HistogramAccumulator;
full_image = imread('liftingbody.png');
top_half = full_image(1:256,:);
bottom_half = full_image(257:end,:);
addToHistogram(hist_obj,top_half);
addToHistogram(hist_obj,bottom_half);
computed_histogram = hist_obj.Histogram;
17-32
Compute Statistics for Large Images
normal_histogram = imhist(full_image);
figure
subplot(1,2,1)
stem(computed_histogram,'Marker','none')
title('Incrementally Computed Histogram')
subplot(1,2,2)
stem(normal_histogram','Marker','none')
title('IMHIST Histogram')
Now use the HistogramAccumulator class with blockproc to build the histogram of the red band
of data in rio.lan. You can define a function handle for blockproc that will invoke the
addToHistogram method on each block of data. By viewing this histogram, you can see that the data
is concentrated within a small part of the available dynamic range. The other visible bands have
similar distributions. This is one reason why the original truecolor composite appears dull.
hist_obj = HistogramAccumulator;
17-33
17 Neighborhood and Block Operations
Compute the histogram of the red channel. Notice that the addToHistFcn function handle does
generate any output. Since the function handle passed to blockproc does not return anything,
blockproc will not return anything either.
input_adapter.SelectedBands = 3;
blockproc(input_adapter,[100 100],addToHistFcn);
red_hist = hist_obj.Histogram;
figure
stem(red_hist,'Marker','none')
title('Histogram of Red Band (Band 3)')
You can now perform a proper contrast stretch on the image. For conventional, in-memory workflows,
you can simply use the stretchlim function to compute the arguments to imadjust (like the
MultispectralImageEnhancementExample example does). When working with large images, as
we have seen, stretchlim is not easily adapted for use with blockproc since it relies on the full
image histogram.
Once you have computed the image histograms for each of the visible bands, compute the proper
arguments to imadjust by hand (similar to how stretchlim does).
First compute the histograms for the green and blue bands.
17-34
Compute Statistics for Large Images
hist_obj = HistogramAccumulator;
addToHistFcn = @(block_struct) addToHistogram(hist_obj,block_struct.data);
input_adapter.SelectedBands = 2;
blockproc(input_adapter,[100 100],addToHistFcn);
green_hist = hist_obj.Histogram;
hist_obj = HistogramAccumulator;
addToHistFcn = @(block_struct) addToHistogram(hist_obj,block_struct.data);
input_adapter.SelectedBands = 1;
blockproc(input_adapter,[100 100],addToHistFcn);
blue_hist = hist_obj.Histogram;
red_cdf = computeCDF(red_hist);
red_limits(1) = findLowerLimit(red_cdf);
red_limits(2) = findUpperLimit(red_cdf);
green_cdf = computeCDF(green_hist);
green_limits(1) = findLowerLimit(green_cdf);
green_limits(2) = findUpperLimit(green_cdf);
blue_cdf = computeCDF(blue_hist);
blue_limits(1) = findLowerLimit(blue_cdf);
blue_limits(2) = findUpperLimit(blue_cdf);
Create a new adjustFcn that applies the global stretch limits and use blockproc to adjust the
truecolor image.
input_adapter.SelectedBands = [3 2 1];
truecolor_enhanced = blockproc(input_adapter,[100 100],adjustFcn);
Display the result. The resulting image is much improved, with the data covering more of the dynamic
range, and by using blockproc you avoid loading the whole image into memory.
imshow(truecolor_enhanced)
title('Truecolor Composite with Corrected Contrast Stretch')
17-35
17 Neighborhood and Block Operations
See Also
Classes
ImageAdapter
Functions
blockproc | imadjust | imhist | stretchlim
Related Examples
• “Enhance Multispectral Color Composite Images” on page 11-92
• “Block Processing Large Images” on page 17-24
17-36
Compute Statistics for Large Images
More About
• “Distinct Block Processing” on page 17-6
• “Perform Block Processing on Image Files in Unsupported Formats” on page 17-15
• “Create Function Handle”
17-37
18
Deep Learning
This topic describes functions that enable image denoising using convolutional neural networks, and
provides examples of other image processing applications using deep learning techniques.
• Noise removal works only with 2-D single-channel images. If you have multiple color channels, or
if you are working with 3-D images, remove noise by treating each channel or plane separately.
For an example, see “Remove Noise from Color Image Using Pretrained Neural Network” on page
18-12.
• The network recognizes only Gaussian noise, with a limited range of standard deviation.
To load the pretrained DnCNN network, use the denoisingNetwork function. Then, pass the
DnCNN network and a noisy 2-D single-channel image to denoiseImage. The image shows the
workflow to denoise an image using the pretrained DnCNN network.
18-2
Train and Apply Denoising Neural Networks
GaussianNoiseLevel property. You must use the default value of PatchSize (50) and
ChannelFormat ('grayscale') so that the size of the training data matches the input size of the
network.
• Get the predefined denoising layers using the dnCNNLayers function.
• Define training options using the trainingOptions function.
• Train the network, specifying the denoising image datastore as the data source for
trainNetwork. For each iteration of training, the denoising image datastore generates one mini-
batch of training data by randomly cropping pristine images from the ImageDatastore, then
adding randomly generated zero-mean Gaussian white noise to each image patch. The standard
deviation of the added noise is unique for each image patch, and has a value within the range
specified by the GaussianNoiseLevel property of the denoising image datastore.
After you have trained the network, pass the network and a noisy grayscale image to denoiseImage.
The diagram shows the denoising workflow in the light gray box.
18-3
18 Deep Learning
• Train a network that detects a larger variety of noise, such as non-Gaussian noise distributions, in
single-channel images. You can define the network architecture by using the layers returned by
the dnCNNLayers function. To generate training images compatible with this network, use the
transform and combine functions to batches of noisy images and the corresponding noise
signal. For more information, see “Preprocess Images for Deep Learning” (Deep Learning
Toolbox).
After you train a denoising network using the DnCNN network architecture, you can use the
denoiseImage function to remove image noise.
Tip The DnCNN network can also detect high-frequency image artifacts caused by other types of
distortion. For example, you can train the DnCNN network to increase image resolution or remove
JPEG compression artifacts. The “JPEG Image Deblocking Using Deep Learning” on page 18-32
example shows how to train a DnCNN network to remove JPEG compression artifacts
• Train a network that detects a range of Gaussian noise distributions for color images. To generate
training images for this network, you can use a denoisingImageDatastore and set the
ChannelFormat property to 'rgb'. You must define a custom convolutional neural network
architecture that supports RGB input images.
After you train a denoising network using a custom network architecture, you can use the
activations function to isolate the noise or high-frequency artifacts in a distorted image. Then,
subtract the noise from the distorted image to obtain a denoised image.
See Also
activations | combine | denoiseImage | denoisingImageDatastore | denoisingNetwork |
dnCNNLayers | trainNetwork | trainingOptions | transform
Related Examples
• “Remove Noise from Color Image Using Pretrained Neural Network” on page 18-12
• “JPEG Image Deblocking Using Deep Learning” on page 18-32
• “Prepare Datastore for Image-to-Image Regression” (Deep Learning Toolbox)
More About
• “Preprocess Images for Deep Learning” (Deep Learning Toolbox)
• “Deep Learning in MATLAB” (Deep Learning Toolbox)
18-4
Get Started with GANs for Image-to-Image Translation
Image-to-image translation is the task of transferring styles and characteristics from one image
domain to another. The source domain is the domain of the starting image. The target domain is the
desired domain after translation. Applications of domain translation for three sample image domains
include:
Select a GAN
You can perform image-to-image translation using deep learning generative adversarial networks
(GANs). A GAN consists of a generator network and one or more discriminator networks that are
trained simultaneously to maximize the overall performance. The objective of the generator network
is to generate realistic images in the translated domain that cannot be distinguished from images in
the original domain. The objective of discriminator networks is to correctly classify original training
data as real and generator-synthesized images as fake.
• Supervised GANs have a one-to-one mapping between images in the source and target domains.
For an example, see “Generate Image from Segmentation Map Using Deep Learning” (Computer
Vision Toolbox). In this example, the source domain consists of images captured of street scenes.
The target domain consists of categorical images representing the semantic segmentation maps.
The data set provides a ground truth segmentation map for every input training image.
• Unsupervised GANs do not have a one-to-one mapping between images in the source and target
domains. For an example, see “Unsupervised Day-To-Dusk Image Translation Using UNIT” on
page 18-144. In this example, the source and target domains consist of images captured in
daytime and dusk conditions, respectively. However, the scene content of the daytime and dusk
images differs, so the daytime images do not have a corresponding dusk image with identical
scene content.
18-5
18 Deep Learning
18-6
Get Started with GANs for Image-to-Image Translation
Some networks require additional modification beyond the options available in the network creation
functions. For example, you may want to replace the addition layers with depth concatenation layers,
or you may want the initial leaky ReLU layer of a UNIT network to have a scale factor other than 0.2.
To refine an existing GAN network, you can use Deep Network Designer. For more information, see
“Build Networks with Deep Network Designer” (Deep Learning Toolbox).
If you need a network that is not available through the built-in creation functions, then you can create
custom GAN networks from modular components. First, create the encoder and decoder modules,
then combine the modules using the encoderDecoderNetwork function. You can optionally include
a bridge connection, skip connections, or additional layers at the end of the network. For more
information, see “Create Modular Neural Networks” on page 18-10.
18-7
18 Deep Learning
• Adversarial loss is commonly used by generator and discriminator networks. This loss relies on
the pixelwise or patchwise difference between the correct classification and the predicted
classification by the discriminator.
• Cycle consistency loss is commonly used by unsupervised generator networks. This loss is
based on the principle that an image translated from one domain to another, then back to the
original domain, should be identical to the original image.
• Specify training options such as the solver type and the number of epochs. For more information,
see “Specify Training Options in Custom Training Loop” (Deep Learning Toolbox).
• Create the custom training loop that loops over mini-batches in every epoch. The loop reads each
mini-batch of data, evaluates the model gradients using the dlfeval function, and updates the
network parameters.
Optionally, include display functions such as plots of scores or batches of generated images that
enable you to monitor the training progress. For more information, see “Monitor GAN Training
Progress and Identify Common Failure Modes” (Deep Learning Toolbox).
References
[1] Wang, Ting-Chun, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, and Bryan Catanzaro. "High-
Resolution Image Synthesis and Semantic Manipulation with Conditional GANs." In 2018
IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8798–8807. Salt Lake
City, UT, USA: IEEE, 2018. https://doi.org/10.1109/CVPR.2018.00917.
[2] Zhu, Jun-Yan, Taesung Park, Phillip Isola, and Alexei A. Efros. "Unpaired Image-to-Image
Translation Using Cycle-Consistent Adversarial Networks." In 2017 IEEE International
Conference on Computer Vision (ICCV), 2242–2251. Venice: IEEE, 2017. https://
ieeexplore.ieee.org/document/8237506.
[3] Isola, Phillip, Jun-Yan Zhu, Tinghui Zhou, and Alexei A. Efros. "Image-to-Image Translation with
Conditional Adversarial Networks." In 2017 IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), 5967–76. Honolulu, HI: IEEE, 2017. https://arxiv.org/abs/1611.07004.
[4] Liu, Ming-Yu, Thomas Breuel, and Jan Kautz. "Unsupervised Image-to-Image Translation
Networks." Advances in Neural Information Processing Systems 30 (NIPS 2017). Long Beach,
CA: 2017. https://arxiv.org/abs/1703.00848.
See Also
blockedNetwork | cycleGANGenerator | encoderDecoderNetwork | patchGANDiscriminator
| pix2pixHDGlobalGenerator | pretrainedEncoderNetwork | unitGenerator
Related Examples
• “Unsupervised Day-To-Dusk Image Translation Using UNIT” on page 18-144
• “Generate Image from Segmentation Map Using Deep Learning” (Computer Vision Toolbox)
More About
• “Create Modular Neural Networks” on page 18-10
• “Train Generative Adversarial Network (GAN)” (Deep Learning Toolbox)
18-8
Get Started with GANs for Image-to-Image Translation
• “Define Custom Training Loops, Loss Functions, and Networks” (Deep Learning Toolbox)
• “Define Model Gradients Function for Custom Training Loop” (Deep Learning Toolbox)
• “Specify Training Options in Custom Training Loop” (Deep Learning Toolbox)
• “Train Network Using Custom Training Loop” (Deep Learning Toolbox)
18-9
18 Deep Learning
• Create an encoder network from a pretrained network, such as SqueezeNet, using the
pretrainedEncoderNetwork function. The function prunes the pretrained network such that
the encoder includes the number of downsampling operations that you specify.
• Create encoder and decoder modules from building blocks of layers that follow a repeating
pattern. To create a module, define a function that specifies the pattern, then assemble blocks into
a module using the blockedNetwork function.
An encoder module consists of an initial block of layers, downsampling blocks, and residual blocks. A
decoder module consists of upsampling blocks and a final block that provides the network output. The
table describes the blocks of layers that commonly comprise encoder and decoder modules.
18-10
Create Modular Neural Networks
You can also create popular GAN generator and discriminator networks directly by using functions
available in Image Processing Toolbox. These networks include CycleGAN, PatchGAN, pix2pixHD, and
UNIT. For more information, see “Get Started with GANs for Image-to-Image Translation” on page 18-
5.
See Also
blockedNetwork | encoderDecoderNetwork | pretrainedEncoderNetwork
More About
• “Get Started with GANs for Image-to-Image Translation” on page 18-5
• “List of Deep Learning Layers” (Deep Learning Toolbox)
18-11
18 Deep Learning
Read a color image into the workspace and convert the data to double. Display the pristine color
image.
pristineRGB = imread('lighthouse.png');
pristineRGB = im2double(pristineRGB);
imshow(pristineRGB)
title('Pristine Image')
18-12
Remove Noise from Color Image Using Pretrained Neural Network
Add zero-mean Gaussian white noise with a variance of 0.01 to the image. imnoise adds noise to
each color channel independently. Display the noisy color image.
noisyRGB = imnoise(pristineRGB,'gaussian',0,0.01);
imshow(noisyRGB)
title('Noisy Image')
18-13
18 Deep Learning
Split the noisy RGB image into its individual color channels.
[noisyR,noisyG,noisyB] = imsplit(noisyRGB);
18-14
Remove Noise from Color Image Using Pretrained Neural Network
net = denoisingNetwork('dncnn');
Use the DnCNN network to remove noise from each color channel.
denoisedR = denoiseImage(noisyR,net);
denoisedG = denoiseImage(noisyG,net);
denoisedB = denoiseImage(noisyB,net);
Recombine the denoised color channels to form the denoised RGB image. Display the denoised color
image.
denoisedRGB = cat(3,denoisedR,denoisedG,denoisedB);
imshow(denoisedRGB)
title('Denoised Image')
18-15
18 Deep Learning
Calculate the peak signal-to-noise ratio (PSNR) for the noisy and denoised images. A larger PSNR
indicates that noise has a smaller relative signal, and is associated with higher image quality.
noisyPSNR = psnr(noisyRGB,pristineRGB);
fprintf('\n The PSNR value of the noisy image is %0.4f.',noisyPSNR);
18-16
Remove Noise from Color Image Using Pretrained Neural Network
denoisedPSNR = psnr(denoisedRGB,pristineRGB);
fprintf('\n The PSNR value of the denoised image is %0.4f.',denoisedPSNR);
Calculate the structural similarity (SSIM) index for the noisy and denoised images. An SSIM index
close to 1 indicates good agreement with the reference image, and higher image quality.
noisySSIM = ssim(noisyRGB,pristineRGB);
fprintf('\n The SSIM value of the noisy image is %0.4f.',noisySSIM);
denoisedSSIM = ssim(denoisedRGB,pristineRGB);
fprintf('\n The SSIM value of the denoised image is %0.4f.',denoisedSSIM);
In practice, image color channels frequently have correlated noise. To remove correlated image noise,
first convert the RGB image to a color space with a luminance channel, such as the L*a*b* color
space. Remove noise on the luminance channel only, then convert the denoised image back to the
RGB color space.
See Also
denoiseImage | denoisingNetwork | imnoise | lab2rgb | psnr | rgb2lab | ssim
More About
• “Train and Apply Denoising Neural Networks” on page 18-2
18-17
18 Deep Learning
Super-resolution is the process of creating high-resolution images from low-resolution images. This
example considers single image super-resolution (SISR), where the goal is to recover one high-
resolution image from one low-resolution image. SISR is challenging because high-frequency image
content typically cannot be recovered from the low-resolution image. Without high-frequency
information, the quality of the high-resolution image is limited. Further, SISR is an ill-posed problem
because one low-resolution image can yield several possible high-resolution images.
Several techniques, including deep learning algorithms, have been proposed to perform SISR. This
example explores one deep learning algorithm for SISR, called very-deep super-resolution (VDSR) [1
on page 18-0 ].
VDSR is a convolutional neural network architecture designed to perform single image super-
resolution [1 on page 18-0 ]. The VDSR network learns the mapping between low- and high-
resolution images. This mapping is possible because low-resolution and high-resolution images have
similar image content and differ primarily in high-frequency details.
VDSR employs a residual learning strategy, meaning that the network learns to estimate a residual
image. In the context of super-resolution, a residual image is the difference between a high-resolution
reference image and a low-resolution image that has been upscaled using bicubic interpolation to
match the size of the reference image. A residual image contains information about the high-
frequency details of an image.
The VDSR network detects the residual image from the luminance of a color image. The luminance
channel of an image, Y, represents the brightness of each pixel through a linear combination of the
red, green, and blue pixel values. In contrast, the two chrominance channels of an image, Cb and Cr,
are different linear combinations of the red, green, and blue pixel values that represent color-
18-18
Single Image Super-Resolution Using Deep Learning
difference information. VDSR is trained using only the luminance channel because human perception
is more sensitive to changes in brightness than to changes in color.
If Y highres is the luminance of the high-resolution image and Y lowres is the luminance a low-resolution
image that has been upscaled using bicubic interpolation, then the input to the VDSR network is
Y lowres and the network learns to predict Y residual = Y highres − Y lowres from the training data.
After the VDSR network learns to estimate the residual image, you can reconstruct high-resolution
images by adding the estimated residual image to the upsampled low-resolution image, then
converting the image back to the RGB color space.
A scale factor relates the size of the reference image to the size of the low-resolution image. As the
scale factor increases, SISR becomes more ill-posed because the low-resolution image loses more
information about the high-frequency image content. VDSR solves this problem by using a large
receptive field. This example trains a VDSR network with multiple scale factors using scale
augmentation. Scale augmentation improves the results at larger scale factors because the network
can take advantage of the image context from smaller scale factors. Additionally, the VDSR network
can generalize to accept images with noninteger scale factors.
Download the IAPR TC-12 Benchmark, which consists of 20,000 still natural images [2 on page 18-
0 ]. The data set includes photos of people, animals, cities, and more. The size of the data file is
~1.8 GB. If you do not want to download the training data set, then you can load the pretrained VDSR
network by typing load('trainedVDSR-Epoch-100-ScaleFactors-234.mat'); at the
command line. Then, go directly to the Perform Single Image Super-Resolution Using VDSR Network
on page 18-0 section in this example.
Use the helper function, downloadIAPRTC12Data, to download the data. This function is attached to
the example as a supporting file.
18-19
18 Deep Learning
imagesDir = tempdir;
url = 'http://www-i6.informatik.rwth-aachen.de/imageclef/resources/iaprtc12.tgz';
downloadIAPRTC12Data(url,imagesDir);
This example will train the network with a small subset of the IAPR TC-12 Benchmark data. Load the
imageCLEF training data. All images are 32-bit JPEG color images.
trainImagesDir = fullfile(imagesDir,'iaprtc12','images','02');
exts = {'.jpg','.bmp','.png'};
pristineImages = imageDatastore(trainImagesDir,'FileExtensions',exts);
numel(pristineImages.Files)
ans = 616
To create a training data set, generate pairs of images consisting of upsampled images and the
corresponding residual images.
The upsampled images are stored on disk as MAT files in the directory upsampledDirName. The
computed residual images representing the network responses are stored on disk as MAT files in the
directory residualDirName. The MAT files are stored as data type double for greater precision
when training the network.
Use the helper function createVDSRTrainingSet to preprocess the training data. This function is
attached to the example as a supporting file.
The helper function performs these operations for each pristine image in trainImages:
scaleFactors = [2 3 4];
createVDSRTrainingSet(pristineImages,scaleFactors,upsampledDirName,residualDirName);
In this example, the network inputs are low-resolution images that have been upsampled using
bicubic interpolation. The desired network responses are the residual images. Create an image
datastore called upsampledImages from the collection of input image files. Create an image
datastore called residualImages from the collection of computed residual image files. Both
datastores require a helper function, matRead, to read the image data from the image files. This
function is attached to the example as a supporting file.
upsampledImages = imageDatastore(upsampledDirName,'FileExtensions','.mat','ReadFcn',@matRead);
residualImages = imageDatastore(residualDirName,'FileExtensions','.mat','ReadFcn',@matRead);
18-20
Single Image Super-Resolution Using Deep Learning
Create an imageDataAugmenter (Deep Learning Toolbox) that specifies the parameters of data
augmentation. Use data augmentation during training to vary the training data, which effectively
increases the amount of available training data. Here, the augmenter specifies random rotation by 90
degrees and random reflections in the x-direction.
The resulting datastore, dsTrain, provides mini-batches of data to the network at each iteration of
the epoch. Preview the result of reading from the datastore.
inputBatch = preview(dsTrain);
disp(inputBatch)
InputImage ResponseImage
______________ ______________
This example defines the VDSR network using 41 individual layers from Deep Learning Toolbox™,
including:
The first layer, imageInputLayer, operates on image patches. The patch size is based on the
network receptive field, which is the spatial image region that affects the response of the top-most
layer in the network. Ideally, the network receptive field is the same as the image size so that the field
can see all the high-level features in the image. In this case, for a network with D convolutional
layers, the receptive field is (2D+1)-by-(2D+1).
18-21
18 Deep Learning
VDSR has 20 convolutional layers so the receptive field and the image patch size are 41-by-41. The
image input layer accepts images with one channel because VDSR is trained using only the luminance
channel.
networkDepth = 20;
firstLayer = imageInputLayer([41 41 1],'Name','InputLayer','Normalization','none');
The image input layer is followed by a 2-D convolutional layer that contains 64 filters of size 3-by-3.
The mini-batch size determines the number of filters. Zero-pad the inputs to each convolutional layer
so that the feature maps remain the same size as the input after each convolution. He's method [3 on
page 18-0 ] initializes the weights to random values so that there is asymmetry in neuron learning.
Each convolutional layer is followed by a ReLU layer, which introduces nonlinearity in the network.
convLayer = convolution2dLayer(3,64,'Padding',1, ...
'WeightsInitializer','he','BiasInitializer','zeros','Name','Conv1');
The middle layers contain 18 alternating convolutional and rectified linear unit layers. Every
convolutional layer contains 64 filters of size 3-by-3-by-64, where a filter operates on a 3-by-3 spatial
region across 64 channels. As before, a ReLU layer follows every convolutional layer.
middleLayers = [convLayer relLayer];
for layerNumber = 2:networkDepth-1
convLayer = convolution2dLayer(3,64,'Padding',[1 1], ...
'WeightsInitializer','he','BiasInitializer','zeros', ...
'Name',['Conv' num2str(layerNumber)]);
The penultimate layer is a convolutional layer with a single filter of size 3-by-3-by-64 that
reconstructs the image.
convLayer = convolution2dLayer(3,1,'Padding',[1 1], ...
'WeightsInitializer','he','BiasInitializer','zeros', ...
'NumChannels',64,'Name',['Conv' num2str(networkDepth)]);
The last layer is a regression layer instead of a ReLU layer. The regression layer computes the mean-
squared error between the residual image and network prediction.
finalLayers = [convLayer regressionLayer('Name','FinalRegressionLayer')];
Alternatively, you can use the vdsrLayers helper function to create VDSR layers. This function is
attached to the example as a supporting file.
layers = vdsrLayers;
Train the network using stochastic gradient descent with momentum (SGDM) optimization. Specify
the hyperparameter settings for SGDM by using the trainingOptions (Deep Learning Toolbox)
18-22
Single Image Super-Resolution Using Deep Learning
function. The learning rate is initially 0.1 and decreased by a factor of 10 every 10 epochs. Train for
100 epochs.
Training a deep network is time-consuming. Accelerate the training by specifying a high learning
rate. However, this can cause the gradients of the network to explode or grow uncontrollably,
preventing the network from training successfully. To keep the gradients in a meaningful range,
enable gradient clipping by specifying 'GradientThreshold' as 0.01, and specify
'GradientThresholdMethod' to use the L2-norm of the gradients.
maxEpochs = 100;
epochIntervals = 1;
initLearningRate = 0.1;
learningRateFactor = 0.1;
l2reg = 0.0001;
miniBatchSize = 64;
options = trainingOptions('sgdm', ...
'Momentum',0.9, ...
'InitialLearnRate',initLearningRate, ...
'LearnRateSchedule','piecewise', ...
'LearnRateDropPeriod',10, ...
'LearnRateDropFactor',learningRateFactor, ...
'L2Regularization',l2reg, ...
'MaxEpochs',maxEpochs, ...
'MiniBatchSize',miniBatchSize, ...
'GradientThresholdMethod','l2norm', ...
'GradientThreshold',0.01, ...
'Plots','training-progress', ...
'Verbose',false);
By default, the example loads a pretrained version of the VDSR network that has been trained to
super-resolve images for scale factors 2, 3 and 4. The pretrained network enables you to perform
super-resolution of test images without waiting for training to complete.
To train the VDSR network, set the doTraining variable in the following code to true. Train the
network using the trainNetwork (Deep Learning Toolbox) function.
Train on a GPU if one is available. Using a GPU requires Parallel Computing Toolbox™ and a CUDA®
enabled NVIDIA® GPU. For more information, see “GPU Support by Release” (Parallel Computing
Toolbox). Training takes about 6 hours on an NVIDIA Titan X.
doTraining = false;
if doTraining
net = trainNetwork(dsTrain,layers,options);
modelDateTime = string(datetime('now','Format',"yyyy-MM-dd-HH-mm-ss"));
save(strcat("trainedVDSR-",modelDateTime,"-Epoch-",num2str(maxEpochs),"-ScaleFactors-234.mat"
else
load('trainedVDSR-Epoch-100-ScaleFactors-234.mat');
end
To perform single image super-resolution (SISR) using the VDSR network, follow the remaining steps
of this example. The remainder of the example shows how to:
18-23
18 Deep Learning
Create a low-resolution image that will be used to compare the results of super-resolution using deep-
learning to the result using traditional image processing techniques such as bicubic interpolation.
The test data set, testImages, contains 21 undistorted images shipped in Image Processing
Toolbox™. Load the images into an imageDatastore.
exts = {'.jpg','.png'};
fileNames = {'sherlock.jpg','car2.jpg','fabric.png','greens.jpg','hands1.jpg','kobi.png', ...
'lighthouse.png','micromarket.jpg','office_4.jpg','onion.png','pears.png','yellowlily.jpg', .
'indiancorn.jpg','flamingos.jpg','sevilla.jpg','llama.jpg','parkavenue.jpg', ...
'peacock.jpg','car1.jpg','strawberries.jpg','wagon.jpg'};
filePath = [fullfile(matlabroot,'toolbox','images','imdata') filesep];
filePathNames = strcat(filePath,fileNames);
testImages = imageDatastore(filePathNames,'FileExtensions',exts);
montage(testImages)
18-24
Single Image Super-Resolution Using Deep Learning
Select one of the images to use as the reference image for super-resolution. You can optionally use
your own high-resolution image as the reference image.
18-25
18 Deep Learning
Create a low-resolution version of the high-resolution reference image by using imresize with a
scaling factor of 0.25. The high-frequency components of the image are lost during the downscaling.
scaleFactor = 0.25;
Ilowres = imresize(Ireference,scaleFactor,'bicubic');
imshow(Ilowres)
title('Low-Resolution Image')
18-26
Single Image Super-Resolution Using Deep Learning
A standard way to increase image resolution without deep learning is to use bicubic interpolation.
Upscale the low-resolution image using bicubic interpolation so that the resulting high-resolution
image is the same size as the reference image.
[nrows,ncols,np] = size(Ireference);
Ibicubic = imresize(Ilowres,[nrows ncols],'bicubic');
imshow(Ibicubic)
title('High-Resolution Image Obtained Using Bicubic Interpolation')
Recall that VDSR is trained using only the luminance channel of an image because human perception
is more sensitive to changes in brightness than to changes in color.
Convert the low-resolution image from the RGB color space to luminance (Iy) and chrominance (Icb
and Icr) channels by using the rgb2ycbcr function.
Iycbcr = rgb2ycbcr(Ilowres);
Iy = Iycbcr(:,:,1);
Icb = Iycbcr(:,:,2);
Icr = Iycbcr(:,:,3);
Upscale the luminance and two chrominance channels using bicubic interpolation. The upsampled
chrominance channels, Icb_bicubic and Icr_bicubic, require no further processing.
18-27
18 Deep Learning
Pass the upscaled luminance component, Iy_bicubic, through the trained VDSR network. Observe
the activations (Deep Learning Toolbox) from the final layer (a regression layer). The output of the
network is the desired residual image.
Iresidual = activations(net,Iy_bicubic,41);
Iresidual = double(Iresidual);
imshow(Iresidual,[])
title('Residual Image from VDSR')
Add the residual image to the upscaled luminance component to get the high-resolution VDSR
luminance component.
Concatenate the high-resolution VDSR luminance component with the upscaled color components.
Convert the image to the RGB color space by using the ycbcr2rgb function. The result is the final
high-resolution color image using VDSR.
Ivdsr = ycbcr2rgb(cat(3,Isr,Icb_bicubic,Icr_bicubic));
imshow(Ivdsr)
title('High-Resolution Image Obtained Using VDSR')
18-28
Single Image Super-Resolution Using Deep Learning
To get a better visual understanding of the high-resolution images, examine a small region inside
each image. Specify a region of interest (ROI) using vector roi in the format [x y width height]. The
elements define the x- and y-coordinate of the top left corner, and the width and height of the ROI.
Crop the high-resolution images to this ROI, and display the result as a montage. The VDSR image
has clearer details and sharper edges than the high-resolution image created using bicubic
interpolation.
montage({imcrop(Ibicubic,roi),imcrop(Ivdsr,roi)})
title('High-Resolution Results Using Bicubic Interpolation (Left) vs. VDSR (Right)');
18-29
18 Deep Learning
Use image quality metrics to quantitatively compare the high-resolution image using bicubic
interpolation to the VDSR image. The reference image is the original high-resolution image,
Ireference, before preparing the sample low-resolution image.
Measure the peak signal-to-noise ratio (PSNR) of each image against the reference image. Larger
PSNR values generally indicate better image quality. See psnr for more information about this
metric.
bicubicPSNR = psnr(Ibicubic,Ireference)
bicubicPSNR = 38.4747
vdsrPSNR = psnr(Ivdsr,Ireference)
vdsrPSNR = 39.2346
Measure the structural similarity index (SSIM) of each image. SSIM assesses the visual impact of
three characteristics of an image: luminance, contrast and structure, against a reference image. The
closer the SSIM value is to 1, the better the test image agrees with the reference image. See ssim for
more information about this metric.
bicubicSSIM = ssim(Ibicubic,Ireference)
bicubicSSIM = 0.9861
vdsrSSIM = ssim(Ivdsr,Ireference)
vdsrSSIM = 0.9874
Measure perceptual image quality using the Naturalness Image Quality Evaluator (NIQE). Smaller
NIQE scores indicate better perceptual quality. See niqe for more information about this metric.
bicubicNIQE = niqe(Ibicubic)
bicubicNIQE = 5.1721
vdsrNIQE = niqe(Ivdsr)
18-30
Single Image Super-Resolution Using Deep Learning
vdsrNIQE = 4.7611
Calculate the average PSNR and SSIM of the entire set of test images for the scale factors 2, 3, and
4. For simplicity, you can use the helper function, superResolutionMetrics, to compute the
average metrics. This function is attached to the example as a supporting file.
scaleFactors = [2 3 4];
superResolutionMetrics(net,testImages,scaleFactors);
VDSR has better metric scores than bicubic interpolation for each scale factor.
References
[1] Kim, J., J. K. Lee, and K. M. Lee. "Accurate Image Super-Resolution Using Very Deep Convolutional
Networks." Proceedings of the IEEE® Conference on Computer Vision and Pattern Recognition.
2016, pp. 1646-1654.
[2] Grubinger, M., P. Clough, H. Müller, and T. Deselaers. "The IAPR TC-12 Benchmark: A New
Evaluation Resource for Visual Information Systems." Proceedings of the OntoImage 2006 Language
Resources For Content-Based Image Retrieval. Genoa, Italy. Vol. 5, May 2006, p. 10.
[3] He, K., X. Zhang, S. Ren, and J. Sun. "Delving Deep into Rectifiers: Surpassing Human-Level
Performance on ImageNet Classification." Proceedings of the IEEE International Conference on
Computer Vision, 2015, pp. 1026-1034.
See Also
combine | randomPatchExtractionDatastore | rgb2ycbcr | trainNetwork |
trainingOptions | transform | ycbcr2rgb
More About
• “Preprocess Images for Deep Learning” (Deep Learning Toolbox)
• “List of Deep Learning Layers” (Deep Learning Toolbox)
18-31
18 Deep Learning
Image compression is used to reduce the memory footprint of an image. One popular and powerful
compression method is employed by the JPEG image format, which uses a quality factor to specify the
amount of compression. Reducing the quality value results in higher compression and a smaller
memory footprint, at the expense of visual quality of the image.
JPEG compression is lossy, meaning that the compression process causes the image to lose
information. For JPEG images, this information loss appears as blocking artifacts in the image. As
shown in the figure, more compression results in more information loss and stronger artifacts.
Textured regions with high-frequency content, such as the grass and clouds, look blurry. Sharp edges,
such as the roof of the house and the guardrails atop the lighthouse, exhibit ringing.
JPEG deblocking is the process of reducing the effects of compression artifacts in JPEG images.
Several JPEG deblocking methods exist, including more effective methods that use deep learning.
This example implements one such deep learning-based method that attempts to minimize the effect
of JPEG compression artifacts.
This example uses a built-in deep feed-forward convolutional neural network, called DnCNN. The
network was primarily designed to remove noise from images. However, the DnCNN architecture can
also be trained to remove JPEG compression artifacts or increase image resolution.
The reference paper [1 on page 18-0 ] employs a residual learning strategy, meaning that the
DnCNN network learns to estimate the residual image. A residual image is the difference between a
pristine image and a distorted copy of the image. The residual image contains information about the
image distortion. For this example, distortion appears as JPEG blocking artifacts.
The DnCNN network is trained to detect the residual image from the luminance of a color image. The
luminance channel of an image, Y, represents the brightness of each pixel through a linear
combination of the red, green, and blue pixel values. In contrast, the two chrominance channels of an
18-32
JPEG Image Deblocking Using Deep Learning
image, Cb and Cr, are different linear combinations of the red, green, and blue pixel values that
represent color-difference information. DnCNN is trained using only the luminance channel because
human perception is more sensitive to changes in brightness than changes in color.
If Y Original is the luminance of the pristine image and Y Compressed is the luminance of the image
containing JPEG compression artifacts, then the input to the DnCNN network is Y Compressed and the
network learns to predict Y Residual = Y Compressed − Y Original from the training data.
Once the DnCNN network learns how to estimate a residual image, it can reconstruct an undistorted
version of a compressed JPEG image by adding the residual image to the compressed luminance
channel, then converting the image back to the RGB color space.
Download the IAPR TC-12 Benchmark, which consists of 20,000 still natural images [2 on page 18-
0 ]. The data set includes photos of people, animals, cities, and more. The size of the data file is
~1.8 GB. If you do not want to download the training data set needed to train the network, then you
can load the pretrained DnCNN network by typing load('pretrainedJPEGDnCNN.mat') at the
command line. Then, go directly to the Perform JPEG Deblocking Using DnCNN Network on page 18-
0 section in this example.
Use the helper function, downloadIAPRTC12Data, to download the data. This function is attached to
the example as a supporting file.
imagesDir = tempdir;
url = "http://www-i6.informatik.rwth-aachen.de/imageclef/resources/iaprtc12.tgz";
downloadIAPRTC12Data(url,imagesDir);
This example will train the network with a small subset of the IAPR TC-12 Benchmark data. Load the
imageCLEF training data. All images are 32-bit JPEG color images.
18-33
18 Deep Learning
trainImagesDir = fullfile(imagesDir,'iaprtc12','images','00');
exts = {'.jpg','.bmp','.png'};
imdsPristine = imageDatastore(trainImagesDir,'FileExtensions',exts);
ans = 251
To create a training data set, read in pristine images and write out images in the JPEG file format
with various levels of compression.
Specify the JPEG image quality values used to render image compression artifacts. Quality values
must be in the range [0, 100]. Small quality values result in more compression and stronger
compression artifacts. Use a denser sampling of small quality values so the training data has a broad
range of compression artifacts.
JPEGQuality = [5:5:40 50 60 70 80];
The compressed images are stored on disk as MAT files in the directory compressedImagesDir. The
computed residual images are stored on disk as MAT files in the directory residualImagesDir. The
MAT files are stored as data type double for greater precision when training the network.
compressedImagesDir = fullfile(imagesDir,'iaprtc12','JPEGDeblockingData','compressedImages');
residualImagesDir = fullfile(imagesDir,'iaprtc12','JPEGDeblockingData','residualImages');
Use the helper function createJPEGDeblockingTrainingSet to preprocess the training data. This
function is attached to the example as a supporting file.
For each pristine training image, the helper function writes a copy of the image with quality factor
100 to use as a reference image and copies of the image with each quality factor to use as the
network inputs. The function computes the luminance (Y) channel of the reference and compressed
images in data type double for greater precision when calculating the residual images. The
compressed images are stored on disk as .MAT files in the directory compressedDirName. The
computed residual images are stored on disk as .MAT files in the directory residualDirName.
[compressedDirName,residualDirName] = createJPEGDeblockingTrainingSet(imdsPristine,JPEGQuality);
Use a random patch extraction datastore to feed the training data to the network. This datastore
extracts random corresponding patches from two image datastores that contain the network inputs
and desired network responses.
In this example, the network inputs are the compressed images. The desired network responses are
the residual images. Create an image datastore called imdsCompressed from the collection of
compressed image files. Create an image datastore called imdsResidual from the collection of
computed residual image files. Both datastores require a helper function, matRead, to read the image
data from the image files. This function is attached to the example as a supporting file.
imdsCompressed = imageDatastore(compressedDirName,'FileExtensions','.mat','ReadFcn',@matRead);
imdsResidual = imageDatastore(residualDirName,'FileExtensions','.mat','ReadFcn',@matRead);
Create an imageDataAugmenter (Deep Learning Toolbox) that specifies the parameters of data
augmentation. Use data augmentation during training to vary the training data, which effectively
18-34
JPEG Image Deblocking Using Deep Learning
increases the amount of available training data. Here, the augmenter specifies random rotation by 90
degrees and random reflections in the x-direction.
Create the randomPatchExtractionDatastore from the two image datastores. Specify a patch
size of 50-by-50 pixels. Each image generates 128 random patches of size 50-by-50 pixels. Specify a
mini-batch size of 128.
patchSize = 50;
patchesPerImage = 128;
dsTrain = randomPatchExtractionDatastore(imdsCompressed,imdsResidual,patchSize, ...
'PatchesPerImage',patchesPerImage, ...
'DataAugmentation',augmenter);
dsTrain.MiniBatchSize = patchesPerImage;
The random patch extraction datastore dsTrain provides mini-batches of data to the network at
iteration of the epoch. Preview the result of reading from the datastore.
inputBatch = preview(dsTrain);
disp(inputBatch)
InputImage ResponseImage
______________ ______________
Create the layers of the built-in DnCNN network by using the dnCNNLayers function. By default, the
network depth (the number of convolution layers) is 20.
layers = dnCNNLayers
layers =
1x59 Layer array with layers:
18-35
18 Deep Learning
Train the network using stochastic gradient descent with momentum (SGDM) optimization. Specify
the hyperparameter settings for SGDM by using the trainingOptions (Deep Learning Toolbox)
function.
Training a deep network is time-consuming. Accelerate the training by specifying a high learning
rate. However, this can cause the gradients of the network to explode or grow uncontrollably,
preventing the network from training successfully. To keep the gradients in a meaningful range,
18-36
JPEG Image Deblocking Using Deep Learning
maxEpochs = 30;
initLearningRate = 0.1;
l2reg = 0.0001;
batchSize = 64;
By default, the example loads a pretrained DnCNN network. The pretrained network enables you to
perform JPEG deblocking without waiting for training to complete.
To train the network, set the doTraining variable in the following code to true. Train the DnCNN
network using the trainNetwork (Deep Learning Toolbox) function.
Train on a GPU if one is available. Using a GPU requires Parallel Computing Toolbox™ and a CUDA®
enabled NVIDIA® GPU. For more information, see “GPU Support by Release” (Parallel Computing
Toolbox). Training takes about 40 hours on an NVIDIA™ Titan X.
doTraining = false;
if doTraining
modelDateTime = string(datetime('now','Format',"yyyy-MM-dd-HH-mm-ss"));
[net,info] = trainNetwork(dsTrain,layers,options);
save(strcat("trainedJPEGDnCNN-",modelDateTime,"-Epoch-",num2str(maxEpochs),".mat"),'net');
else
load('pretrainedJPEGDnCNN.mat');
end
You can now use the DnCNN network to remove JPEG compression artifacts from images.
To perform JPEG deblocking using DnCNN, follow the remaining steps of this example. The
remainder of the example shows how to:
• Create sample test images with JPEG compression artifacts at three different quality levels.
• Remove the compression artifacts using the DnCNN network.
• Visually compare the images before and after deblocking.
• Evaluate the quality of the compressed and deblocked images by quantifying their similarity to the
undistorted reference image.
18-37
18 Deep Learning
Create sample images to evaluate the result of JPEG image deblocking using the DnCNN network.
The test data set, testImages, contains 21 undistorted images shipped in Image Processing
Toolbox™. Load the images into an imageDatastore.
exts = {'.jpg','.png'};
fileNames = {'sherlock.jpg','car2.jpg','fabric.png','greens.jpg','hands1.jpg','kobi.png',...
'lighthouse.png','micromarket.jpg','office_4.jpg','onion.png','pears.png','yellowlily.jpg',..
'indiancorn.jpg','flamingos.jpg','sevilla.jpg','llama.jpg','parkavenue.jpg',...
'peacock.jpg','car1.jpg','strawberries.jpg','wagon.jpg'};
filePath = [fullfile(matlabroot,'toolbox','images','imdata') filesep];
filePathNames = strcat(filePath,fileNames);
testImages = imageDatastore(filePathNames,'FileExtensions',exts);
montage(testImages)
18-38
JPEG Image Deblocking Using Deep Learning
Select one of the images to use as the reference image for JPEG deblocking. You can optionally use
your own uncompressed image as the reference image.
18-39
18 Deep Learning
Create three compressed test images with the JPEG Quality values of 10, 20, and 50.
imwrite(Ireference,fullfile(tempdir,'testQuality10.jpg'),'Quality',10);
imwrite(Ireference,fullfile(tempdir,'testQuality20.jpg'),'Quality',20);
imwrite(Ireference,fullfile(tempdir,'testQuality50.jpg'),'Quality',50);
18-40
JPEG Image Deblocking Using Deep Learning
I10 = imread(fullfile(tempdir,'testQuality10.jpg'));
I20 = imread(fullfile(tempdir,'testQuality20.jpg'));
I50 = imread(fullfile(tempdir,'testQuality50.jpg'));
montage({I50,I20,I10},'Size',[1 3])
title('JPEG-Compressed Images with Quality Factor: 50, 20 and 10 (left to right)')
Recall that DnCNN is trained using only the luminance channel of an image because human
perception is more sensitive to changes in brightness than changes in color. Convert the JPEG-
compressed images from the RGB color space to the YCbCr color space using the rgb2ycbcr
function.
I10ycbcr = rgb2ycbcr(I10);
I20ycbcr = rgb2ycbcr(I20);
I50ycbcr = rgb2ycbcr(I50);
In order to perform the forward pass of the network, use the denoiseImage function. This function
uses exactly the same training and testing procedures for denoising an image. You can think of the
JPEG compression artifacts as a type of image noise.
I10y_predicted = denoiseImage(I10ycbcr(:,:,1),net);
I20y_predicted = denoiseImage(I20ycbcr(:,:,1),net);
I50y_predicted = denoiseImage(I50ycbcr(:,:,1),net);
The chrominance channels do not need processing. Concatenate the deblocked luminance channel
with the original chrominance channels to obtain the deblocked image in the YCbCr color space.
18-41
18 Deep Learning
I10ycbcr_predicted = cat(3,I10y_predicted,I10ycbcr(:,:,2:3));
I20ycbcr_predicted = cat(3,I20y_predicted,I20ycbcr(:,:,2:3));
I50ycbcr_predicted = cat(3,I50y_predicted,I50ycbcr(:,:,2:3));
Convert the deblocked YCbCr image to the RGB color space by using the ycbcr2rgb function.
I10_predicted = ycbcr2rgb(I10ycbcr_predicted);
I20_predicted = ycbcr2rgb(I20ycbcr_predicted);
I50_predicted = ycbcr2rgb(I50ycbcr_predicted);
montage({I50_predicted,I20_predicted,I10_predicted},'Size',[1 3])
title('Deblocked Images with Quality Factor 50, 20 and 10 (Left to Right)')
To get a better visual understanding of the improvements, examine a smaller region inside each
image. Specify a region of interest (ROI) using vector roi in the format [x y width height]. The
elements define the x- and y-coordinate of the top left corner, and the width and height of the ROI.
Crop the compressed images to this ROI, and display the result as a montage.
i10 = imcrop(I10,roi);
i20 = imcrop(I20,roi);
i50 = imcrop(I50,roi);
montage({i50 i20 i10},'Size',[1 3])
title('Patches from JPEG-Compressed Images with Quality Factor 50, 20 and 10 (Left to Right)')
18-42
JPEG Image Deblocking Using Deep Learning
Crop the deblocked images to this ROI, and display the result as a montage.
i10predicted = imcrop(I10_predicted,roi);
i20predicted = imcrop(I20_predicted,roi);
i50predicted = imcrop(I50_predicted,roi);
montage({i50predicted,i20predicted,i10predicted},'Size',[1 3])
title('Patches from Deblocked Images with Quality Factor 50, 20 and 10 (Left to Right)')
Quantitative Comparison
Quantify the quality of the deblocked images through four metrics. You can use the
displayJPEGResults helper function to compute these metrics for compressed and deblocked
images at the quality factors 10, 20, and 50. This function is attached to the example as a supporting
file.
• Structural Similarity Index (SSIM). SSIM assesses the visual impact of three characteristics of an
image: luminance, contrast and structure, against a reference image. The closer the SSIM value is
to 1, the better the test image agrees with the reference image. Here, the reference image is the
undistorted original image, Ireference, before JPEG compression. See ssim for more
information about this metric.
• Peak signal-to-noise ratio (PSNR). The larger the PSNR value, the stronger the signal compared to
the distortion. See psnr for more information about this metric.
• Naturalness Image Quality Evaluator (NIQE). NIQE measures perceptual image quality using a
model trained from natural scenes. Smaller NIQE scores indicate better perceptual quality. See
niqe for more information about this metric.
18-43
18 Deep Learning
displayJPEGResults(Ireference,I10,I20,I50,I10_predicted,I20_predicted,I50_predicted)
------------------------------------------
SSIM Comparison
===============
I10: 0.90624 I10_predicted: 0.91286
I20: 0.94904 I20_predicted: 0.95444
I50: 0.97238 I50_predicted: 0.97482
------------------------------------------
PSNR Comparison
===============
I10: 26.6046 I10_predicted: 27.0793
I20: 28.8015 I20_predicted: 29.3378
I50: 31.4512 I50_predicted: 31.8584
------------------------------------------
NIQE Comparison
===============
I10: 7.2194 I10_predicted: 3.9478
I20: 4.5158 I20_predicted: 3.0685
I50: 2.8874 I50_predicted: 2.4106
NOTE: Smaller NIQE score signifies better perceptual quality
------------------------------------------
BRISQUE Comparison
==================
I10: 52.372 I10_predicted: 38.9271
I20: 45.3772 I20_predicted: 30.8991
I50: 27.7093 I50_predicted: 24.3845
NOTE: Smaller BRISQUE score signifies better perceptual quality
References
[1] Zhang, K., W. Zuo, Y. Chen, D. Meng, and L. Zhang, "Beyond a Gaussian Denoiser: Residual
Learning of Deep CNN for Image Denoising." IEEE® Transactions on Image Processing. Feb 2017.
[2] Grubinger, M., P. Clough, H. Müller, and T. Deselaers. "The IAPR TC-12 Benchmark: A New
Evaluation Resource for Visual Information Systems." Proceedings of the OntoImage 2006 Language
Resources For Content-Based Image Retrieval. Genoa, Italy. Vol. 5, May 2006, p. 10.
See Also
denoiseImage | dnCNNLayers | randomPatchExtractionDatastore | rgb2ycbcr |
trainNetwork | trainingOptions | ycbcr2rgb
More About
• “Preprocess Images for Deep Learning” (Deep Learning Toolbox)
• “Datastores for Deep Learning” (Deep Learning Toolbox)
• “List of Deep Learning Layers” (Deep Learning Toolbox)
18-44
Image Processing Operator Approximation Using Deep Learning
Operator approximation finds alternative ways to process images such that the result resembles the
output from a conventional image processing operation or pipeline. The goal of operator
approximation is often to reduce the time required to process an image.
Several classical and deep learning techniques have been proposed to perform operator
approximation. Some classical techniques improve the efficiency of a single algorithm but cannot be
generalized to other operations. Another common technique approximates a wide range of operations
by applying the operator to a low resolution copy of an image, but the loss of high-frequency content
limits the accuracy of the approximation.
Deep learning solutions enable the approximation of more general and complex operations. For
example, the multiscale context aggregation network (CAN) presented by Q. Chen [1 on page 18-0 ]
can approximate multiscale tone mapping, photographic style transfer, nonlocal dehazing, and pencil
drawing. Multiscale CAN trains on full-resolution images for greater accuracy in processing high-
frequency details. After the network is trained, the network can bypass the conventional processing
operation and process images directly.
This example explores how to train a multiscale CAN to approximate a bilateral image filtering
operation, which reduces image noise while preserving edge sharpness. The example presents the
complete training and inference workflow, which includes the process of creating a training
datastore, selecting training options, training the network, and using the network to process test
images.
The multiscale CAN is trained to minimize the l2 loss between the conventional output of an image
processing operation and the network response after processing the input image using multiscale
context aggregation. Multiscale context aggregation looks for information about each pixel from
across the entire image, rather than limiting the search to a small neighborhood surrounding the
pixel.
18-45
18 Deep Learning
To help the network learn global image properties, the multiscale CAN architecture has a large
receptive field. The first and last layers have the same size because the operator should not change
the size of the image. Successive intermediate layers are dilated by exponentially increasing scale
factors (hence the "multiscale" nature of the CAN). Dilation enables the network to look for spatially
separated features at various spatial frequencies, without reducing the resolution of the image. After
each convolution layer, the network uses adaptive normalization to balance the impact of batch
normalization and the identity mapping on the approximated operator.
Download the IAPR TC-12 Benchmark, which consists of 20,000 still natural images [2 on page 18-
0 ]. The data set includes photos of people, animals, cities, and more. The size of the data file is
~1.8 GB. If you do not want to download the training data set needed to train the network, then you
can load the pretrained CAN by typing load('trainedOperatorLearning-Epoch-181.mat');
at the command line. Then, go directly to the Perform Bilateral Filtering Approximation Using
Multiscale CAN on page 18-0 section in this example.
imagesDir = tempdir;
url_1 = 'http://www-i6.informatik.rwth-aachen.de/imageclef/resources/iaprtc12.tgz';
downloadIAPRTC12Data(url_1,imagesDir);
This example trains the network with small subset of the IAPRTC-12 Benchmark data.
trainImagesDir = fullfile(imagesDir,'iaprtc12','images','39');
exts = {'.jpg','.bmp','.png'};
pristineImages = imageDatastore(trainImagesDir,'FileExtensions',exts);
numel(pristineImages.Files)
18-46
Image Processing Operator Approximation Using Deep Learning
ans = 916
To create a training data set, read in pristine images and write out images that have been bilateral
filtered. The filtered images are stored on disk in the directory specified by preprocessDataDir.
Use the helper function bilateralFilterDataset to preprocess the training data. This function is
attached to the example as a supporting file.
The helper function performs these operations for each pristine image in inputImages:
• Calculate the degree of smoothing for bilateral filtering. Smoothing the filtered image reduces
image noise.
• Perform bilateral filtering using imbilatfilt.
• Save the filtered image to disk using imwrite.
bilateralFilterDataset(pristineImages,preprocessDataDir);
Use a random patch extraction datastore to feed the training data to the network. This datastore
extracts random corresponding patches from two image datastores that contain the network inputs
and desired network responses.
In this example, the network inputs are the pristine images in pristineImages. The desired
network responses are the processed images after bilateral filtering. Create an image datastore
called bilatFilteredImages from the collection of bilateral filtered image files.
bilatFilteredImages = imageDatastore(preprocessDataDir,'FileExtensions',exts);
Create a randomPatchExtractionDatastore from the two image datastores. Specify a patch size
of 256-by-256 pixels. Specify 'PatchesPerImage' to extract one randomly-positioned patch from
each pair of images during training. Specify a mini-batch size of one.
miniBatchSize = 1;
patchSize = [256 256];
dsTrain = randomPatchExtractionDatastore(pristineImages,bilatFilteredImages,patchSize, ....
'PatchesPerImage',1);
dsTrain.MiniBatchSize = miniBatchSize;
inputBatch = read(dsTrain);
disp(inputBatch)
InputImage ResponseImage
_________________ _________________
This example defines the multiscale CAN using layers from Deep Learning Toolbox™, including:
18-47
18 Deep Learning
Two custom scale layers are added to implement an adaptive batch normalization layer. These layers
are attached as supporting files to this example.
The first layer, imageInputLayer, operates on image patches. The patch size is based on the
network receptive field, which is the spatial image region that affects the response of top-most layer
in the network. Ideally, the network receptive field is the same as the image size so that it can see all
the high level features in the image. For a bilateral filter, the approximation image patch size is fixed
to 256-by-256.
networkDepth = 10;
numberOfFilters = 32;
firstLayer = imageInputLayer([256 256 3],'Name','InputLayer','Normalization','none');
The image input layer is followed by a 2-D convolution layer that contains 32 filters of size 3-by-3.
Zero-pad the inputs to each convolution layer so that feature maps remain the same size as the input
after each convolution. Initialize the weights to the identity matrix.
Wgts = zeros(3,3,3,numberOfFilters);
for ii = 1:3
Wgts(2,2,ii,ii) = 1;
end
convolutionLayer = convolution2dLayer(3,numberOfFilters,'Padding',1, ...
'Weights',Wgts,'Name','Conv1');
Each convolution layer is followed by a batch normalization layer and an adaptive normalization scale
layer that adjusts the strengths of the batch-normalization branch. Later, this example will create the
corresponding adaptive normalization scale layer that adjusts the strength of the identity branch. For
now, follow the adaptiveNormalizationMu layer with an addition layer. Finally, specify a leaky
ReLU layer with a scalar multiplier of 0.2 for negative inputs.
batchNorm = batchNormalizationLayer('Name','BN1');
adaptiveMu = adaptiveNormalizationMu(numberOfFilters,'Mu1');
addLayer = additionLayer(2,'Name','add1');
leakyrelLayer = leakyReluLayer(0.2,'Name','Leaky1');
Specify the middle layers of the network following the same pattern. Successive convolution layers
have a dilation factor that scales exponentially with the network depth.
Wgts = zeros(3,3,numberOfFilters,numberOfFilters);
for ii = 1:numberOfFilters
Wgts(2,2,ii,ii) = 1;
18-48
Image Processing Operator Approximation Using Deep Learning
end
batchNorm = batchNormalizationLayer('Name','AN9');
adaptiveMu = adaptiveNormalizationMu(numberOfFilters,'Mu9');
addLayer = additionLayer(2,'Name','add9');
leakyrelLayer = leakyReluLayer(0.2,'Name','Leaky9');
middleLayers = [middleLayers conv2dLayer batchNorm adaptiveMu addLayer leakyrelLayer];
The last convolution layer has a single filter of size 1-by-1-by-32-by-3 that reconstructs the image.
Wgts = sqrt(2/(9*numberOfFilters))*randn(1,1,numberOfFilters,3);
conv2dLayer = convolution2dLayer(1,3,'NumChannels',numberOfFilters, ...
'Weights',Wgts,'Name','Conv10');
The last layer is a regression layer instead of a leaky ReLU layer. The regression layer computes the
mean-squared error between the bilateral-filtered image and the network prediction.
finalLayers = [conv2dLayer
regressionLayer('Name','FinalRegressionLayer')
];
Create skip connections, which act as the identity branch for the adaptive normalization equation.
Connect the skip connections to the addition layers.
skipConv1 = adaptiveNormalizationLambda(numberOfFilters,'Lambda1');
skipConv2 = adaptiveNormalizationLambda(numberOfFilters,'Lambda2');
skipConv3 = adaptiveNormalizationLambda(numberOfFilters,'Lambda3');
skipConv4 = adaptiveNormalizationLambda(numberOfFilters,'Lambda4');
skipConv5 = adaptiveNormalizationLambda(numberOfFilters,'Lambda5');
skipConv6 = adaptiveNormalizationLambda(numberOfFilters,'Lambda6');
skipConv7 = adaptiveNormalizationLambda(numberOfFilters,'Lambda7');
skipConv8 = adaptiveNormalizationLambda(numberOfFilters,'Lambda8');
skipConv9 = adaptiveNormalizationLambda(numberOfFilters,'Lambda9');
lgraph = addLayers(lgraph,skipConv1);
18-49
18 Deep Learning
lgraph = connectLayers(lgraph,'Conv1','Lambda1');
lgraph = connectLayers(lgraph,'Lambda1','add1/in2');
lgraph = addLayers(lgraph,skipConv2);
lgraph = connectLayers(lgraph,'Conv2','Lambda2');
lgraph = connectLayers(lgraph,'Lambda2','add2/in2');
lgraph = addLayers(lgraph,skipConv3);
lgraph = connectLayers(lgraph,'Conv3','Lambda3');
lgraph = connectLayers(lgraph,'Lambda3','add3/in2');
lgraph = addLayers(lgraph,skipConv4);
lgraph = connectLayers(lgraph,'Conv4','Lambda4');
lgraph = connectLayers(lgraph,'Lambda4','add4/in2');
lgraph = addLayers(lgraph,skipConv5);
lgraph = connectLayers(lgraph,'Conv5','Lambda5');
lgraph = connectLayers(lgraph,'Lambda5','add5/in2');
lgraph = addLayers(lgraph,skipConv6);
lgraph = connectLayers(lgraph,'Conv6','Lambda6');
lgraph = connectLayers(lgraph,'Lambda6','add6/in2');
lgraph = addLayers(lgraph,skipConv7);
lgraph = connectLayers(lgraph,'Conv7','Lambda7');
lgraph = connectLayers(lgraph,'Lambda7','add7/in2');
lgraph = addLayers(lgraph,skipConv8);
lgraph = connectLayers(lgraph,'Conv8','Lambda8');
lgraph = connectLayers(lgraph,'Lambda8','add8/in2');
lgraph = addLayers(lgraph,skipConv9);
lgraph = connectLayers(lgraph,'Conv9','Lambda9');
lgraph = connectLayers(lgraph,'Lambda9','add9/in2');
plot(lgraph)
18-50
Image Processing Operator Approximation Using Deep Learning
Train the network using the Adam optimizer. Specify the hyperparameter settings by using the
trainingOptions (Deep Learning Toolbox) function. Use the default values of 0.9 for 'Momentum'
and 0.0001 for 'L2Regularization' (weight decay). Specify a constant learning rate of 0.0001.
Train for 181 epochs.
maxEpochs = 181;
initLearningRate = 0.0001;
miniBatchSize = 1;
By default, the example loads a pretrained multiscale CAN that approximates a bilateral filter. The
pretrained network enables you to perform an approximation of bilateral filtering without waiting for
training to complete.
To train the network, set the doTraining variable in the following code to true. Train the multiscale
CAN using the trainNetwork (Deep Learning Toolbox) function.
18-51
18 Deep Learning
Train on a GPU if one is available. Using a GPU requires Parallel Computing Toolbox™ and a CUDA®
enabled NVIDIA® GPU. For more information, see “GPU Support by Release” (Parallel Computing
Toolbox). Training takes about 15 hours on an NVIDIA™ Titan X.
doTraining = false;
if doTraining
modelDateTime = string(datetime('now','Format',"yyyy-MM-dd-HH-mm-ss"));
net = trainNetwork(dsTrain,lgraph,options);
save(strcat("trainedOperatorLearning-",modelDateTime,"-Epoch-",num2str(maxEpochs),".mat"),'ne
else
load('trainedOperatorLearning-Epoch-181.mat');
end
To process an image using a trained multiscale CAN network that approximates a bilateral filter,
follow the remaining steps of this example. The remainder of the example shows how to:
Create a sample noisy image that will be used to compare the results of operator approximation to
conventional bilateral filtering. The test data set, testImages, contains 21 pristine images shipped
in Image Processing Toolbox™. Load the images into an imageDatastore.
exts = {'.jpg','.png'};
fileNames = {'sherlock.jpg','car2.jpg','fabric.png','greens.jpg','hands1.jpg','kobi.png',...
'lighthouse.png','micromarket.jpg','office_4.jpg','onion.png','pears.png','yellowlily.jpg',..
'indiancorn.jpg','flamingos.jpg','sevilla.jpg','llama.jpg','parkavenue.jpg',...
'peacock.jpg','car1.jpg','strawberries.jpg','wagon.jpg'};
filePath = [fullfile(matlabroot,'toolbox','images','imdata') filesep];
filePathNames = strcat(filePath,fileNames);
testImages = imageDatastore(filePathNames,'FileExtensions',exts);
montage(testImages)
18-52
Image Processing Operator Approximation Using Deep Learning
Select one of the images to use as the reference image for bilateral filtering. Convert the image to
data type uint8.
indx = 3; % Index of image to read from the test image datastore
Ireference = readimage(testImages,indx);
Ireference = im2uint8(Ireference);
You can optionally use your own image as the reference image. Note that the size of the test image
must be at least 256-by-256. If the test image is smaller than 256-by-256, then increase the image
size by using the imresize function. The network also requires an RGB test image. If the test image
is grayscale, then convert the image to RGB by using the cat function to concatenate three copies of
the original image along the third dimension.
18-53
18 Deep Learning
imshow(Ireference)
title('Pristine Reference Image')
Use the imnoise function to add zero-mean Gaussian white noise with a variance of 0.00001 to the
reference image.
Inoisy = imnoise(Ireference,'gaussian',0.00001);
imshow(Inoisy)
title('Noisy Image')
18-54
Image Processing Operator Approximation Using Deep Learning
Conventional bilateral filtering is a standard way to reduce image noise while preserving edge
sharpness. Use the imbilatfilt function to apply a bilateral filter to the noisy image. Specify a
degree of smoothing equal to the variance of pixel values.
degreeOfSmoothing = var(double(Inoisy(:)));
Ibilat = imbilatfilt(Inoisy,degreeOfSmoothing);
imshow(Ibilat)
title('Denoised Image Obtained Using Bilateral Filtering')
18-55
18 Deep Learning
Pass the normalized input image through the trained network and observe the activations (Deep
Learning Toolbox) from the final layer (a regression layer). The output of the network is the desired
denoised image.
Iapprox = activations(net,Inoisy,'FinalRegressionLayer');
Image Processing Toolbox™ requires floating point images to have pixel values in the range [0, 1].
Use the rescale function to scale the pixel values to this range, then convert the image to uint8.
Iapprox = rescale(Iapprox);
Iapprox = im2uint8(Iapprox);
imshow(Iapprox)
title('Denoised Image Obtained Using Multiscale CAN')
18-56
Image Processing Operator Approximation Using Deep Learning
To get a better visual understanding of the denoised images, examine a small region inside each
image. Specify a region of interest (ROI) using vector roi in the format [x y width height]. The
elements define the x- and y-coordinate of the top left corner, and the width and height of the ROI.
Crop the images to this ROI, and display the result as a montage.
montage({imcrop(Ireference,roi),imcrop(Inoisy,roi), ...
imcrop(Ibilat,roi),imcrop(Iapprox,roi)}, ...
'Size',[1 4]);
title('Reference Image | Noisy Image | Bilateral-Filtered Image | CAN Prediction');
18-57
18 Deep Learning
The CAN removes more noise than conventional bilateral filtering. Both techniques preserve edge
sharpness.
Use image quality metrics to quantitatively compare the noisy input image, the bilateral-filtered
image, and the operator-approximated image. The reference image is the original reference image,
Ireference, before adding noise.
Measure the peak signal-to-noise ratio (PSNR) of each image against the reference image. Larger
PSNR values generally indicate better image quality. See psnr for more information about this
metric.
noisyPSNR = psnr(Inoisy,Ireference);
bilatPSNR = psnr(Ibilat,Ireference);
approxPSNR = psnr(Iapprox,Ireference);
disp(['PSNR of: Noisy Image / Bilateral-Filtered Image / Operator Approximated Image = ', ...
num2str([noisyPSNR bilatPSNR approxPSNR])])
PSNR of: Noisy Image / Bilateral-Filtered Image / Operator Approximated Image = 20.2857 25.7
Measure the structural similarity index (SSIM) of each image. SSIM assesses the visual impact of
three characteristics of an image: luminance, contrast and structure, against a reference image. The
closer the SSIM value is to 1, the better the test image agrees with the reference image. See ssim for
more information about this metric.
noisySSIM = ssim(Inoisy,Ireference);
bilatSSIM = ssim(Ibilat,Ireference);
approxSSIM = ssim(Iapprox,Ireference);
disp(['SSIM of: Noisy Image / Bilateral-Filtered Image / Operator Approximated Image = ', ...
num2str([noisySSIM bilatSSIM approxSSIM])])
SSIM of: Noisy Image / Bilateral-Filtered Image / Operator Approximated Image = 0.76251 0.915
Measure perceptual image quality using the Naturalness Image Quality Evaluator (NIQE). Smaller
NIQE scores indicate better perceptual quality. See niqe for more information about this metric.
noisyNIQE = niqe(Inoisy);
bilatNIQE = niqe(Ibilat);
approxNIQE = niqe(Iapprox);
disp(['NIQE score of: Noisy Image / Bilateral-Filtered Image / Operator Approximated Image = ', .
num2str([noisyNIQE bilatNIQE approxNIQE])])
NIQE score of: Noisy Image / Bilateral-Filtered Image / Operator Approximated Image = 12.1865
18-58
Image Processing Operator Approximation Using Deep Learning
Compared to conventional bilateral filtering, the operator approximation produces better metric
scores.
References
[1] Chen, Q. J. Xu, and V. Koltun. "Fast Image Processing with Fully-Convolutional Networks." In
Proceedings of the 2017 IEEE Conference on Computer Vision. Venice, Italy, Oct. 2017, pp.
2516-2525.
[2] Grubinger, M., P. Clough, H. Müller, and T. Deselaers. "The IAPR TC-12 Benchmark: A New
Evaluation Resource for Visual Information Systems." Proceedings of the OntoImage 2006 Language
Resources For Content-Based Image Retrieval. Genoa, Italy. Vol. 5, May 2006, p. 10.
See Also
activations | imbilatfilt | layerGraph | randomPatchExtractionDatastore |
trainNetwork | trainingOptions
More About
• “Preprocess Images for Deep Learning” (Deep Learning Toolbox)
• “Datastores for Deep Learning” (Deep Learning Toolbox)
• “List of Deep Learning Layers” (Deep Learning Toolbox)
18-59
18 Deep Learning
DSLRs and many modern phone cameras offer the ability to save data collected directly from the
camera sensor as a RAW file. Each pixel of RAW data corresponds directly to the amount of light
captured by a corresponding camera photosensor. The data depends on fixed characteristics of the
camera hardware, such as the sensitivity to each photosensor to a particular range of wavelengths of
the electromagnetic spectrum. The data also depends on camera acquisition settings such as
exposure time, and factors of the scene such as the light source.
Demosaicing is the only required operation to convert single-channel RAW data to a three-channel
RGB image. However, without additional image processing operations, the resulting RGB image has
subjectively poor visual quality.
Deep learning techniques enable direct RAW to RGB conversion without the necessity of developing a
traditional processing pipeline. For instance, one technique compensates for underexposure when
converting RAW images to RGB [2]. This example shows how to convert RAW images from a lower
end phone camera to RGB images that approximate the quality of a higher end DSLR camera [3].
This example uses the Zurich RAW to RGB data set [3]. The size of the data set is 22 GB. The data set
contains 48,043 spatially registered pairs of RAW and RGB training image patches of size 448-by-448.
The data set contains two separate test sets. One test set consists of 1,204 spatially registered pairs
of RAW and RGB image patches of size 448-by-448. The other test set consists of unregistered full-
resolution RAW and RGB images.
18-60
Develop Raw Camera Processing Pipeline Using Deep Learning
imageDir = fullfile(tempdir,'ZurichRAWToRGB');
if ~exist(imageDir,'dir')
mkdir(imageDir);
end
To download the data set, request access using the Zurich RAW to RGB dataset form. Extract the data
into the directory specified by the imageDir variable. When extracted successfully, imageDir
contains three directories named full_resolution, test, and train.
Create an imageDatastore that reads the target RGB training image patches acquired using a high-
end Canon DSLR.
trainImageDir = fullfile(imageDir,'train');
dsTrainRGB = imageDatastore(fullfile(trainImageDir,'canon'),'ReadSize',16);
groundTruthPatch = preview(dsTrainRGB);
imshow(groundTruthPatch)
18-61
18 Deep Learning
Create an imageDatastore that reads the input RAW training image patches acquired using a
Huawei phone camera. The RAW images are captured with 10-bit precision and are represented as
both 8-bit and 16-bit PNG files. The 8-bit files provide a compact representation of patches with data
in the range [0, 255]. No scaling has been done on any of the RAW data.
dsTrainRAW = imageDatastore(fullfile(trainImageDir,'huawei_raw'),'ReadSize',16);
Preview an input RAW training image patch. The datastore reads this patch as an 8-bit uint8 image
because the sensor counts are in the range [0, 255]. To simulate the 10-bit dynamic range of the
training data, divide the image intensity values by 4. If you zoom in on the image, then you can see
the RGGB Bayer pattern.
inputPatch = preview(dsTrainRAW);
inputPatchRAW = inputPatch/4;
imshow(inputPatchRAW)
18-62
Develop Raw Camera Processing Pipeline Using Deep Learning
To simulate the minimal traditional processing pipeline, demosaic the RGGB Bayer pattern of the
RAW data using the demosaic function. Display the processed image and brighten the display.
Compared to the target RGB image, the minimally-processed RGB image is dark and has imbalanced
colors and noticeable artifacts. A trained RAW-to-RGB network performs preprocessing operations so
that the output RGB image resembles the target image.
inputPatchRGB = demosaic(inputPatch,'rggb');
imshow(rescale(inputPatchRGB))
18-63
18 Deep Learning
The test data contains RAW and RGB image patches and full-sized images. This example partitions
the test image patches into a validation set and test set. The example uses the full-sized test images
for qualitative testing only. See Evaluate Trained Image Processing Pipeline on Full-Sized Images on
page 18-0 .
Create image datastores that read the RAW and RGB test image patches.
testImageDir = fullfile(imageDir,'test');
dsTestRAW = imageDatastore(fullfile(testImageDir,'huawei_raw'),'ReadSize',16);
dsTestRGB = imageDatastore(fullfile(testImageDir,'canon'),'ReadSize',16);
Randomly split the test data into two sets for validation and training. The validation data set contains
200 images. The test set contains the remaining images.
numTestImages = dsTestRAW.numpartitions;
numValImages = 200;
18-64
Develop Raw Camera Processing Pipeline Using Deep Learning
testIdx = randperm(numTestImages);
validationIdx = testIdx(1:numValImages);
testIdx = testIdx(numValImages+1:numTestImages);
dsValRAW = subset(dsTestRAW,validationIdx);
dsValRGB = subset(dsTestRGB,validationIdx);
dsTestRAW = subset(dsTestRAW,testIdx);
dsTestRGB = subset(dsTestRGB,testIdx);
The sensor acquires color data in a repeating Bayer pattern that includes one red, two green, and one
blue photosensor. Preprocess the data into a four-channel image expected of the network using the
transform function. The transform function processes the data using the operations specified in
the preprocessRAWDataForRAWToRGB helper function. The helper function is attached to the
example as a supporting file.
The function also casts the data to data type single scaled to the range [0, 1].
dsTrainRAW = transform(dsTrainRAW,@preprocessRAWDataForRAWToRGB);
dsValRAW = transform(dsValRAW,@preprocessRAWDataForRAWToRGB);
dsTestRAW = transform(dsTestRAW,@preprocessRAWDataForRAWToRGB);
The target RGB images are stored on disk as unsigned 8-bit data. To make the computation of metrics
and the network design more convenient, preprocess the target RGB training images using the
transform function and the preprocessRGBDataForRAWToRGB helper function. The helper
function is attached to the example as a supporting file.
The preprocessRGBDataForRAWToRGB helper function casts images to data type single scaled to
the range [0, 1].
dsTrainRGB = transform(dsTrainRGB,@preprocessRGBDataForRAWToRGB);
dsValRGB = transform(dsValRGB,@preprocessRGBDataForRAWToRGB);
Combine the input RAW and target RGB data for the training, validation, and test image sets by using
the combine function.
dsTrain = combine(dsTrainRAW,dsTrainRGB);
dsVal = combine(dsValRAW,dsValRGB);
dsTest = combine(dsTestRAW,dsTestRGB);
Randomly augment the training data using the transform function and the
augmentDataForRAWToRGB helper function. The helper function is attached to the example as a
supporting file.
18-65
18 Deep Learning
The augmentDataForRAWToRGB helper function randomly applies 90 degree rotation and horizontal
reflection to pairs of input RAW and target RGB training images.
dsTrainAug = transform(dsTrain,@augmentDataForRAWToRGB);
exampleAug = preview(dsTrainAug)
Display the network input and target image in a montage. The network input has four channels, so
display the first channel rescaled to the range [0, 1]. The input RAW and target RGB images have
identical augmentation.
exampleInput = exampleAug{1,1};
exampleOutput = exampleAug{1,2};
montage({rescale(exampleInput(:,:,1)),exampleOutput})
This example uses a custom training loop. The minibatchqueue (Deep Learning Toolbox) object is
useful for managing the mini-batching of observations in custom training loops. The
18-66
Develop Raw Camera Processing Pipeline Using Deep Learning
minibatchqueue object also casts data to a dlarray (Deep Learning Toolbox) object that enables
auto differentiation in deep learning applications.
miniBatchSize = 12;
valBatchSize = 10;
trainingQueue = minibatchqueue(dsTrainAug,'MiniBatchSize',miniBatchSize,'PartialMiniBatch','disca
validationQueue = minibatchqueue(dsVal,'MiniBatchSize',valBatchSize,'MiniBatchFormat','SSCB');
The next (Deep Learning Toolbox) function of minibatchqueue yields the next mini-batch of data.
Preview the outputs from one call to the next function. The outputs have data type dlarray. The
data is already cast to gpuArray, on the GPU, and ready for training.
[inputRAW,targetRGB] = next(trainingQueue);
whos inputRAW
whos targetRGB
This example uses a variation of the U-Net network. In U-Net, the initial series of convolutional layers
are interspersed with max pooling layers, successively decreasing the resolution of the input image.
These layers are followed by a series of convolutional layers interspersed with upsampling operators,
successively increasing the resolution of the input image. The name U-Net comes from the fact that
the network can be drawn with a symmetric shape like the letter U.
This example uses a simple U-Net architecture with two modifications. First, the network replaces the
final transposed convolution operation with a custom pixel shuffle upsampling (also known as a
depth-to-space) operation. Second, the network uses a custom hyperbolic tangent activation layer as
the final layer in the network.
Convolution followed by pixel shuffle upsampling can define subpixel convolution for super resolution
applications. Subpixel convolution prevents the checkboard artifacts that can arise from transposed
convolution [6]. Because the model needs to map H/2-by-W/2-by-4 RAW inputs to W-by-H-by-3 RGB
outputs, the final upsampling stage of the model can be thought of similarly to super resolution where
the number of spatial samples grows from the input to the output.
The figure shows how pixel shuffling upsampling works for a 2-by-2-by-4 input. The first two
dimensions are spatial dimensions and the 3rd dimension is a channel dimension. In general, pixel
C
shuffle upsampling by a factor of S takes a H-by-W-by-C input and yields a S*H-by-S*W-by- 2 output.
S
The pixel shuffle function grows the spatial dimensions of the output by mapping information from
channel dimensions at a given spatial location into S-by-S spatial blocks in the output in which each
channel contributes to a consistent spatial position relative to its neighbors during upsampling.
18-67
18 Deep Learning
A hyperbolic tangent activation layer applies the tanh function on the layer inputs. This example uses
a scaled and shfited version of the tanh function, which encourages but does not strictly enforce that
the RGB network outputs are in the range [0, 1] [6].
f x = 0 . 58 * tanh x + 0 . 5
Use tall to compute per-channel mean reduction across the training data set. The input layer of the
network performs mean centering of inputs during training and testing using the mean statistics.
dsIn = copy(dsTrainRAW);
dsIn.UnderlyingDatastore.ReadSize = 1;
t = tall(dsIn);
perChannelMean = gather(mean(t,[1 2]));
Add layers of the first encoding subnetwork. The first encoder has 32 convolutional filters.
numEncoderStages = 4;
numFiltersFirstEncoder = 32;
encoderNamePrefix = "Encoder-Stage-";
encoderLayers = [
convolution2dLayer([3 3],numFiltersFirstEncoder,"Padding","same","WeightsInitializer","narrow
"Name",strcat(encoderNamePrefix,"1-Conv-1"))
leakyReluLayer(0.2,"Name",strcat(encoderNamePrefix,"1-ReLU-1"))
convolution2dLayer([3 3],numFiltersFirstEncoder,"Padding","same","WeightsInitializer","narrow
"Name",strcat(encoderNamePrefix,"1-Conv-2"))
leakyReluLayer(0.2,"Name",strcat(encoderNamePrefix,"1-ReLU-2"))
18-68
Develop Raw Camera Processing Pipeline Using Deep Learning
Add layers of additional encoding subnetworks. These subnetworks add channel-wise instance
normalization after each convolutional layer using a groupNormalizationLayer. Each encoder
subnetwork has twice the number of filters as the previous encoder subnetwork.
cnIdx = 1;
for stage = 2:numEncoderStages
numFilters = numFiltersFirstEncoder*2^(stage-1);
layerNamePrefix = strcat(encoderNamePrefix,num2str(stage));
encoderLayers = [
encoderLayers
convolution2dLayer([3 3],numFilters,"Padding","same","WeightsInitializer","narrow-normal"
"Name",strcat(layerNamePrefix,"-Conv-1"))
groupNormalizationLayer("channel-wise","Name",strcat("cn",num2str(cnIdx)))
leakyReluLayer(0.2,"Name",strcat(layerNamePrefix,"-ReLU-1"))
convolution2dLayer([3 3],numFilters,"Padding","same","WeightsInitializer","narrow-normal"
"Name",strcat(layerNamePrefix,"-Conv-2"))
groupNormalizationLayer("channel-wise","Name",strcat("cn",num2str(cnIdx+1)))
leakyReluLayer(0.2,"Name",strcat(layerNamePrefix,"-ReLU-2"))
maxPooling2dLayer([2 2],"Stride",[2 2],"Name",strcat(layerNamePrefix,"-MaxPool"))
];
cnIdx = cnIdx + 2;
end
Add bridge layers. The bridge subnetwork has twice the number of filters as the final encoder
subnetwork and first decoder subnetwork.
numFilters = numFiltersFirstEncoder*2^numEncoderStages;
bridgeLayers = [
convolution2dLayer([3 3],numFilters,"Padding","same","WeightsInitializer","narrow-normal", ..
"Name","Bridge-Conv-1")
groupNormalizationLayer("channel-wise","Name","cn7")
leakyReluLayer(0.2,"Name","Bridge-ReLU-1")
convolution2dLayer([3 3],numFilters,"Padding","same","WeightsInitializer","narrow-normal", ..
"Name","Bridge-Conv-2")
groupNormalizationLayer("channel-wise","Name","cn8")
leakyReluLayer(0.2,"Name","Bridge-ReLU-2")];
numDecoderStages = 4;
cnIdx = 9;
decoderNamePrefix = "Decoder-Stage-";
decoderLayers = [];
for stage = 1:numDecoderStages-1
numFilters = numFiltersFirstEncoder*2^(numDecoderStages-stage);
layerNamePrefix = strcat(decoderNamePrefix,num2str(stage));
decoderLayers = [
decoderLayers
18-69
18 Deep Learning
cnIdx = cnIdx + 2;
end
Add layers of the last decoder subnetwork. This subnetwork excludes the channel-wise instance
normalization performed by the other decoder subnetworks. Each decoder subnetwork has half the
number of filters as the previous subnetwork.
numFilters = numFiltersFirstEncoder;
layerNamePrefix = strcat(decoderNamePrefix,num2str(stage+1));
decoderLayers = [
decoderLayers
transposedConv2dLayer([3 3],numFilters,"Stride",[2 2],"Cropping","same","WeightsInitializer",
"Name",strcat(layerNamePrefix,"-UpConv"))
leakyReluLayer(0.2,"Name",strcat(layerNamePrefix,"-UpReLU"))
depthConcatenationLayer(2,"Name",strcat(layerNamePrefix,"-DepthConcatenation"))
convolution2dLayer([3 3],numFilters,"Padding","same","WeightsInitializer","narrow-normal", ..
"Name",strcat(layerNamePrefix,"-Conv-1"))
leakyReluLayer(0.2,"Name",strcat(layerNamePrefix,"-ReLU-1"))
convolution2dLayer([3 3],numFilters,"Padding","same","WeightsInitializer","narrow-normal", ..
"Name",strcat(layerNamePrefix,"-Conv-2"))
leakyReluLayer(0.2,"Name",strcat(layerNamePrefix,"-ReLU-2"))];
Add the final layers of the U-Net. The pixel shuffle layer moves from the H/2-by-W/2-by-12 channel
size of the activations from the final convolution to H-by-W-by 3 channel activations using pixel shuffle
upsampling. The final layer encourages outputs to the desired range [0, 1] using a hyperbolic tangent
function.
finalLayers = [
convolution2dLayer([3 3],12,"Padding","same","WeightsInitializer","narrow-normal", ...
"Name","Decoder-Stage-4-Conv-3")
pixelShuffleLayer("pixelShuffle",2)
tanhScaledAndShiftedLayer("tanhActivation")];
layers = [initialLayer;encoderLayers;bridgeLayers;decoderLayers;finalLayers];
lgraph = layerGraph(layers);
18-70
Develop Raw Camera Processing Pipeline Using Deep Learning
Visualize the network architecture using the Deep Network Designer (Deep Learning Toolbox) app.
deepNetworkDesigner(lgraph)
This function modifies a pretrained VGG-16 deep neural network to extract image features at various
layers. These multilayer features are used to compute content loss.
To get a pretrained VGG-16 network, install vgg16 (Deep Learning Toolbox). If you do not have the
required support package installed, then the software provides a download link.
vggNet = vgg16;
To make the VGG-16 network suitable for feature extraction, use the layers up to 'relu5_3'.
vggNet = vggNet.Layers(1:31);
vggNet = dlnetwork(layerGraph(vggNet));
The helper function modelGradients calculates the gradients and overall loss for batches of
training data. This function is defined in the Supporting Functions on page 18-0 section of this
example.
The overall loss is a weighted sum of two losses: mean of absolute error (MAE) loss and content loss.
The content loss is weighted such that the MAE loss and content loss contribute approximately
equally to the overall loss:
The MAE loss penalises the L1 distance between samples of the network predictions and samples of
the target image. L1 is often a better choice than L2 for image processing applications because it can
help reduce blurring artifacts [4]. This loss is implemented using the maeLoss helper function
defined in the Supporting Functions on page 18-0 section of this example.
The content loss helps the network learn both high-level structural content and low-level edge and
color information. The loss function calculates a weighted sum of the mean square error (MSE)
between predictions and targets for each activation layer. This loss is implemented using the
contentLoss helper function defined in the Supporting Functions on page 18-0 section of this
example.
The modelGradients helper function requires the content loss weight factor as an input argument.
Calculate the weight factor for a sample batch of training data such that the MAE loss is equal to the
weighted content loss.
Preview a batch of training data, which consists of pairs of RAW network inputs and RGB target
outputs.
trainingBatch = preview(dsTrainAug);
networkInput = dlarray((trainingBatch{1,1}),'SSC');
targetOutput = dlarray((trainingBatch{1,2}),'SSC');
18-71
18 Deep Learning
Predict the response of the untrained U-Net network using the forward (Deep Learning Toolbox)
function.
predictedOutput = forward(net,networkInput);
Calculate the MAE and content losses between the predicted and target RGB images.
sampleMAELoss = maeLoss(predictedOutput,targetOutput);
sampleContentLoss = contentLoss(vggNet,predictedOutput,targetOutput);
weightContent = sampleMAELoss/sampleContentLoss;
Define the training options that are used within the custom training loop to control aspects of Adam
optimization. Train for 20 epochs.
learnRate = 5e-5;
numEpochs = 20;
Train Network
By default, the example downloads a pretrained version of the RAW-to-RGB network by using the
helper function downloadTrainedRAWToRGBNet. The helper function is attached to the example as
a supporting file. The pretrained network enables you to run the entire example without waiting for
training to complete.
To train the network, set the doTraining variable in the following code to true. Train the model in
a custom training loop. For each iteration:
• Read the data for current mini-batch using the next (Deep Learning Toolbox) function.
• Evaluate the model gradients using the dlfeval (Deep Learning Toolbox) function and the
modelGradients helper function.
• Update the network parameters using the adamupdate (Deep Learning Toolbox) function and the
gradient information.
• Update the training progress plot for every iteration and display various computed losses.
Train on a GPU if one is available. Using a GPU requires Parallel Computing Toolbox™ and a CUDA®
enabled NVIDIA® GPU. For more information, see “GPU Support by Release” (Parallel Computing
Toolbox). Training takes about 88 hours on an NVIDIA™ Titan RTX and can take even longer
depending on your GPU hardware.
doTraining = false;
if doTraining
18-72
Develop Raw Camera Processing Pipeline Using Deep Learning
start = tic;
for epoch = 1:numEpochs
reset(trainingQueue);
shuffle(trainingQueue);
while hasdata(trainingQueue)
[inputRAW,targetRGB] = next(trainingQueue);
[grad,loss] = dlfeval(@modelGradients,net,vggNet,inputRAW,targetRGB,weightContent);
iteration = iteration + 1;
updateTrainingPlotRAWPipeline(batchLine,validationLine,iteration,loss,start,epoch,...
validationQueue,valSetSize,valBatchSize,net,vggNet,weightContent);
end
% Save checkpoint of network state
save(checkpointDir + "epoch" + epoch,'net');
end
% Save the final network state
save(checkpointDir + "trainedRAWToRGBNet.mat",'net');
else
trainedRAWToRGBNet_url = 'https://ssd.mathworks.com/supportfiles/vision/data/trainedRAWToRGBN
downloadTrainedRAWToRGBNet(trainedRAWToRGBNet_url,imageDir);
load(fullfile(imageDir,'trainedRAWToRGBNet.mat'));
end
Reference-based quality metrics such as MSSIM or PSNR enable a quantitative measure of image
quality. You can calculate the MSSIM and PSNR of the patched test images because they are spatially
registered and the same size.
Iterate through the test set of patched images using a minibatchqueue object.
patchTestSet = combine(dsTestRAW,dsTestRGB);
testPatchQueue = minibatchqueue(patchTestSet,'MiniBatchSize',16,'MiniBatchFormat','SSCB');
Iterate through the test set and calculate the MSSIM and PSNR for each test image using the
multissim and psnr functions. Although the functions accept RGB images, the metrics are not well-
defined for RGB images. Therefore, approximate the MSSIM and PSNR by calculating the metrics of
the color channels separately. You can use the calculateRAWToRGBQualityMetrics helper
function to calculate channel-wise metrics. This function is attached to the example as a supporting
file.
totalMSSIM = 0;
totalPSNR = 0;
while hasdata(testPatchQueue)
[inputRAW,targetRGB] = next(testPatchQueue);
outputRGB = forward(net,inputRAW);
[mssimOut,psnrOut] = calculateRAWToRGBQualityMetrics(outputRGB,targetRGB);
totalMSSIM = totalMSSIM + mssimOut;
18-73
18 Deep Learning
Calculate the mean MSSIM and mean PSNR over the test set. This result is consistent with the
similar U-Net approach from [3] for mean MSSIM and competitive with the PyNet approach in [3] in
mean PSNR. The differences in loss functions and use of pixel shuffle upsampling compared to [3]
likely account for these differences.
numObservations = dsTestRGB.numpartitions;
meanMSSIM = totalMSSIM / numObservations
meanMSSIM = single
0.8534
meanPSNR = 21.2956
Because of sensor differences between the phone camera and DSLR used to acquire the full-
resolution test images, the scenes are not registered and are not the same size. Reference-based
comparison of the full-resolution images from the network and the DSLR ISP is difficult. However, a
qualitative comparison of the images is useful because a goal of image processing is to create an
aesthetically pleasing image.
Create an image datastore that contains full-sized RAW images acquired by a phone camera.
testImageDir = fullfile(imageDir,'test');
testImageDirRAW = "huawei_full_resolution";
dsTestFullRAW = imageDatastore(fullfile(testImageDir,testImageDirRAW));
Get the names of the image files in the full-sized RAW test set.
targetFilesToInclude = extractAfter(string(dsTestFullRAW.Files),fullfile(testImageDirRAW,filesep)
targetFilesToInclude = extractBefore(targetFilesToInclude,".png");
Preprocess the RAW data by converting the data to the form expected by the network using the
transform function. The transform function processes the data using the operations specified in
the preprocessRAWDataForRAWToRGB helper function. The helper function is attached to the
example as a supporting file.
dsTestFullRAW = transform(dsTestFullRAW,@preprocessRAWDataForRAWToRGB);
Create an image datastore that contains full-sized RGB test images captured from the high-end DSLR.
The Zurich RAW to RGB data set contains more full-sized RGB images than RAW images, so include
only the RGB images with a corresponding RAW image.
dsTestFullRGB = imageDatastore(fullfile(imageDir,'full_resolution','canon'));
dsTestFullRGB.Files = dsTestFullRGB.Files(contains(dsTestFullRGB.Files,targetFilesToInclude));
Read in the target RGB images. Get a sense of the overall output by looking at a montage view.
targetRGB = readall(dsTestFullRGB);
montage(targetRGB,"Size",[5 2],"Interpolation","bilinear")
18-74
Develop Raw Camera Processing Pipeline Using Deep Learning
Iterate through the test set of full-sized images using a minibatchqueue object. If you have a GPU
device with sufficient memory to process full-resolution images, then you can run preduction on a
GPU by specifying the output environment as "gpu".
18-75
18 Deep Learning
For each full-sized RAW test image, predict the output RGB image by calling forward (Deep
Learning Toolbox) on the network.
idx = 1;
while hasdata(testQueue)
inputRAW = next(testQueue);
rgbOut = forward(net,inputRAW);
rgbOut = gather(extractdata(rgbOut));
outputImages(:,:,:,idx) = im2uint8(rgbOut);
idx = idx+1;
end
Get a sense of the overall output by looking at a montage view. The network produces images that are
aesthetically pleasing, with similar characteristics.
montage(outputImages,"Size",[5 2],"Interpolation","bilinear")
18-76
Develop Raw Camera Processing Pipeline Using Deep Learning
Compare one target RGB image with the corresponding image predicted by the network. The network
produces colors which are more saturated than the target DSLR images. Although the colors from the
simple U-Net architecture are not the same as the DSLR targets, the images are still qualitively
pleasing in many cases.
imgIdx = 1;
imTarget = targetRGB{imgIdx};
imPredicted = outputImages(:,:,:,imgIdx);
18-77
18 Deep Learning
figure
montage({imTarget,imPredicted},"Interpolation","bilinear")
To improve the performance of the RAW to RGB network, a network architecture would learn detailed
localized spatial features using multiple scales from global features that describe color and contrast
[3].
Supporting Functions
The modelGradients helper function calculates the gradients and overall loss. The gradient
information is returned as a table which includes the layer, parameter name and value for each
learnable parameter in the model.
The helper function maeLoss computes the mean absolute error between network predictions, Y, and
target images, T.
The helper function contentLoss calculates a weighted sum of the MSE between network
predictions, Y, and target images, T, for each activation layer. The contentLoss helper function
calculates the MSE for each activation layer using the mseLoss helper function. Weights are selected
such that the loss from each activation layers contributes roughly equally to the overall content loss.
18-78
Develop Raw Camera Processing Pipeline Using Deep Learning
layers = ["relu1_1","relu1_2","relu2_1","relu2_2","relu3_1","relu3_2","relu3_3","relu4_1"];
[T1,T2,T3,T4,T5,T6,T7,T8] = forward(net,T,'Outputs',layers);
[X1,X2,X3,X4,X5,X6,X7,X8] = forward(net,Y,'Outputs',layers);
l1 = mseLoss(X1,T1);
l2 = mseLoss(X2,T2);
l3 = mseLoss(X3,T3);
l4 = mseLoss(X4,T4);
l5 = mseLoss(X5,T5);
l6 = mseLoss(X6,T6);
l7 = mseLoss(X7,T7);
l8 = mseLoss(X8,T8);
The helper function mseLoss computes the MSE between network predictions, Y, and target images,
T.
References
1) Sumner, Rob. "Processing RAW Images in MATLAB". May 19, 2014. https://rcsumner.net/
raw_guide/RAWguide.pdf
2) Chen, Chen, Qifeng Chen, Jia Xu, and Vladlen Koltun. “Learning to See in the Dark.”
ArXiv:1805.01934 [Cs], May 4, 2018. http://arxiv.org/abs/1805.01934.
3) Ignatov, Andrey, Luc Van Gool, and Radu Timofte. “Replacing Mobile Camera ISP with a Single
Deep Learning Model.” ArXiv:2002.05509 [Cs, Eess], February 13, 2020. http://arxiv.org/abs/
2002.05509. Project Website.
4) Zhao, Hang, Orazio Gallo, Iuri Frosio, and Jan Kautz. “Loss Functions for Neural Networks for
Image Processing.” ArXiv:1511.08861 [Cs], April 20, 2018. http://arxiv.org/abs/1511.08861.
5) Johnson, Justin, Alexandre Alahi, and Li Fei-Fei. “Perceptual Losses for Real-Time Style Transfer
and Super-Resolution.” ArXiv:1603.08155 [Cs], March 26, 2016. http://arxiv.org/abs/1603.08155.
6) Shi, Wenzhe, Jose Caballero, Ferenc Huszár, Johannes Totz, Andrew P. Aitken, Rob Bishop, Daniel
Rueckert, and Zehan Wang. “Real-Time Single Image and Video Super-Resolution Using an Efficient
Sub-Pixel Convolutional Neural Network.” ArXiv:1609.05158 [Cs, Stat], September 23, 2016. http://
arxiv.org/abs/1609.05158.
See Also
combine | imageDatastore | trainNetwork | trainingOptions | transform
18-79
18 Deep Learning
More About
• “Preprocess Images for Deep Learning” (Deep Learning Toolbox)
• “Datastores for Deep Learning” (Deep Learning Toolbox)
• “List of Deep Learning Layers” (Deep Learning Toolbox)
• “Define Custom Training Loops, Loss Functions, and Networks” (Deep Learning Toolbox)
18-80
Semantic Segmentation of Multispectral Images Using Deep Learning
Semantic segmentation involves labeling each pixel in an image with a class. One application of
semantic segmentation is tracking deforestation, which is the change in forest cover over time.
Environmental agencies track deforestation to assess and quantify the environmental and ecological
health of a region.
Deep learning based semantic segmentation can yield a precise measurement of vegetation cover
from high-resolution aerial photographs. One challenge is differentiating classes with similar visual
characteristics, such as trying to classify a green pixel as grass, shrubbery, or tree. To increase
classification accuracy, some data sets contain multispectral images that provide additional
information about each pixel. For example, the Hamlin Beach State Park data set supplements the
color images with three near-infrared channels that provide a clearer separation of the classes.
This example shows how to use deep-learning-based semantic segmentation techniques to calculate
the percentage vegetation cover in a region from a set of multispectral images.
Download Data
This example uses a high-resolution multispectral data set to train the network [1 on page 18-0 ].
The image set was captured using a drone over the Hamlin Beach State Park, NY. The data contains
labeled training, validation, and test sets, with 18 object class labels. The size of the data file is ~3.0
GB.
18-81
18 Deep Learning
Download the MAT-file version of the data set using the downloadHamlinBeachMSIData helper
function. This function is attached to the example as a supporting file.
imageDir = tempdir;
url = 'http://www.cis.rit.edu/~rmk6217/rit18_data.mat';
downloadHamlinBeachMSIData(url,imageDir);
load(fullfile(imageDir,'rit18_data','rit18_data.mat'));
train_data = switchChannelsToThirdPlane(train_data);
val_data = switchChannelsToThirdPlane(val_data);
test_data = switchChannelsToThirdPlane(test_data);
The RGB color channels are the 3rd, 2nd, and 1st image channels. Display the color component of the
training, validation, and test images as a montage. To make the images appear brighter on the
screen, equalize their histograms by using the histeq function.
figure
montage(...
{histeq(train_data(:,:,[3 2 1])), ...
histeq(val_data(:,:,[3 2 1])), ...
histeq(test_data(:,:,[3 2 1]))}, ...
'BorderSize',10,'BackgroundColor','white')
title('RGB Component of Training Image (Left), Validation Image (Center), and Test Image (Right)'
18-82
Semantic Segmentation of Multispectral Images Using Deep Learning
Display the last three histogram-equalized channels of the training data as a montage. These
channels correspond to the near-infrared bands and highlight different components of the image
based on their heat signatures. For example, the trees near the center of the second channel image
show more detail than the trees in the other two channels.
figure
montage(...
{histeq(train_data(:,:,4)), ...
histeq(train_data(:,:,5)), ...
histeq(train_data(:,:,6))}, ...
'BorderSize',10,'BackgroundColor','white')
title('IR Channels 1 (Left), 2, (Center), and 3 (Right) of Training Image')
18-83
18 Deep Learning
Channel 7 is a mask that indicates the valid segmentation region. Display the mask for the training,
validation, and test images.
figure
montage(...
{train_data(:,:,7), ...
val_data(:,:,7), ...
test_data(:,:,7)}, ...
'BorderSize',10,'BackgroundColor','white')
title('Mask of Training Image (Left), Validation Image (Center), and Test Image (Right)')
18-84
Semantic Segmentation of Multispectral Images Using Deep Learning
The labeled images contain the ground truth data for the segmentation, with each pixel assigned to
one of the 18 classes. Get a list of the classes with their corresponding IDs.
disp(classes)
18-85
18 Deep Learning
"LowLevelVegetation","Grass_Lawn","Sand_Beach",...
"Water_Lake","Water_Pond","Asphalt"];
Overlay the labels on the histogram-equalized RGB training image. Add a colorbar to the image.
cmap = jet(numel(classNames));
B = labeloverlay(histeq(train_data(:,:,4:6)),train_labels,'Transparency',0.8,'Colormap',cmap);
figure
title('Training Labels')
imshow(B)
N = numel(classNames);
ticks = 1/(N*2):1/N:1;
colorbar('TickLabels',cellstr(classNames),'Ticks',ticks,'TickLength',0,'TickLabelInterpreter','no
colormap(cmap)
18-86
Semantic Segmentation of Multispectral Images Using Deep Learning
Save the training data as a MAT file and the training labels as a PNG file.
18-87
18 Deep Learning
save('train_data.mat','train_data');
imwrite(train_labels,'train_labels.png');
Use a random patch extraction datastore to feed the training data to the network. This datastore
extracts multiple corresponding random patches from an image datastore and pixel label datastore
that contain ground truth images and pixel label data. Patching is a common technique to prevent
running out of memory for large images and to effectively increase the amount of available training
data.
Begin by storing the training images from 'train_data.mat' in an imageDatastore. Because the
MAT file format is a nonstandard image format, you must use a MAT file reader to enable reading the
image data. You can use the helper MAT file reader, matReader, that extracts the first six channels
from the training data and omits the last channel containing the mask. This function is attached to
the example as a supporting file.
imds = imageDatastore('train_data.mat','FileExtensions','.mat','ReadFcn',@matReader);
Create a pixelLabelDatastore (Computer Vision Toolbox) to store the label patches containing
the 18 labeled regions.
pixelLabelIds = 1:18;
pxds = pixelLabelDatastore('train_labels.png',classNames,pixelLabelIds);
Create a randomPatchExtractionDatastore from the image datastore and the pixel label
datastore. Each mini-batch contains 16 patches of size 256-by-256 pixels. One thousand mini-batches
are extracted at each iteration of the epoch.
dsTrain = randomPatchExtractionDatastore(imds,pxds,[256,256],'PatchesPerImage',16000);
The random patch extraction datastore dsTrain provides mini-batches of data to the network at
each iteration of the epoch. Preview the datastore to explore the data.
inputBatch = preview(dsTrain);
disp(inputBatch)
InputImage ResponsePixelLabelImage
__________________ _______________________
This example uses a variation of the U-Net network. In U-Net, the initial series of convolutional layers
are interspersed with max pooling layers, successively decreasing the resolution of the input image.
These layers are followed by a series of convolutional layers interspersed with upsampling operators,
successively increasing the resolution of the input image [2 on page 18-0 ]. The name U-Net comes
from the fact that the network can be drawn with a symmetric shape like the letter U.
18-88
Semantic Segmentation of Multispectral Images Using Deep Learning
This example modifies the U-Net to use zero-padding in the convolutions, so that the input and the
output to the convolutions have the same size. Use the helper function, createUnet, to create a U-
Net with a few preselected hyperparameters. This function is attached to the example as a supporting
file.
inputTileSize = [256,256,6];
lgraph = createUnet(inputTileSize);
disp(lgraph.Layers)
18-89
18 Deep Learning
Train the network using stochastic gradient descent with momentum (SGDM) optimization. Specify
the hyperparameter settings for SGDM by using the trainingOptions (Deep Learning Toolbox)
function.
Training a deep network is time-consuming. Accelerate the training by specifying a high learning
rate. However, this can cause the gradients of the network to explode or grow uncontrollably,
preventing the network from training successfully. To keep the gradients in a meaningful range,
enable gradient clipping by specifying 'GradientThreshold' as 0.05, and specify
'GradientThresholdMethod' to use the L2-norm of the gradients.
initialLearningRate = 0.05;
maxEpochs = 150;
minibatchSize = 16;
l2reg = 0.0001;
options = trainingOptions('sgdm',...
'InitialLearnRate',initialLearningRate, ...
'Momentum',0.9,...
'L2Regularization',l2reg,...
'MaxEpochs',maxEpochs,...
'MiniBatchSize',minibatchSize,...
'LearnRateSchedule','piecewise',...
'Shuffle','every-epoch',...
'GradientThresholdMethod','l2norm',...
'GradientThreshold',0.05, ...
'Plots','training-progress', ...
'VerboseFrequency',20);
By default, the example downloads a pretrained version of U-Net for this dataset using the
downloadTrainedUnet helper function. This function is attached to the example as a supporting
file. The pretrained network enables you to run the entire example without having to wait for training
to complete.
To train the network, set the doTraining variable in the following code to true. Train the model by
using the trainNetwork (Deep Learning Toolbox) function.
Train on a GPU if one is available. Using a GPU requires Parallel Computing Toolbox™ and a CUDA®
enabled NVIDIA® GPU. For more information, see “GPU Support by Release” (Parallel Computing
Toolbox). Training takes about 20 hours on an NVIDIA Titan X.
doTraining = false;
if doTraining
18-90
Semantic Segmentation of Multispectral Images Using Deep Learning
[net,info] = trainNetwork(dsTrain,lgraph,options);
modelDateTime = string(datetime('now','Format',"yyyy-MM-dd-HH-mm-ss"));
save(strcat("multispectralUnet-",modelDateTime,"-Epoch-",num2str(maxEpochs),".mat"),'net');
else
trainedUnet_url = 'https://www.mathworks.com/supportfiles/vision/data/multispectralUnet.mat';
downloadTrainedUnet(trainedUnet_url,imageDir);
load(fullfile(imageDir,'trainedUnet','multispectralUnet.mat'));
end
You can now use the U-Net to semantically segment the multispectral image.
To perform the forward pass on the trained network, use the helper function, segmentImage, with
the validation data set. This function is attached to the example as a supporting file. segmentImage
performs segmentation on image patches using the semanticseg (Computer Vision Toolbox)
function.
To extract only the valid portion of the segmentation, multiply the segmented image by the mask
channel of the validation data.
figure
imshow(segmentedImage,[])
title('Segmented Image')
18-91
18 Deep Learning
The output of semantic segmentation is noisy. Perform post image processing to remove noise and
stray pixels. Use the medfilt2 function to remove salt-and-pepper noise from the segmentation.
Visualize the segmented image with the noise removed.
18-92
Semantic Segmentation of Multispectral Images Using Deep Learning
segmentedImage = medfilt2(segmentedImage,[7,7]);
imshow(segmentedImage,[]);
title('Segmented Image with Noise Removed')
18-93
18 Deep Learning
B = labeloverlay(histeq(val_data(:,:,[3 2 1])),segmentedImage,'Transparency',0.8,'Colormap',cmap)
figure
imshow(B)
title('Labeled Validation Image')
colorbar('TickLabels',cellstr(classNames),'Ticks',ticks,'TickLength',0,'TickLabelInterpreter','no
colormap(cmap)
18-94
Semantic Segmentation of Multispectral Images Using Deep Learning
Save the segmented image and ground truth labels as PNG files. These will be used to compute
accuracy metrics.
imwrite(segmentedImage,'results.png');
imwrite(val_labels,'gtruth.png');
18-95
18 Deep Learning
Create a pixelLabelDatastore (Computer Vision Toolbox) for the segmentation results and the
ground truth labels.
pxdsResults = pixelLabelDatastore('results.png',classNames,pixelLabelIds);
pxdsTruth = pixelLabelDatastore('gtruth.png',classNames,pixelLabelIds);
ssm = evaluateSemanticSegmentation(pxdsResults,pxdsTruth,'Metrics','global-accuracy');
GlobalAccuracy
______________
0.90698
The global accuracy score indicates that just over 90% of the pixels are classified correctly.
The final goal of this example is to calculate the extent of vegetation cover in the multispectral image.
Find the number of pixels labeled vegetation. The label IDs 2 ("Trees"), 13 ("LowLevelVegetation"),
and 14 ("Grass_Lawn") are the vegetation classes. Also find the total number of valid pixels by
summing the pixels in the ROI of the mask image.
vegetationClassIds = uint8([2,13,14]);
vegetationPixels = ismember(segmentedImage(:),vegetationClassIds);
validPixels = (segmentedImage~=0);
numVegetationPixels = sum(vegetationPixels(:));
numValidPixels = sum(validPixels(:));
Calculate the percentage of vegetation cover by dividing the number of vegetation pixels by the
number of valid pixels.
percentVegetationCover = (numVegetationPixels/numValidPixels)*100;
fprintf('The percentage of vegetation cover is %3.2f%%.',percentVegetationCover);
References
[1] Kemker, R., C. Salvaggio, and C. Kanan. "High-Resolution Multispectral Dataset for Semantic
Segmentation." CoRR, abs/1703.01918. 2017.
18-96
Semantic Segmentation of Multispectral Images Using Deep Learning
[2] Ronneberger, O., P. Fischer, and T. Brox. "U-Net: Convolutional Networks for Biomedical Image
Segmentation." CoRR, abs/1505.04597. 2015.
See Also
evaluateSemanticSegmentation | histeq | imageDatastore | pixelLabelDatastore |
randomPatchExtractionDatastore | semanticseg | trainNetwork | trainingOptions |
unetLayers
More About
• “Getting Started with Semantic Segmentation Using Deep Learning” (Computer Vision Toolbox)
• “Semantic Segmentation Using Deep Learning” (Computer Vision Toolbox)
• “Datastores for Deep Learning” (Deep Learning Toolbox)
External Websites
• https://github.com/rmkemker/RIT-18
18-97
18 Deep Learning
Semantic segmentation involves labeling each pixel in an image or voxel of a 3-D volume with a class.
This example illustrates the use of deep learning methods to perform binary semantic segmentation
of brain tumors in magnetic resonance imaging (MRI) scans. In this binary segmentation, each pixel
is labeled as tumor or background.
This example performs brain tumor segmentation using a 3-D U-Net architecture [1 on page 18-0 ].
U-Net is a fast, efficient and simple network that has become popular in the semantic segmentation
domain.
One challenge of medical image segmentation is the amount of memory needed to store and process
3-D volumes. Training a network on the full input volume is impractical due to GPU resource
constraints. This example solves the problem by training the network on image patches. The example
uses an overlap-tile strategy to stitch test patches into a complete segmented test volume. The
example avoids border artifacts by using the valid part of the convolution in the neural network [5 on
page 18-0 ].
A second challenge of medical image segmentation is class imbalance in the data that hampers
training when using conventional cross entropy loss. This example solves the problem by using a
weighted multiclass Dice loss function [4 on page 18-0 ]. Weighting the classes helps to counter the
influence of larger regions on the Dice score, making it easier for the network to learn how to
segment smaller regions.
This example uses the BraTS data set [2 on page 18-0 ]. The BraTS data set contains MRI scans of
brain tumors, namely gliomas, which are the most common primary brain malignancies. The size of
the data file is ~7 GB. If you do not want to download the BraTS data set, then go directly to the
Download Pretrained Network and Sample Test Set on page 18-0 section in this example.
imageDir = fullfile(tempdir,'BraTS');
if ~exist(imageDir,'dir')
mkdir(imageDir);
end
To download the BraTS data, go to the Medical Segmentation Decathlon website and click the
"Download Data" link. Download the "Task01_BrainTumour.tar" file [3 on page 18-0 ]. Unzip the
TAR file into the directory specified by the imageDir variable. When unzipped successfully,
imageDir will contain a directory named Task01_BrainTumour that has three subdirectories:
imagesTr, imagesTs, and labelsTr.
The data set contains 750 4-D volumes, each representing a stack of 3-D images. Each 4-D volume
has size 240-by-240-by-155-by-4, where the first three dimensions correspond to height, width, and
depth of a 3-D volumetric image. The fourth dimension corresponds to different scan modalities. The
data set is divided into 484 training volumes with voxel labels and 266 test volumes, The test volumes
do not have labels so this example does not use the test data. Instead, the example splits the 484
training volumes into three independent sets that are used for training, validation, and testing.
18-98
3-D Brain Tumor Segmentation Using Deep Learning
To train the 3-D U-Net network more efficiently, preprocess the MRI data using the helper function
preprocessBraTSdataset. This function is attached to the example as a supporting file.
• Crop the data to a region containing primarily the brain and tumor. Cropping the data reduces the
size of data while retaining the most critical part of each MRI volume and its corresponding labels.
• Normalize each modality of each volume independently by subtracting the mean and dividing by
the standard deviation of the cropped brain region.
• Split the 484 training volumes into 400 training, 29 validation, and 55 test sets.
Use a random patch extraction datastore to feed the training data to the network and to validate the
training progress. This datastore extracts random patches from ground truth images and
corresponding pixel label data. Patching is a common technique to prevent running out of memory
when training with arbitrarily large volumes.
Create an imageDatastore to store the 3-D image data. Because the MAT-file format is a
nonstandard image format, you must use a MAT-file reader to enable reading the image data. You can
use the helper MAT-file reader, matRead. This function is attached to the example as a supporting
file.
lblLoc = fullfile(preprocessDataLoc,'labelsTr');
classNames = ["background","tumor"];
pixelLabelID = [0 1];
pxds = pixelLabelDatastore(lblLoc,classNames,pixelLabelID, ...
'FileExtensions','.mat','ReadFcn',volReader);
Preview one image volume and label. Display the labeled volume using the labelvolshow function.
Make the background fully transparent by setting the visibility of the background label (1) to 0.
volume = preview(volds);
label = preview(pxds);
18-99
18 Deep Learning
Create a randomPatchExtractionDatastore that contains the training image and pixel label
data. Specify a patch size of 132-by-132-by-132 voxels. Specify 'PatchesPerImage' to extract 16
randomly positioned patches from each pair of volumes and labels during training. Specify a mini-
batch size of 8.
Follow the same steps to create a randomPatchExtractionDatastore that contains the validation
image and pixel label data. You can use validation data to evaluate whether the network is
continuously learning, underfitting, or overfitting as time progresses.
volLocVal = fullfile(preprocessDataLoc,'imagesVal');
voldsVal = imageDatastore(volLocVal, ...
'FileExtensions','.mat','ReadFcn',volReader);
lblLocVal = fullfile(preprocessDataLoc,'labelsVal');
pxdsVal = pixelLabelDatastore(lblLocVal,classNames,pixelLabelID, ...
'FileExtensions','.mat','ReadFcn',volReader);
18-100
3-D Brain Tumor Segmentation Using Deep Learning
Augment the training and validation data by using the transform function with custom
preprocessing operations specified by the helper function augmentAndCrop3dPatch. This function
is attached to the example as a supporting file.
dataSource = 'Validation';
dsVal = transform(dsVal,@(patchIn)augmentAndCrop3dPatch(patchIn,dataSource));
This example uses the 3-D U-Net network [1 on page 18-0 ]. In U-Net, the initial series of
convolutional layers are interspersed with max pooling layers, successively decreasing the resolution
of the input image. These layers are followed by a series of convolutional layers interspersed with
upsampling operators, successively increasing the resolution of the input image. A batch
normalization layer is introduced before each ReLU layer. The name U-Net comes from the fact that
the network can be drawn with a symmetric shape like the letter U.
Create a default 3-D U-Net network by using the unetLayers (Computer Vision Toolbox) function.
Specify two class segmentation. Also specify valid convolution padding to avoid border artifacts when
using the overlap-tile strategy for prediction of the test volumes.
inputPatchSize = [132 132 132 4];
numClasses = 2;
[lgraph,outPatchSize] = unet3dLayers(inputPatchSize,numClasses,'ConvolutionPadding','valid');
To better segment smaller tumor regions and reduce the influence of larger background regions, this
example uses a dicePixelClassificationLayer (Computer Vision Toolbox). Replace the pixel
classification layer with the Dice pixel classification layer.
outputLayer = dicePixelClassificationLayer('Name','Output');
lgraph = replaceLayer(lgraph,'Segmentation-Layer',outputLayer);
The data has already been normalized in the Preprocess Training and Validation Data on page 18-0
section of this example. Data normalization in the image3dInputLayer (Deep Learning Toolbox) is
unnecessary, so replace the input layer with an input layer that does not have data normalization.
inputLayer = image3dInputLayer(inputPatchSize,'Normalization','none','Name','ImageInputLayer');
lgraph = replaceLayer(lgraph,'ImageInputLayer',inputLayer);
Alternatively, you can modify the 3-D U-Net network by using Deep Network Designer App from Deep
Learning Toolbox™.
Train the network using the adam optimization solver. Specify the hyperparameter settings using the
trainingOptions (Deep Learning Toolbox) function. The initial learning rate is set to 5e-4 and
18-101
18 Deep Learning
gradually decreases over the span of training. You can experiment with the MiniBatchSize property
based on your GPU memory. To maximize GPU memory utilization, favor large input patches over a
large batch size. Note that batch normalization layers are less effective for smaller values of
MiniBatchSize. Tune the initial learning rate based on the MiniBatchSize.
Download a pretrained version of 3-D U-Net and five sample test volumes and their corresponding
labels from the BraTS data set [3 on page 18-0 ]. The pretrained model and sample data enable you
to perform segmentation on test data without downloading the full data set or waiting for the network
to train.
trained3DUnet_url = 'https://www.mathworks.com/supportfiles/vision/data/brainTumor3DUNetValid.mat
sampleData_url = 'https://www.mathworks.com/supportfiles/vision/data/sampleBraTSTestSetValid.tar.
imageDir = fullfile(tempdir,'BraTS');
if ~exist(imageDir,'dir')
mkdir(imageDir);
end
downloadTrained3DUnetSampleData(trained3DUnet_url,sampleData_url,imageDir);
Train Network
By default, the example loads a pretrained 3-D U-Net network. The pretrained network enables you to
run the entire example without waiting for training to complete.
To train the network, set the doTraining variable in the following code to true. Train the model
using the trainNetwork (Deep Learning Toolbox) function.
Train on a GPU if one is available. Using a GPU requires Parallel Computing Toolbox™ and a CUDA®
enabled NVIDIA® GPU. For more information, see “GPU Support by Release” (Parallel Computing
Toolbox). Training takes about 30 hours on a multi-GPU system with 4 NVIDIA™ Titan Xp GPUs and
can take even longer depending on your GPU hardware.
doTraining = false;
if doTraining
modelDateTime = string(datetime('now','Format',"yyyy-MM-dd-HH-mm-ss"));
[net,info] = trainNetwork(dsTrain,lgraph,options);
save(strcat("trained3DUNet-",modelDateTime,"-Epoch-",num2str(options.MaxEpochs),".mat"),'net'
else
inputPatchSize = [132 132 132 4];
outPatchSize = [44 44 44 2];
load(fullfile(imageDir,'trained3DUNet','brainTumor3DUNetValid.mat'));
end
18-102
3-D Brain Tumor Segmentation Using Deep Learning
A GPU is highly recommended for performing semantic segmentation of the image volumes (requires
Parallel Computing Toolbox™).
Select the source of test data that contains ground truth volumes and labels for testing. If you keep
the useFullTestSet variable in the following code as false, then the example uses five volumes
for testing. If you set the useFullTestSet variable to true, then the example uses 55 test images
selected from the full data set.
useFullTestSet = false;
if useFullTestSet
volLocTest = fullfile(preprocessDataLoc,'imagesTest');
lblLocTest = fullfile(preprocessDataLoc,'labelsTest');
else
volLocTest = fullfile(imageDir,'sampleBraTSTestSetValid','imagesTest');
lblLocTest = fullfile(imageDir,'sampleBraTSTestSetValid','labelsTest');
classNames = ["background","tumor"];
pixelLabelID = [0 1];
end
The voldsTest variable stores the ground truth test images. The pxdsTest variable stores the
ground truth labels.
Use the overlap-tile strategy to predict the labels for each test volume. Each test volume is padded to
make the input size a multiple of the output size of the network and compensates for the effects of
valid convolution. The overlap-tile algorithm selects overlapping patches, predicts the labels for each
patch by using the semanticseg (Computer Vision Toolbox) function, and then recombines the
patches.
id = 1;
while hasdata(voldsTest)
disp(['Processing test volume ' num2str(id)]);
tempGroundTruth = read(pxdsTest);
groundTruthLabels{id} = tempGroundTruth{1};
vol{id} = read(voldsTest);
tempSeg = categorical(zeros([height,width,depth],'uint8'),[0;1],classNames);
18-103
18 Deep Learning
for k = 1:outPatchSize(3):depthPad-inputPatchSize(3)+1
for j = 1:outPatchSize(2):widthPad-inputPatchSize(2)+1
for i = 1:outPatchSize(1):heightPad-inputPatchSize(1)+1
patch = volPadded( i:i+inputPatchSize(1)-1,...
j:j+inputPatchSize(2)-1,...
k:k+inputPatchSize(3)-1,:);
patchSeg = semanticseg(patch,net);
tempSeg(i:i+outPatchSize(1)-1, ...
j:j+outPatchSize(2)-1, ...
k:k+outPatchSize(3)-1) = patchSeg;
end
end
end
Select one of the test images to evaluate the accuracy of the semantic segmentation. Extract the first
modality from the 4-D volumetric data and store this 3-D volume in the variable vol3d.
volId = 1;
vol3d = vol{volId}(:,:,:,1);
Display in a montage the center slice of the ground truth and predicted labels along the depth
direction.
zID = size(vol3d,3)/2;
zSliceGT = labeloverlay(vol3d(:,:,zID),groundTruthLabels{volId}(:,:,zID));
zSlicePred = labeloverlay(vol3d(:,:,zID),predictedLabels{volId}(:,:,zID));
figure
montage({zSliceGT,zSlicePred},'Size',[1 2],'BorderSize',5)
title('Labeled Ground Truth (Left) vs. Network Prediction (Right)')
18-104
3-D Brain Tumor Segmentation Using Deep Learning
Display the ground-truth labeled volume using the labelvolshow function. Make the background
fully transparent by setting the visibility of the background label (1) to 0. Because the tumor is inside
the brain tissue, make some of the brain voxels transparent, so that the tumor is visible. To make
some brain voxels transparent, specify the volume threshold as a number in the range [0, 1]. All
normalized volume intensities below this threshold value are fully transparent. This example sets the
volume threshold as less than 1 so that some brain pixels remain visible, to give context to the spatial
location of the tumor inside the brain.
18-105
18 Deep Learning
hPred.LabelVisibility(1) = 0;
18-106
3-D Brain Tumor Segmentation Using Deep Learning
This image shows the result of displaying slices sequentially across the one of the volume. The
labeled ground truth is on the left and the network prediction is on the right.
Measure the segmentation accuracy using the dice function. This function computes the Dice
similarity coefficient between the predicted and ground truth segmentations.
diceResult = zeros(length(voldsTest.Files),2);
for j = 1:length(vol)
diceResult(j,:) = dice(groundTruthLabels{j},predictedLabels{j});
end
Calculate the average Dice score across the set of test volumes.
18-107
18 Deep Learning
meanDiceBackground = mean(diceResult(:,1));
disp(['Average Dice score of background across ',num2str(j), ...
' test volumes = ',num2str(meanDiceBackground)])
meanDiceTumor = mean(diceResult(:,2));
disp(['Average Dice score of tumor across ',num2str(j), ...
' test volumes = ',num2str(meanDiceTumor)])
The figure shows a boxplot (Statistics and Machine Learning Toolbox) that visualizes statistics
about the Dice scores across the set of five sample test volumes. The red lines in the plot show the
median Dice value for the classes. The upper and lower bounds of the blue box indicate the 25th and
75th percentiles, respectively. Black whiskers extend to the most extreme data points not considered
outliers.
If you have Statistics and Machine Learning Toolbox™, then you can use the boxplot function to
visualize statistics about the Dice scores across all your test volumes. To create a boxplot, set the
createBoxplot variable in the following code to true.
createBoxplot = false;
if createBoxplot
18-108
3-D Brain Tumor Segmentation Using Deep Learning
figure
boxplot(diceResult)
title('Test Set Dice Accuracy')
xticklabels(classNames)
ylabel('Dice Coefficient')
end
References
[1] Çiçek, Ö., A. Abdulkadir, S. S. Lienkamp, T. Brox, and O. Ronneberger. "3D U-Net: Learning Dense
Volumetric Segmentation from Sparse Annotation." In Proceedings of the International Conference on
Medical Image Computing and Computer-Assisted Intervention - MICCAI 2016. Athens, Greece, Oct.
2016, pp. 424-432.
[2] Isensee, F., P. Kickingereder, W. Wick, M. Bendszus, and K. H. Maier-Hein. "Brain Tumor
Segmentation and Radiomics Survival Prediction: Contribution to the BRATS 2017 Challenge." In
Proceedings of BrainLes: International MICCAI Brainlesion Workshop. Quebec City, Canada, Sept.
2017, pp. 287-297.
The BraTS dataset is provided by Medical Segmentation Decathlon under the CC-BY-SA 4.0 license.
All warranties and representations are disclaimed; see the license for details. MathWorks® has
modified the data set linked in the Download Pretrained Network and Sample Test Set on page 18-
0 section of this example. The modified sample dataset has been cropped to a region containing
primarily the brain and tumor and each channel has been normalized independently by subtracting
the mean and dividing by the standard deviation of the cropped brain region.
[4] Sudre, C. H., W. Li, T. Vercauteren, S. Ourselin, and M. J. Cardoso. "Generalised Dice Overlap as a
Deep Learning Loss Function for Highly Unbalanced Segmentations." Deep Learning in Medical
Image Analysis and Multimodal Learning for Clinical Decision Support: Third International Workshop.
Quebec City, Canada, Sept. 2017, pp. 240-248.
[5] Ronneberger, O., P. Fischer, and T. Brox. "U-Net:Convolutional Networks for Biomedical Image
Segmentation." In Proceedings of the International Conference on Medical Image Computing and
Computer-Assisted Intervention - MICCAI 2015. Munich, Germany, Oct. 2015, pp. 234-241. Available
at arXiv:1505.04597.
See Also
dicePixelClassificationLayer | imageDatastore | pixelLabelDatastore |
randomPatchExtractionDatastore | semanticseg | trainNetwork | trainingOptions |
transform
More About
• “Preprocess Volumes for Deep Learning” (Deep Learning Toolbox)
• “Datastores for Deep Learning” (Deep Learning Toolbox)
• “List of Deep Learning Layers” (Deep Learning Toolbox)
External Websites
• 3-D Brain Tumor Segmentation Using Deep Learning Video Tutorial
18-109
18 Deep Learning
The only definitive way to diagnose breast cancer is by examining tissue samples collected from
biopsy or surgery. The samples are commonly prepared with hematoxylin and eosin (H&E) staining to
increase the contrast of structures in the tissue. Traditionally, pathologists examine the tissue on
glass slides under a microscope to detect tumor tissue. Diagnosis takes time as pathologists must
thoroughly inspect an entire slide at close magnification. Further, pathologists may not notice small
tumors. Deep learning methods aim to automate the detection of tumor tissue, saving time and
improving the detection rate of small tumors.
Deep learning methods for tumor classification rely on digital pathology, in which whole tissue slides
are imaged and digitized. The resulting WSIs have extremely high resolution. WSIs are frequently
stored in a multiresolution file to facilitate the display, navigation, and processing of the images.
Reading WSIs is a challenge because the images cannot be loaded as a whole into memory and
therefore require out-of-core image processing techniques. You can store and process this type of
large multiresolution image by using blockedImage objects. You can extract batches of data from
blockedImage objects using blockedImageDatastore objects.
This example shows how to train a deep learning network to classify tumors in very large
multiresolution images using blockedImage and blockedImageDatastore. The example presents
classification results as heatmaps that depict the probability that local tissue is tumorous. The
localization of tumor regions enables medical pathologists to investigate specific regions and quickly
identify tumors of any size in an image.
This example uses WSIs from the Camelyon16 challenge [1 on page 18-0 ]. The data from this
challenge contains a total of 400 WSIs of lymph nodes from two independent sources, separated into
270 training images and 130 test images. The WSIs are stored as TIF files in a stripped format with
an 11-level pyramid structure.
The training data set consists of 159 WSIs of normal lymph nodes and 111 WSIs of lymph nodes with
tumor and healthy tissue. Usually, the tumor tissue is a small fraction of the healthy tissue. Ground
truth coordinates of the lesion boundaries accompany the tumor images.
The size of each training file is approximately 2 GB. If you do not want to download the training data
set or train the network, then go directly to the Train or Download the Network on page 18-0
section in this example.
18-110
Classify Large Multiresolution Images Using blockedImage and Deep Learning
trainNormalDataDir = fullfile(trainingImageDir,'normal');
trainTumorDataDir = fullfile(trainingImageDir,'tumor');
trainTumorAnnotationDir = fullfile(trainingImageDir,'lesion_annotations');
To download the training data, go to the Camelyon17 website and click the first "CAMELYON16 data
set" link. Open the "training" directory, then follow these steps.
• Download the "lesion_annotations.zip" file. Extract the files to the directory specified by the
trainTumorAnnotationDir variable.
• Open the "normal" directory. Download the images to the directory specified by the
trainNormalDataDir variable.
• Open the "tumor" directory. Download the images to the directory specified by the
trainTumorDataDir variable.
Specify the number of training images. Note that one of the training images of normal tissue,
'normal_144.tif', has metadata that cannot be read by the blockedImage object. This example uses
the remaining 158 training files.
numNormalFiles = 158;
numTumorFiles = 111;
To get a better understanding of the training data, display one training image. Because loading the
entire image into memory at the finest resolution is not possible, you cannot use traditional image
display functions such as imshow. To display and process the image data, use a blockedImage
object.
tumorFileName = fullfile(trainTumorDataDir,'tumor_001.tif');
tumorImage = blockedImage(tumorFileName);
Inspect the dimensions of the blockedImage at each resolution level. Level 1 has the most pixels
and is the finest resolution level. Level 10 has the fewest pixels and is the coarsest resolution level.
The aspect ratio is not consistent, which indicates that levels do not all span the same world area.
levelSizeInfo=11×4 table
Resolution Level Image Width Image Height Aspect Ratio
________________ ___________ ____________ ____________
18-111
18 Deep Learning
10 1024 512 2
11 512 512 1
Display the blockedImage at a coarse resolution level by using the bigimageshow function. Return
a handle to the bigimageshow object. You can use the handle to adjust the display. The image
contains a lot of empty white space. The tissue occupies only a small portion of the image.
h = bigimageshow(tumorImage,'ResolutionLevel',8);
Zoom in on one part of the image by setting the horizontal and vertical spatial extents with respect to
the finest resolution level. The image looks blurry because this resolution level is very coarse.
xlim([29471,29763]);
ylim([117450,118110]);
h.ResolutionLevel = 1;
18-112
Classify Large Multiresolution Images Using blockedImage and Deep Learning
Create Masks
You can reduce the amount of computation by processing only regions of interest (ROIs). Use a mask
to define ROIs. A mask is a logical image in which true pixels represent the ROI.
To further reduce the amount of computation, create masks at a coarse resolution level that can be
processed entirely in memory instead of on a block-by-block basis. If the spatial referencing of the
coarse resolution level matches the spatial referencing of finer resolution levels, then locations at the
coarse level correspond to locations in finer levels. In this case, you can use the coarse mask to select
which blocks to process at the finer levels. For more information, see “Set Up Spatial Referencing for
Blocked Images” on page 16-2 and “Process Blocked Images Efficiently Using Mask” on page 16-22.
Specify a resolution level to use for creating the mask. This example uses resolution level 7, which is
coarse and fits in memory. Note that the blockedImage object automatically sorts the levels in a
multiresolution image from finest to coarsest based on number of pixels in each level. Several
Camelyon16 image files contain a mask of intermediate resolution. This example ignores the mask
when determining and reading the seventh level of image data.
resolutionLevel = 7;
18-113
18 Deep Learning
In normal images, the ROI consists of healthy tissue. The color of healthy tissue is distinct from the
color of the background, so use color thresholding to segment the image and create an ROI. The
L*a*b* color space provides the best color separation for segmentation. Convert the image to the
L*a*b* color space, then threshold the a* channel to create the tissue mask.
You can use the helper function createMaskForNormalTissue to create masks using color
thresholding. This helper function is attached to the example as a supporting file.
The helper function performs these operations on each training image of normal tissue:
Now that both normal images and masks are on disk, create blockedImage objects to manage the
data by using the helper function createBlockedImageAndMaskArrays. This function creates an
array of blockedImage objects from the normal images and a corresponding array of
blockedImage objects from the normal mask images. The helper function is attached to the example
as a supporting file.
[normalImages,normalMasks] = createBlockedImageAndMaskArrays(trainNormalDataDir,trainNormalMaskDi
Select a sample normal image and mask. Confirm that the spatial world extents of the mask match
the extents of the image at the finest resolution level. The spatial world extents are specified by the
WorldStart and WorldEnd properties.
idx = 2;
[normalImages(idx).WorldStart normalImages(idx).WorldEnd]
ans = 11×6
105 ×
18-114
Classify Large Multiresolution Images Using blockedImage and Deep Learning
[normalMasks(idx).WorldStart normalMasks(idx).WorldEnd]
ans = 1×4
0 0 221184 97792
Verify that the mask contains the correct ROIs and spatial referencing. Display the sample image by
using the bigimageshow function. Get the axes containing the display.
figure
hNormal = bigimageshow(normalImages(idx));
hNormalAxes = hNormal.Parent;
Create a new axes on top of the displayed blockedImage. In the new axes, display the
corresponding mask image with partial transparency. The mask highlights the regions containing
normal tissue.
hMaskAxes = axes;
hMask = bigimageshow(normalMasks(idx),'Parent',hMaskAxes, ...
'Interpolation','nearest','AlphaData',0.5);
hMaskAxes.Visible = 'off';
Link the axes of the image with the axes of the mask. When you zoom and pan, both axes are updated
identically.
linkaxes([hNormalAxes,hMaskAxes]);
18-115
18 Deep Learning
18-116
Classify Large Multiresolution Images Using blockedImage and Deep Learning
Zoom in on one part of the image by setting the horizontal and vertical spatial extents. The mask
overlaps the normal tissue correctly.
xlim([45000 80000]);
ylim([130000 165000]);
18-117
18 Deep Learning
In tumor images, the ROI consists of tumor tissue. The color of tumor tissue is similar to the color of
healthy tissue, so you cannot use color segmentation techniques. Instead, create ROIs by using the
ground truth coordinates of the lesion boundaries that accompany the tumor images.
You can use the helper function createMaskForTumorTissue to create a mask using ROIs. This
helper function is attached to the example as a supporting file.
The helper function performs these operations on each training image of tumor tissue:
18-118
Classify Large Multiresolution Images Using blockedImage and Deep Learning
• Create an output logical mask blockedImage object at a coarser resolution level. Write the mask
image on a block-by-block basis by using the setBlock function.
• Write the mask blockedImage object to a directory in memory. Only the blockedImage object is
in memory. The individual image blocks corresponding to the logical mask image are in a
temporary directory. Writing to a directory preserves the custom spatial referencing, which
ensures that the tumor images and their corresponding mask images have the same spatial
referencing.
Now that both tumor images and masks are on disk, create blockedImage objects to manage the
data by using the helper function createBlockedImageAndMaskArrays. This function creates an
array of blockedImage objects from the tumor images and a corresponding array of blockedImage
objects from the tumor mask images. The helper function is attached to the example as a supporting
file.
[tumorImages,tumorMasks] = createBlockedImageAndMaskArrays(trainTumorDataDir,trainTumorMaskDir);
Select a sample tumor image and mask. Confirm that the spatial world extents of the mask match the
extents of the image at the finest resolution level. The spatial world extents are specified by the
XWorldLimits and YWorldLimits properties.
idx = 5;
[tumorImages(idx).WorldStart tumorImages(idx).WorldEnd]
ans = 11×6
105 ×
[tumorMasks(idx).WorldStart tumorMasks(idx).WorldEnd]
ans = 1×4
0 0 219648 97792
Verify that the mask contains the correct ROIs and spatial referencing. Display the sample image by
using the bigimageshow function. Get the axes containing the display.
figure
hTumor = bigimageshow(tumorImages(idx));
hTumorAxes = hTumor.Parent;
18-119
18 Deep Learning
Create a new axes on top of the displayed blockedImage. In the new axes, display the
corresponding mask image with partial transparency. The mask highlights the regions containing
normal tissue.
hMaskAxes = axes;
hMask = bigimageshow(tumorMasks(idx),'Parent',hMaskAxes, ...
'Interpolation','nearest','AlphaData',0.5);
hMaskAxes.Visible = 'off';
Link the axes of the image with the axes of the mask. When you zoom and pan, both axes are updated
identically.
linkaxes([hTumorAxes,hMaskAxes]);
18-120
Classify Large Multiresolution Images Using blockedImage and Deep Learning
Zoom in on one part of the image by setting the horizontal and vertical spatial extents. The mask
overlaps the tumor tissue correctly.
xlim([45000 65000]);
ylim([130000 150000]);
18-121
18 Deep Learning
Color imbalance and class imbalance in the raw training patches can potentially bias the network.
Color imbalance results from nonuniform color staining of the tissue. Class imbalance results from an
unequal amount of tumor and normal tissue in the data. To correct these imbalances, you can
preprocess and augment the datastore.
This example shows how to create a blockedImageDatastore that extracts tumor and normal
patches for training the network. The example also shows how to preprocess and augment datastores
to avoid biasing the network.
Randomly split the normal images and corresponding masks into two sets. The validation set contains
two randomly selected images and corresponding masks. The training set contains the remaining
images and masks.
18-122
Classify Large Multiresolution Images Using blockedImage and Deep Learning
The patch size is small compared to the size of features in the image. By default, a
blockedImageDatastore extracts patches with no overlap and no gap, which generates a huge
quantity of training patches. You can reduce the amount of training data by specifying a subset of
patches. Specify the coordinates of patches using the selectBlockLocations function. Add a gap
between the sampled training patches using the BlockOffsets name-value argument. Specify an
offset that is larger than the patch size. Increase the inclusion threshold from the default value of 0.5
so that the network trains on relatively homogenous patches.
patchSize = [299,299,3];
normalStrideFactor = 10;
blsNormalData = selectBlockLocations(normalImages(normalTrainIdx), ...
"BlockSize",patchSize(1:2),"BlockOffsets",patchSize(1:2)*normalStrideFactor, ...
"Masks",normalMasks(normalTrainIdx),"InclusionThreshold",0.75,"ExcludeIncompleteBlocks",true)
Select the location of validation patches to read. Because there are fewer validation images, you do
not need to add a gap between patches.
Create datastores dsNormalData and dsNormalDataVal that read image patches from normal
images at the finest resolution level for training and validation, respectively. Specify the coordinates
of patches using the BlockLocationSet name-value pair argument.
18-123
18 Deep Learning
18-124
Classify Large Multiresolution Images Using blockedImage and Deep Learning
Randomly split the tumor images and corresponding masks into two sets. The validation set contains
two randomly selected images and corresponding masks. The training set contains the remaining
images and masks.
Specify the coordinates of patches to read using the selectBlockLocations function. Tumor tissue
is more sparse than normal tissue, so increase the sampling density by specifying a smaller block
offset than for normal tissue. Note that if you want to train using fewer training images, then you
might need to increase the size of the training set by decreasing the block offset even further.
tumorStrideFactor = 3;
blsTumorData = selectBlockLocations(tumorImages(tumorTrainIdx), ...
"BlockSize",patchSize(1:2),"BlockOffsets",patchSize(1:2)*tumorStrideFactor, ...
"Masks",tumorMasks(tumorTrainIdx),"InclusionThreshold",0.75,"ExcludeIncompleteBlocks",true);
Select the location of validation patches to read. Because there are fewer validation images, you do
not need to add a gap between patches.
Create a blockedImageDatastore from the training tumor images and masks. The datastores
dsTumorData and dsTumorDataVal read image patches from tumor images at the finest resolution
level for training and validation, respectively.
18-125
18 Deep Learning
The training images have different color distributions because the data set came from different
sources and color staining the tissue does not result in identically stained images. Additional
preprocessing is necessary to avoid biasing the network.
To prevent color variability, this example preprocesses the data with standard stain normalization
techniques. Apply stain normalization and augmentation by using the transform function with
custom preprocessing operations specified by the helper function augmentAndLabelCamelyon16.
This function is attached to the example as a supporting file.
• Normalize staining by using the normalizeStaining.m function [4] on page 18-0 . Stain
normalization is performed using Macenko's method, which separates H&E color channels by
color deconvolution using a fixed matrix and then recreates the normalized images with individual
corrected mixing. The function returns the normalized image as well as the H&E images.
• Add color jitter by using the jitterColorHSV function. Color jitter varies the color of each patch
by perturbing the image contrast, hue, saturation, and brightness. Color jitter is performed in the
HSV color space to avoid unwanted color artifacts in the RGB image.
• Apply random combinations of 90 degree rotations and vertical and horizontal reflection.
Randomized affine transformations make the network agnostic to the orientation of the input
image data.
18-126
Classify Large Multiresolution Images Using blockedImage and Deep Learning
Each image patch generates five augmented and labeled patches: the stain-normalized patch, the
stain-normalized patch with color jitter, the stain-normalized patch with color jitter and random affine
transformation, the hematoxylin image with random affine transformation, and the eosin image with
random affine transformation.
Create datastores that transform the normal training and validation images and label the generated
patches as 'normal'.
Create datastores that transform the tumor training and validation images and label the generated
patches as 'tumor'.
The amount of cancer tissue in the tumor images is very small compared to the amount of normal
tissue. Additional preprocessing is necessary to avoid training the network on class-imbalanced data
containing a large amount of normal tissue and a very small amount of tumor tissue.
Create the custom randomSamplingDatastore from the normal and tumor training datastores. The
random sampling datastore dsTrain provides mini-batches of training data to the network at each
iteration of the epoch.
dsTrain = randomSamplingDatastore(dsLabeledTumorData,dsLabeledNormalData);
To limit the number of patches used during validation, this example defines a custom datastore called
a validationDatastore that returns five validation patches from each class. The script to define
this custom datastore is attached to the example as a supporting file.
Create the custom validationDatastore from the normal and tumor validation datastores.
numValidationPatchesPerClass = 5;
dsVal = validationDatastore(dsLabeledTumorDataVal, ...
dsLabeledNormalDataVal,numValidationPatchesPerClass);
This example uses the Inception-v3 network, a convolutional neural network that is trained on more
than a million images from the ImageNet database [3 on page 18-0 ]. The network is 48 layers deep
and can classify images into 1,000 object categories, such as keyboard, mouse, pencil, and many
animals. The network expects an image input size of 299-by-299 with 3 channels.
18-127
18 Deep Learning
The inceptionv3 (Deep Learning Toolbox) function returns a pretrained Inception-v3 network.
Inception-v3 requires the Deep Learning Toolbox™ Model for Inception-v3 Network support package.
If this support package is not installed, then the function provides a download link.
net = inceptionv3;
The convolutional layers of the network extract image features that the last learnable layer and the
final classification layer use to classify the input image. These two layers contain information on how
to combine the features that the network extracts into class probabilities, a loss value, and predicted
labels. To retrain a pretrained network to classify new images, replace these two layers with new
layers adapted to the new data set. For more information, see “Train Deep Learning Network to
Classify New Images” (Deep Learning Toolbox).
lgraph = layerGraph(net);
Find the names of the two layers to replace by using the supporting function
findLayersToReplace. This function is attached to the example as a supporting file. In Inception-
v3, these two layers are named 'predictions' and 'ClassificationLayer_predictions'.
[learnableLayer,classLayer] = findLayersToReplace(lgraph)
learnableLayer =
FullyConnectedLayer with properties:
Name: 'predictions'
Hyperparameters
InputSize: 2048
OutputSize: 1000
Learnable Parameters
Weights: [1000×2048 single]
Bias: [1000×1 single]
classLayer =
ClassificationOutputLayer with properties:
Name: 'ClassificationLayer_predictions'
Classes: [1000×1 categorical]
ClassWeights: 'none'
OutputSize: 1000
Hyperparameters
LossFunction: 'crossentropyex'
The goal of this example is to perform binary segmentation between two classes, tumor and
nontumor regions. Create a new fully connected layer for two classes. Replace the original final fully
connected layer with the new layer.
18-128
Classify Large Multiresolution Images Using blockedImage and Deep Learning
numClasses = 2;
newLearnableLayer = fullyConnectedLayer(numClasses,'Name','predictions');
lgraph = replaceLayer(lgraph,learnableLayer.Name,newLearnableLayer);
Create a new classification layer for two classes. Replace the original final classification layer with
the new layer.
newClassLayer = classificationLayer('Name','ClassificationLayer_predictions');
lgraph = replaceLayer(lgraph,classLayer.Name,newClassLayer);
Train the network using the rmsprop optimization solver. This solver automatically adjusts the
learning rate and momentum for faster convergence. Specify other hyperparameter settings by using
the trainingOptions (Deep Learning Toolbox) function. Reduce MaxEpochs to a small number
because the large amount of training data enables the network to reach convergence sooner.
checkpointsDir = fullfile(trainingImageDir,'checkpoints');
if ~exist(checkpointsDir,'dir')
mkdir(checkpointsDir);
end
By default, the example downloads a pretrained version of the trained Inception-v3 network by using
the helper function downloadTrainedCamelyonNet. The helper function is attached to the example
as a supporting file. The pretrained network enables you to run the entire example without waiting
for training to complete.
To train the network, set the doTraining variable in the following code to true. Train the network
using the trainNetwork (Deep Learning Toolbox) function.
Train on a GPU if one is available. Using a GPU requires Parallel Computing Toolbox™ and a CUDA®
enabled NVIDIA® GPU. For more information, see “GPU Support by Release” (Parallel Computing
Toolbox). Training takes about 20 hours on an NVIDIA Titan X.
doTraining = false;
if doTraining
trainedNet = trainNetwork(dsTrain,lgraph,options);
modelDateTime = string(datetime('now','Format',"yyyy-MM-dd-HH-mm-ss"));
save(strcat("trainedCamelyonNet-",modelDateTime,".mat"),'trainedNet');
else
trainedCamelyonNet_url = 'https://www.mathworks.com/supportfiles/vision/data/trainedCamelyonN
netDir = fullfile(tempdir,'Camelyon16');
downloadTrainedCamelyonNet(trainedCamelyonNet_url,netDir);
load(fullfile(netDir,'trainedCamelyonNet.mat'));
end
18-129
18 Deep Learning
The Camelyon16 test data set consists of 130 WSIs. These images have both normal and tumor tissue.
This example uses two test images from the Camelyon16 test data. The size of each file is
approximately 2 GB.
testDataDir = fullfile(testingImageDir,'images');
testTumorAnnotationDir = fullfile(testingImageDir,'lesion_annotations');
To download the test data, go to the Camelyon17 website and click the first "CAMELYON16 data set"
link. Open the "testing" directory, then follow these steps.
• Download the "lesion_annotations.zip" file. Extract all files to the directory specified by the
testTumorAnnotationDir variable.
• Open the "images" directory. Download the first two files, "test_001.tif" and "test_002.tif". Move
the files to the directory specified by the testDataDir variable.
The test images contain a mix of normal and tumor images. To reduce the amount of computation
during classification, define the ROIs by creating masks.
Specify a resolution level to use for creating the mask. This example uses resolution level 7, which is
coarse and fits in memory.
resolutionLevel = 7;
Create masks for regions containing tissue. You can use the helper function
createMaskForNormalTissue to create masks using color thresholding. This helper function is
attached to the example as a supporting file. For more information about this helper function, see
Create Masks for Normal Images on page 18-0 .
testTissueMaskDir = fullfile(testDataDir,['test_tissuemask_level' num2str(resolutionLevel)]);
createMaskForNormalTissue(testDataDir,testTissueMaskDir,resolutionLevel);
Create masks for images that contain tumor tissue. Skip images that do not contain tumor tissue. You
can use the helper function createMaskForTumorTissue to create masks using ROI objects. This
helper function is attached to the example as a supporting file. For more information about this
helper function, see Create Masks for Tumor Images on page 18-0 .
testTumorMaskDir = fullfile(testDataDir,['test_tumormask_level' num2str(resolutionLevel)]);
createMaskForTumorTissue(testDataDir,testTumorAnnotationDir,testTumorMaskDir,resolutionLevel);
18-130
Classify Large Multiresolution Images Using blockedImage and Deep Learning
Each test image has two masks, one indicating normal tissue and one indicating tumor tissue. Create
blockedImage objects to manage the test data and masks by using the helper function
createBlockedImageAndMaskArrays. The helper function is attached to the example as a
supporting file.
[testImages,testTissueMasks] = createBlockedImageAndMaskArrays(testDataDir,testTissueMaskDir);
[~,testTumorMasks] = createBlockedImageAndMaskArrays(testDataDir,testTumorMaskDir);
Use the trained Inception-v3 network to identify tumor patches in the test images, testImages.
Classify the test images on a block-by-block basis by using the apply function with a custom
processing pipeline specified by the helper function tumorProbabilityHeatMap. This helper
function is attached to the example as a supporting file. To reduce the amount of computation
required, specify the tissue mask testTissueMask so that the apply function processes only
patches that contain tissue. Specify the 'UseParallel' name-value pair argument as as the boolean
returned by canUseGPU. If a supported GPU is available for computation, then apply function
evaluates blocks in parallel.
• Calculate the tumor probability score by using the predict (Deep Learning Toolbox) function.
• Create a heatmap image patch with pixel values equal to the tumor probability score.
The apply function stitches the heatmap of each block into a single heatmap for the test image. The
heatmap shows where the network detects regions containing tumors.
To visualize the heatmap, overlay the heatmap on the original image and set the transparency
'AlphaData' property as the tissue mask. The overlay shows how well the tumor is localized in the
image. Regions with a high probability of being tumors are displayed with red pixels. Regions with a
low probability of being tumors are displayed as blue pixels.
outputHeatmapsDir = fullfile(testingImageDir,'heatmaps');
if ~exist(outputHeatmapsDir,'dir')
mkdir(outputHeatmapsDir);
end
patchSize = [299,299,3];
figure
hTest = bigimageshow(testImages(idx));
hTestAxes = hTest.Parent;
hTestAxes.Visible = 'off';
hMaskAxes = axes;
hMask = bigimageshow(testHeatMaps(idx),'Parent',hMaskAxes, ...
"Interpolation","nearest","AlphaData",testTissueMasks(idx));
colormap(jet(255));
hMaskAxes.Visible = 'off';
18-131
18 Deep Learning
linkaxes([hTestAxes,hMaskAxes]);
title(['Tumor Heatmap of Test Image ',num2str(idx)]);
end
18-132
Classify Large Multiresolution Images Using blockedImage and Deep Learning
18-133
18 Deep Learning
References
[1] Ehteshami B. B., et al. "Diagnostic Assessment of Deep Learning Algorithms for Detection of
Lymph Node Metastases in Women With Breast Cancer." Journal of the American Medical Association.
Vol. 318, No. 22, 2017, pp. 2199–2210. doi:10.1001/jama.2017.14585
[2] Szegedy, C., V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna. "Rethinking the Inception Architecture
for Computer Vision." In Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition, 2818–2826. Las Vegas, NV: IEEE, 2016.
[4] Macenko, M., et al. "A Method for Normalizing Histology Slides for Quantitative Analysis." In 2009
IEEE International Symposium on Biomedical Imaging: From Nano to Macro, 1107–1110. Boston,
MA: IEEE, 2009.
See Also
bigimageshow | blockLocationSet | blockedImage | blockedImageDatastore |
selectBlockLocations | trainNetwork | trainingOptions | transform
More About
• “Datastores for Deep Learning” (Deep Learning Toolbox)
• “List of Deep Learning Layers” (Deep Learning Toolbox)
18-134
Neural Style Transfer Using Deep Learning
Load Data
Load the style image and content image. This example uses the distinctive Van Gogh painting "Starry
Night" as the style image and a photograph of a lighthouse as the content image.
styleImage = im2double(imread('starryNight.jpg'));
contentImage = imread('lighthouse.png');
imshow(imtile({styleImage,contentImage},'BackgroundColor','w'));
In this example, you use a modified pretrained VGG-19 deep neural network to extract the features of
the content and style image at various layers. These multilayer features are used to compute
respective content and style losses. The network generates the stylized transfer image using the
combined loss.
To get a pretrained VGG-19 network, install vgg19 (Deep Learning Toolbox). If you do not have the
required support packages installed, then the software provides a download link.
net = vgg19;
To make the VGG-19 network suitable for feature extraction, remove all of the fully connected layers
from the network.
lastFeatureLayerIdx = 38;
layers = net.Layers;
layers = layers(1:lastFeatureLayerIdx);
18-135
18 Deep Learning
The max pooling layers of the VGG-19 network cause a fading effect. To decrease the fading effect
and increase the gradient flow, replace all max pooling layers with average pooling layers [1] on page
18-0 .
for l = 1:lastFeatureLayerIdx
layer = layers(l);
if isa(layer,'nnet.cnn.layer.MaxPooling2DLayer')
layers(l) = averagePooling2dLayer(layer.PoolSize,'Stride',layer.Stride,'Name',layer.Name)
end
end
To train the network with a custom training loop and enable automatic differentiation, convert the
layer graph to a dlnetwork object.
dlnet = dlnetwork(lgraph);
Preprocess Data
Resize the style image and content image to a smaller size for faster processing.
imageSize = [384,512];
styleImg = imresize(styleImage,imageSize);
contentImg = imresize(contentImage,imageSize);
The pretrained VGG-19 network performs classification on a channel-wise mean subtracted image.
Get the channel-wise mean from the image input layer, which is the first layer in the network.
imgInputLayer = lgraph.Layers(1);
meanVggNet = imgInputLayer.Mean(1,1,:);
18-136
Neural Style Transfer Using Deep Learning
The values of the channel-wise mean are appropriate for images of floating point data type with pixel
values in the range [0, 255]. Convert the style image and content image to data type single with
range [0, 255]. Then, subtract the channel-wise mean from the style image and content image.
The transfer image is the output image as a result of style transfer. You can intitalize the transfer
image with a style image, content image, or any random image. Initialization with a style image or
content image biases the style transfer process and produces a transfer image more similar to the
input image. In contrast, initialization with white noise removes the bias but takes longer to converge
on the stylized image. For better stylization and faster convergence, this example initializes the
output transfer image as a weighted combination of the content image and a white noise image.
noiseRatio = 0.7;
randImage = randi([-20,20],[imageSize 3]);
transferImage = noiseRatio.*randImage + (1-noiseRatio).*contentImg;
Content Loss
The objective of content loss is to make the features of the transfer image match the features of the
content image. The content loss is computed as the mean squared difference between content image
features and transfer image features for each content feature layer [1] on page 18-0 . Y is the
predicted feature map for the transfer image and Y is the predicted feature map for the content
th
image. Wcl is the content layer weight for the l layer. H, W, Care the height, width, and channels of
the feature maps, respectively.
1 l
Lcontent = ∑ Wcl × HWC ∑ (Y i, j − Yi,l j)2
l i, j
Specify the content feature extraction layer names. The features extracted from these layers are used
to compute the content loss. In the VGG-19 network, training is more effective using features from
deeper layers rather than features from shallow layers. Therefore, specify the content feature
extraction layer as the fourth convolutional layer.
styleTransferOptions.contentFeatureLayerNames = {'conv4_2'};
styleTransferOptions.contentFeatureLayerWeights = 1;
Style Loss
The objective of style loss is to make the texture of the transfer image match the texture of the style
image. The style representation of an image is represented as a Gram matrix. Therefore, the style
loss is computed as the mean squared difference between the Gram matrix of the style image and the
Gram matrix of the transfer image [1] on page 18-0 . Z and Z are the predicted feature maps for the
style and transfer image, respectively. GZ and GZ are Gram matrices for style features and transfer
th
features, respectively. Wsl is the style layer weight for the l style layer.
18-137
18 Deep Learning
GZ = ∑ Zi, j × Z j, i
i, j
GZ = ∑ Zi, j × Z j, i
i, j
1 l
Lstyle = ∑ Wsl × 2 ∑ (GZ − GZl)2
l (2HWC)
Specify the names of the style feature extraction layers. The features extracted from these layers are
used to compute style loss.
styleTransferOptions.styleFeatureLayerNames = {'conv1_1','conv2_1','conv3_1','conv4_1','conv5_1'}
Specify the weights of the style feature extraction layers. Specify small weights for simple style
images and increase the weights for complex style images.
styleTransferOptions.styleFeatureLayerWeights = [0.5,1.0,1.5,3.0,4.0];
Total Loss
The total loss is a weighted combination of content loss and style loss. α and β are weight factors for
content loss and style loss, respectively.
Specify the weight factors alpha and beta for content loss and style loss. The ratio of alpha to
beta should be around 1e-3 or 1e-4 [1] on page 18-0 .
styleTransferOptions.alpha = 1;
styleTransferOptions.beta = 1e3;
numIterations = 2500;
Specify options for Adam optimization. Set the learning rate to 2 for faster convergence. You can
experiment with the learning rate by observing your output image and losses. Initialize the trailing
average gradient and trailing average gradient-square decay rates with [].
learningRate = 2;
trailingAvg = [];
trailingAvgSq = [];
Convert the style image, content image, and transfer image to dlarray (Deep Learning Toolbox)
objects with underlying type single and dimension labels 'SSC'.
dlStyle = dlarray(styleImg,'SSC');
dlContent = dlarray(contentImg,'SSC');
dlTransfer = dlarray(transferImage,'SSC');
18-138
Neural Style Transfer Using Deep Learning
Train on a GPU if one is available. Using a GPU requires Parallel Computing Toolbox™ and a CUDA®
enabled NVIDIA® GPU. For more information, see “GPU Support by Release” (Parallel Computing
Toolbox). For GPU training, convert the data into a gpuArray.
if canUseGPU
dlContent = gpuArray(dlContent);
dlStyle = gpuArray(dlStyle);
dlTransfer = gpuArray(dlTransfer);
end
numContentFeatureLayers = numel(styleTransferOptions.contentFeatureLayerNames);
contentFeatures = cell(1,numContentFeatureLayers);
[contentFeatures{:}] = forward(dlnet,dlContent,'Outputs',styleTransferOptions.contentFeatureLayer
numStyleFeatureLayers = numel(styleTransferOptions.styleFeatureLayerNames);
styleFeatures = cell(1,numStyleFeatureLayers);
[styleFeatures{:}] = forward(dlnet,dlStyle,'Outputs',styleTransferOptions.styleFeatureLayerNames)
Train the model using a custom training loop. For each iteration:
• Calculate the content loss and style loss using the features of the content image, style image, and
transfer image. To calculate the loss and gradients, use the helper function imageGradients
(defined in the Supporting Functions on page 18-0 section of this example).
• Update the transfer image using the adamupdate (Deep Learning Toolbox) function.
• Select the best style transfer image as the final output image.
figure
minimumLoss = inf;
% Display the transfer image on the first iteration and after every 50
% iterations. The postprocessing steps are described in the "Postprocess
% Transfer Image for Display" section of this example.
if mod(iteration,50) == 0 || (iteration == 1)
transferImage = gather(extractdata(dlTransfer));
transferImage = transferImage + meanVggNet;
transferImage = uint8(transferImage);
transferImage = imresize(transferImage,size(contentImage,[1 2]));
image(transferImage)
title(['Transfer Image After Iteration ',num2str(iteration)])
18-139
18 Deep Learning
end
Some pixel values can exceed the original range [0, 255] of the content and style image. You can clip
the values to the range [0, 255] by converting the data type to uint8.
transferImage = uint8(transferImage);
Resize the transfer image to the original size of the content image.
transferImage = imresize(transferImage,size(contentImage,[1 2]));
Display the content image, transfer image, and style image in a montage.
imshow(imtile({contentImage,transferImage,styleImage}, ...
'GridSize',[1 3],'BackgroundColor','w'));
18-140
Neural Style Transfer Using Deep Learning
Supporting Functions
The imageGradients helper function returns the loss and gradients using features of the content
image, style image, and transfer image.
transferContentFeatures = cell(1,numContentFeatureLayers);
transferStyleFeatures = cell(1,numStyleFeatureLayers);
18-141
18 Deep Learning
losses.totalLoss = gather(extractdata(loss));
losses.contentLoss = gather(extractdata(cLoss));
losses.styleLoss = gather(extractdata(sLoss));
end
The contentLoss helper function computes the weighted mean squared difference between the
content image features and the transfer image features.
loss = 0;
for i=1:numel(contentFeatures)
temp = 0.5 .* mean((transferContentFeatures{1,i} - contentFeatures{1,i}).^2,'all');
loss = loss + (contentWeights(i)*temp);
end
end
The styleLoss helper function computes the weighted mean squared difference between the Gram
matrix of the style image features and the Gram matrix of the transfer image features.
loss = 0;
for i=1:numel(styleFeatures)
tsf = transferStyleFeatures{1,i};
sf = styleFeatures{1,i};
[h,w,c] = size(sf);
gramStyle = computeGramMatrix(sf);
gramTransfer = computeGramMatrix(tsf);
sLoss = mean((gramTransfer - gramStyle).^2,'all') / ((h*w*c)^2);
The computeGramMatrix helper function is used by the styleLoss helper function to compute the
Gram matrix of a feature map.
18-142
Neural Style Transfer Using Deep Learning
References
[1] Leon A. Gatys, Alexander S. Ecker, and Matthias Bethge. "A Neural Algorithm of Artistic Style."
Preprint, submitted September 2, 2015. https://arxiv.org/abs/1508.06576
See Also
dlarray | trainNetwork | trainingOptions | vgg19
More About
• “Get Started with GANs for Image-to-Image Translation” on page 18-5
• “Define Custom Training Loops, Loss Functions, and Networks” (Deep Learning Toolbox)
• “Specify Training Options in Custom Training Loop” (Deep Learning Toolbox)
• “Train Network Using Custom Training Loop” (Deep Learning Toolbox)
• “List of Functions with dlarray Support” (Deep Learning Toolbox)
• “List of Deep Learning Layers” (Deep Learning Toolbox)
18-143
18 Deep Learning
Domain translation is the task of transferring styles and characteristics from one image domain to
another. This technique can be extended to other image-to-image learning operations such as image
enhancement, image colorization, defect generation, and medical image analysis.
UNIT [1] on page 18-0 is a type of generative adversarial network (GAN) that consists of one
generator network and two discriminator networks that you train simultaneously to maximize the
overall performance. For more information about UNIT, see “Get Started with GANs for Image-to-
Image Translation” on page 18-5.
Download Dataset
This example uses the CamVid data set [2] on page 18-0 from the University of Cambridge for
training. This data set is a collection of 701 images containing street-level views obtained while
driving.
Download the CamVid data set from this URL. The download time depends on your internet
connection.
imageURL = 'http://web4.cs.ucl.ac.uk/staff/g.brostow/MotionSegRecData/files/701_StillsRaw_full.zi
dataDir = fullfile(tempdir,'CamVid');
downloadCamVidImageData(dataDir,imageURL);
imgDir = fullfile(dataDir,"images","701_StillsRaw_full");
The CamVid image data set includes 497 images acquired in daytime and 124 images acquired at
dusk. The performance of the trained UNIT network is limited because the number of CamVid
training images is relatively small, which limits the performance of the trained network. Further,
some images belong to an image sequence and therefore are correlated with other images in the data
set. To minimize the impact of these limitations, this example manually partitions the data into
training and test data sets in a way that maximizes the variability of the training data.
Get the file names of the day and dusk images for training and testing by loading the file
'camvidDayDuskDatasetFileNames.mat'. The training data sets consist of 263 day images and 107
dusk images. The test data sets consist of 234 day images and 17 dusk images.
load('camvidDayDuskDatasetFileNames.mat');
Create imageDatastore objects that manage the day and dusk images for training and testing.
imdsDayTrain = imageDatastore(fullfile(imgDir,trainDayNames));
imdsDuskTrain = imageDatastore(fullfile(imgDir,trainDuskNames));
imdsDayTest = imageDatastore(fullfile(imgDir,testDayNames));
imdsDuskTest = imageDatastore(fullfile(imgDir,testDuskNames));
Preview a training image from the day and dusk training data sets.
day = preview(imdsDayTrain);
dusk = preview(imdsDuskTrain);
montage({day,dusk})
18-144
Unsupervised Day-To-Dusk Image Translation Using UNIT
Specify the image input size for the source and target images.
inputSize = [256,256,3];
Augment and preprocess the training data by using the transform function with custom
preprocessing operations specified by the helper function augmentDataForDayToDusk. This
function is attached to the example as a supporting file.
1 Resize the image to the specified input size using bicubic interpolation.
2 Randomly flip the image in the horizontal direction.
3 Scale the image to the range [-1, 1]. This range matches the range of the final tanhLayer (Deep
Learning Toolbox) used in the generator.
Create an UNIT generator network using the unitGenerator function. The source and target
encoder sections of the generator each consist of two downsampling blocks and five residual blocks.
The encoder sections share two of the five residual blocks. Likewise, the source and target decoder
sections of the generator each consist of two downsampling blocks and five residual blocks, and the
decoder sections share two of the five residual blocks.
gen = unitGenerator(inputSize,'NumResidualBlocks',5,'NumSharedBlocks',2);
analyzeNetwork(gen)
18-145
18 Deep Learning
There are two discriminator networks, one for each of the image domains (day and dusk). Create the
discriminators for the source and target domains using the patchGANDiscriminator function.
discDay = patchGANDiscriminator(inputSize,"NumDownsamplingBlocks",4,"FilterSize",3, ...
"ConvolutionWeightsInitializer","narrow-normal","NormalizationLayer","none");
discDusk = patchGANDiscriminator(inputSize,"NumDownsamplingBlocks",4,"FilterSize",3, ...
"ConvolutionWeightsInitializer","narrow-normal","NormalizationLayer","none");
The modelGradientsDisc and modelGradientGen helper functions calculate the gradients and
losses for the discriminators and generator, respectively. These functions are defined in the
Supporting Functions on page 18-0 section of this example.
The objective of each discriminator is to correctly distinguish between real images (1) and translated
images (0) for images in its domain. Each discriminator has a single loss function.
The objective of the generator is to generate translated images that the discriminators classify as
real. The generator loss is a weighted sum of five types of losses: self-reconstruction loss, cycle
consistency loss, hidden KL loss, cycle hidden KL loss, and adverserial loss.
Specify the options for Adam optimization. Train the network for 35 epochs. Specify identical options
for the generator and discriminator networks.
learnRate = 0.0001;
gradDecay = 0.5;
sqGradDecay = 0.999;
weightDecay = 0.0001;
genAvgGradient = [];
genAvgGradientSq = [];
18-146
Unsupervised Day-To-Dusk Image Translation Using UNIT
discDayAvgGradient = [];
discDayAvgGradientSq = [];
discDuskAvgGradient = [];
discDuskAvgGradientSq = [];
miniBatchSize = 1;
numEpochs = 35;
Create a minibatchqueue (Deep Learning Toolbox) object that manages the mini-batching of
observations in a custom training loop. The minibatchqueue object also casts data to a dlarray
(Deep Learning Toolbox) object that enables auto differentiation in deep learning applications.
Specify the mini-batch data extraction format as SSCB (spatial, spatial, channel, batch). Set the
DispatchInBackground name-value argument as the boolean returned by canUseGPU. If a
supported GPU is available for computation, then the minibatchqueue object preprocesses mini-
batches in the background in a parallel pool during training.
mbqDayTrain = minibatchqueue(imdsDayTrain,"MiniBatchSize",miniBatchSize, ...
"MiniBatchFormat","SSCB","DispatchInBackground",canUseGPU);
mbqDuskTrain = minibatchqueue(imdsDuskTrain,"MiniBatchSize",miniBatchSize, ...
"MiniBatchFormat","SSCB","DispatchInBackground",canUseGPU);
By default, the example downloads a pretrained version of the UNIT generator for the CamVid
dataset by using the helper function downloadTrainedDayDuskGeneratorNet. The helper
function is attached to the example as a supporting file. The pretrained network enables you to run
the entire example without waiting for training to complete.
To train the network, set the doTraining variable in the following code to true. Train the model in
a custom training loop. For each iteration:
• Read the data for current mini-batch using the next (Deep Learning Toolbox) function.
• Evaluate the model gradients using the dlfeval (Deep Learning Toolbox) function and the
modelGradientsDisc and modelGradientGen helper functions.
• Update the network parameters using the adamupdate (Deep Learning Toolbox) function.
• Display input and translated images for both the source and target domains after each epoch.
Train on a GPU if one is available. Using a GPU requires Parallel Computing Toolbox™ and a CUDA®
enabled NVIDIA® GPU. For more information, see “GPU Support by Release” (Parallel Computing
Toolbox). Training takes about 88 hours on an NVIDIA Titan RTX.
doTraining = false;
if doTraining
% Create a figure to show the results
figure("Units","Normalized");
for iPlot = 1:4
ax(iPlot) = subplot(2,2,iPlot);
end
iteration = 0;
18-147
18 Deep Learning
% Run the loop until all the images in the mini-batch queue mbqDayTrain are processed
while hasdata(mbqDayTrain)
iteration = iteration + 1;
18-148
Unsupervised Day-To-Dusk Image Translation Using UNIT
else
net_url = 'https://ssd.mathworks.com/supportfiles/vision/data/trainedDayDuskUNITGeneratorNet.
downloadTrainedDayDuskGeneratorNet(net_url,dataDir);
load(fullfile(dataDir,'trainedDayDuskUNITGeneratorNet.mat'));
end
Source-to-target image translation uses the UNIT generator to generate an image in the target (dusk)
domain from an image in the source (day) domain.
idxToTest = 1;
dayTestImage = readimage(imdsDayTest,idxToTest);
Convert the image to data type single and normalize the image to the range [-1, 1].
dayTestImage = im2single(dayTestImage);
dayTestImage = (dayTestImage-0.5)/0.5;
Create a dlarray object that inputs data to the generator. If a supported GPU is available for
computation, then perform inference on a GPU by converting the data to a gpuArray object.
dlDayImage = dlarray(dayTestImage,'SSCB');
if canUseGPU
dlDayImage = gpuArray(dlDayImage);
end
Translate the input day image to the dusk domain using the unitPredict function.
dlDayToDuskImage = unitPredict(gen,dlDayImage);
dayToDuskImage = extractdata(gather(dlDayToDuskImage));
The final layer of the generator network produces activations in the range [-1, 1]. For display, rescale
the activations to the range [0, 1]. Also, rescale the input day image before display.
dayToDuskImage = rescale(dayToDuskImage);
dayTestImage = rescale(dayTestImage);
Display the input day image and its translated dusk version in a montage.
figure
montage({dayTestImage dayToDuskImage})
title(['Day Test Image ',num2str(idxToTest),' with Translated Dusk Image'])
18-149
18 Deep Learning
Target-to-source image translation uses the UNIT generator to generate an image in the source (day)
domain from an image in the target (dusk) domain.
Convert the image to data type single and normalize the image to the range [-1, 1].
duskTestImage = im2single(duskTestImage);
duskTestImage = (duskTestImage-0.5)/0.5;
Create a dlarray object that inputs data to the generator. If a supported GPU is available for
computation, then perform inference on a GPU by converting the data to a gpuArray object.
dlDuskImage = dlarray(duskTestImage,'SSCB');
if canUseGPU
dlDuskImage = gpuArray(dlDuskImage);
end
Translate the input dusk image to the day domain using the unitPredict function.
dlDuskToDayImage = unitPredict(gen,dlDuskImage,"OutputType","TargetToSource");
duskToDayImage = extractdata(gather(dlDuskToDayImage));
For display, rescale the activations to the range [0, 1]. Also, rescale the input dusk image before
display.
duskToDayImage = rescale(duskToDayImage);
duskTestImage = rescale(duskTestImage);
Display the input dusk image and its translated day version in a montage.
montage({duskTestImage duskToDayImage})
title(['Test Dusk Image ',num2str(idxToTest),' with Translated Day Image'])
18-150
Unsupervised Day-To-Dusk Image Translation Using UNIT
Supporting Functions
The modelGradientDisc helper function calculates the gradients and loss for the two
discriminators.
[~,fakeA,fakeB,~] = forward(gen,ImageA,ImageB);
The modelGradientGen helper function calculates the gradients and loss for the generator.
18-151
18 Deep Learning
[ImageAA,ImageBA,ImageAB,ImageBB] = forward(gen,ImageA,ImageB);
hidden = forward(gen,ImageA,ImageB,'Outputs','encoderSharedBlock');
[~,ImageABA,ImageBAB,~] = forward(gen,ImageBA,ImageAB);
cycle_hidden = forward(gen,ImageBA,ImageAB,'Outputs','encoderSharedBlock');
outA = forward(discA,ImageBA);
outB = forward(discB,ImageAB);
advLoss = computeAdvLoss(outA) + computeAdvLoss(outB);
Loss Functions
The computeDiscLoss helper function calculates discriminator loss. Each discriminator loss is a
sum of two components:
• The squared difference between a vector of ones and the predictions of the discriminator on real
images, Y real
• The squared difference between a vector of zeros and the predictions of the discriminator on
generated images, Y translated
2 2
discriminatorLoss = 1 − Y real + 0 − Y translated
The computeAdvLoss helper function calculates adversarial loss for the generator. Adversarial loss
is the squared difference between a vector of ones and the discriminator predictions on the translated
image.
2
adversarialLoss = 1 − Y translated
18-152
Unsupervised Day-To-Dusk Image Translation Using UNIT
The computeKLLoss helper function calculates hidden KL loss and cycle-hidden KL loss for the
generator. Hidden KL loss is the squared difference between a vector of zeros and the
'encoderSharedBlock' activation for the self-reconstruction stream. Cycle-hidden KL loss is the
squared difference between a vector of zeros and the 'encoderSharedBlock' activation for the
cycle-reconstruction stream.
2
hiddenKLLoss = 0 − Y encoderSharedBlockActivation
2
cycleHiddenKLLoss = 0 − Y encoderSharedBlockActivation
References
[1] Liu, Ming-Yu, Thomas Breuel, and Jan Kautz, "Unsupervised image-to-image translation networks".
In Advances in Neural Information Processing Systems, 2017. https://arxiv.org/abs/1703.00848.
[2] Brostow, Gabriel J., Julien Fauqueur, and Roberto Cipolla. "Semantic Object Classes in Video: A
High-Definition Ground Truth Database." Pattern Recognition Letters. Vol. 30, Issue 2, 2009, pp
88-97.
See Also
adamupdate | dlarray | dlfeval | minibatchqueue | patchGANDiscriminator | transform |
unitGenerator | unitPredict
More About
• “Get Started with GANs for Image-to-Image Translation” on page 18-5
• “Datastores for Deep Learning” (Deep Learning Toolbox)
• “Define Custom Training Loops, Loss Functions, and Networks” (Deep Learning Toolbox)
• “Define Model Gradients Function for Custom Training Loop” (Deep Learning Toolbox)
• “Specify Training Options in Custom Training Loop” (Deep Learning Toolbox)
18-153
18 Deep Learning
18-154
Quantify Image Quality Using Neural Image Assessment
Image quality metrics provide an objective measure of image quality. An effective metric provides
quantitative scores that correlate well with a subjective perception of quality by a human observer.
Quality metrics enable the comparison of image processing algorithms.
NIMA [1] on page 18-0 is a no-reference technique that predicts the quality of an image without
relying on a pristine reference image, which is frequently unavailable. NIMA uses a CNN to predict a
distribution of quality scores for each image.
imageDir = fullfile(tempdir,"LIVEInTheWild");
if ~exist(imageDir,'dir')
mkdir(imageDir);
end
trainedNIMA_url = 'https://ssd.mathworks.com/supportfiles/images/data/trainedNIMA.mat';
downloadTrainedNIMANet(trainedNIMA_url,imageDir);
load(fullfile(imageDir,'trainedNIMA.mat'));
You can evaluate the effectiveness of the NIMA model by comparing the predicted scores for a high-
quality and lower quality image.
imOriginal = imread('kobi.png');
Reduce the aesthetic quality of the image by applying a Gaussian blur. Display the original image and
the blurred image in a montage. Subjectively, the aesthetic quality of the blurred image is worse than
the quality of the original image.
imBlur = imgaussfilt(imOriginal,5);
montage({imOriginal,imBlur})
18-155
18 Deep Learning
Predict the NIMA quality score distribution for the two images using the predictNIMAScore helper
function. This function is attached to the example as a supporting file.
The predictNIMAScore function returns the mean and standard deviation of the NIMA score
distribution for an image. The predicted mean score is a measure of the quality of the image. The
standard deviation of scores can be considered a measure of the confidence level of the predicted
mean score.
[meanOriginal,stdOriginal] = predictNIMAScore(dlnet,imOriginal);
[meanBlur,stdBlur] = predictNIMAScore(dlnet,imBlur);
Display the images along with the mean and standard deviation of the score distributions predicted
by the NIMA model. The NIMA model correctly predicts scores for these images that agree with the
subjective visual assessment.
figure
t = tiledlayout(1,2);
displayImageAndScoresForNIMA(t,imOriginal,meanOriginal,stdOriginal,"Original Image")
displayImageAndScoresForNIMA(t,imBlur,meanBlur,stdBlur,"Blurred Image")
18-156
Quantify Image Quality Using Neural Image Assessment
The rest of this example shows how to train and evaluate a NIMA model.
This example uses the LIVE In the Wild data set [2] on page 18-0 , which is a public-domain
subjective image quality challenge database. The data set contains 1162 photos captured by mobile
devices, with 7 additional images provided to train the human scorers. Each image is rated by an
average of 175 individuals on a scale of [1, 100]. The data set provides the mean and standard
deviation of the subjective scores for each image.
Download the data set by following the instructions outlined in LIVE In the Wild Image Quality
Challenge Database. Extract the data into the directory specified by the imageDir variable. When
extraction is successful, imageDir contains two directories: Data and Images.
imageData = load(fullfile(imageDir,'Data','AllImages_release.mat'));
imageData = imageData.AllImages_release;
18-157
18 Deep Learning
nImg = length(imageData);
imageList(1:7) = fullfile(imageDir,'Images','trainingImages',imageData(1:7));
imageList(8:nImg) = fullfile(imageDir,'Images',imageData(8:end));
imds = imageDatastore(imageList);
Load the mean and standard deviation data corresponding to the images.
meanData = load(fullfile(imageDir,'Data','AllMOS_release.mat'));
meanData = meanData.AllMOS_release;
stdData = load(fullfile(imageDir,'Data','AllStdDev_release.mat'));
stdData = stdData.AllStdDev_release;
Optionally, display a few sample images from the data set with the corresponding mean and standard
deviation values.
figure
t = tiledlayout(1,3);
idx1 = 785;
displayImageAndScoresForNIMA(t,readimage(imds,idx1), ...
meanData(idx1),stdData(idx1),"Image "+imageData(idx1))
idx2 = 203;
displayImageAndScoresForNIMA(t,readimage(imds,idx2), ...
meanData(idx2),stdData(idx2),"Image "+imageData(idx2))
idx3 = 777;
displayImageAndScoresForNIMA(t,readimage(imds,idx3), ...
meanData(idx3),stdData(idx3),"Image "+imageData(idx3))
18-158
Quantify Image Quality Using Neural Image Assessment
The NIMA model requires a distribution of human scores, but the LIVE data set provides only the
mean and standard deviation of the distribution. Approximate an underlying distribution for each
image in the LIVE data set using the createNIMAScoreDistribution helper function. This
function is attached to the example as a supporting file.
The createNIMAScoreDistribution rescales the scores to the range [1, 10], then generates
maximum entropy distribution of scores from the mean and standard deviation values.
newMaxScore = 10;
prob = createNIMAScoreDistribution(meanData,stdData);
cumProb = cumsum(prob,2);
18-159
18 Deep Learning
probDS = arrayDatastore(cumProb','IterationDimension',2);
Combine the datastores containing the image data and score distribution data.
dsCombined = combine(imds,probDS);
figure
tiledlayout(1,2)
nexttile
imshow(sampleRead{1})
title("Sample Image from Data Set")
nexttile
plot(sampleRead{2})
title("Cumulative Score Distribution")
18-160
Quantify Image Quality Using Neural Image Assessment
Partition the data into training, validation, and test sets. Allocate 70% of the data for training, 15%
for validation, and the remainder for testing.
numTrain = floor(0.70 * nImg);
numVal = floor(0.15 * nImg);
Idx = randperm(nImg);
idxTrain = Idx(1:numTrain);
idxVal = Idx(numTrain+1:numTrain+numVal);
idxTest = Idx(numTrain+numVal+1:nImg);
dsTrain = subset(dsCombined,idxTrain);
dsVal = subset(dsCombined,idxVal);
dsTest = subset(dsCombined,idxTest);
Augment the training data using the augmentImageTest helper function. This function is attached
to the example as a supporting file. The augmentDataForNIMA function performs these
augmentation operations on each training image:
The input layer of the network performs z-score normalization of the training images. Calculate the
mean and standard deviation of the training images for use in z-score normalization.
meanImage = zeros([inputSize 3]);
meanImageSq = zeros([inputSize 3]);
while hasdata(dsTrain)
dat = read(dsTrain);
img = double(dat{1});
meanImage = meanImage + img;
meanImageSq = meanImageSq + img.^2;
end
meanImage = meanImage/numTrain;
meanImageSq = meanImageSq/numTrain;
varImage = meanImageSq - meanImage.^2;
stdImage = sqrt(varImage);
This example starts with a MobileNet-v2 [3] on page 18-0 CNN trained on ImageNet [4] on page
18-0 . The example modifies the network by replacing the last layer of the MobileNet-v2 network
with a fully connected layer with 10 neurons, each representing a discrete score from 1 through 10.
The network predicts the probability of each score for each image. The example normalizes the
outputs of the fully connected layer using a softmax activation layer.
18-161
18 Deep Learning
The mobilenetv2 (Deep Learning Toolbox) function returns a pretrained MobileNet-v2 network.
This function requires the Deep Learning Toolbox™ Model for MobileNet-v2 Network support
package. If this support package is not installed, then the function provides a download link.
net = mobilenetv2;
lgraph = layerGraph(net);
The network has an image input size of 224-by-224 pixels. Replace the input layer with an image
input layer that performs z-score normalization on the image data using the mean and standard
deviation of the training images.
Replace the original final classification layer with a fully connected layer with 10 neurons. Add a
softmax layer to normalize the outputs. Set the learning rate of the fully connected layer to 10 times
the learning rate of the baseline CNN layers. Apply a dropout of 75%.
lgraph = removeLayers(lgraph,{'ClassificationLayer_Logits','Logits_softmax','Logits'});
newFinalLayers = [
dropoutLayer(0.75,'Name','drop')
fullyConnectedLayer(newMaxScore,'Name','fc','WeightLearnRateFactor',10,'BiasLearnRateFactor',
softmaxLayer('Name','prob')];
lgraph = addLayers(lgraph,newFinalLayers);
lgraph = connectLayers(lgraph,'global_average_pooling2d_1','drop');
dlnet = dlnetwork(lgraph);
Visualize the network using the Deep Network Designer (Deep Learning Toolbox) app.
deepNetworkDesigner(lgraph)
The modelGradients helper function calculates the gradients and losses for each iteration of
training the network. This function is defined in the Supporting Functions on page 18-0 section of
this example.
The objective of the NIMA network is to minimize the earth mover's distance (EMD) between the
ground truth and predicted score distributions. EMD loss considers the distance between classes
when penalizing misclassification. Therefore, EMD loss performs better than a typical softmax cross-
entropy loss used in classification tasks [5] on page 18-0 . This example calculates the EMD loss
using the earthMoverDistance helper function, which is defined in the Supporting Functions on
page 18-0 section of this example.
For the EMD loss function, use an r-norm distance with r = 2. This distance allows for easy
optimization when you work with gradient descent.
Specify the options for SGDM optimization. Train the network for 150 epochs with a mini-batch size
of 128.
numEpochs = 150;
miniBatchSize = 128;
18-162
Quantify Image Quality Using Neural Image Assessment
momentum = 0.9;
initialLearnRate = 3e-3;
decay = 0.95;
Create a minibatchqueue (Deep Learning Toolbox) object that manages the mini-batching of
observations in a custom training loop. The minibatchqueue object also casts data to a dlarray
(Deep Learning Toolbox) object that enables automatic differentiation in deep learning applications.
Specify the mini-batch data extraction format as 'SSCB' (spatial, spatial, channel, batch). Set the
'DispatchInBackground' name-value argument to the boolean returned by canUseGPU. If a
supported GPU is available for computation, then the minibatchqueue object preprocesses mini-
batches in the background in a parallel pool during training.
Train Network
By default, the example loads a pretrained version of NIMA network. The pretrained network enables
you to run the entire example without waiting for training to complete.
To train the network, set the doTraining variable in the following code to true. Train the model in
a custom training loop. For each iteration:
• Read the data for current mini-batch using the next (Deep Learning Toolbox) function.
• Evaluate the model gradients using the dlfeval (Deep Learning Toolbox) function and the
modelGradients helper function.
• Update the network parameters using the sgdmupdate (Deep Learning Toolbox) function.
Train on a GPU if one is available. Using a GPU requires Parallel Computing Toolbox™ and a CUDA®
enabled NVIDIA® GPU. For more information, see “GPU Support by Release” (Parallel Computing
Toolbox).
doTraining = false;
if doTraining
iteration = 0;
velocity = [];
start = tic;
[hFig,lineLossTrain,lineLossVal] = initializeTrainingPlotNIMA;
shuffle (mbqTrain);
learnRate = initialLearnRate/(1+decay*floor(epoch/10));
while hasdata(mbqTrain)
iteration = iteration + 1;
[dlX,cdfY] = next(mbqTrain);
[grad,loss] = dlfeval(@modelGradients,dlnet,dlX,cdfY);
[dlnet,velocity] = sgdmupdate(dlnet,grad,velocity,learnRate,momentum);
18-163
18 Deep Learning
updateTrainingPlotNIMA(lineLossTrain,loss,epoch,iteration,start)
end
end
else
load(fullfile(imageDir,'trainedNIMA.mat'));
end
Evaluate the performance of the model on the test data set using three metrics: EMD, binary
classification accuracy, and correlation coefficients. The performance of the NIMA network on the
test data set is in agreement with the performance of the reference NIMA model reported by Talebi
and Milanfar [1] on page 18-0 .
Create a minibatchqueue (Deep Learning Toolbox) object that manages the mini-batching of test
data.
mbqTest = minibatchqueue(dsTest,'MiniBatchSize',miniBatchSize,'MiniBatchFormat',{'SSCB',''});
Calculate the predicted probabilities and ground truth cumulative probabilities of mini-batches of test
data using the modelPredictions function. This function is defined in the Supporting Functions on
page 18-0 section of this example.
[YPredTest,~,cdfYTest] = modelPredictions(dlnet,mbqTest);
Calculate the mean and standard deviation values of the ground truth and predicted distributions.
Calculate EMD
Calculate the EMD of the ground truth and predicted score distributions. For prediction, use an r-
norm distance with r = 1. The EMD value indicates the closeness of the predicted and ground truth
rating distributions.
EMDTest = earthMoverDistance(YPredTest,cdfYTest,1)
EMDTest =
1×1 single gpuArray dlarray
18-164
Quantify Image Quality Using Neural Image Assessment
0.1158
For binary classification accuracy, convert the distributions to two classifications: high-quality and
low-quality. Classify images with a mean score greater than a threshold as high-quality.
qualityThreshold = 5;
binaryPred = meanPred > qualityThreshold;
binaryOrig = meanOrig > qualityThreshold;
binaryAccuracy =
84.6591
Large correlation values indicate a large positive correlation between the ground truth and predicted
scores. Calculate the linear correlation coefficient (LCC) and Spearman’s rank correlation coefficient
(SRCC) for the mean scores.
meanLCC = corr(meanOrig,meanPred)
meanLCC =
gpuArray single
0.7265
meanSRCC = corr(meanOrig,meanPred,'type','Spearman')
meanSRCC =
gpuArray single
0.6451
Supporting Functions
The modelGradients function takes as input a dlnetwork object dlnet and a mini-batch of input
data dlX with corresponding target cumulative probabilities cdfY. The function returns the gradients
of the loss with respect to the learnable parameters in dlnet as well as the loss. To compute the
gradients automatically, use the dlgradient function.
18-165
18 Deep Learning
Loss Function
The earthMoverDistance function calculates the EMD between the ground truth and predicted
distributions for a specified r-norm value. The earthMoverDistance uses the computeCDF helper
function to calculate the cumulative probabilities of the predicted distribution.
function loss = earthMoverDistance(YPred,cdfY,r)
N = size(cdfY,1);
cdfYPred = computeCDF(YPred);
cdfDiff = (1/N) * (abs(cdfY - cdfYPred).^r);
lossArray = sum(cdfDiff,1).^(1/r);
loss = mean(lossArray);
end
function cdfY = computeCDF(Y)
% Given a probability mass function Y, compute the cumulative probabilities
[N,miniBatchSize] = size(Y);
L = repmat(triu(ones(N)),1,1,miniBatchSize);
L3d = permute(L,[1 3 2]);
prod = Y.*L3d;
prodSum = sum(prod,1);
cdfY = reshape(prodSum(:)',miniBatchSize,N)';
end
The modelPredictions function calculates the estimated probabilities, loss, and ground truth
cumulative probabilities of mini-batches of data.
function [dlYPred,loss,cdfYOrig] = modelPredictions(dlnet,mbq)
reset(mbq);
loss = 0;
numObservations = 0;
dlYPred = [];
cdfYOrig = [];
while hasdata(mbq)
[dlX,cdfY] = next(mbq);
miniBatchSize = size(dlX,4);
dlY = predict(dlnet,dlX);
loss = loss + earthMoverDistance(dlY,cdfY,2)*miniBatchSize;
dlYPred = [dlYPred dlY];
cdfYOrig = [cdfYOrig cdfY];
end
loss = loss / numObservations;
end
References
[1] Talebi, Hossein, and Peyman Milanfar. “NIMA: Neural Image Assessment.” IEEE Transactions on
Image Processing 27, no. 8 (August 2018): 3998–4011. https://doi.org/10.1109/TIP.2018.2831899.
[2] LIVE: Laboratory for Image and Video Engineering. "LIVE In the Wild Image Quality Challenge
Database." https://live.ece.utexas.edu/research/ChallengeDB/index.html.
18-166
Quantify Image Quality Using Neural Image Assessment
[3] Sandler, Mark, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen.
“MobileNetV2: Inverted Residuals and Linear Bottlenecks.” In 2018 IEEE/CVF Conference on
Computer Vision and Pattern Recognition, 4510–20. Salt Lake City, UT: IEEE, 2018. https://doi.org/
10.1109/CVPR.2018.00474.
[5] Hou, Le, Chen-Ping Yu, and Dimitris Samaras. “Squared Earth Mover’s Distance-Based Loss for
Training Deep Neural Networks.” Preprint, submitted November 30, 2016. https://arxiv.org/abs/
1611.05916.
See Also
dlfeval | dlnetwork | layerGraph | minibatchqueue | mobilenetv2 | predict | sgdmupdate |
transform
More About
• “Image Quality Metrics” on page 11-133
• “Datastores for Deep Learning” (Deep Learning Toolbox)
• “Define Custom Training Loops, Loss Functions, and Networks” (Deep Learning Toolbox)
• “Define Model Gradients Function for Custom Training Loop” (Deep Learning Toolbox)
• “Specify Training Options in Custom Training Loop” (Deep Learning Toolbox)
• “Train Network Using Custom Training Loop” (Deep Learning Toolbox)
18-167
19
This topic describes functions that enable hyperspectral image analysis and provides examples for
spectral classification and anomaly detection using endmembers and abundance maps.
For hyperspectral image processing, the values read from the data file are arranged into a three-
dimensional (3-D) array of form M-by-N-by-C, where M and N are the spatial dimensions of the
acquired data, C is the spectral dimension specifying the number of spectral wavelengths used during
acquisition. Thus, you can consider the 3-D array as a set of two-dimensional (2-D) monochromatic
images captured at varying wavelengths. This set is known as the hyperspectral data cube or data
cube.
The hypercube function constructs the data cube by reading the data file and the metadata
information in the associated header file. The hypercube function creates a hypercube object and
stores the data cube, spectral wavelengths, and the metadata to its properties. You can use the
hypercube object as input to all other functions in the Image Processing Toolbox Hyperspectral
Imaging Library.
19-2
Getting Started with Hyperspectral Image Processing
To visualize and understand the object being imaged, it is useful to represent the data cube as a 2-D
image by using color schemes. The color representation of the data cube enables you to visually
inspect the data and supports decision making. You can use the colorize function to compute the
Red-Green-Blue (RGB), false-color, and color-infrared (CIR) representations of the data cube.
• The RGB color scheme uses the red, green, and blue spectral band responses to generate the 2-D
image of the hyperspectral data cube. The RGB color scheme brings a natural appearance, but
results in a significant loss of subtle information.
19-3
19 Hyperspectral Image Processing
• The false-color scheme uses a combination of any number of bands other than the visible red,
green, and blue spectral bands. Use false-color representation to visualize the spectral responses
of bands outside the visible spectrum. The false-color scheme efficiently captures distinct
information across all spectral bands of hyperspectral data.
• The CIR color scheme uses spectral bands in the NIR range. The CIR representation of a
hyperspectral data cube is particularly useful in displaying and analyzing vegetation areas of the
data cube.
Preprocessing
The hyperspectral imaging sensors typically have high spectral resolution and low spatial resolution.
The spatial and the spectral characteristics of the acquired hyperspectral data are characterized by
its pixels. Each pixel is a vector of values that specify the intensities at a location (x,y) in z different
bands. The vector is known as the pixel spectrum, and it defines the spectral signature of the pixel
located at (x,y). The pixel spectra are important features in hyperspectral data analysis. But these
pixel spectra gets distorted due to factors such as sensor noise, atmospheric effects, and low
resolution.
You can use the denoiseNGMeet function to remove noise from a hyperspectral data by using the
non-local meets global approach.
To enhance the spatial resolution of a hyperspectral data, you can use image fusion methods. The
fusion approach combines information from the low resolution hyperspectral data with a high
resolution multispectral data or panchromatic image of the same scene. This approach is also known
as sharpening or pansharpening in hyperspectral image analysis. Pansharpening specifically refers to
fusion between hyperspectral and panchromatic data. You can use the sharpencnmf function for
sharpening hyperspectral data using coupled non-matrix factorization method.
To compensate for the atmospheric effects, you must first calibrate the pixel values, which are digital
numbers (DNs). You must preprocess the data by calibrating DNs using radiometric and atmospheric
correction methods. This process improves interpretation of the pixel spectra and provides better
19-4
Getting Started with Hyperspectral Image Processing
results when you analyse multiple data sets, as in a classification problem. For information about
radiometric calibration and atmospheric correction methods, see “Hyperspectral Data Correction” on
page 19-9.
The other preprocessing step that is important in all hyperspectral imaging applications is
dimensionality reduction. The large number of bands in the hyperspectral data increases the
computational complexity of processing the data cube. The contiguous nature of the band images
results in redundant information across bands. Neighboring bands in a hyperspectral image have
high correlation, which results in spectral redundancy. You can remove the redundant bands by
decorrelating the band images. Popular approaches for reducing the spectral dimensionality of a data
cube include band selection and orthogonal transforms.
• The band selection approach uses orthogonal space projections to find the spectrally distinct and
most informative bands in the data cube. Use the selectBands and removeBands functions for
the finding most informative bands and removing one or more bands, respectively.
• Orthogonal transforms such as principal component analysis (PCA) and maximum noise fraction
(MNF), decorrelate the band information and find the principal component bands.
PCA transforms the data to a lower dimensional space and finds principal component vectors with
their directions along the maximum variances of the input bands. The principal components are in
descending order of the amount of total variance explained.
MNF computes the principal components that maximize the signal-noise-ratio, rather than the
variance. MNF transform is particularly efficient at deriving principal components from noisy
band images. The principal component bands are spectrally distinct bands with low interband
correlation.
The hyperpca and hypermnf functions reduce the spectral dimensionality of the data cube by
using the PCA and MNF transforms respectively. You can use the pixel spectra derived from the
reduced data cube for hyperspectral data analysis.
Spectral Unmixing
In a hyperspectral image, the intensity values recorded at each pixel specify the spectral
characteristics of the region that the pixel belongs to. The region can be a homogeneous surface or
heterogeneous surface. The pixels that belong to a homogeneous surface are known as pure pixels.
These pure pixels constitute the endmembers of the hyperspectral data.
Heterogeneous surfaces are a combination of two or more distinct homogeneous surfaces. The pixels
belonging to heterogeneous surfaces are known as mixed pixels. The spectral signature of a mixed
pixel is a combination of two or more endmember signatures. This spatial heterogeneity is mainly due
to the low spatial resolution of the hyperspectral sensor.
19-5
19 Hyperspectral Image Processing
Spectral unmixing is the process of decomposing the spectral signatures of mixed pixels into their
constituent endmembers. The spectral unmixing process involves two steps:
1 Endmember extraction — The spectra of the endmembers are prominent features in the
hyperspectral data and can be used for efficient spectral unmixing, segmentation, and
classification of hyperspectral images. Convex geometry based approaches, such as pixel purity
index (PPI), fast iterative pixel purity index (FIPPI), and N-finder (N-FINDR) are some of the
efficient approaches for endmember extraction.
• Use the ppi function to estimate the endmembers by using the PPI approach. The PPI
approach projects the pixel spectra to an orthogonal space and identifies extrema pixels in the
projected space as endmembers. This is a non-iterative approach, and the results depend on
the random unit vectors generated for orthogonal projection. To improve results, you must
increase the random unit vectors for projection, which can be computationally expensive.
• Use the fippi function to estimate the endmembers by using the FIPPI approach. The FIPPI
approach is an iterative approach, which uses an automatic target generation process to
estimate the initial set of unit vectors for orthogonal projection. The algorithm converges
faster than the PPI approach and identifies endmembers that are distinct from one another.
• Use the nfindr function to estimate the endmembers by using the N-FINDR method. N-
FINDR is an iterative approach that constructs a simplex by using the pixel spectra. The
approach assumes that the volume of a simplex formed by the endmembers is larger than the
volume defined by any other combination of pixels. The set of pixel signatures for which the
volume of the simplex is high are the endmembers.
2 Abundance map estimation — Given the endmember signatures, it is useful to estimate the
fractional amount of each endmember present in each pixel. You can generate the abundance
maps for each endmember, which represent the distribution of endmember spectra in the image.
You can label a pixel as belonging to an endmember spectra by comparing all of the abundance
map values obtained for that pixel.
Use the estimateAbundanceLS function to estimate the abundance maps for each endmember
spectra.
19-6
Getting Started with Hyperspectral Image Processing
Spectral Matching
Interpret the pixel spectra by performing spectral matching.Spectral matching identifies the class of
an endmember material by comparing its spectra with one or more reference spectra. The reference
data consists of pure spectral signatures of materials, which are available as spectral libraries.
Use the readEcostressSig function to read the reference spectra files from the ECOSTRESS
spectral library. Then, you can compute the similarity between the files in the ECOSTRESS library
spectra and an endmember spectra by using the spectralMatch function.
The geometrical characteristics and the probability distribution values of the pixel spectra are the
important features for spectral matching. You can improve the matching efficiency by combining both
the geometrical and probabilistic characteristics. Such combination measures have higher
discrimination capabilities than the individual approaches and are more suitable for discriminating
spectrally similar targets (intra-species). This table lists the functions available for computing the
spectral matching score.
Method Description
sam Spectral angle mapper (SAM) matches two
spectra based on their geometrical
characteristics. The SAM measure computes
angle between two spectral signatures. The
smaller angle represents best matching between
two spectra. This measure is insensitive to
illumination changes.
sid Spectral information divergence (SID) matches
two spectra based on their probability
distributions. This method is efficient in
identifying mixed pixels spectra. Low SID value
implies higher similarity between two spectra.
sidsam Combination of SID and SAM. The SID-SAM
approach has better discrimination capability
compared to SID and SAM individually. Minimum
score implies higher similarity between two
spectra.
jmsam Combination of Jeffries–Matusita (JM) distance
and SAM. Low distance values imply higher
similarity between two spectra. This method is
particularly efficient in discriminating spectrally
close targets.
ns3 Normalized spectra similarity score (NS3), which
combines Euclidean distance and SAM. Low
distance values imply higher similarity between
two spectra. This method has high discrimination
capability but requires extensive reference data
for high accuracy.
19-7
19 Hyperspectral Image Processing
Applications
Hyperspectral image processing applications include classification, target detection, anomaly
detection, and material analysis.
• Segment and classify each pixel in a hyperspectral image through unmixing and spectral
matching. For examples of classification, see “Hyperspectral Image Analysis Using Maximum
Abundance Classification” on page 19-21 and “Classify Hyperspectral Image Using Library
Signatures and SAM” on page 19-28.
• You can perform target detection by matching the known spectral signature of a target material to
the pixel spectra in hyperspectral data. For an example, see “Target Detection Using Spectral
Signature Matching” on page 19-41.
• You can also use hyperspectral image processing for anomaly detection and material analysis,
such as vegetation analysis.
See Also
Apps
Hyperspectral Viewer
Functions
anomalyRX | estimateAbundanceLS | hypercube | ndvi | ppi | spectralMatch
Related Examples
• “Classify Hyperspectral Image Using Library Signatures and SAM” on page 19-28
• “Hyperspectral Image Analysis Using Maximum Abundance Classification” on page 19-21
• “Target Detection Using Spectral Signature Matching” on page 19-41
19-8
Hyperspectral Data Correction
The incident radiation reflected by the surface is known as the surface reflectance. The reflected
radiation measured by the sensor positioned at the top of the atmosphere (TOA) is known as the TOA
radiance. Ideally, the TOA radiance is equal to the surface reflectance. But, in real conditions, the
incident and the reflected radiation are affected by atmospheric phenomena such as scattering and
absorption. As a result, the TOA radiance value is the sum of reflections from the surface, reflections
from clouds, and scattering from air molecules and aerosol particles in the atmosphere.
Along with the characteristics of the light source and the surface material, the radiation values
measured by the sensor are influenced by the sensor gain and bias (offset) at each spectral
wavelength. The raw data recorded by the hyperspectral sensors is known as the digital numbers
(DNs). To use the hyperspectral data for quantitative analysis, you must calibrate the data for TOA
radiance values, and estimate the actual surface reflectance values from the DNs.
The process of estimating TOA radiance values from the DNs is known as radiometric calibration. The
process of estimating the surface reflectance values by removing the atmospheric effects is known as
atmospheric correction.
You can perform radiometric calibration and atmospheric correction procedures as preprocessing
steps for thorough spectral analysis.
Radiometric Calibration
DN to TOA Radiance
To estimate TOA radiance values from DNs, calibrate sensor gain and bias in each spectral band.
Gainλ and Biasλ are the gain and bias values for each spectral band (λ), respectively.
19-9
19 Hyperspectral Image Processing
You can find the TOA radiance values for uncalibrated hyperspectral data by using the dn2radiance
function. The function reads the gain and the bias (offset) values for each spectral band from the
header file associated with the hyperspectral data.
You can estimate the TOA reflectance values from TOA radiance values. TOA reflectance specifies the
ratio of TOA radiance to the radiation incident on the surface.
d is the Earth-sun distance in astronomical units, ESUNλ is the mean solar irradiance for each
spectral band, and θE is the sun elevation angle. You can estimate the TOA reflectance values from
TOA radiance values by using the radiance2Reflectance function.
DN to TOA Reflectance
You can directly compute TOA reflectance values from DNs, if the reflectance gain (RGain) and
reflectance offset (ROffset) parameters of each spectral band are available.
The dn2reflectance function calibrates the DNs to TOA reflectance values by using the reflectance
gain and offset parameters available in the metadata.
Atmospheric Correction
Atmospheric correction methods estimate the surface reflectance values from TOA radiance or TOA
reflectance values. The atmospheric correction methods are classified as empirical methods and
model-based methods.
• Empirical methods are scene-based approaches that estimate relative surface reflectance values.
Empirical methods are computationally efficient and does not require a priori measurements.
• Model-based methods are dependent on in situ atmospheric data and are useful for accurate
estimation of surface reflectance values.
Method Description
subtractDarkPixel Dark pixel subtraction or dark object subtraction,
is an empirical method suitable for removing
atmospheric haze from hyperspectral images.
Atmospheric haze is characterized by high DN
values, and results in unnatural brightening of
the images. The dark pixels are minimum values
pixels in each band. Dark pixels are assumed to
have zero surface reflectance, and their values
account for the additive effect of the atmospheric
path radiance.
19-10
Hyperspectral Data Correction
19-11
19 Hyperspectral Image Processing
19-12
Explore Hyperspectral Data in the Hyperspectral Viewer
For this example, load an aerial hyperspectral data set of an area called Jasper Ridge, captured via
the airborne visible/infrared imaging spectrometer (AVIRIS). The data set contains areas of water,
land, road, and vegetation. Load the hyperspectral data set into a hypercube object in the
MATLAB® workspace.
hcube = hypercube('jasperRidge2_R198.img');
This command creates a hypercube object in the workspace called hcube. The hcube object
contains a 100-by-100-by-198 cube of hyperspectral data.
Open the Hyperspectral Viewer app. First, click the Apps tab on the MATLAB toolstrip. Then, in the
Image Processing and Computer Vision section, click the Hyperspectral Viewer button.
With the app open, load the hyperspectral data into the app. On the app toolstrip, click Import and
select Hypercube Object. In the Import from Workspace dialog box, select the hypercube object
you loaded into the workspace, hcube. (You can also specify a data set when you open the app using
the command: hyperspectralViewer(hcube).)
The app displays several views of the Jasper Ridge hyperspectral data. The Bands pane displays the
bands of the hyperspectral data as a stack of grayscale images. A second pane includes color
composite representations of the hyperspectral data, displaying the False Color tab by default. The
Plots pane displays a histogram of the band currently displayed in the Bands pane and a plot of the
spectral dimension of the data by wavelength or by band. (You can rearrange these panes by clicking
and dragging them inside the app. To return to the standard pane arrangement, click Default Layout
on the app toolstrip.)
19-13
19 Hyperspectral Image Processing
Explore the spectral bands of the Jasper Ridge data set as a stack of grayscale images in the Bands
pane. Use the slider at the bottom of the pane to navigate through the images. Because each band
isolates a specific range of wavelengths, aspects of the scene might be clearer in some bands than
others.
19-14
Explore Hyperspectral Data in the Hyperspectral Viewer
To get a closer look at a band, click Zoom In or Zoom Out in the axes toolbar that appears when you
point the cursor over the image.
To improve the contrast of a band image, click Adjust Contrast on the app toolstrip. When you do,
the app overlays a contrast adjustment window on the histogram of the image, displayed in the Plot
pane. To adjust the contrast, move the window over the histogram or resize the window by clicking
and dragging the handles. The app adjusts the contrast using a technique called contrast stretching.
In this process, pixel values below a specified value are displayed as black, pixel values above a
specified value are displayed as white, and pixel values in between these two values are displayed as
shades of gray. The result is a linear mapping of a subset of pixel values to the entire range of grays,
from black to white, producing an image of higher contrast. To return to the default view, click Snap
Data Range. To remove the contrast adjustment window from the histogram, click Adjust Contrast.
19-15
19 Hyperspectral Image Processing
Explore the Jasper Ridge hyperspectral data as a color composite image. To create these color
images, the Hyperspectral Viewer automatically chooses three of the bands in the hyperspectral
dataset to use for the red, green, and blue channels of a color image. The choice of which bands the
app uses depends on the type of color representation. The app supports three types of color
composite renditions: False Color, RGB, and Color Infrared (CIR). It can be useful to view all of the
color composite images because each one uses different bands and can highlight different spectral
details, thus increasing the interpretability of the data.
By default, the app displays a false-color representation of the data. False-color composites visualize
wavelengths that the human eye cannot see. The tab of the pane identifies the type of the color
image, False Color, and the bands that the app used to form it, (104,100,146), in red-green-blue
order. The app indicates in the Spectral Plot which bands were used. To change these band
selections, click and drag the handle of the band indicator in the Spectral Plot. If you choose a
different band, the app updates the text in the tab with the new bands and adds the word "Custom,"
such as, False Color-Custom.
19-16
Explore Hyperspectral Data in the Hyperspectral Viewer
To create the RGB color composite image, the app chooses bands in the visible part of the
electromagnetic spectrum. The resulting composite image resembles what the human eye would
observe naturally. For example, vegetation appears green and water is blue. While RGB composites
can appear natural to our eyes, it can be difficult to distinguish subtle differences in features. Natural
color images can be low in contrast.
19-17
19 Hyperspectral Image Processing
To create the CIR color composite image, the app chooses red, green, and near-infrared wavelengths.
Near infrared wavelengths are slightly longer than red, and they are outside of the range visible to
the human eye.
19-18
Explore Hyperspectral Data in the Hyperspectral Viewer
After exploring the grayscale and color visualizations of the hyperspectral data, you can plot points or
small regions of the data along the spectral dimension to create spectral profiles. You can plot a
single pixel or a region up to 10-by-10 pixels square. Use the Neighborhood Size slider to specify
the region size. When you select a region, the app uses the mean of all the pixels in the region when
plotting the data. Plotting a region, rather than an individual pixel, can smooth out spectral profiles.
To create a spectral plot, click Add Spectral Plot on the app toolstrip, move the cursor over a
visualization in the app, and click to select the point or region. You can make your selections on any
of the visualizations provided by the app. Your choice of which visualization to use can depend on
which one provides the best view of the particular feature of the data you are interested in. When you
make a selection, the app puts a point icon at that position on all of the visualizations. To select
additional points, click Add Spectral Plot and repeat the process. To delete a point, right-click the
point and choose Delete Point from the pop-up menu. To delete all of the points you have selected,
click Clear All on the app toolstrip.
19-19
19 Hyperspectral Image Processing
For example, the following figure shows four points selected in each visualization, each point
representing a particular type of data: water, vegetation, road, and land.
As you select each point, the app plots the data on the Spectral Plot, using a different color to
identify each plot. By default, the Spectral Plot also includes a legend identifying the plot for each
point. To toggle off inclusion of the legend, click Show Legend.
19-20
Hyperspectral Image Analysis Using Maximum Abundance Classification
This example uses a data sample from the Pavia University dataset as test data. The test data
contains nine endmembers that represent these ground truth classes: Asphalt, Meadows, Gravel,
Trees, Painted metal sheets, Bare soil, Bitumen, Self blocking bricks, and Shadows.
Load the .mat file containing the test data into the workspace. The .mat file contains an array
paviaU, representing the hyperspectral data cube and a matrix signatures, representing the nine
endmember signatures taken from the hyperspectral data. The data cube has 103 spectral bands with
wavelengths ranging from 430 nm to 860 nm. The geometric resolution is 1.3 meters and the spatial
resolution of each band image is 610-by-340.
load('paviaU.mat');
image = paviaU;
sig = signatures;
Compute the central wavelength for each spectral band by evenly spacing the wavelength range
across the number of spectral bands.
Create a hypercube object using the hyperspectral data cube and the central wavelengths. Then
estimate an RGB image from the hyperspectral data. Set the ContrastStretching parameter value
to true in order to improve the contrast of the RGB output. Visualize the RGB image.
hcube = hypercube(image,wavelength);
rgbImg = colorize(hcube,'Method','RGB','ContrastStretching',true);
figure
imshow(rgbImg)
19-21
19 Hyperspectral Image Processing
The test data contains the endmember signatures of nine ground truth classes. Each column of sig
contain the endmember signature of a ground truth class. Create a table that lists the class name for
each endmember and the corresponding column of sig.
num = 1:size(sig,2);
endmemberCol = num2str(num');
classNames = {'Asphalt';'Meadows';'Gravel';'Trees';'Painted metal sheets';'Bare soil';...
19-22
Hyperspectral Image Analysis Using Maximum Abundance Classification
ans=9×2 table
Column of sig Endmember Class Name
_____________ ________________________
1 {'Asphalt' }
2 {'Meadows' }
3 {'Gravel' }
4 {'Trees' }
5 {'Painted metal sheets'}
6 {'Bare soil' }
7 {'Bitumen' }
8 {'Self blocking bricks'}
9 {'Shadows' }
figure
plot(sig)
xlabel('Band Number')
ylabel('Data Values')
ylim([400 2700])
title('Endmember Signatures')
legend(classNames,'Location','NorthWest')
19-23
19 Hyperspectral Image Processing
Create abundance maps the endmembers by using the estimateAbundanceLS function and select
the method as full constrained least squares (FCLS). The function outputs the abundance maps as a
3-D array with the spatial dimensions as the input data. Each channel is the abundance map of the
endmember from the corresponding column of signatures. In this example, the spatial dimension of
the input data is 610-by-340 and number of endmembers is 9. So, the size of the output abundance
map is 610-by-340-by-9.
abundanceMap = estimateAbundanceLS(hcube,sig,'Method','fcls');
19-24
Hyperspectral Image Analysis Using Maximum Abundance Classification
Find the channel number of the largest abundance value for each pixel. The channel number returned
for each pixel corresponds to the column in sig that contains the endmember signature associated
with the maximum abundance value of that pixel. Display a color coded image of the pixels classified
by maximum abundance value.
[~,matchIdx] = max(abundanceMap,[],3);
figure
imagesc(matchIdx)
colormap(jet(numel(classNames)))
colorbar('TickLabels',classNames)
19-25
19 Hyperspectral Image Processing
Segment the classified regions and overlay each of them on the RGB image estimated from the
hyperspectral data cube.
segmentImg = zeros(size(matchIdx));
overlayImg = zeros(size(abundanceMap,1),size(abundanceMap,2),3,size(abundanceMap,3));
for i = 1:size(abundanceMap,3)
segmentImg(matchIdx==i) = 1;
overlayImg(:,:,:,i) = imoverlay(rgbImg,segmentImg);
segmentImg = zeros(size(matchIdx));
end
Display the classified and the overlaid hyperspectral image regions along with their class names.
From the images, you can see that the asphalt, trees, bare soil, and brick regions have been
accurately classified.
19-26
Hyperspectral Image Analysis Using Maximum Abundance Classification
See Also
colorize | estimateAbundanceLS | hypercube
More About
• “Getting Started with Hyperspectral Image Processing” on page 19-2
19-27
19 Hyperspectral Image Processing
1 Generate a score map for different regions present in the test data by computing the SAM
spectral match score between the spectrum of each test pixel and a pure spectrum. The pure
spectra are from the ECOSTRESS spectral library.
2 Classify the regions by using minimum score criteria, and assign a class label for each pixel in
the test data.
Read test data from the Jasper Ridge dataset by using the hypercube function. The function returns
a hypercube object, which stores the hyperspectral data cube and the corrsponding wavelength and
metadata information read from the test data. The test data has 198 spectral bands and their
wavelengths range from 399.4 nm to 2457 nm. The spectral resolution is up to 9.9 nm and the spatial
resolution of each band image is 100-by-100.
hcube = hypercube('jasperRidge2_R198.img')
hcube =
hypercube with properties:
Estimate an RGB image from the data cube. Apply contrast stretching to enhance the contrast of the
output RGB image.
rgbImg = colorize(hcube,'Method','rgb','ContrastStretching',true);
figure
imagesc(rgbImg);
axis image off
title('RGB Image of Data Cube')
19-28
Classify Hyperspectral Image Using Library Signatures and SAM
The ECOSTRESS spectral ibrary consists of pure spectral signatures for individual surface materials.
If the spectrum of a pixel matches a signature from the ECOSTRESS library, the pixel consists
entirely of that single surface material. The library is a compilation of over 3400 spectral signatures
for both natural and manmade materials. Since you know the the endmembers latent in the test data,
choose the ECOSTRESS spectral library files related to those four endmembers.
Read spectral files related to water, vegetation, soil, and concrete from the ECOSTRESS spectral
library. Use the spectral signatures of these types:
filenames = ["water.seawater.none.liquid.tir.seafoam.jhu.becknic.spectrum.txt",...
"vegetation.tree.eucalyptus.maculata.vswir.jpl087.jpl.asd.spectrum.txt",...
"soil.utisol.hapludult.none.all.87p707.jhu.becknic.spectrum.txt",...
"soil.mollisol.cryoboroll.none.all.85p4663.jhu.becknic.spectrum.txt",...
"manmade.concrete.pavingconcrete.solid.all.0092uuu_cnc.jhu.becknic.spectrum.txt"];
lib = readEcostressSig(filenames)
19-29
19 Hyperspectral Image Processing
classNames = [lib.Class];
Plot the pure spectral signatures read from the ECOSTRESS spectral library.
figure
hold on
for idx = 1:numel(lib)
plot(lib(idx).Wavelength,lib(idx).Reflectance,'LineWidth',2)
end
axis tight
box on
title('Pure Spectral Signatures from ECOSTRESS Library')
xlabel('Wavelength (\mum)')
ylabel('Reflectance (%)')
legend(classNames,'Location','northeast')
title(legend,'Class Names')
hold off
19-30
Classify Hyperspectral Image Using Library Signatures and SAM
Find the spectral match score between each pixel spectrum and the library signatures by using the
spectralMatch function. By default, the spectralMatch function computes the degree of
similarity between two spectra by using the SAM classification algorithm. The function returns an
array with the same spatial dimensions as the hyperspectral data cube and channels equal to the
number of library signatures specified. Each channel contains the score map for a single library
signature. In this example, there are five ECOSTRESS spectral library files specified for comparison,
and each band of the hyperspectral data cube has spatial dimensions of 100-by-100 pixels. The size of
the output array of score maps thus is 100-by-100-by-5.
scoreMap = spectralMatch(lib,hcube);
figure
montage(scoreMap,'Size',[1 numel(lib)],'BorderSize',10)
title('Score Map Obtained for Each Pure Spectrum','FontSize',14)
colormap(jet);
colorbar
19-31
19 Hyperspectral Image Processing
Lower SAM values indicate higher spectral similarity. Use the minimum score citeria to classify the
test pixels by finding the best match for each pixel among the library signatures. The result is a pixel-
wise classification map in which the value of each pixel is the index of library signature file in lib for
which that pixel exhibits the lowest SAM value. For example, if the value of a pixel in the
classification map is 1, the pixel exhibits high similarity to the first library signature in lib.
[~,classMap] = min(scoreMap,[],3);
Create a class table that maps the classification map values to the ECOSTRESS library signatures
used for spectral matching.
classTable = table((min(classMap(:)):max(classMap(:)))',classNames',...
'VariableNames',{'Classification map value','Matching library signature'})
classTable=5×2 table
Classification map value Matching library signature
________________________ __________________________
1 "Sea Water"
2 "Tree"
3 "Utisol"
4 "Mollisol"
5 "Concrete"
Display the RGB image of the hyperspectral data and the classification results. Visual inspection
shows that spectral matching classifies each pixel effectively.
19-32
Classify Hyperspectral Image Using Library Signatures and SAM
References
[1] Kruse, F.A., A.B. Lefkoff, J.W. Boardman, K.B. Heidebrecht, A.T. Shapiro, P.J. Barloon, and A.F.H.
Goetz. “The Spectral Image Processing System (SIPS)—Interactive Visualization and Analysis of
Imaging Spectrometer Data.” Remote Sensing of Environment 44, no. 2–3 (May 1993): 145–63.
https://doi.org/10.1016/0034-4257(93)90013-N.
See Also
colorize | hypercube | readEcostressSig | spectralMatch
More About
• “Getting Started with Hyperspectral Image Processing” on page 19-2
19-33
19 Hyperspectral Image Processing
This example uses 1) the spectral signatures in the ECOSTRESS spectral library as the reference
spectra and 2) a data sample from the Jasper Ridge dataset as the test data, for endmember material
identification.
Add the full file path containing the ECOSTRESS library files and specify the names of the files to be
read from the library.
fileroot = matlabshared.supportpkg.getSupportPackageRoot();
addpath(fullfile(fileroot,'toolbox','images','supportpackages','hyperspectral','hyperdata','ECOST
filenames = ["water.seawater.none.liquid.tir.seafoam.jhu.becknic.spectrum.txt",...
"water.tapwater.none.liquid.all.tapwater.jhu.becknic.spectrum.txt",...
"water.ice.none.solid.all.ice_dat_.jhu.becknic.spectrum.txt",...
"vegetation.tree.eucalyptus.maculata.vswir.jpl087.jpl.asd.spectrum.txt",...
"soil.utisol.hapludult.none.all.87p707.jhu.becknic.spectrum.txt",...
"soil.mollisol.cryoboroll.none.all.85p4663.jhu.becknic.spectrum.txt",...
"manmade.road.tar.solid.all.0099uuutar.jhu.becknic.spectrum.txt",...
"manmade.concrete.pavingconcrete.solid.all.0092uuu_cnc.jhu.becknic.spectrum.txt"];
lib = readEcostressSig(filenames);
Display the lib data and inspect its values. The data is a struct variable specifying the class,
subclass, wavelength, and reflectance related information.
lib
19-34
Endmember Material Identification Using Spectral Library
NumberOfXValues
AdditionalInformation
Wavelength
Reflectance
⋮
Plot the spectral signatures read from the ECOSTRESS spectral library.
figure
hold on
for idx = 1:numel(lib)
plot(lib(idx).Wavelength,lib(idx).Reflectance,'LineWidth',2);
end
axis tight
box on
xlabel('Wavelength (\mum)');
ylabel('Reflectance (%)');
classNames = {lib.Class};
legend(classNames,'Location','northeast')
title('Reference Spectra from ECOSTRESS Library');
hold off
Read a test data from Jasper Ridge dataset by using the hypercube function. The function returns a
hypercube object that stores the data cube and the metadata information read from the test data.
The test data has 198 spectral bands and their wavelengths range from 399.4 nm to 2457 nm. The
19-35
19 Hyperspectral Image Processing
spectral resolution is up to 9.9 nm and the spatial resolution of each band image is 100-by-100. The
test data contains four endmembers latent that includes road, soil, water, and trees.
hcube = hypercube('jasperRidge2_R198.hdr');
To compute the total number of spectrally distinct endmembers present in the test data, use the
countEndmembersHFC function. This function finds the number of endmembers by using the
Harsanyi–Farrand–Chang (HFC) method. Set the probability of false alarm (PFA) to a low value in
order to avoid false detections.
numEndmembers = countEndmembersHFC(hcube,'PFA',10^-27);
Extract the endmembers of the test data by using the N-FINDR method.
endMembers = nfindr(hcube,numEndmembers);
Read the wavelength values from the hypercube object hcube. Plot the extracted endmember
signatures. The test data comprises of 4 endmember materials and the class names of these materials
can be identified through spectral matching.
figure
plot(hcube.Wavelength,endMembers,'LineWidth',2)
axis tight
xlabel('Wavelength (nm)')
ylabel('Data Values')
title('Endmembers Extracted using N-FINDR')
num = 1:numEndmembers;
legendName = strcat('Endmember',{' '},num2str(num'));
legend(legendName)
19-36
Endmember Material Identification Using Spectral Library
To identify the name of an endmember material, use the spectralMatch function. The function
computes the spectral similarity between the library files and an endmember spectrum to be
classified. Select spectral information divergence (SID) method for computing the matching score.
Typically, a low value of SID score means better matching between the test and the reference spectra.
Then, the test spectrum is classified to belong to the class of the best matching reference spectrum.
For example, to identify the class of the third and fourth endmember material, find the spectral
similarity between the library signatures and the respective endmember spectrum. The index of the
minimum SID score value specifies the class name in the spectral library. The third endmember
spectrum is identified as Sea Water and the fourth endmember spectrum is identified as Tree.
wavelength = hcube.Wavelength;
detection = cell(1,1);
cnt = 1;
queryEndmember = [3 4];
for num = 1:numel(queryEndmember)
spectra = endMembers(:,queryEndmember(num));
scoreValues = spectralMatch(lib,spectra,wavelength,'Method','sid');
[~, matchIdx] = min(scoreValues);
detection{cnt} = lib(matchIdx).Class;
disp(strcat('Endmember spectrum ',{' '},num2str(queryEndmember(num)),' is identified as ',{'
cnt=cnt+1;
end
19-37
19 Hyperspectral Image Processing
To visually inspect the identification results, localise and segment the image regions specific to the
endmember materials in the test data. Use the sid function to compute pixel-wise spectral similarity
between the pixel spectrum and the extracted endmember spectrum. Then, perform thresholding to
segment the desired endmember regions in the test data and generate the segmented image. Select
the value for threshold as 15 to select the best matching pixels.
For visualization, generate the RGB version of the test data by using the colorize function and
then, overlay the segmented image onto the test image.
threshold = 15;
rgbImg = colorize(hcube,'method','rgb','ContrastStretching',true);
overlayImg = rgbImg;
labelColor = {'Blue','Green'};
segmentedImg = cell(size(hcube.DataCube,1),size(hcube.DataCube,2),numel(queryEndmember));
for num = 1:numel(queryEndmember)
scoreMap = sid(hcube,endMembers(:,queryEndmember(num)));
segmentedImg{num} = scoreMap <= threshold;
overlayImg = imoverlay(overlayImg,segmentedImg{num},labelColor{num});
end
Display Results
Visually inspect the identification results by displaying the segmented images and the overlayed
image that highlights the Sea Water and Tree endmember regions in the test data.
19-38
Endmember Material Identification Using Spectral Library
19-39
19 Hyperspectral Image Processing
References
[1] Kruse, F.A., A.B. Lefkoff, J.W. Boardman, K.B. Heidebrecht, A.T. Shapiro, P.J. Barloon, and A.F.H.
Goetz. “The Spectral Image Processing System (SIPS)—Interactive Visualization and Analysis of
Imaging Spectrometer Data.” Remote Sensing of Environment 44, no. 2–3 (May 1993): 145–63.
https://doi.org/10.1016/0034-4257(93)90013-N.
See Also
colorize | countEndmembersHFC | hypercube | readEcostressSig | spectralMatch
More About
• “Getting Started with Hyperspectral Image Processing” on page 19-2
19-40
Target Detection Using Spectral Signature Matching
This example uses the data sample taken from the Pavia Universtity dataset as the test data. The
dataset contains endmember signatures for 9 groundtruth classes and each signature is a vector of
length 103. The ground truth classes include Asphalt, Meadows, Gravel, Trees, Painted metal sheets,
Bare soil, Bitumen, Self blocking bricks, and Shadows. Of these classes, the painted metal sheets
typically belongs to the roofing materials type and it is the desired target to be located.
Read the test data from Pavia University dataset by using the hypercube function. The function
returns a hypercube object that stores the data cube and the metadata information read from the
test data. The test data has 103 spectral bands and their wavelengths range from 430 nm to 860 nm.
The geometric resolution is 1.3 meters and the spatial resolution of each band image is 610-by-340.
hcube = hypercube('paviaU.hdr');
Estimate an RGB color image from the data cube by using the colorize function. Set the
ContrastStretching parameter value to true in order to improve the contrast of RGB color
image. Display the RGB image.
rgbImg = colorize(hcube,'Method','rgb','ContrastStretching',true);
figure
imshow(rgbImg)
title('RGB Image')
19-41
19 Hyperspectral Image Processing
Read the spectral information corresponding to a roofing material from the ECOSTRESS spectral
library by using the readEcostressSig function. Add the full file path containing the ECOSTRESS
spectral file and read the spectral signature for roofing material from the specified location.
19-42
Target Detection Using Spectral Signature Matching
fileroot = matlabshared.supportpkg.getSupportPackageRoot();
addpath(fullfile(fileroot,'toolbox','images','supportpackages','hyperspectral','hyperdata','ECOST
lib = readEcostressSig("manmade.roofingmaterial.metal.solid.all.0692uuucop.jhu.becknic.spectrum.t
Inspect the properties of the reference spectrum read from the ECOSTRESS library. The output
structure lib stores the metadata and the data values read from the ECOSTRESS library.
lib
Read the wavelength and the reflectance values stored in lib. The wavelength and the reflectance
pair comprises the reference spectrum or the reference spectral signature.
wavelength = lib.Wavelength;
reflectance = lib.Reflectance;
plot(wavelength,reflectance,'LineWidth',2)
axis tight
xlabel('Wavelength (\mum)')
ylabel('Reflectance (%)')
title('Reference Spectrum')
19-43
19 Hyperspectral Image Processing
Find the spectral similarity between the reference spectrum and the data cube by using the
spectralMatch function. By default, the function uses the spectral angle mapper (SAM) method for
finding the spectral match. The output is a score map that signifies the matching between each pixel
spectrum and the reference spectrum. Thus, the score map is a matrix of spatial dimension same as
that of the test data. In this case, the size of the score map is 610-by-340. SAM is insensitive to gain
19-44
Target Detection Using Spectral Signature Matching
factors and hence, can be used to match pixel spectrum that inherently have an unknown gain factor
due to topographic illumination effects.
scoreMap = spectralMatch(lib,hcube);
19-45
19 Hyperspectral Image Processing
Typical values for the SAM score lies in the range [0, 3.142] and the measurement unit is radians.
Lower value of SAM score represents better matching between the pixel spectrum and the reference
spectrum. Use thresholding method to spatially localize the target region in the input data. To
determine the threshold, inspect the histogram of the score map. The minimum SAM score value with
prominent number of occurrences can be used to select the threshold for detecting the target region.
figure
imhist(scoreMap);
title('Histogram Plot of Score Map');
xlabel('Score Map Values')
ylabel('Number of occurrences');
From the histogram plot, you can infer the minimum score value with prominent number of
occurrence as approximately 0.22. Accordingly, you can set a value around the local maxima as the
threshold. For this example, you can select the threshold for detecting the target as 0.25. The pixel
values that are less than the maximum threshold are classified as the target region.
maxthreshold = 0.25;
Perform thresholding to detect the target region with maximum spectral similarity. Overlay the
thresholded image on the RGB image of the hyperspectral data.
thresholdedImg = scoreMap <= maxthreshold;
overlaidImg = imoverlay(rgbImg,thresholdedImg,'green');
19-46
Target Detection Using Spectral Signature Matching
You can validate the obtained target detection results by using the ground truth data taken from
Pavia University dataset.
Load .mat file containing the ground truth data . To validate the result quantitatively, compute the
mean squared error between the ground truth and the output. The error value is less if the obtained
results are close to the ground truth.
load('paviauRoofingGT.mat');
err = immse(im2double(paviauRoofingGT), im2double(thresholdedImg));
fprintf('\n The mean squared error is %0.4f\n', err)
19-47
19 Hyperspectral Image Processing
title('Result Obtained')
axis off
axes2 = axes('Parent',fig,'Position',[0.47 0.11 0.4 0.82]);
imagesc(paviauRoofingGT,'Parent',axes2)
colormap([0 0 0;1 1 1]);
axis off
title('Ground Truth')
References
[1] Kruse, F.A., A.B. Lefkoff, J.W. Boardman, K.B. Heidebrecht, A.T. Shapiro, P.J. Barloon, and A.F.H.
Goetz. “The Spectral Image Processing System (SIPS)—Interactive Visualization and Analysis of
Imaging Spectrometer Data.” Remote Sensing of Environment 44, no. 2–3 (May 1993): 145–63.
https://doi.org/10.1016/0034-4257(93)90013-N.
[2] Chein-I Chang. “An Information-Theoretic Approach to Spectral Variability, Similarity, and
Discrimination for Hyperspectral Image Analysis.” IEEE Transactions on Information Theory 46, no. 5
(August 2000): 1927–32. https://doi.org/10.1109/18.857802.
See Also
colorize | hypercube | readEcostressSig | spectralMatch
More About
• “Getting Started with Hyperspectral Image Processing” on page 19-2
19-48
Identify Vegetation Regions Using Interactive NDVI Thresholding
NIR − R
NDVI =
NIR + R
The NDVI value of a pixel is a scalar from -1 to 1. The pixels in regions with healthy or dense
vegetation reflect more NIR light, resulting in high NDVI values. The pixels in regions with unhealthy
vegetation or barren land absorb more NIR light, resulting in low or negative NDVI values. Based on
its NDVI value, you can identify vegetation in a region as dense vegetation, moderate vegetation,
sparse vegetation, or no vegetation. These are the typical NDVI value range for each type of region:
You can segment the desired vegetation regions by performing thresholding using the NDVI values.
In this example, you will interactively select and change the threshold values to identify different
vegetation regions in a hyperspectral data cube based on their NDVI values.
Read hyperspectral data from an ENVI format file into the workspace. This example uses a data
sample from the Pavia dataset, which contains both vegetation and barren regions.
hcube = hypercube('paviaU.dat','paviaU.hdr');
Compute NDVI
Compute the NDVI value for each pixel in the data cube by using the ndvi function. The function
outputs a 2-D image in which the value of each pixel is the NDVI value for the corresponding pixel in
the hyperspectral data cube.
ndviImg = ndvi(hcube);
Identify different regions in the hyperspectral data using multilevel thresholding. Define a label
matrix to assign label values to pixels based on specified threshold values. You can set the thresholds
based on the computed NDVI values.
• Label value 1 - Specify the threshold value as 0.6, and find the pixels with NDVI values greater or
equal to the threshold. These are dense vegetation pixels.
• Label value 2 - Specify a lower threshold limit of 0.4 and an upper threshold limit of 0.6. Find the
pixels with NDVI values greater than or equal to 0.4 and less than 0.6. These are moderate
vegetation pixels.
19-49
19 Hyperspectral Image Processing
• Label value 3 - Specify a lower threshold limit of 0.2 and an upper threshold limit of 0.4. Find the
pixels with NDVI values greater than or equal to 0.2 and less than 0.4. These are the sparse
vegetation pixels.
• Label value 4 - Specify the threshold value as 0.2, and find the pixels with NDVI values less than
the threshold. These are no vegetation pixels.
L = zeros(size(ndviImg));
L(ndviImg >= 0.6) = 1;
L(ndviImg >= 0.4 & ndviImg < 0.6) = 2;
L(ndviImg >= 0.2 & ndviImg < 0.4) = 3;
L(ndviImg < 0.2) = 4;
Estimate a contrast-stretched RGB image from the original data cube by using the colorize
function.
rgbImg = colorize(hcube,'Method','rgb','ContrastStretching',true);
Define a colormap to display each value in the label matrix in a different color. Overlay the label
matrix on the RGB image.
cmap = [0 1 0; 0 0 1; 1 1 0; 1 0 0];
overlayImg = labeloverlay(rgbImg,L,'Colormap',cmap);
To build an interactive interface, first create a figure window using the uifigure function. Then add
two panels to the figure window, for displaying the input image and the overlaid image side by side.
h = uifigure('Name','Interactive NDVI Thresholding','Position',[200,50,1000,700]);
Annotate the figure window with the color for each label and its associated vegetation density. The
colormap value for dense vegetation is green, moderate vegetation is blue, sparse vegetation is
yellow, and no vegetation is red.
annotation(h,'rectangle',[0.82 0.82 0.03 0.03],'Color',[0 1 0],'FaceColor',[0 1 0]);
annotation(h,'rectangle',[0.82 0.77 0.03 0.03],'Color',[0 0 1],'FaceColor',[0 0 1]);
annotation(h,'rectangle',[0.82 0.72 0.03 0.03],'Color',[1 1 0],'FaceColor',[1 1 0]);
annotation(h,'rectangle',[0.82 0.67 0.03 0.03],'Color',[1 0 0],'FaceColor',[1 0 0]);
annotation(h,'textbox',[0.85 0.80 0.9 0.05],'EdgeColor','None','String','Dense Vegetation');
annotation(h,'textbox',[0.85 0.75 0.9 0.05],'EdgeColor','None','String','Moderate Vegetation');
annotation(h,'textbox',[0.85 0.70 0.9 0.05],'EdgeColor','None','String','Sparse Vegetation');
annotation(h,'textbox',[0.85 0.65 0.9 0.05],'EdgeColor','None','String','No Vegetation');
Create sliders for interactively changing the thresholds. Use uislider function to add a slider for
adjusting the minimum threshold value and a slider for adjusting the maximum threshold value.
slidePanel1 = uipanel(h,'Position',[400,120,400,70],'Title','Minimum Threshold Value');
minsld = uislider(slidePanel1,'Position',[30,40,350,3],'Value',-1,'Limits',[-1 1],'MajorTicks',-1
slidePanel2 = uipanel(h,'Position',[400,30,400,70],'Title','Maximum Threshold Value');
maxsld = uislider(slidePanel2,'Position',[30,35,350,3],'Value',1,'Limits',[-1 1],'MajorTicks',-1:
19-50
Identify Vegetation Regions Using Interactive NDVI Thresholding
Use the function ndviThreshold to change the minimum and maximum threshold limits. When you
move the slider thumb and release the mouse button, the ValueChangedFcn callback updates the
slider value and sets the slider value as the new threshold. You must call the ndviThreshold
function separately for the minimum threshold slider and maximum threshold slider. Change the
threshold limits by adjusting the sliders. This enables you to inspect the types of vegetation regions
within your specified threshold limits.
minsld.ValueChangedFcn = @(es,ed) ndviThreshold(minsld,maxsld,ndviImg,rgbImg,ax2,cmap);
maxsld.ValueChangedFcn = @(es,ed) ndviThreshold(minsld,maxsld,ndviImg,rgbImg,ax2,cmap);
The ndviThreshold function generates a new label matrix using the updated threshold values and
dynamically updates the overlaid image in the figure window.
Create callback function to interactively change the threshold limits and dynamically update the
results.
function ndviThreshold(minsld,maxsld,ndviImg,rgbImg,ax2,cmap)
L = zeros(size(ndviImg));
minth = round(minsld.Value,2);
19-51
19 Hyperspectral Image Processing
maxth = round(maxsld.Value,2);
19-52
Identify Vegetation Regions Using Interactive NDVI Thresholding
References
[1] J.W. Rouse, R.H. Hass, J.A. Schell, and D.W. Deering. “Monitoring Vegetation Systems in the Great
Plains with ERTS.” In Proceedings of the Third Earth Resources Technology Satellite- 1
Symposium, 1:309–17. Greenbelt, NASA SP-351, Washington, DC, 1973.
[2] Haboudane, D. “Hyperspectral Vegetation Indices and Novel Algorithms for Predicting Green LAI
of Crop Canopies: Modeling and Validation in the Context of Precision Agriculture.” Remote
Sensing of Environment 90, no. 3 (April 15, 2004): 337–52. https://doi.org/10.1016/
j.rse.2003.12.013.
See Also
colorize | hypercube | labeloverlay | uifigure | uipanel | uislider
More About
• “Getting Started with Hyperspectral Image Processing” on page 19-2
• “Measure Vegetation Cover in Hyperspectral Data Using NDVI Image”
19-53
19 Hyperspectral Image Processing
Hyperspectral imaging measures the spatial and spectral features of an object at different
wavelengths ranging from ultraviolet through long infrared, including the visible spectrum. Unlike
color imaging, which uses only three types of sensors sensitive to the red, green, and blue portions of
the visible spectrum, hyperspectral images can include dozens or hundreds of channels. Therefore,
hyperspectral images can enable the differentiation of objects that appear identical in an RGB image.
This example uses a CSCNN that learns to classify 16 types of vegetation and terrain based on the
unique spectral signatures of each material. The example shows how to train a CSCNN and also
provides a pretrained network that you can use to perform classification.
This example uses the Indian Pines data set, included with the Image Processing Toolbox™
Hyperspectral Imaging Library. The data set consists of a single hyperspectral image of size 145-
by-145 pixels with 220 color channels. The data set also contains a ground truth label image with 16
classes, such as Alfalfa, Corn, Grass-pasture, Grass-trees, and Stone-Steel-Towers.
hcube = hypercube('indian_pines.dat');
rgbImg = colorize(hcube,'method','rgb');
imshow(rgbImg)
19-54
Classify Hyperspectral Images Using Deep Learning
Load the ground truth labels and specify the number of classes.
gtLabel = load('indian_pines_gt.mat');
gtLabel = gtLabel.indian_pines_gt;
numClasses = 16;
Reduce the number of spectral bands to 30 using the hyperpca function. This function performs
principal component analysis (PCA) and selects the spectral bands with the most unique signatures.
dimReduction = 30;
imageData = hyperpca(hcube,dimReduction);
sd = std(imageData,[],3);
imageData = imageData./sd;
Split the hyperspectral image into patches of size 25-by-25 pixels with 30 channels using the
createImagePatchesFromHypercube helper function. This function is attached to the example as
a supporting file. The function also returns a single label for each patch, which is the label of the
central pixel.
windowSize = 25;
inputSize = [windowSize windowSize dimReduction];
[allPatches,allLabels] = createImagePatchesFromHypercube(imageData,gtLabel,windowSize);
Not all of the cubes in this data set have labels. However, training the network requires labeled data.
Select only the labeled cubes for training. Count how many labeled patches are available.
patchesLabeled = allPatches(allLabels>0,:,:,:);
patchLabels = allLabels(allLabels>0);
numCubes = size(patchesLabeled,1);
19-55
19 Hyperspectral Image Processing
patchLabels = categorical(patchLabels);
Randomly divide the patches into training and test data sets.
[trainingIndex,validationIndex,testInd] = dividerand(numCubes,0.3,0.7,0);
dataInputTrain = patchesLabeled(trainingIndex,:,:,:);
dataLabelTrain = patchLabels(trainingIndex,1);
imdsTrain = augmentedImageDatastore(inputSize,dataInputTransposeTrain,dataLabelTrain);
imdsTest = augmentedImageDatastore(inputSize,dataInputTransposeVal,dataLabelVal);
layers = [
image3dInputLayer(inputSize,'Name','Input','Normalization','None')
convolution3dLayer([3 3 7],8,'Name','conv3d_1')
reluLayer('Name','Relu_1')
convolution3dLayer([3 3 5],16,'Name','conv3d_2')
reluLayer('Name','Relu_2')
convolution3dLayer([3 3 3],32,'Name','conv3d_3')
reluLayer('Name','Relu_3')
convolution3dLayer([3 3 1],8,'Name','conv3d_4')
reluLayer('Name','Relu_4')
fullyConnectedLayer(256,'Name','fc1')
reluLayer('Name','Relu_5')
dropoutLayer(0.4,'Name','drop_1')
fullyConnectedLayer(128,'Name','fc2')
dropoutLayer(0.4,'Name','drop_2')
fullyConnectedLayer(numClasses,'Name','fc3')
softmaxLayer('Name','softmax')
classificationLayer('Name','output')];
lgraph = layerGraph(layers);
deepNetworkDesigner(lgraph)
Specify the required network parameters. For this example, train the network for 100 epochs with an
initial learning rate of 0.001, a batch size of 256, and Adam optimization.
numEpochs = 100;
miniBatchSize = 256;
initLearningRate = 0.001;
momentum = 0.9;
19-56
Classify Hyperspectral Images Using Deep Learning
learningRateFactor = 0.01;
By default, the example downloads a pretrained classifier for the Indian Pines data set using the
downloadTrainedIndianPinesCSCNN helper function. This function is attached to the example as
a supporting file. The pretrained network enables you to classify the Indian Pines data set without
waiting for training to complete.
To train the network, set the doTraining variable in the following code to true. If you choose to
train the network, use of a CUDA capable NVIDIA™ GPU is highly recommended. Use of a GPU
requires Parallel Computing Toolbox™. For more information about supported GPU devices, see “GPU
Support by Release” (Parallel Computing Toolbox).
doTraining = false;
if doTraining
modelDateTime = string(datetime('now','Format',"yyyy-MM-dd-HH-mm-ss"));
net = trainNetwork(imdsTrain,lgraph,options);
save(strcat("IndianPinesCSCNN-",modelDateTime,"-Epoch-",num2str(numEpochs),".mat"),'net');
else
dataFolder = fullfile(tempdir,"indianPines");
trainedHyperspectralCSCNN_url = 'https://ssd.mathworks.com/supportfiles/image/data/trainedInd
downloadTrainedIndianPinesCSCNN(trainedHyperspectralCSCNN_url,dataFolder);
load(fullfile(dataFolder,'trainedIndianPinesCSCNN.mat'));
end
Calculate the accuracy of the classification for the test data set. Here, accuracy is the fraction of the
correct pixel classification over all the classes.
predictionTest = classify(net,imdsTest);
accuracy = sum(predictionTest == dataLabelVal)/numel(dataLabelVal);
disp(['Accuracy of the test data = ', num2str(accuracy)])
Reconstruct the complete image by classifying all image pixels, including pixels in labeled training
patches, pixels in labeled test patches, and unlabeled pixels.
prediction = classify(net,dsAllPatches);
prediction = double(prediction);
19-57
19 Hyperspectral Image Processing
The network is trained on labeled patches only. Therefore, the predicted classification of unlabeled
pixels is meaningless. Find the unlabeled patches and set the label to 0.
patchesUnlabeled = find(allLabels==0);
prediction(patchesUnlabeled) = 0;
Reshape the classified pixels to match the dimensions of the ground truth image.
[m,n,d] = size(imageData);
indianPinesPrediction = reshape(prediction,[n m]);
indianPinesPrediction = indianPinesPrediction';
cmap = parula(numClasses);
figure
tiledlayout(1,2,"TileSpacing","Tight")
nexttile
imshow(gtLabel,cmap)
title("Ground Truth Classification")
nexttile
imshow(indianPinesPrediction,cmap)
colorbar
title("Predicted Classification")
19-58
Classify Hyperspectral Images Using Deep Learning
To highlight misclassified pixels, display a composite image of the ground truth and predicted labels.
Gray pixels indicate identical labels and colored pixels indicate different labels.
figure
imshowpair(gtLabel,indianPinesPrediction)
19-59
19 Hyperspectral Image Processing
See Also
augmentedImageDatastore | classify | colorize | hypercube | hyperpca | imageDatastore
| trainNetwork | trainingOptions
Related Examples
• “Semantic Segmentation of Multispectral Images Using Deep Learning” on page 18-81
19-60
20
• Write your MATLAB function or application as you would normally, using functions from the Image
Processing Toolbox.
• Add the %#codegen compiler directive at the end of the function signature. This directive
instructs the MATLAB code analyzer to diagnose issues that would prohibit successful code
generation.
• Open the MATLAB Coder app, create a project, and add your file to the project. In the app, you
can check the readiness of your code for code generation. For example, your code may contain
functions that are not enabled for code generation. Make any modifications required for code
generation.
• Generate code by clicking Generate on the Generate Code page of the MATLAB Coder app. You
can choose to generate a MEX file, a shared library, a dynamic library, or an executable.
Even if you addressed all readiness issues identified by MATLAB Coder, you might still encounter
build issues. The readiness check only looks at function dependencies. When you try to generate
code, MATLAB Coder might discover coding patterns that are not supported for code generation.
View the error report and modify your MATLAB code until you get a successful build.
For a complete list of Image Processing Toolbox functions that support code generation, see
Functions Supporting Code Generation. For an example of using code generation, see “Generate
Code for Object Detection” on page 20-8.
• Some functions generate standalone C code that can be incorporated into applications that run on
many platforms, such as ARM processors.
• Some functions generate C code that uses a platform-specific shared library. The Image
Processing Toolbox uses this shared library approach to preserve performance optimizations, but
this limits the platforms on which you can run this code to only platforms that can host MATLAB.
To view a list of host platforms, see system requirements.
• Some functions can generate either standalone C code or generate code that depends on a shared
library, depending upon which target you choose in the MATLAB Coder configuration settings.
• If you choose the generic MATLAB Host Computer option, these functions deliver code that
uses a shared library.
• If you choose any other platform option, these functions deliver C code.
The diagram illustrates the difference between generating C code and generating code that uses a
shared library.
20-2
Code Generation for Image Processing
See Also
MATLAB Coder | codegen
Related Examples
• “Generate Code for Object Detection” on page 20-8
More About
• “Code Generation Workflow” (MATLAB Coder)
• “Generate C Code by Using the MATLAB Coder App” (MATLAB Coder)
• Functions Supporting Code Generation
20-3
20 Code Generation for Image Processing Toolbox Functions
In generated code, each supported toolbox function has the same name, arguments, and functionality
as its Image Processing Toolbox counterpart. However, some functions have limitations. The following
table includes information about code generation limitations that might exist for each function. In the
following table, all the functions generate C code. The table identifies those functions that generate C
code that depends on a shared library, and those functions that can do both, depending on which
target platform you choose.
An asterisk (*) indicates that the reference page has usage notes and limitations for C/C++ code
generation.
20-4
List of Supported Functions with Usage Notes
20-5
20 Code Generation for Image Processing Toolbox Functions
20-6
List of Supported Functions with Usage Notes
See Also
MATLAB Coder | codegen
More About
• “Code Generation for Image Processing” on page 20-2
• “MATLAB Coder”
20-7
20 Code Generation for Image Processing Toolbox Functions
This example also demonstrates how to solve issues that you might encounter in your MATLAB code
that prevent code generation. To illustrate the process, the code used by this example includes some
readiness issues and build issues that you must overcome before you can generate code.
For more information about generating code, see the MATLAB Coder documentation.
Set Up Compiler
Specify which C/C++ compiler you want to use with MATLAB Coder to generate code by using the
mex function with the -setup option.
mex -setup
The entry-point function is a MATLAB function used as the source code for code generation. First,
prototype the image processing workflow without support for code generation. This example defines
a function called detectCells.m that performs cell detection using segmentation and morphological
techniques. This function is attached to the example as a supporting file.
I = imread('cell.tif');
Iseg = detectCells(I);
20-8
Generate Code for Object Detection
20-9
20 Code Generation for Image Processing Toolbox Functions
Confirm the accuracy of the segmentation by overlaying the segmented image on the original image.
imshow(labeloverlay(I,Iseg))
Because you modify this code for code generation, it is good to work with a copy of the code. This
example includes a copy of the helper function detectCells.m named detectCellsCodeGen.m.
The version of the function used for code generation includes the MATLAB Coder compilation
directive %#codegen at the end of the function signature. This directive instructs the MATLAB code
analyzer to diagnose issues that would prohibit successful code generation.
Open the MATLAB Coder app by using the coder function. (Alternatively, in MATLAB, select the Apps
tab, navigate to Code Generation and click the MATLAB Coder app.)
coder
Specify the name of your entry-point function, detectCellsCodeGen, and press Enter.
20-10
Generate Code for Object Detection
Click Next. MATLAB Coder identifies any issues that might prevent code generation. The example
code contains five unsupported function calls.
20-11
20 Code Generation for Image Processing Toolbox Functions
Review the readiness issues. Click Review Issues. In the report, MATLAB Coder displays your code
in an editing window with the readiness issues listed below, flagging uses of the imshow function
which does not support code generation.
20-12
Generate Code for Object Detection
Address the readiness issues. Remove the calls to imshow and related display code from your
example. The display statements are not necessary for the segmentation operation. You can edit the
example code directly in MATLAB Coder. When you have removed the code, click Save to save your
edits and rerun the readiness check. After rerunning the readiness check, MATLAB Coder displays
the No issues found message.
20-13
20 Code Generation for Image Processing Toolbox Functions
Every input to your code must be specified to be of fixed size, variable size or a constant. There are
several ways to specify the size of your input argument but the easiest way is by giving MATLAB
Coder an example of calling your function. Enter a script that calls your function in the text entry
field. For this example, enter the following code in the MATLAB prompt and press Autodefine Input
Types.
I = imread('cell.tif');
Iseg = detectCellsCodeGen(I);
20-14
Generate Code for Object Detection
20-15
20 Code Generation for Image Processing Toolbox Functions
For more information about defining inputs, see the MATLAB Coder documentation. After MATLAB
Coder returns with the input type definition, click Next.
20-16
Generate Code for Object Detection
Even though you performed MATLAB Coder readiness checks, additional issues might arise during
the build process that can prevent code generation. While the readiness checks look at function
dependencies to determine readiness, the build process examines coding patterns. You can use the
same code you entered to define input types (which is preloaded into the dialog box). Click Check for
Issues.
20-17
20 Code Generation for Image Processing Toolbox Functions
This example contains a build issue: it passes an array of strel objects to imdilate and arrays of
objects are not supported for code generation.
20-18
Generate Code for Object Detection
Address the build issues identified. For this example, modify the call to imdilate to avoid passing an
array of strel objects. Replace the single call to imdilate with two separate calls to imdilate
where you pass one strel object with each call.
20-19
20 Code Generation for Image Processing Toolbox Functions
Rerun the test build to make sure your changes fixed the issue. Click Check for Issues. MATLAB
Coder displays a message declaring that no issues were detected.
20-20
Generate Code for Object Detection
Generate Code
Choose the type of code you want to generate and select the target platform. MATLAB Coder can
generate C or C++ source code, a MEX file, a static library, a shared library, or a standalone
executable. For Production Hardware, you can select from many choices including ARM and Intel
processors.
This example uses the default options. The build type is Source Code and the language is C. For
device options, specify a generic device from a device vendor and a MATLAB Host Computer for the
device type. When you choose MATLAB Host Computer, MATLAB Coder generates code that depends
on a precompiled shared library. Image Processing Toolbox functions use a shared library to preserve
performance optimizations.
Click Generate.
20-21
20 Code Generation for Image Processing Toolbox Functions
Click Next to complete the process. MATLAB Coder displays information about what it generated. By
default, MATLAB Coder creates a codegen subfolder in your work folder that contains the generated
output.
See Also
MATLAB Coder | codegen
More About
• “Code Generation for Image Processing” on page 20-2
• “Code Generation Workflow” (MATLAB Coder)
• “Generate C Code by Using the MATLAB Coder App” (MATLAB Coder)
• “Input Type Specification for Code Generation” (MATLAB Coder)
• Functions Supporting Code Generation
20-22
21
To run image processing code on a graphics processing unit (GPU), you must have the Parallel
Computing Toolbox software. To perform a supported image processing operation on a GPU, follow
these steps:
• Move the data from the CPU to the GPU. Use the gpuArray function to transfer an array from
MATLAB to the GPU. For more information, see “Create GPU Arrays from Existing Data” (Parallel
Computing Toolbox).
• Perform the image processing operation on the GPU. For a list of all the toolbox functions that
have been GPU-enabled, see Functions Supporting GPU Computing.
• Move the data back onto the CPU from the GPU. Use the gather function to retrieve an array
from the GPU and transfer the array to the MATLAB workspace as a regular MATLAB array.
If you call a function with GPU support using at least one gpuArray input argument, then the
function runs automatically on a GPU and generates a gpuArray as the result. You can mix inputs
using both gpuArray and MATLAB arrays in the same function call. In this case, the function
automatically transfers the MATLAB arrays to the GPU for execution.
To learn about integrating custom CUDA kernels directly into MATLAB to accelerate complex
algorithms, see “Run CUDA or PTX Code on GPU” (Parallel Computing Toolbox).
See Also
gather | gpuArray
Related Examples
• “Perform Thresholding and Morphological Operations on GPU” on page 21-3
• “Perform Pixel-Based Operations on GPU” on page 21-8
More About
• “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox)
• Functions Supporting GPU Computing
21-2
Perform Thresholding and Morphological Operations on GPU
imOriginal = imread('concordaerial.png');
imshow(imOriginal)
Move the image to the GPU by creating a gpuArray (Parallel Computing Toolbox) object.
imGPUoriginal = gpuArray(imOriginal);
As a preprocessing step, change the RGB image to a grayscale image. rgb2gray performs the
conversion operation on a GPU because the input argument is a gpuArray.
imGPUgray = rgb2gray(imGPUoriginal);
View the image in the Image Viewer app and inspect the pixel values to find the value of watery
areas. To use Image Viewer, you must bring the image data back onto the CPU by using the gather
(Parallel Computing Toolbox) function. As you move the mouse over the image, you can view the value
of the pixel under the cursor at the bottom of the Image Viewer. In the image, areas of water are dark
and have pixel values less than 70.
21-3
21 GPU Computing with Image Processing Toolbox Functions
imtool(gather(imGPUgray));
To get a new image that contains only the pixels with values less than 70, threshold the image on the
GPU.
imWaterGPU = imGPUgray<70;
Display the thresholded image. Unlike Image Viewer, the imshow function supports gpuArray input.
figure
imshow(imWaterGPU)
21-4
Perform Thresholding and Morphological Operations on GPU
Remove small objects from the image while preserving the shape and size of larger objects by using
morphological opening. The imopen function performs morphological opening and supports
gpuArray input.
imWaterMask = imopen(imWaterGPU,strel('disk',5));
imshow(imWaterMask)
21-5
21 GPU Computing with Image Processing Toolbox Functions
Create a copy of the original image that will contain the enhanced data. Convert the data type to
single.
imGPUenhanced = im2single(imGPUoriginal);
blueChannelOriginal = imGPUenhanced(:,:,3);
Enhance the saturation of the blue channel by increasing the strength of the blue channel for pixels
where the mask is 1 (true).
The maximum value of the enhanced blue channel exceeds the maximum value expected of images of
data type single. Rescale the data to the expected range [0, 1] by using the rescale function.
blueChannelEnhanced = rescale(blueChannelEnhanced);
imGPUenhanced(:,:,3) = blueChannelEnhanced;
Display the enhanced image. Pixels corresponding to water have a more saturated blue color in the
enhanced image than in the original image.
21-6
Perform Thresholding and Morphological Operations on GPU
imshow(imGPUenhanced)
title('Enhanced Image')
After filtering the image on the GPU, move the data back to the CPU by using the gather function.
Write the modified image to a file.
outCPU = gather(imGPUenhanced);
imwrite(outCPU,'concordwater.png')
See Also
gather | gpuArray
Related Examples
• “Perform Pixel-Based Operations on GPU” on page 21-8
More About
• “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox)
• Functions Supporting GPU Computing
21-7
21 GPU Computing with Image Processing Toolbox Functions
I = imread('concordaerial.png');
imshow(I)
Move the data from the CPU to the GPU by creating a gpuArray (Parallel Computing Toolbox) object.
Igpu = gpuArray(I);
Perform an operation on the GPU. This example defines a custom function called rgb2gray_custom
that converts an RGB image to grayscale by using a custom weighting of the red, green, and blue
color channels. This function is defined at the end of the example. Pass the handle to the custom
function and data to the GPU for evaluation by the arrayfun (Parallel Computing Toolbox) function.
21-8
Perform Pixel-Based Operations on GPU
Move the data back to the CPU from the GPU by using the gather (Parallel Computing Toolbox)
function.
I_gpuresult = gather(Igray_gpu);
imshow(I_gpuresult)
Supporting Function
The rgb2gray_custom helper functions takes a linear combination of three channels and returns a
single channel output image.
See Also
arrayfun | gather | gpuArray
Related Examples
• “Perform Thresholding and Morphological Operations on GPU” on page 21-3
21-9
21 GPU Computing with Image Processing Toolbox Functions
More About
• “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox)
• Functions Supporting GPU Computing
21-10