0% found this document useful (0 votes)
34 views

Image Processing Toolbox User Guide Matlab 2023

Uploaded by

premprakashsliet
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views

Image Processing Toolbox User Guide Matlab 2023

Uploaded by

premprakashsliet
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1800

Image Processing Toolbox™

User's Guide

R2023b
How to Contact MathWorks

Latest news: www.mathworks.com

Sales and services: www.mathworks.com/sales_and_services

User community: www.mathworks.com/matlabcentral

Technical support: www.mathworks.com/support/contact_us

Phone: 508-647-7000

The MathWorks, Inc.


1 Apple Hill Drive
Natick, MA 01760-2098
Image Processing Toolbox™ User's Guide
© COPYRIGHT 1993–2023 by The MathWorks, Inc.
The software described in this document is furnished under a license agreement. The software may be used or copied
only under the terms of the license agreement. No part of this manual may be photocopied or reproduced in any form
without prior written consent from The MathWorks, Inc.
FEDERAL ACQUISITION: This provision applies to all acquisitions of the Program and Documentation by, for, or through
the federal government of the United States. By accepting delivery of the Program or Documentation, the government
hereby agrees that this software or documentation qualifies as commercial computer software or commercial computer
software documentation as such terms are used or defined in FAR 12.212, DFARS Part 227.72, and DFARS 252.227-7014.
Accordingly, the terms and conditions of this Agreement and only those rights specified in this Agreement, shall pertain
to and govern the use, modification, reproduction, release, performance, display, and disclosure of the Program and
Documentation by the federal government (or other entity acquiring for or through the federal government) and shall
supersede any conflicting contractual terms or conditions. If this License fails to meet the government's needs or is
inconsistent in any respect with federal procurement law, the government agrees to return the Program and
Documentation, unused, to The MathWorks, Inc.
Trademarks
MATLAB and Simulink are registered trademarks of The MathWorks, Inc. See
www.mathworks.com/trademarks for a list of additional trademarks. Other product or brand names may be
trademarks or registered trademarks of their respective holders.
Patents
MathWorks products are protected by one or more U.S. patents. Please see www.mathworks.com/patents for
more information.
Revision History
August 1993 First printing Version 1
May 1997 Second printing Version 2
April 2001 Third printing Revised for Version 3.0
June 2001 Online only Revised for Version 3.1 (Release 12.1)
July 2002 Online only Revised for Version 3.2 (Release 13)
May 2003 Fourth printing Revised for Version 4.0 (Release 13.0.1)
September 2003 Online only Revised for Version 4.1 (Release 13.SP1)
June 2004 Online only Revised for Version 4.2 (Release 14)
August 2004 Online only Revised for Version 5.0 (Release 14+)
October 2004 Fifth printing Revised for Version 5.0.1 (Release 14SP1)
March 2005 Online only Revised for Version 5.0.2 (Release 14SP2)
September 2005 Online only Revised for Version 5.1 (Release 14SP3)
March 2006 Online only Revised for Version 5.2 (Release 2006a)
September 2006 Online only Revised for Version 5.3 (Release 2006b)
March 2007 Online only Revised for Version 5.4 (Release 2007a)
September 2007 Online only Revised for Version 6.0 (Release 2007b)
March 2008 Online only Revised for Version 6.1 (Release 2008a)
October 2008 Online only Revised for Version 6.2 (Release 2008b)
March 2009 Online only Revised for Version 6.3 (Release 2009a)
September 2009 Online only Revised for Version 6.4 (Release 2009b)
March 2010 Online only Revised for Version 7.0 (Release 2010a)
September 2010 Online only Revised for Version 7.1 (Release 2010b)
April 2011 Online only Revised for Version 7.2 (Release 2011a)
September 2011 Online only Revised for Version 7.3 (Release 2011b)
March 2012 Online only Revised for Version 8.0 (Release 2012a)
September 2012 Online only Revised for Version 8.1 (Release 2012b)
March 2013 Online only Revised for Version 8.2 (Release 2013a)
September 2013 Online only Revised for Version 8.3 (Release 2013b)
March 2014 Online only Revised for Version 9.0 (Release 2014a)
October 2014 Online only Revised for Version 9.1 (Release 2014b)
March 2015 Online only Revised for Version 9.2 (Release 2015a)
September 2015 Online only Revised for Version 9.3 (Release 2015b)
March 2016 Online only Revised for Version 9.4 (Release 2016a)
September 2016 Online only Revised for Version 9.5 (Release 2016b)
March 2017 Online only Revised for Version 10.0 (Release 2017a)
September 2017 Online only Revised for Version 10.1 (Release 2017b)
March 2018 Online only Revised for Version 10.2 (Release 2018a)
September 2018 Online only Revised for Version 10.3 (Release 2018b)
March 2019 Online only Revised for Version 10.4 (Release 2019a)
September 2019 Online only Revised for Version 11.0 (Release 2019b)
March 2020 Online only Revised for Version 11.1 (Release 2020a)
September 2020 Online only Revised for Version 11.2 (Release 2020b)
March 2021 Online only Revised for Version 11.3 (Release 2021a)
September 2021 Online only Revised for Version 11.4 (Release 2021b)
March 2022 Online only Revised for Version 11.5 (Release 2022a)
September 2022 Online only Revised for Version 11.6 (Release 2022b)
March 2023 Online only Revised for Version 11.7 (Release 2023a)
September 2023 Online only Revised for Version 23.2 (R2023b)
Contents

Getting Started
1
Image Processing Toolbox Product Description . . . . . . . . . . . . . . . . . . . . . 1-2
Key Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-2

Compilability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-3

Basic Image Import, Processing, and Export . . . . . . . . . . . . . . . . . . . . . . . 1-4

Correct Nonuniform Illumination and Analyze Foreground Objects . . . . . 1-9

Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-17

Introduction
2
Images in MATLAB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-2

Image Types in the Toolbox . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-3


Binary Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-4
Indexed Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-4
Grayscale Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-5
Truecolor Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-6
HDR Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-6
Multispectral and Hyperspectral Images . . . . . . . . . . . . . . . . . . . . . . . . . . 2-7
Label Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-8

Convert Between Image Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-9

Display Separated Color Channels of RGB Image . . . . . . . . . . . . . . . . . . . 2-11

Convert Image Data Between Data Types . . . . . . . . . . . . . . . . . . . . . . . . . 2-14


Overview of Image Data Type Conversions . . . . . . . . . . . . . . . . . . . . . . . 2-14
Losing Information in Conversions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-14
Converting Indexed Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-14

Work with Image Sequences as Multidimensional Arrays . . . . . . . . . . . . 2-15


Create Multidimensional Array Representing Image Sequence . . . . . . . . 2-15
Display Image Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-15
Process Image Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-16

Perform an Operation on a Sequence of Images . . . . . . . . . . . . . . . . . . . 2-18

v
Process Folder of Images Using Image Batch Processor App . . . . . . . . . 2-20

Process Images Using Image Batch Processor App with File Metadata
......................................................... 2-27

Process Large Set of Images Using MapReduce Framework and Hadoop


......................................................... 2-35

Detecting Cars in a Video of Traffic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-44

Image Arithmetic Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-50

Image Arithmetic Clipping Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-51

Nest Calls to Image Arithmetic Functions . . . . . . . . . . . . . . . . . . . . . . . . 2-52

Find Vegetation in a Multispectral Image . . . . . . . . . . . . . . . . . . . . . . . . . 2-53

Image Coordinate Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-63


Pixel Indices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-63
Spatial Coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-63

Define World Coordinate System of Image . . . . . . . . . . . . . . . . . . . . . . . . 2-66


Define Spatial Referencing Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-66
Specify Minimum and Maximum Image Extent . . . . . . . . . . . . . . . . . . . . 2-67

Shift X- and Y-Coordinate Range of Displayed Image . . . . . . . . . . . . . . . . 2-69

Reading and Writing Image Data


3
Read Image Data into the Workspace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-2

Read Multiple Images from a Single Graphics File . . . . . . . . . . . . . . . . . . 3-4

Read and Write 1-Bit Binary Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-5

Write Image Data to File in Graphics Format . . . . . . . . . . . . . . . . . . . . . . . 3-6

DICOM Support in Image Processing Toolbox . . . . . . . . . . . . . . . . . . . . . . 3-7


Read and Display DICOM Image Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-7
Work with DICOM Metadata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-7
Write New DICOM Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-8
Work with DICOM-RT Contour Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-8
Prepare DICOM Files for Deep Learning Workflows . . . . . . . . . . . . . . . . . 3-8

Read Metadata from DICOM Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-10


Private DICOM Metadata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-10
Create Your Own Copy of DICOM Dictionary . . . . . . . . . . . . . . . . . . . . . . 3-11

vi Contents
Read Image Data from DICOM Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-12
View DICOM Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-12

Write Image Data to DICOM Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-14


Include Metadata with Image Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-14
Specify Value Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-14

Remove Confidential Information from DICOM File . . . . . . . . . . . . . . . . 3-16

Create New DICOM Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-20

Create Image Datastore Containing DICOM Images . . . . . . . . . . . . . . . . 3-24

Create Image Datastore Containing Single and Multi-File DICOM


Volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-26

Add and Modify ROIs of DICOM-RT Contour Data . . . . . . . . . . . . . . . . . . 3-29

Create and Display 3-D Mask of DICOM-RT Contour Data . . . . . . . . . . . 3-34

Mayo Analyze 7.5 Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-40

Interfile Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-41

Implement Digital Camera Processing Pipeline . . . . . . . . . . . . . . . . . . . . 3-42

Work with High Dynamic Range Images . . . . . . . . . . . . . . . . . . . . . . . . . . 3-52


Read HDR Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-52
Display and Process HDR Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-52
Create High Dynamic Range Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-53
Write High Dynamic Range Image to File . . . . . . . . . . . . . . . . . . . . . . . . 3-53

Display High Dynamic Range Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-54

Displaying and Exploring Images


4
Choose Approach to Display 2-D and 3-D Images . . . . . . . . . . . . . . . . . . . . 4-2
Display 2-D Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-2
Display 2-D Slices and Frames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-6
Display 3-D Renderings of 3-D Volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-8

Display an Image in Figure Window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-10


Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-10
Specifying the Initial Image Magnification . . . . . . . . . . . . . . . . . . . . . . . 4-12
Controlling the Appearance of the Figure . . . . . . . . . . . . . . . . . . . . . . . . 4-12

Display Multiple Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-14


Display Multiple Images in Separate Figure Windows . . . . . . . . . . . . . . . 4-14
Display Multiple Images in a Montage . . . . . . . . . . . . . . . . . . . . . . . . . . 4-14
Display Images Individually in the Same Figure . . . . . . . . . . . . . . . . . . . 4-16

vii
Compare a Pair of Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-17

View and Edit Collection of Images in Folder or Datastore . . . . . . . . . . . 4-19

Get Started with Image Viewer App . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-25


Open Image Viewer App . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-26
Navigate Image in Image Viewer App . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-27
Get Information About Image Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-28
Modify Image Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-30
Save and Export Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-31

Get Pixel Information in Image Viewer App . . . . . . . . . . . . . . . . . . . . . . . 4-33


Determine Individual Pixel Values in Image Viewer . . . . . . . . . . . . . . . . . 4-33
View Pixel Values in Image Region . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-34

Measure Distances and Areas Using Image Viewer App . . . . . . . . . . . . . 4-37


Determine Distance Between Pixels . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-37
Determine Area of Polygon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-38
Hide or Delete Measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-39
Export Distance and Area Measurements . . . . . . . . . . . . . . . . . . . . . . . . 4-39

Adjust Image Contrast in Image Viewer App . . . . . . . . . . . . . . . . . . . . . . 4-41


Load Image into Image Viewer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-41
Adjust Contrast and Brightness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-42
View Imported and Adjusted Image Values . . . . . . . . . . . . . . . . . . . . . . . 4-44
Export Contrast-Adjusted Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-45

Crop Image Using Image Viewer App . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-47


Define Cropping Rectangle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-47
View Cropped Image in App . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-48
Export Cropped Region . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-48

Get Started with Image Tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-50


Open Image Tool and Display Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-51
Navigate Image in Image Tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-52
Get Information about Image Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-53
Modify Image Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-56
Save and Export Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-58

Explore 3-D Volumetric Data with Volume Viewer App . . . . . . . . . . . . . . 4-60

Explore 3-D Labeled Volumetric Data with Volume Viewer . . . . . . . . . . . 4-74

Display Interior Labels by Clipping Volume Planes . . . . . . . . . . . . . . . . . 4-80

Display Interior Labels by Adjusting Volume Overlay Properties . . . . . . 4-88

Display Volume Using Cinematic Rendering . . . . . . . . . . . . . . . . . . . . . . . 4-97

Remove Objects from Volume Display Using 3-D Scissors . . . . . . . . . . . 4-103

Display Translucent Volume with Advanced Light Scattering . . . . . . . . 4-109

Display Large 3-D Images Using Blocked Volume Visualization . . . . . . 4-114

viii Contents
View Image Sequences in Video Viewer . . . . . . . . . . . . . . . . . . . . . . . . . . 4-122
Open Data in Video Viewer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-122
Explore Image Sequence Using Playback Controls . . . . . . . . . . . . . . . . 4-122
Examine Frame More Closely . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-123
Specify Frame Rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-124
Specify Colormap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-124
Get Information about an Image Sequence . . . . . . . . . . . . . . . . . . . . . . 4-125
Configure Video Viewer App . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-125

Convert Multiframe Image to Movie . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-127

Display Different Image Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-128


Display Indexed Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-128
Display Grayscale Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-128
Display Binary Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-130
Display Truecolor Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-131

Add Color Bar to Displayed Grayscale Image . . . . . . . . . . . . . . . . . . . . . 4-133

Print Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-135


Graphics Object Properties That Impact Printing . . . . . . . . . . . . . . . . . 4-135

Manage Display Preferences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-136


Retrieve Toolbox Preferences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-136
Set Toolbox Preferences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-136
Control Image Display Using Preferences and Name-Value Arguments . 4-136

Building GUIs with Modular Tools


5
Interactive Image Viewing and Processing Tools . . . . . . . . . . . . . . . . . . . . 5-2

Interactive Tool Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-6


Display Target Image in Figure Window . . . . . . . . . . . . . . . . . . . . . . . . . . 5-6
Create the Tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-6
Position Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-7
Add Navigation Aids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-8
Customize Tool Interactivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-8

Add Scroll Panel to Figure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-10

Get Handle to Target Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-13

Create Pixel Region Tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-15

Build App to Display Pixel Information . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-19

Build App for Navigating Large Images . . . . . . . . . . . . . . . . . . . . . . . . . . 5-21

Build Image Comparison Tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-24

Create Angle Measurement Tool Using ROI Objects . . . . . . . . . . . . . . . . 5-28

ix
Geometric Transformations
6
Resize an Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-2

Resize Image and Preserve Aspect Ratio . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-7

Rotate an Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-13

Crop an Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-15

Translate an Image Using imtranslate Function . . . . . . . . . . . . . . . . . . . 6-17

2-D and 3-D Geometric Transformation Process Overview . . . . . . . . . . . 6-20


Create Geometric Transformation Object . . . . . . . . . . . . . . . . . . . . . . . . 6-20

Migrate Geometric Transformations to Premultiply Convention . . . . . . 6-25


About the Premultiply and Postmultiply Conventions . . . . . . . . . . . . . . . 6-25
Create New Geometric Transformation Objects from Previous Geometric
Transformation Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-25

Matrix Representation of Geometric Transformations . . . . . . . . . . . . . . . 6-27


2-D Affine Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-27
2-D Projective Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-28
3-D Affine Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-29
3-D Projective and N-D Transformations . . . . . . . . . . . . . . . . . . . . . . . . . 6-31

Create Composite 2-D Affine Transformations . . . . . . . . . . . . . . . . . . . . . 6-32

Specify Fill Values in Geometric Transformation Output . . . . . . . . . . . . . 6-36

Perform Simple 2-D Translation Transformation . . . . . . . . . . . . . . . . . . . 6-38

N-Dimensional Spatial Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . 6-41

Register Two Images Using Spatial Referencing to Enhance Display . . . 6-43

Create a Gallery of Transformed Images . . . . . . . . . . . . . . . . . . . . . . . . . . 6-48

Exploring a Conformal Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-63

Explore Slices from 3-D Image Volume with Anisotropic Voxel Spacing
......................................................... 6-75

Padding and Shearing an Image Simultaneously . . . . . . . . . . . . . . . . . . . 6-85

x Contents
Image Registration
7
Choose Image Registration Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-2
The Registration Estimator App . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-2
Intensity-Based Automatic Image Registration . . . . . . . . . . . . . . . . . . . . . 7-3
Control Point Registration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-4
Automated Feature Detection and Matching . . . . . . . . . . . . . . . . . . . . . . . 7-5

Register Images Using Registration Estimator App . . . . . . . . . . . . . . . . . . 7-6

Load Images, Spatial Referencing Information, and Initial


Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-14
Load Images from File or Workspace . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-14
Provide Spatial Referencing Information . . . . . . . . . . . . . . . . . . . . . . . . . 7-15
Provide an Initial Geometric Transformation . . . . . . . . . . . . . . . . . . . . . . 7-16

Tune Registration Settings in Registration Estimator . . . . . . . . . . . . . . . 7-17


Geometric Transformations Supported by Registration Estimator . . . . . . 7-17
Feature-Based Registration Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-17
Intensity-Based Registration Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-18
Nonrigid and Post-Processing Settings . . . . . . . . . . . . . . . . . . . . . . . . . . 7-18

Export Results from Registration Estimator App . . . . . . . . . . . . . . . . . . . 7-20


Export Results to the Workspace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-20
Generate a Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-20

Techniques Supported by Registration Estimator . . . . . . . . . . . . . . . . . . 7-22


Feature-Based Registration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-22
Intensity-Based Registration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-22
Nonrigid Registration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-23

Intensity-Based Automatic Image Registration . . . . . . . . . . . . . . . . . . . . 7-24

Create an Optimizer and Metric for Intensity-Based Image Registration


......................................................... 7-26

Use Phase Correlation as Preprocessing Step in Registration . . . . . . . . 7-27

Register Multimodal MRI Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-32

Register Multimodal 3-D Medical Images . . . . . . . . . . . . . . . . . . . . . . . . . 7-42

Registering an Image Using Normalized Cross-Correlation . . . . . . . . . . 7-51

Control Point Registration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-57

Geometric Transformation Types for Control Point Registration . . . . . . 7-59

Control Point Selection Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-61

Start the Control Point Selection Tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-63

xi
Find Visual Elements Common to Both Images . . . . . . . . . . . . . . . . . . . . 7-65
Use Scroll Bars to View Other Parts of an Image . . . . . . . . . . . . . . . . . . . 7-65
Use the Detail Rectangle to Change the View . . . . . . . . . . . . . . . . . . . . . 7-65
Pan the Image Displayed in the Detail Window . . . . . . . . . . . . . . . . . . . . 7-65
Zoom In and Out on an Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-65
Specify the Magnification of the Images . . . . . . . . . . . . . . . . . . . . . . . . . 7-66
Lock the Relative Magnification of the Moving and Fixed Images . . . . . . 7-67

Select Matching Control Point Pairs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-68


Pick Control Point Pairs Manually . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-68
Use Control Point Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-69
Move Control Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-71
Delete Control Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-71

Export Control Points to the Workspace . . . . . . . . . . . . . . . . . . . . . . . . . . 7-73

Find Image Rotation and Scale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-75

Use Cross-Correlation to Improve Control Point Placement . . . . . . . . . . 7-79

Register Images with Projection Distortion Using Control Points . . . . . 7-80

Designing and Implementing Linear Filters for Image Data


8
What Is Image Filtering in the Spatial Domain? . . . . . . . . . . . . . . . . . . . . 8-2
Convolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-2
Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-3

Filter Grayscale and Truecolor (RGB) Images Using imfilter Function . . 8-5

imfilter Boundary Padding Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-9

Change Filter Strength Radially Outward . . . . . . . . . . . . . . . . . . . . . . . . . 8-12

Noise Removal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-18


Remove Noise by Linear Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-18
Remove Noise Using an Averaging Filter and a Median Filter . . . . . . . . . 8-18
Remove Noise By Adaptive Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-21

Apply Gaussian Smoothing Filters to Images . . . . . . . . . . . . . . . . . . . . . . 8-24

Reduce Noise in Image Gradients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-30

What is Guided Image Filtering? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-39

Perform Flash/No-flash Denoising with Guided Filter . . . . . . . . . . . . . . . 8-40

Segment Thermographic Image After Edge-Preserving Filtering . . . . . . 8-44

Integral Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-48

xii Contents
Apply Multiple Filters to Integral Image . . . . . . . . . . . . . . . . . . . . . . . . . . 8-50

Filter Images Using Predefined Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-55

Generate HDL Code for Image Sharpening . . . . . . . . . . . . . . . . . . . . . . . . 8-58

Adjust Image Intensity Values to Specified Range . . . . . . . . . . . . . . . . . . 8-65

Gamma Correction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-67


Specify Gamma when Adjusting Contrast . . . . . . . . . . . . . . . . . . . . . . . . 8-67

Contrast Enhancement Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-69

Specify Contrast Adjustment Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-73


Specify Contrast Adjustment Limits as Range . . . . . . . . . . . . . . . . . . . . . 8-73
Set Image Intensity Adjustment Limits Automatically . . . . . . . . . . . . . . . 8-74

Adjust Image Contrast Using Histogram Equalization . . . . . . . . . . . . . . . 8-75

Adaptive Histogram Equalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-80


Adjust Contrast using Adaptive Histogram Equalization . . . . . . . . . . . . . 8-80

Enhance Color Separation Using Decorrelation Stretching . . . . . . . . . . 8-83


Simple Decorrelation Stretching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-83
Linear Contrast Stretching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-87
Decorrelation Stretch with Linear Contrast Stretch . . . . . . . . . . . . . . . . 8-87

Enhance Multispectral Color Composite Images . . . . . . . . . . . . . . . . . . . 8-90

Low-Light Image Enhancement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-100

Design Linear Filters in the Frequency Domain . . . . . . . . . . . . . . . . . . . 8-107


Two-Dimensional Finite Impulse Response (FIR) Filters . . . . . . . . . . . . 8-107
Create 2-D Filter Using Frequency Transformation of 1-D Filter . . . . . . 8-107
Create Filter Using Frequency Sampling Method . . . . . . . . . . . . . . . . . 8-109
Windowing Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-111
Creating the Desired Frequency Response Matrix . . . . . . . . . . . . . . . . 8-112
Computing the Frequency Response of a Filter . . . . . . . . . . . . . . . . . . . 8-113

Image Deblurring
9
Image Deblurring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-2
Deblurring Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-3

Deblur Images Using a Wiener Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-5

Deblur Images Using Regularized Filter . . . . . . . . . . . . . . . . . . . . . . . . . . 9-12

xiii
Adapt the Lucy-Richardson Deconvolution for Various Image Distortions
......................................................... 9-22
Reduce the Effect of Noise Amplification . . . . . . . . . . . . . . . . . . . . . . . . 9-22
Account for Nonuniform Image Quality . . . . . . . . . . . . . . . . . . . . . . . . . . 9-22
Handle Camera Read-Out Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-23
Handling Undersampled Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-23
Refine the Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-23

Deblurring Images Using the Lucy-Richardson Algorithm . . . . . . . . . . . 9-25

Adapt Blind Deconvolution for Various Image Distortions . . . . . . . . . . . 9-37


Deblur images using blind deconvolution . . . . . . . . . . . . . . . . . . . . . . . . 9-37
Refining the Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-44

Deblurring Images Using the Blind Deconvolution Algorithm . . . . . . . . 9-45

Create Your Own Deblurring Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-53

Avoid Ringing in Deblurred Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-54

Transforms
10
Fourier Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-2
Definition of Fourier Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-2
Discrete Fourier Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-5
Applications of the Fourier Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-8

Discrete Cosine Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-12


DCT Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-12
The DCT Transform Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-13
Image Compression with the Discrete Cosine Transform . . . . . . . . . . . . 10-13

Hough Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-16


Detect Lines in Images Using Hough . . . . . . . . . . . . . . . . . . . . . . . . . . 10-16

Radon Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-21


Plot the Radon Transform of an Image . . . . . . . . . . . . . . . . . . . . . . . . . 10-23

Detect Lines Using Radon Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-27

The Inverse Radon Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-32


Inverse Radon Transform Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-32
Reconstruct an Image from Parallel Projection Data . . . . . . . . . . . . . . . 10-34

Fan-Beam Projection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-37


Image Reconstruction from Fan-Beam Projection Data . . . . . . . . . . . . . 10-39
Reconstruct Image using Inverse Fanbeam Projection . . . . . . . . . . . . . 10-40

Reconstructing an Image from Projection Data . . . . . . . . . . . . . . . . . . . 10-44

xiv Contents
Morphological Operations
11
Types of Morphological Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-2
Morphological Dilation and Erosion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-2
Operations Based on Dilation and Erosion . . . . . . . . . . . . . . . . . . . . . . . 11-4

Structuring Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-9


Determine the Origin of a Structuring Element . . . . . . . . . . . . . . . . . . . 11-10
Structuring Element Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-11

Border Padding for Morphology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-13

Morphological Reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-14


Marker Image and Mask Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-15
Influence of Pixel Connectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-18
Applications of Morphological Reconstruction . . . . . . . . . . . . . . . . . . . 11-20

Find Image Peaks and Valleys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-21


Global and Regional Minima and Maxima . . . . . . . . . . . . . . . . . . . . . . . 11-21
Find Areas of High or Low Intensity . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-21
Suppress Minima and Maxima . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-23
Impose a Minimum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-24

Pixel Connectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-27


Choosing a Connectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-28
Specifying Custom Connectivities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-28

Lookup Table Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-30


Creating a Lookup Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-30
Using a Lookup Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-30

Dilate an Image to Enlarge a Shape . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-32

Remove Thin Lines Using Erosion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-36

Use Morphological Opening to Extract Large Image Features . . . . . . . 11-38

Flood-Fill Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-42


Specifying Connectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-42
Specifying the Starting Point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-42
Filling Holes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-43

Detect Cell Using Edge Detection and Morphology . . . . . . . . . . . . . . . . 11-45

Granulometry of Snowflakes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-50

Distance Transform of a Binary Image . . . . . . . . . . . . . . . . . . . . . . . . . . 11-55

Label and Measure Connected Components in a Binary Image . . . . . . 11-57


Detect Connected Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-57
Label Connected Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-58
Select Objects in a Binary Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-59
Measure Properties of Connected Components . . . . . . . . . . . . . . . . . . . 11-59

xv
Image Segmentation
12
Color-Based Segmentation Using the L*a*b* Color Space . . . . . . . . . . . . 12-2

Color-Based Segmentation Using K-Means Clustering . . . . . . . . . . . . . . 12-7

Plot Land Classification with Color Features and Superpixels . . . . . . . 12-13

Compute 3-D Superpixels of Input Volumetric Intensity Image . . . . . . 12-16

Segment Lungs from 3-D Chest Scan . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-19

Marker-Controlled Watershed Segmentation . . . . . . . . . . . . . . . . . . . . . 12-27

Segment Image and Create Mask Using Color Thresholder . . . . . . . . . 12-43

Acquire Live Images in Color Thresholder . . . . . . . . . . . . . . . . . . . . . . . 12-55

Getting Started with Image Segmenter . . . . . . . . . . . . . . . . . . . . . . . . . . 12-59


Open Image Segmenter App and Load Data . . . . . . . . . . . . . . . . . . . . . 12-59
Create and Add Regions to Segmented Mask . . . . . . . . . . . . . . . . . . . . 12-59
Refine Segmented Mask . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-60
Export Segmentation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-61

Segment Image Using Thresholding in Image Segmenter . . . . . . . . . . 12-62

Segment Image by Drawing Regions Using Image Segmenter . . . . . . . 12-68

Segment Image Using Active Contours in Image Segmenter . . . . . . . . 12-74

Refine Segmentation Using Morphology in Image Segmenter . . . . . . . 12-80

Segment Image Using Graph Cut in Image Segmenter . . . . . . . . . . . . . 12-85

Segment Image Using Local Graph Cut (Grabcut) in Image Segmenter


........................................................ 12-94

Segment Image Using Find Circles in Image Segmenter . . . . . . . . . . 12-102

Segment Image Using Auto Cluster in Image Segmenter . . . . . . . . . . 12-109

Use Texture Filtering in Image Segmenter . . . . . . . . . . . . . . . . . . . . . . 12-115

Create Binary Mask Using Volume Segmenter . . . . . . . . . . . . . . . . . . . 12-118

Create Semantic Segmentation Using Volume Segmenter . . . . . . . . . 12-130

Work with Blocked Images Using Volume Segmenter . . . . . . . . . . . . . 12-143

Install Sample Data Using Add-On Explorer . . . . . . . . . . . . . . . . . . . . . 12-158

Texture Segmentation Using Gabor Filters . . . . . . . . . . . . . . . . . . . . . . 12-160

xvi Contents
Texture Segmentation Using Texture Filters . . . . . . . . . . . . . . . . . . . . . 12-165

Analyze Images
13
Pixel Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-2
Determine Values of Individual Pixels in Images . . . . . . . . . . . . . . . . . . . 13-2

Intensity Profile of Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-4


Create an Intensity Profile of an Image . . . . . . . . . . . . . . . . . . . . . . . . . . 13-4
Create Intensity Profile of an RGB Image . . . . . . . . . . . . . . . . . . . . . . . . 13-5

Create Contour Plot of Grayscale Image . . . . . . . . . . . . . . . . . . . . . . . . . . 13-7

Measuring Regions in Grayscale Images . . . . . . . . . . . . . . . . . . . . . . . . . 13-11

Find the Length of a Pendulum in Motion . . . . . . . . . . . . . . . . . . . . . . . 13-17

Create Image Histogram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-22

Image Mean, Standard Deviation, and Correlation Coefficient . . . . . . . 13-24

Edge Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-25


Detect Edges in Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-25

Boundary Tracing in Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-28


Trace Boundaries of Objects in Images . . . . . . . . . . . . . . . . . . . . . . . . . 13-28
Select First Step Direction for Tracing . . . . . . . . . . . . . . . . . . . . . . . . . 13-31

Quadtree Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-33


Perform Quadtree Decomposition on an Image . . . . . . . . . . . . . . . . . . . 13-33

Detect and Measure Circular Objects in an Image . . . . . . . . . . . . . . . . . 13-36

Identifying Round Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-48

Measuring Angle of Intersection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-56

Measuring the Radius of a Roll of Tape . . . . . . . . . . . . . . . . . . . . . . . . . . 13-62

Calculate Statistical Measures of Texture . . . . . . . . . . . . . . . . . . . . . . . . 13-65

Texture Analysis Using the Gray-Level Co-Occurrence Matrix (GLCM)


........................................................ 13-67

Create a Gray-Level Co-Occurrence Matrix . . . . . . . . . . . . . . . . . . . . . . . 13-68

Specify Offset Used in GLCM Calculation . . . . . . . . . . . . . . . . . . . . . . . . 13-70

Derive Statistics from GLCM and Plot Correlation . . . . . . . . . . . . . . . . . 13-71

xvii
Image Quality Metrics
14
Image Quality Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-2
Full-Reference Quality Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-2
No-Reference Quality Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-3

Train and Use No-Reference Quality Assessment Model . . . . . . . . . . . . . 14-4


NIQE Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-4
BRISQUE Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-6

Compare No Reference Image Quality Metrics . . . . . . . . . . . . . . . . . . . . . 14-8

Obtain Local Structural Similarity Index . . . . . . . . . . . . . . . . . . . . . . . . 14-15

Compare Image Quality at Various Compression Levels . . . . . . . . . . . . 14-17

Anatomy of the Imatest Extended eSFR Chart . . . . . . . . . . . . . . . . . . . . 14-19


Slanted Edge Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-19
Gray Patch Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-20
Color Patch Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-21
Registration Markers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-21

Evaluate Quality Metrics on eSFR Test Chart . . . . . . . . . . . . . . . . . . . . . 14-23

Correct Colors Using Color Correction Matrix . . . . . . . . . . . . . . . . . . . . 14-35

ROI-Based Processing
15
Specify ROI as Binary Mask . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-2
Create Mask Using Thresholding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-2
Create Mask Based on Position . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-4
Create Mask Using Automated and Semi-Automated Segmentation
Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-5

Create ROI Shapes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-7


Create ROI Using Creation Convenience Functions . . . . . . . . . . . . . . . . 15-10
Create ROI Using draw Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-12
Using ROIs in Apps Created with App Designer . . . . . . . . . . . . . . . . . . 15-16

ROI Migration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-18


ROI Object Migration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-18
ROI Object Function Migration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-18
ROI Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-20

Create Binary Mask Using an ROI Function . . . . . . . . . . . . . . . . . . . . . . 15-21

Overview of ROI Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-24

Sharpen Region of Interest in an Image . . . . . . . . . . . . . . . . . . . . . . . . . 15-25

xviii Contents
Apply Custom Filter to Region of Interest in Image . . . . . . . . . . . . . . . . 15-28

Fill Region of Interest in an Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-31

Calculate Properties of Image Regions Using Image Region Analyzer


........................................................ 15-33

Filter Images on Properties Using Image Region Analyzer App . . . . . . 15-38

Create Image Comparison Tool Using ROIs . . . . . . . . . . . . . . . . . . . . . . 15-43

Use Freehand ROIs to Refine Segmentation Masks . . . . . . . . . . . . . . . . 15-50

Rotate Image Interactively Using Rectangle ROI . . . . . . . . . . . . . . . . . . 15-55

Subsample or Simplify a Freehand ROI . . . . . . . . . . . . . . . . . . . . . . . . . . 15-61

Measure Distances in an Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-71

Use Polyline to Create Angle Measurement Tool . . . . . . . . . . . . . . . . . . 15-78

Create Freehand ROI Editing Tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-82

Use Wait Function After Drawing ROI . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-88

Interactive Image Inpainting Using Exemplar Matching . . . . . . . . . . . . 15-91

Classify Pixels That Are Partially Enclosed by ROI . . . . . . . . . . . . . . . . . 15-95

Color
16
Display Colors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-2

Reduce the Number of Colors in an Image . . . . . . . . . . . . . . . . . . . . . . . . 16-3


Reduce Colors of Truecolor Image Using Color Approximation . . . . . . . . 16-3
Reduce Colors of Indexed Image Using imapprox . . . . . . . . . . . . . . . . . . 16-7
Reduce Colors Using Dithering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-7

Profile-Based Color Space Conversions . . . . . . . . . . . . . . . . . . . . . . . . . . 16-10


Read ICC Profiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-10
Write ICC Profile Information to a File . . . . . . . . . . . . . . . . . . . . . . . . . 16-10
Convert RGB to CMYK Using ICC Profiles . . . . . . . . . . . . . . . . . . . . . . . 16-11
What is Rendering Intent in Profile-Based Conversions? . . . . . . . . . . . . 16-12

Device-Independent Color Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-13


Convert Between Device-Independent Color Spaces . . . . . . . . . . . . . . . 16-13
Color Space Data Encodings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-13

Understanding Color Spaces and Color Space Conversion . . . . . . . . . . 16-15


RGB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-15

xix
HSV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-16
CIE 1976 XYZ and CIE 1976 L*a*b* . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-17
YCbCr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-18
YIQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-19

Convert Between RGB and HSV Color Spaces . . . . . . . . . . . . . . . . . . . . 16-20

Determine If L*a*b* Value Is in RGB Gamut . . . . . . . . . . . . . . . . . . . . . . 16-24

Comparison of Auto White Balance Algorithms . . . . . . . . . . . . . . . . . . . 16-25

Calculate CIE94 Color Difference of Colors on Test Chart . . . . . . . . . . . 16-42

Blocked Image Processing


17
Set Up Spatial Referencing for Blocked Images . . . . . . . . . . . . . . . . . . . . 17-2

Process Blocked Images Efficiently Using Partial Images or Lower


Resolutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-13

Process Blocked Images Efficiently Using Mask . . . . . . . . . . . . . . . . . . 17-22

Explore Blocked Image Details with Interactive ROIs . . . . . . . . . . . . . . 17-34

Warp Blocked Image at Coarse and Fine Resolution Levels . . . . . . . . . 17-42

Create Labeled Blocked Image from ROIs and Masks . . . . . . . . . . . . . . 17-47

Convert Image Labeler Polygons to Labeled Blocked Image for Semantic


Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-57

Read Whole-Slide Images with Custom Blocked Image Adapter . . . . . 17-62

Detect and Count Cell Nuclei in Whole Slide Images . . . . . . . . . . . . . . 17-69

Neighborhood and Block Operations


18
Neighborhood or Block Processing: An Overview . . . . . . . . . . . . . . . . . . . 18-2

Sliding Neighborhood Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-3


Determine the Center Pixel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-3
General Algorithm of Sliding Neighborhood Operations . . . . . . . . . . . . . 18-4
Border Padding Behavior in Sliding Neighborhood Operations . . . . . . . . 18-4
Implementing Linear and Nonlinear Filtering as Sliding Neighborhood
Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-4

xx Contents
Distinct Block Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-6
Implement Block Processing Using the blockproc Function . . . . . . . . . . . 18-6
Apply Padding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-7

Block Size and Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-9


TIFF Image Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-9
Choose Block Size to Optimize blockproc Performance . . . . . . . . . . . . . . 18-9

Parallel Block Processing on Large Image Files . . . . . . . . . . . . . . . . . . . 18-13


What is Parallel Block Processing? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-13
When to Use Parallel Block Processing . . . . . . . . . . . . . . . . . . . . . . . . . 18-13
How to Use Parallel Block Processing . . . . . . . . . . . . . . . . . . . . . . . . . . 18-13

Perform Block Processing on Image Files in Unsupported Formats . . . 18-15


Learning More About the LAN File Format . . . . . . . . . . . . . . . . . . . . . . 18-15
Parsing the Header . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-15
Reading the File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-16
Examining the LanAdapter Class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-17
Using the LanAdapter Class with blockproc . . . . . . . . . . . . . . . . . . . . . 18-20

Use Column-wise Processing to Speed Up Sliding Neighborhood or


Distinct Block Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-21
Using Column Processing with Sliding Neighborhood Operations . . . . . 18-21
Using Column Processing with Distinct Block Operations . . . . . . . . . . . 18-22

Block Processing Large Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-24

Compute Statistics for Large Images . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-29

Deep Learning
19
Get Started with Image Preprocessing and Augmentation for Deep
Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-2
Preprocess and Augment Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-2
Preprocess and Augment Pixel Label Images for Semantic Segmentation
..................................................... 19-4

Preprocess Images for Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-6


Resize Images Using Rescaling and Cropping . . . . . . . . . . . . . . . . . . . . . 19-6
Augment Images for Training with Random Geometric Transformations
..................................................... 19-7
Perform Additional Image Processing Operations Using Built-In Datastores
..................................................... 19-8
Apply Custom Image Processing Pipelines Using Combine and Transform
..................................................... 19-8

Preprocess Volumes for Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . 19-10


Read Volumetric Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-10
Pair Image and Label Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-11
Preprocess Volumetric Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-11
Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-12

xxi
Augment Images for Deep Learning Workflows . . . . . . . . . . . . . . . . . . . 19-17

Get Started with GANs for Image-to-Image Translation . . . . . . . . . . . . 19-39


Select a GAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-39
Create GAN Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-39
Train GAN Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-41

Create Modular Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-44


Create Encoder and Decoder Modules . . . . . . . . . . . . . . . . . . . . . . . . . 19-44
Create Networks from Encoder and Decoder Modules . . . . . . . . . . . . . 19-45

Train and Apply Denoising Neural Networks . . . . . . . . . . . . . . . . . . . . . 19-46


Remove Gaussian Noise Using Pretrained Network . . . . . . . . . . . . . . . 19-46
Train a Denoising Network Using Built-In Layers . . . . . . . . . . . . . . . . . 19-46
Train Fully Customized Denoising Neural Network . . . . . . . . . . . . . . . . 19-47

Remove Noise from Color Image Using Pretrained Neural Network . . 19-49

Increase Image Resolution Using Deep Learning . . . . . . . . . . . . . . . . . . 19-55

JPEG Image Deblocking Using Deep Learning . . . . . . . . . . . . . . . . . . . . 19-71

Image Processing Operator Approximation Using Deep Learning . . . . 19-84

Develop Camera Processing Pipeline Using Deep Learning . . . . . . . . . 19-98

Brighten Extremely Dark Images Using Deep Learning . . . . . . . . . . . 19-120

Semantic Segmentation of Multispectral Images Using Deep Learning


....................................................... 19-131

3-D Brain Tumor Segmentation Using Deep Learning . . . . . . . . . . . . . 19-149

Neural Style Transfer Using Deep Learning . . . . . . . . . . . . . . . . . . . . . 19-159

Unsupervised Day-to-Dusk Image Translation Using UNIT . . . . . . . . . 19-168

Quantify Image Quality Using Neural Image Assessment . . . . . . . . . . 19-179

Unsupervised Medical Image Denoising Using CycleGAN . . . . . . . . . . 19-192

Unsupervised Medical Image Denoising Using UNIT . . . . . . . . . . . . . . 19-206

Preprocess Multiresolution Images for Training Classification Network


....................................................... 19-219

Classify Tumors in Multiresolution Blocked Images . . . . . . . . . . . . . . 19-235

Detect Image Anomalies Using Explainable FCDD Network . . . . . . . . 19-247

Classify Defects on Wafer Maps Using Deep Learning . . . . . . . . . . . . . 19-260

Detect Image Anomalies Using Pretrained ResNet-18 Feature


Embeddings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-276

xxii Contents
Hyperspectral Image Processing
20
Getting Started with Hyperspectral Image Processing . . . . . . . . . . . . . . 20-2
Representing Hyperspectral Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20-2
Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20-3
Spectral Unmixing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20-5
Spectral Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20-6
Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20-7

Hyperspectral Data Correction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20-9


Radiometric Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20-9
Atmospheric Correction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20-10

Spectral Indices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20-13


Band Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20-13
List of Supported Spectral Indices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20-15

Support for Singleton Dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20-18

Identify Vegetation and Non-Vegetation Spectra . . . . . . . . . . . . . . . . . . 20-20

Explore Hyperspectral Data in the Hyperspectral Viewer . . . . . . . . . . . 20-22

Hyperspectral Image Analysis Using Maximum Abundance Classification


........................................................ 20-33

Classify Hyperspectral Image Using Library Signatures and SAM . . . . 20-40

Endmember Material Identification Using Spectral Library . . . . . . . . . 20-46

Target Detection Using Spectral Signature Matching . . . . . . . . . . . . . . 20-53

Identify Vegetation Regions Using Interactive NDVI Thresholding . . . 20-61

Classify Hyperspectral Images Using Deep Learning . . . . . . . . . . . . . . . 20-66

Find Regions in Spatially Referenced Multispectral Image . . . . . . . . . 20-72

Classify Hyperspectral Image Using Support Vector Machine Classifier


........................................................ 20-78

Manually Label ROIs in Multispectral Image . . . . . . . . . . . . . . . . . . . . . 20-83

Change Detection in Hyperspectral Images . . . . . . . . . . . . . . . . . . . . . . 20-88

Ship Detection from Sentinel-1 C Band SAR Data Using YOLO v2 Object
Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20-93

Automate Pixel Labeling of Hyperspectral Images Using ECOSTRESS


Spectral Signatures in Image Labeler . . . . . . . . . . . . . . . . . . . . . . . . 20-106

xxiii
Code Generation for Image Processing Toolbox Functions
21
Code Generation for Image Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . 21-2
Types of Code Generation Support in Image Processing Toolbox . . . . . . . 21-2
Generate Code with Image Processing Functions . . . . . . . . . . . . . . . . . . 21-3

Generate Code for Object Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21-5

Generate Code to Resize Images to Fixed Output Size . . . . . . . . . . . . . 21-22

GPU Computing with Image Processing Toolbox Functions


22
Image Processing on a GPU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22-2

Perform Thresholding and Morphological Operations on GPU . . . . . . . . 22-3

Perform Pixel-Based Operations on GPU . . . . . . . . . . . . . . . . . . . . . . . . . . 22-8

xxiv Contents
1

Getting Started

This topic presents two examples to get you started doing image processing using MATLAB® and the
Image Processing Toolbox software. The examples contain cross-references to other sections in the
documentation that have in-depth discussions on the concepts presented in the examples.

• “Image Processing Toolbox Product Description” on page 1-2


• “Compilability” on page 1-3
• “Basic Image Import, Processing, and Export” on page 1-4
• “Correct Nonuniform Illumination and Analyze Foreground Objects” on page 1-9
• “Acknowledgments” on page 1-17
1 Getting Started

Image Processing Toolbox Product Description


Perform image processing, visualization, and analysis

Image Processing Toolbox provides a comprehensive set of reference-standard algorithms and


workflow apps for image processing, analysis, visualization, and algorithm development. You can
perform image segmentation, image enhancement, noise reduction, geometric transformations, and
image registration using deep learning and traditional image processing techniques. The toolbox
supports processing of 2D, 3D, and arbitrarily large images.

Image Processing Toolbox apps let you automate common image processing workflows. You can
interactively segment image data, compare image registration techniques, and batch-process large
datasets. Visualization functions and apps let you explore images, 3D volumes, and videos; adjust
contrast; create histograms; and manipulate regions of interest (ROIs).

You can accelerate your algorithms by running them on multicore processors and GPUs. Many
toolbox functions support C/C++ code generation for desktop prototyping and embedded vision
system deployment.

Key Features
• Image analysis, including segmentation, morphology, statistics, and measurement
• Apps for image region analysis, image batch processing, and image registration
• 3D image processing workflows, including visualization and segmentation
• Image enhancement, filtering, geometric transformations, and deblurring algorithms
• Intensity-based and non-rigid image registration methods
• Support for CUDA enabled NVIDIA GPUs (with Parallel Computing Toolbox™)
• C-code generation support for desktop prototyping and embedded vision system deployment

1-2
Compilability

Compilability
The Image Processing Toolbox software is compilable with the MATLAB Compiler™ except for the
following functions that launch GUIs:

• cpselect
• implay
• imtool

1-3
1 Getting Started

Basic Image Import, Processing, and Export

This example shows how to read an image into the workspace, adjust the contrast in the image, and
then write the adjusted image to a file.

Step 1: Read and Display an Image

Read an image into the workspace using the imread function. The example reads one of the sample
images included with the toolbox, an image of a young girl in a file named pout.tif, and stores it in
an array named I. The imread function infers from the file that the graphics file format is Tagged
Image File Format (TIFF).
I = imread("pout.tif");

Display the image using the imshow function. You can also view an image in the Image Viewer app,
which presents an integrated environment for displaying images and performing some common
image processing tasks. The Image Viewer app provides all the image display capabilities of imshow
but also provides access to several other tools for navigating and exploring images, such as scroll
bars, the Pixel Region tool, Image Information tool, and the Contrast Adjustment tool.
imshow(I)

Step 2: Check How the Image Appears in the Workspace

Check how the imread function stores the image data in the workspace using the whos function. You
can also check the variable in the Workspace browser. The imread function returns the image data in
the variable I, which is a 291-by-240 element array of uint8 data.
whos I

1-4
Basic Image Import, Processing, and Export

Name Size Bytes Class Attributes

I 291x240 69840 uint8

Step 3: Improve Image Contrast

View the distribution of image pixel intensities. The image pout.tif is a somewhat low contrast
image. To see the distribution of intensities in the image, create a histogram by calling the imhist
function. Notice how the histogram indicates that the intensity range of the image is rather narrow.
The range does not cover the potential range of [0, 255], and the image is missing the high and low
values that would result in good contrast.

imhist(I)

Improve the contrast in an image using the imhist function, then display the result. Histogram
equalization spreads the intensity values over the full range of the image. The toolbox includes
several other functions that perform contrast adjustment, including imadjust and adapthisteq,
and interactive tools such as the Adjust Contrast tool, available in the Image Viewer app.

I2 = histeq(I);
imshow(I2)

1-5
1 Getting Started

Call the imhist function again to create a histogram of the equalized image I2. If you compare the
two histograms, you can see that the histogram of I2 is more spread out over the entire range than
the histogram of I.

figure
imhist(I2)

1-6
Basic Image Import, Processing, and Export

Step 4: Write the Adjusted Image to a File

Write the newly adjusted image I2 to a file using the imwrite function. This example includes the
extension ".png" in the filename, so the imwrite function writes the image to a file in Portable
Network Graphics (PNG) format. You can specify other file formats.
imwrite(I2,"pout2.png");

Step 5: Check the Contents of the Newly Written File

View information about the image in the file, such as its format, size, width, and height, by using the
imfinfo function.
imfinfo("pout2.png")

ans = struct with fields:


Filename: 'C:\TEMP\Bdoc23b_2361005_11440\ibC44013\38\tpd3ac91eb\images-ex895050
FileModDate: '19-Aug-2023 21:14:21'
FileSize: 36938
Format: 'png'
FormatVersion: []
Width: 240
Height: 291
BitDepth: 8
ColorType: 'grayscale'
FormatSignature: [137 80 78 71 13 10 26 10]
Colormap: []
Histogram: []

1-7
1 Getting Started

InterlaceType: 'none'
Transparency: 'none'
SimpleTransparencyData: []
BackgroundColor: []
RenderingIntent: []
Chromaticities: []
Gamma: []
XResolution: []
YResolution: []
ResolutionUnit: []
XOffset: []
YOffset: []
OffsetUnit: []
SignificantBits: []
ImageModTime: '20 Aug 2023 01:14:21 +0000'
Title: []
Author: []
Description: []
Copyright: []
CreationTime: []
Software: []
Disclaimer: []
Warning: []
Source: []
Comment: []
OtherText: []

See Also
imread | imshow | imwrite | imfinfo

Related Examples
• “Read Image Data into the Workspace” on page 3-2
• “Display an Image in Figure Window” on page 4-10
• “Write Image Data to File in Graphics Format” on page 3-6

1-8
Correct Nonuniform Illumination and Analyze Foreground Objects

Correct Nonuniform Illumination and Analyze Foreground


Objects

This example shows how to enhance an image as a preprocessing step before analysis. In this
example, you correct the nonuniform background illumination and convert the image into a binary
image to make it easy to identify foreground objects (individual grains of rice). You can then analyze
the objects, such as finding the area of each grain of rice, and you can compute statistics for all
objects in the image.

Preprocess the Image

Read an image into the workspace.


I = imread('rice.png');
imshow(I)

The background illumination is brighter in the center of the image than at the bottom. Preprocess the
image to make the background illumination more uniform.

As a first step, remove all of the foreground (rice grains) using morphological opening. The opening
operation removes small objects that cannot completely contain the structuring element. Define a
disk-shaped structuring element with a radius of 15, which fits entirely inside a single grain of rice.
se = strel('disk',15)

se =
strel is a disk shaped structuring element with properties:

Neighborhood: [29x29 logical]


Dimensionality: 2

1-9
1 Getting Started

To perform the morphological opening, use imopen with the structuring element.

background = imopen(I,se);
imshow(background)

Subtract the background approximation image, background, from the original image, I, and view
the resulting image. After subtracting the adjusted background image from the original image, the
resulting image has a uniform background but is now a bit dark for analysis.

I2 = I - background;
imshow(I2)

1-10
Correct Nonuniform Illumination and Analyze Foreground Objects

Use imadjust to increase the contrast of the processed image I2 by saturating 1% of the data at
both low and high intensities and by stretching the intensity values to fill the uint8 dynamic range.

I3 = imadjust(I2);
imshow(I3)

Note that the prior two steps could be replaced by a single step using imtophat which first
calculates the morphological opening and then subtracts it from the original image.

1-11
1 Getting Started

I2 = imtophat(I,strel('disk',15));

Create a binary version of the processed image so you can use toolbox functions for analysis. Use the
imbinarize function to convert the grayscale image into a binary image. Remove background noise
from the image with the bwareaopen function.
bw = imbinarize(I3);
bw = bwareaopen(bw,50);
imshow(bw)

Identify Objects in the Image

Now that you have created a binary version of the original image you can perform analysis of objects
in the image.

Find all the connected components (objects) in the binary image. The accuracy of your results
depends on the size of the objects, the connectivity parameter (4, 8, or arbitrary), and whether or not
any objects are touching (in which case they could be labeled as one object). Some of the rice grains
in the binary image bw are touching.
cc = bwconncomp(bw,4)

cc = struct with fields:


Connectivity: 4
ImageSize: [256 256]
NumObjects: 95
PixelIdxList: {1x95 cell}

cc.NumObjects

ans = 95

View the rice grain that is labeled 50 in the image.

1-12
Correct Nonuniform Illumination and Analyze Foreground Objects

grain = false(size(bw));
grain(cc.PixelIdxList{50}) = true;
imshow(grain)

Visualize all the connected components in the image by creating a label matrix and then displaying it
as a pseudocolor indexed image.

Use labelmatrix to create a label matrix from the output of bwconncomp. Note that labelmatrix
stores the label matrix in the smallest numeric class necessary for the number of objects.

labeled = labelmatrix(cc);
whos labeled

Name Size Bytes Class Attributes

labeled 256x256 65536 uint8

Use label2rgb to choose the colormap, the background color, and how objects in the label matrix
map to colors in the colormap. In the pseudocolor image, the label identifying each object in the label
matrix maps to a different color in an associated colormap matrix.

RGB_label = label2rgb(labeled,'spring','c','shuffle');
imshow(RGB_label)

1-13
1 Getting Started

Compute Area-Based Statistics

Compute the area of each object in the image using regionprops. Each rice grain is one connected
component in the cc structure.

graindata = regionprops(cc,'basic')

graindata=95×1 struct array with fields:


Area
Centroid
BoundingBox

Create a new vector grain_areas, which holds the area measurement for each grain.

grain_areas = [graindata.Area];

Find the area of the 50th component.

grain_areas(50)

ans = 194

Find and display the grain with the smallest area.

[min_area, idx] = min(grain_areas)

min_area = 61

idx = 16

grain = false(size(bw));
grain(cc.PixelIdxList{idx}) = true;
imshow(grain)

1-14
Correct Nonuniform Illumination and Analyze Foreground Objects

Use the histogram command to create a histogram of rice grain areas.

histogram(grain_areas)
title('Histogram of Rice Grain Area')

1-15
1 Getting Started

See Also
imopen | bwareaopen | bwconncomp | regionprops | imadjust | imbinarize | label2rgb |
labelmatrix | imread | imshow

1-16
Acknowledgments

Acknowledgments
This table lists the copyright owners of the images used in the Image Processing Toolbox
documentation.

Image Source
cameraman Copyright Massachusetts Institute of Technology. Used with
permission.
cell Cancer cell from a rat's prostate, courtesy of Alan W. Partin,
M.D., Ph.D., Johns Hopkins University School of Medicine.
circuit Micrograph of 16-bit A/D converter circuit, courtesy of Steve
Decker and Shujaat Nadeem, MIT, 1993.
concordaerial and Visible color aerial photographs courtesy of mPower3/Emerge.
westconcordaerial
concordorthophoto and Orthoregistered photographs courtesy of Massachusetts
westconcordorthophoto Executive Office of Environmental Affairs, MassGIS.
forest Photograph of Carmanah Ancient Forest, British Columbia,
Canada, courtesy of Susan Cohen.
LAN files Permission to use Landsat data sets provided by Space Imaging,
LLC, Denver, Colorado.
liftingbody Picture of M2-F1 lifting body in tow, courtesy of NASA (Image
number E-10962).
m83 M83 spiral galaxy astronomical image courtesy of Anglo-
Australian Observatory, photography by David Malin.
moon Copyright Michael Myers. Used with permission.
saturn Voyager 2 image, 1981-08-24, NASA catalog #PIA01364.
solarspectra Courtesy of Ann Walker. Used with permission.
tissue Courtesy of Alan W. Partin, M.D., PhD., Johns Hopkins University
School of Medicine.
trees Trees with a View, watercolor and ink on paper, copyright Susan
Cohen. Used with permission.
paviaU University of Pavia hyperspectral data set, courtesy of Paolo
Gamba, PhD., Remote Sensing Group at the University of Pavia.
Used with permission.

1-17
2

Introduction

This chapter introduces you to the fundamentals of image processing using MATLAB and the Image
Processing Toolbox software.

• “Images in MATLAB” on page 2-2


• “Image Types in the Toolbox” on page 2-3
• “Convert Between Image Types” on page 2-9
• “Display Separated Color Channels of RGB Image” on page 2-11
• “Convert Image Data Between Data Types” on page 2-14
• “Work with Image Sequences as Multidimensional Arrays” on page 2-15
• “Perform an Operation on a Sequence of Images” on page 2-18
• “Process Folder of Images Using Image Batch Processor App” on page 2-20
• “Process Images Using Image Batch Processor App with File Metadata” on page 2-27
• “Process Large Set of Images Using MapReduce Framework and Hadoop” on page 2-35
• “Detecting Cars in a Video of Traffic” on page 2-44
• “Image Arithmetic Functions” on page 2-50
• “Image Arithmetic Clipping Rules” on page 2-51
• “Nest Calls to Image Arithmetic Functions” on page 2-52
• “Find Vegetation in a Multispectral Image” on page 2-53
• “Image Coordinate Systems” on page 2-63
• “Define World Coordinate System of Image” on page 2-66
• “Shift X- and Y-Coordinate Range of Displayed Image” on page 2-69
2 Introduction

Images in MATLAB
The basic data structure in MATLAB is the array, an ordered set of real or complex elements. This
object is naturally suited to the representation of images, real-valued ordered sets of color or
intensity data.

MATLAB stores most images as two-dimensional matrices, in which each element of the matrix
corresponds to a single discrete pixel in the displayed image. (Pixel is derived from picture element
and usually denotes a single dot on a computer display.) For example, an image composed of 200
rows and 300 columns of different colored dots would be stored in MATLAB as a 200-by-300 matrix.

Some images, such as truecolor images, represent images using a three-dimensional array. In
truecolor images, the first plane in the third dimension represents the red pixel intensities, the
second plane represents the green pixel intensities, and the third plane represents the blue pixel
intensities. This convention makes working with images in MATLAB similar to working with any other
type of numeric data, and makes the full power of MATLAB available for image processing
applications.

For more information on how Image Processing Toolbox assigns pixel indices and how to relate pixel
indices to continuous spatial coordinates, see “Image Coordinate Systems” on page 2-63.

See Also
imread | imshow

Related Examples
• “Basic Image Import, Processing, and Export” on page 1-4

More About
• “Image Types in the Toolbox” on page 2-3

2-2
Image Types in the Toolbox

Image Types in the Toolbox


The Image Processing Toolbox software defines several fundamental types of images, summarized in
the table. These image types determine the way MATLAB interprets array elements as pixel intensity
values.

All images in Image Processing Toolbox are assumed to have nonsparse values. Numeric and logical
images are expected to be real-valued unless otherwise specified.

Image Type Interpretation


“Binary Images” on page 2- Image data are stored as an m-by-n logical matrix in which values of 0
4 and 1 are interpreted as black and white, respectively. Some toolbox
functions can also interpret an m-by-n numeric matrix as a binary
image, where values of 0 are black and all nonzero values are white.
“Indexed Images” on page 2- Image data are stored as an m-by-n numeric matrix whose elements
4 are direct indices into a colormap. Each row of the colormap specifies
the red, green, and blue components of a single color.

• For single or double arrays, integer values range from [1, p].
• For logical, uint8, or uint16 arrays, values range from [0, p-1].

The colormap is a c-by-3 array of data type double with values in the
range [0, 1].
“Grayscale Images” on page Image data are stored as an m-by-n numeric matrix whose elements
2-5 specify intensity values. The smallest value indicates black and the
largest value indicates white.
(Also known as intensity
images) • For single or double arrays, values range from [0, 1].
• For uint8 arrays, values range from [0, 255].
• For uint16, values range from [0, 65535].
• For int16, values range from [-32768, 32767].
“Truecolor Images” on page Image data are stored as an m-by-n-by-3 numeric array whose
2-6 elements specify the intensity values of one of the three color
channels. For RGB images, the three channels represent the red,
(Commonly referred to as green, and blue signals of the image.
RGB images)
• For single or double arrays, RGB values range from [0, 1].
• For uint8 arrays, RGB values range from [0, 255].
• For uint16, RGB values range from [0, 65535].

There are other models, called color spaces, that describe colors using
three color channels. For these color spaces, the range of each data
type may differ from the range allowed by images in the RGB color
space. For example, pixel values in the L*a*b* color space of data type
double can be negative or greater than 1. For more information, see
“Understanding Color Spaces and Color Space Conversion” on page
16-15.

2-3
2 Introduction

Image Type Interpretation


High Dynamic Range (HDR) HDR images are stored as an m-by-n numeric matrix or m-by-n-by-3
Images on page 2-6 numeric array, similar to grayscale or RGB images, respectively. HDR
images have data type single or double but data values are not
limited to the range [0, 1] and can contain Inf values. For more
information, see “Work with High Dynamic Range Images” on page 3-
52.
Multispectral and Image data are stored as an m-by-n-by-c numeric array, where c is the
Hyperspectral Images on number of color channels.
page 2-7
Label Images on page 2-8 Image data are stored as an m-by-n categorical matrix or numeric
matrix of nonnegative integers.

Binary Images
In a binary image, each pixel has one of only two discrete values: 1 or 0. Most functions in the toolbox
interpret pixels with value 1 as belonging to a region of interest, and pixels with value 0 as the
background. Binary images are frequently used in conjunction with other image types to indicate
which portions of the image to process.

The figure shows a binary image with a close-up view of some of the pixel values.

Indexed Images
An indexed image consists of an image matrix and a colormap.

A colormap is a c-by-3 matrix of data type double with values in the range [0, 1]. Each row of the
colormap specifies the red, green, and blue components of a single color.

The pixel values in the image matrix are direct indices into the colormap. Therefore, the color of each
pixel in the indexed image is determined by mapping the pixel value in the image matrix to the
corresponding color in the colormap. The mapping depends on the data type of the image matrix:

• If the image matrix is of data type single or double, the colormap normally contains integer
values in the range [1, p], where p is the length of the colormap. The value 1 points to the first row
in the colormap, the value 2 points to the second row, and so on.
• If the image matrix is of data type logical, uint8 or uint16, the colormap normally contains
integer values in the range [0, p–1]. The value 0 points to the first row in the colormap, the value 1
points to the second row, and so on.

2-4
Image Types in the Toolbox

A colormap is often stored with an indexed image and is automatically loaded with the image when
you use the imread function. After you read the image and the colormap into the workspace as
separate variables, you must keep track of the association between the image and colormap.
However, you are not limited to using the default colormap—you can use any colormap that you
choose.

The figure illustrates an indexed image, the image matrix, and the colormap, respectively. The image
matrix is of data type double, so the value 7 points to the seventh row of the colormap.

Grayscale Images
A grayscale image is a data matrix whose values represent intensities of one image pixel. While
grayscale images are rarely saved with a colormap, MATLAB uses a colormap to display them.

You can obtain a grayscale image directly from a camera that acquires a single signal for each pixel.
You can also convert truecolor or multispectral images to grayscale to emphasize one particular
aspect of the images. For example, you can take a linear combination of the red, green, and blue
channels of an RGB image such that the resulting grayscale image indicates the brightness,
saturation, or hue of each pixel. You can process each channel of a truecolor or multispectral image
independently by splitting the channels into separate grayscale images.

The figure depicts a grayscale image of data type double whose pixel values are in the range [0, 1].

2-5
2 Introduction

Truecolor Images
A truecolor image is an image in which each pixel has a color specified by three values. Graphics file
formats store truecolor images as 24-bit images, where three color channels are 8 bits each. This
yields a potential of 16 million colors. The precision with which a real-life image can be replicated has
led to the commonly used term truecolor image.

RGB images are the most common type of truecolor images. In RGB images, the three color channels
are red, green, and blue. For more information about the RGB color channels, see “Display Separated
Color Channels of RGB Image” on page 2-11.

There are other models, called color spaces, that describe colors using three different color channels.
For these color spaces, the range of each data type may differ from the range allowed by images in
the RGB color space. For example, pixel values in the L*a*b* color space of data type double can be
negative or greater than 1. For more information, see “Understanding Color Spaces and Color Space
Conversion” on page 16-15.

Truecolor images do not use a colormap. The color of each pixel is determined by the combination of
the intensities stored in each color channel at the pixel's location.

The figure depicts the red, green, and blue channels of a floating-point RGB image. Observe that pixel
values are in the range [0, 1].

To determine the color of the pixel at (row, column) coordinate (2,3), you would look at the RGB
triplet stored in the vector (2,3,:). Suppose (2,3,1) contains the value 0.5176, (2,3,2) contains
0.1608, and (2,3,3) contains 0.0627. The color for the pixel at (2,3) is
0.5176 0.1608 0.0627

HDR Images
Dynamic range refers to the range of brightness levels. The dynamic range of real-world scenes can
be quite high. High dynamic range (HDR) images attempt to capture the whole tonal range of real-
world scenes (called scene-referred), using 32-bit floating-point values to store each color channel.

The figure depicts the red, green, and blue channels of a tone-mapped HDR image with original pixel
values in the range [0, 3.2813]. Tone mapping is a process that reduces the dynamic range of an HDR
image to the range expected by a computer monitor or screen.

2-6
Image Types in the Toolbox

Multispectral and Hyperspectral Images


A multispectral image is a type of color image that stores more than three channels. For example, a
multispectral image can store three RGB color channels and three infrared channels, for a total of six
channels. The number of channels in a multispectral image is usually small. In contrast, a
hyperspectral image can store dozens or even hundreds of channels.

The figure depicts a multispectral image with six channels consisting of red, green, blue color
channels (depicted as a single RGB image) and three infrared channels.

2-7
2 Introduction

Label Images
A label image is an image in which each pixel specifies a class, object, or region of interest (ROI). You
can derive a label image from an image of a scene using segmentation techniques.

• A numeric label image enumerates objects or ROIs in the scene. Labels are nonnegative integers.
The background typically has the value 0. The pixels labeled 1 make up one object; the pixels
labeled 2 make up a second object; and so on.
• A categorical label image specifies the class of each pixel in the image. The background is
commonly assigned the value <undefined>.

The figure depicts a label image with three categories: petal, leaf, and dirt.

See Also

More About
• “Convert Between Image Types” on page 2-9
• “Understanding Color Spaces and Color Space Conversion” on page 16-15
• “Work with Image Sequences as Multidimensional Arrays” on page 2-15

2-8
Convert Between Image Types

Convert Between Image Types


The toolbox includes many functions that you can use to convert an image from one type to another,
listed in the following table. For example, if you want to filter a color image that is stored as an
indexed image, you must first convert it to truecolor format. When you apply the filter to the
truecolor image, MATLAB filters the intensity values in the image, as is appropriate. If you attempt to
filter the indexed image, MATLAB simply applies the filter to the indices in the indexed image matrix,
and the results might not be meaningful.

You can perform certain conversions just using MATLAB syntax. For example, you can convert a
grayscale image to truecolor format by concatenating three copies of the original matrix along the
third dimension.

RGB = cat(3,I,I,I);

The resulting truecolor image has identical matrices for the red, green, and blue planes, so the image
displays as shades of gray.

In addition to these image type conversion functions, there are other functions that return a different
image type as part of the operation they perform. For example, the region of interest functions return
a binary image that you can use to mask an image for filtering or for other operations.

Note When you convert an image from one format to another, the resulting image might look
different from the original. For example, if you convert a color indexed image to a grayscale image,
the resulting image displays as shades of grays, not color.

Function Description
demosaic Convert a Bayer pattern encoded image to a truecolor (RGB) image.
dither Use dithering to convert a grayscale image to a binary image or to convert a
truecolor image to an indexed image.
gray2ind Convert a grayscale image to an indexed image.
grayslice Convert a grayscale image to an indexed image by using multilevel
thresholding.
im2gray Convert an RGB image to a grayscale image.

The im2gray function also accepts single-channel grayscale input images and
returns them unchanged. Use this function instead of rgb2gray when you
want a preprocessing algorithm to accept both grayscale and RGB images.
ind2gray Convert an indexed image to a grayscale image.
ind2rgb Convert an indexed image to a truecolor image.
mat2gray Convert a data matrix to a grayscale image, by scaling the data.
rgb2gray Convert a truecolor image to a grayscale image.

Unlike the im2rgb function, the rgb2gray function requires that the input
image have three color channels.
rgb2ind Convert a truecolor image to an indexed image.

2-9
2 Introduction

Some images use color spaces other than the RGB color space, such as the HSV color space. To work
with these images, first convert the image to the RGB color space, process the image, and then
convert it back to the original color space. For more information about color space conversion
routines, see “Understanding Color Spaces and Color Space Conversion” on page 16-15.

See Also

More About
• “Image Types in the Toolbox” on page 2-3
• “Read Image Data into the Workspace” on page 3-2

2-10
Display Separated Color Channels of RGB Image

Display Separated Color Channels of RGB Image

This example creates a simple RGB image and then separates the color channels. The example
displays each color channel as a grayscale intensity image and as a color image.

Create an RGB image with uninterrupted areas of red, green, and blue. Display the image.

imSize = 200;
RGB = reshape(ones(imSize,1)*reshape(jet(imSize),1,imSize*3),[imSize,imSize,3]);
imshow(RGB)
title('Original RGB Image')

Separate the three color channels.

[R,G,B] = imsplit(RGB);

Display a grayscale representation of each color channel. Notice that each separated color plane in
the figure contains an area of white. The white corresponds to the highest values (purest shades) of
each separate color. For example, in the red channel image, the white represents the highest
concentration of pure red values. As red becomes mixed with green or blue, gray pixels appear. The
black region in the image shows pixel values that contain no red values, in other words, when R ==
0.

figure
subplot(1,3,1)
imshow(R)
title('Red Channel')

subplot(1,3,2)
imshow(G)
title('Green Channel')

subplot(1,3,3)
imshow(B)
title('Blue Channel')

2-11
2 Introduction

Display a color representation of each color channel. In these images, the desired color channel
maintains its original intensity values and pixel values in the other two color channels are set to 0.

Create an all-black channel.

allBlack = zeros(size(RGB,1,2),class(RGB));
justR = cat(3,R,allBlack,allBlack);
justG = cat(3,allBlack,G,allBlack);
justB = cat(3,allBlack,allBlack,B);

Display all the channels in a montage.

figure
montage({justR,justG,justB},'Size',[1 3], ...
"BackgroundColor",'w',"BorderSize",10);
title('Color Representation of the Red, Green, and Blue Color Channels');

2-12
Display Separated Color Channels of RGB Image

See Also
imsplit

More About
• “Image Types in the Toolbox” on page 2-3

2-13
2 Introduction

Convert Image Data Between Data Types

Overview of Image Data Type Conversions


You can convert uint8 and uint16 image data to double using the MATLAB double function.
However, converting between data types changes the way MATLAB and the toolbox interpret the
image data. If you want the resulting array to be interpreted properly as image data, you need to
rescale or offset the data when you convert it.

For easier conversion of data types, use one of these functions: im2uint8, im2uint16, im2int16,
im2single, or im2double. These functions automatically handle the rescaling and offsetting of the
original data of any image data type. For example, this command converts a double-precision RGB
image with data in the range [0,1] to a uint8 RGB image with data in the range [0,255].

RGB2 = im2uint8(RGB1);

Losing Information in Conversions


When you convert to a data type that uses fewer bits to represent numbers, you generally lose some
of the information in your image. For example, a uint16 grayscale image is capable of storing up to
65,536 distinct shades of gray, but a uint8 grayscale image can store only 256 distinct shades of
gray. When you convert a uint16 grayscale image to a uint8 grayscale image, im2uint8 quantizes
the gray shades in the original image. In other words, all values from 0 to 127 in the original image
become 0 in the uint8 image, values from 128 to 385 all become 1, and so on.

Converting Indexed Images


It is not always possible to convert an indexed image from one storage data type to another. In an
indexed image, the image matrix contains only indices into a colormap, rather than the color data
itself, so no quantization of the color data is possible during the conversion.

For example, a uint16 or double indexed image with 300 colors cannot be converted to uint8,
because uint8 arrays have only 256 distinct values. If you want to perform this conversion, you must
first reduce the number of the colors in the image using the imapprox function. This function
performs the quantization on the colors in the colormap, to reduce the number of distinct colors in
the image. See “Reduce Colors of Indexed Image Using imapprox” on page 16-7 for more
information.

2-14
Work with Image Sequences as Multidimensional Arrays

Work with Image Sequences as Multidimensional Arrays

Create Multidimensional Array Representing Image Sequence


Multidimensional arrays are a convenient way to display and process image sequences. Create a
multidimensional array by concatenating the individual images of an image sequences. Each image
must be the same size and have the same number of color channels. If you are storing a sequence of
indexed images, each image must use the same colormap.

• If you have a sequence of 2-D grayscale, binary, or indexed images, then concatenate the images
in the third dimension to create a 3-D array of size m-by-n-by-p. Each of the p images has size m-
by-n.
• If you have a sequence of 2-D RGB images, then concatenate the images along the fourth
dimension to create a 4-D array of size m-by-n-by-3-by-p. Each of the p images has size m-by-n-
by-3.

The figure depicts 2-D images concatenated as planes of a 3-D array.

Use the cat function to concatenate individual images. For example, this code concatenates a group
of RGB images along the fourth dimension.

A = cat(4,A1,A2,A3,A4,A5)

Note Some functions work with a particular type of multidimensional array, call a multiframe array.
In a multiframe array, images are concatenated along the fourth dimension regardless of the number
of color channels that the images have. A multiframe array of grayscale, binary, or indexed images
has size m-by-n-by-1-by-p. If you need to convert a multiframe array of grayscale images to a 3-D
array for use with other toolbox functions, then you can use the squeeze function to remove the
singleton dimension.

Display Image Sequences


There are several ways to display image sequences. To display one frame at a time, use the Image
Viewer app or the imshow function. To display all the frames in an image sequence simultaneously,
use the montage function.

To animate an image sequence or provide navigation within the sequence, use the Video Viewer app.
The Video Viewer app provides playback controls that you can use to navigate among the frames in
the sequence.

2-15
2 Introduction

Process Image Sequences


Many toolbox functions can operate on multidimensional arrays and, consequently, can operate on
image sequences. For example, if you pass a multidimensional array to the imwarp function, it
applies the same 2-D transformation to all 2-D planes along the higher dimension.

Some toolbox functions that accept multidimensional arrays, however, do not by default interpret an
m-by-n-by-p or an m-by-n-by-3-by-p array as an image sequence. To use these functions with image
sequences, you must use particular syntax and be aware of other limitations. The table lists common
toolbox functions that support image sequences.

Function Image Sequence Guideline When Used with an Image


Dimensions Sequence
bwlabeln m-by-n-by-p only Must use the bwlabeln(BW,conn) syntax
with a 2-D connectivity.
deconvblind m-by-n-by-p or PSF argument can be either 1-D or 2-D.
m-by-n-by-3-by-p
deconvlucy m-by-n-by-p or PSF argument can be either 1-D or 2-D.
m-by-n-by-3-by-p
edgetaper m-by-n-by-p or PSF argument can be either 1-D or 2-D.
m-by-n-by-3-by-p
entropyfilt m-by-n-by-p only nhood argument must be 2-D.
imabsdiff m-by-n-by-p or Image sequences must be the same size.
m-by-n-by-3-by-p
imadd m-by-n-by-p or Image sequences must be the same size.
m-by-n-by-3-by-p Cannot add scalar to image sequence.
imbothat m-by-n-by-p only SE argument must be 2-D.
imclose m-by-n-by-p only SE argument must be 2-D.
imdilate m-by-n-by-p only SE argument must be 2-D.
imdivide m-by-n-by-p or Image sequences must be the same size.
m-by-n-by-3-by-p
imerode m-by-n-by-p only SE argument must be 2-D.
imextendedmax m-by-n-by-p only Must use the imextendedmax(I,h,conn)
syntax with a 2-D connectivity.
imextendedmin m-by-n-by-p only Must use the imextendedmin(I,h,conn)
syntax with a 2-D connectivity.
imfilter m-by-n-by-p or With grayscale images, h can be 2-D. With
m-by-n-by-3-by-p truecolor images (RGB), h can be 2-D or 3-D.
imhmax m-by-n-by-p only Must use the imhmax(I,h,conn) syntax with
a 2-D connectivity.
imhmin m-by-n-by-p only Must use the imhmin(I,h,conn) syntax with
a 2-D connectivity.
imlincomb m-by-n-by-p or Image sequences must be the same size.
m-by-n-by-3-by-p

2-16
Work with Image Sequences as Multidimensional Arrays

Function Image Sequence Guideline When Used with an Image


Dimensions Sequence
immultiply m-by-n-by-p or Image sequences must be the same size.
m-by-n-by-3-by-p
imopen m-by-n-by-p only SE argument must be 2-D.
imregionalmax m-by-n-by-p only Must use the imextendedmax(I,conn)
syntax with a 2-D connectivity.
imregionalmin m-by-n-by-p only Must use the imextendedmin(I,conn)
syntax with a 2-D connectivity.
imsubtract m-by-n-by-p or Image sequences must be the same size.
m-by-n-by-3-by-p
imtophat m-by-n-by-p only SE argument must be 2-D.
imwarp m-by-n-by-p or tform argument must be 2-D.
m-by-n-by-3-by-p
padarray m-by-n-by-p or PADSIZE argument must be a two-element
m-by-n-by-3-by-p vector.
rangefilt m-by-n-by-p only nhood argument must be 2-D.
stdfilt m-by-n-by-p only nhood argument must be 2-D.
watershed m-by-n-by-p only Must use watershed(I,conn) syntax with a
2-D connectivity.

See Also

More About
• “Perform an Operation on a Sequence of Images” on page 2-18
• “View Image Sequences in Video Viewer” on page 4-122
• “Process Folder of Images Using Image Batch Processor App” on page 2-20

2-17
2 Introduction

Perform an Operation on a Sequence of Images

This example shows how to perform an operation on a sequence of images. The example creates an
array of images and passes the entire array to the stdfilt function to perform standard deviation
filtering on each image in the sequence.

Create an array of file names.

fileFolder = fullfile(matlabroot,'toolbox','images','imdata');
dirOutput = dir(fullfile(fileFolder,'AT3_1m4_*.tif'));
fileNames = {dirOutput.name}'
numFrames = numel(fileNames)

fileNames =

10x1 cell array

{'AT3_1m4_01.tif'}
{'AT3_1m4_02.tif'}
{'AT3_1m4_03.tif'}
{'AT3_1m4_04.tif'}
{'AT3_1m4_05.tif'}
{'AT3_1m4_06.tif'}
{'AT3_1m4_07.tif'}
{'AT3_1m4_08.tif'}
{'AT3_1m4_09.tif'}
{'AT3_1m4_10.tif'}

numFrames =

10

Preallocate an m -by- n -by- p array and read images into the array.

I = imread(fileNames{1});
sequence = zeros([size(I) numFrames],class(I));
sequence(:,:,1) = I;

for p = 2:numFrames
sequence(:,:,p) = imread(fileNames{p});
end

Process each image in the sequence, performing standard deviation filtering. Note that, to use
stdfilt with an image sequence, you must specify the nhood argument, passing a 2-D
neighborhood.

sequenceNew = stdfilt(sequence,ones(3));

View each input image followed by its processed image.

figure;
for k = 1:numFrames
imshow(sequence(:,:,k));

2-18
Perform an Operation on a Sequence of Images

title(sprintf('Original Image # %d',k));


pause(1);
imshow(sequenceNew(:,:,k),[]);
title(sprintf('Processed Image # %d',k));
pause(1);
end

2-19
2 Introduction

Process Folder of Images Using Image Batch Processor App

This example shows how to use the Image Batch Processor app to process a batch of images in a
folder or datastore.

Create a new folder in an area where you have write permission and copy a set of 10 images from the
Image Processing Toolbox imdata folder into the new folder.

mkdir("cellprocessing");
copyfile(fullfile(matlabroot,"toolbox","images","imdata","AT3*.tif"),"cellprocessing","f");

Load Images into Image Batch Processor App

Open the Image Batch Processor app from the MATLAB® toolstrip. On the Apps tab, in the Image

Processing and Computer Vision section, click Image Batch Processor .

Load images into the app. In the app toolstrip, click above Add. In the Load Images from Folder
dialog box, specify the folder containing the images you want to load. For this example, specify the
folder that you created in the first step, cellprocessing. By default, the app includes images in
subfolders. Then, click Load.

2-20
Process Folder of Images Using Image Batch Processor App

To load images from a folder, including subfolders, in the app toolstrip, under Add, select Folder,
include subfolders. To load images from an imageDatastore object in the MATLAB workspace,
under Add, select Image datastore from workspace.

The Image Batch Processor app creates thumbnails of the images in the folder and displays them in
a scrollable tab in the left pane. The app displays the first selected image (highlighted in blue) at a
greater resolution in the Input Image tab in the right pane.

Specify Batch Processing Function

Specify the name of the function you want to use to process the images in the folder. To specify an
existing function, type the name in the Function Name box in the Batch Function section of the app
toolstrip. The function can be a MATLAB function, such as imbinarize, or a previously created
custom batch function. If you use a MATLAB function, it must have the signature out = fcn(in).
You can also click the folder icon next to the box to browse and select the function. To create a new
batch processing function, click Create in the Batch Function section of the app toolstrip. When you
do this, the app opens the batch processing function template in the MATLAB® Editor. For this
example, click Create to create a new function.

2-21
2 Introduction

In the batch processing function template, enter code for the new function into the space reserved in
the template file and click Save. This example uses the default name for the batch processing
function, myimfcn, but you can specify any name. For this example, the code specifies a function that
creates a mask image, calculates the total number of cells in the image, and creates a thresholded
version of the original image.
function results = myimfcn(varargin)
%Image Processing Function
%
% VARARGIN - Can contain up to two inputs:
% IM - First input is a numeric array containing the image data.
% INFO - Second input is a scalar structure containing information about
% the input image source.
%
% INFO can be used to obtain metadata about the image read.
% To apply a batch function using the INFO argument, you must select the
% Include Image Info check box in the app toolstrip.
%
% RESULTS - A scalar struct with the processing results.
%
%
%
%--------------------------------------------------------------------------
% Auto-generated by imageBatchProcessor App.
%
% When used by the App, this function will be called for each input image

2-22
Process Folder of Images Using Image Batch Processor App

% file automatically.
%
%--------------------------------------------------------------------------

% Input parsing------------------------------------------------------------
im = varargin{1};

if nargin == 2
% Obtain information about the input image source
info = varargin{2};
end

% Replace the sample below with your code----------------------------------

imstd = stdfilt(im,ones(27));
bw = imstd>30;

thresholdMask = imfuse(im, bw);


[~, n] = bwlabel(bw);

results.bw = bw;
results.thresholdMask = thresholdMask;
results.numCells = n;

%--------------------------------------------------------------------------
end

Save the file. After saving, the app displays the name of this new function in the Function Name box
on the app toolstrip.

Process Images Using Batch Processing Function

Test the new function by running the batch processor on one of your images. With one image selected
(highlighted in blue), click Process Selected to process the selected image. The app displays the
results of the processing in a new panel called Results. For this example, the app displays the binary
mask, a count of the number of objects (cells) in the image, and a thresholded version of the image.

2-23
2 Introduction

To get a closer view of the image results, click Show for that particular result in the Results panel.
The app open a larger resolution version of the image in a new tab in a bottom-center pane. For this
example, view the binary mask results be clicking Show for bw in the Results panel. To explore the
results, move the cursor over the image to access the pan and zoom controls. When zooming and
panning, the app links the result image to the original image—panning or zooming on one image
causes the other image to move as well. If you do not want this behavior, clear Link Axes in the app
toolstrip.

2-24
Process Folder of Images Using Image Batch Processor App

If the results of the test run on one image are successful, then execute the function on all of the
images in the folder. To process all the images at once, on the app toolstrip, click Process Selected
and select Process All. To process only a subset of the images, click Process Selected. You can
select images to process either by pressing Ctrl and clicking the desired images or by clicking one
image to start, pressing Shift, and clicking another image to select all images in between the starting
and ending images. If you have Parallel Computing Toolbox™, you can click Use Parallel on the app
toolstrip to process the images on a local parallel pool. For this example, process all of the images.

The app processes all the images in the folder or datastore. A filled-in green square next to a
thumbnail indicates the app successfully processed that image. The Results panel contains the
results of the selected image (highlighted in blue). A status bar at the bottom-right of the app reports
on the number of images processed.

Export Processed Images and Processing Pipeline

To save the results, click Export to view the options available. You can export results to the
workspace or to a file, or you can get the MATLAB® code the app used to generate the results.

Save the results in a workspace variable. On the app toolstrip, click Export and select Export result
of all processed images to workspace option. In the dialog box that opens, select the results you
want to export. A common approach is to export the nonimage results to the workspace and save the
images that result from the processing in files. This example saves the cell count along with the name
of the input file to the workspace variable numCells.

2-25
2 Introduction

By default, the app returns the results you select in a table named allresults. To store the results
in a structure instead of a table, select Struct Array in the dialog box. To specify another name for the
result variable, change Variable name in the dialog box. If you select Include input image file
name, the app includes the name of the image associated with the results in the structure or table.
After specifying exporting details, click OK.

To get the MATLAB® code that the app used to process your files, on the app toolstrip, click Export
and select Generate function. The app generates a function that accepts the input folder name or
imageDatastore object, as well as the output folder name, as input arguments. By default, the
generated function returns a table with the results, but you can choose a structure instead. For image
results, you can specify the file format and whether you want the function to write the image to the
specified output folder.

See Also
Image Batch Processor | imageDatastore

Related Examples
• “Process Images Using Image Batch Processor App with File Metadata” on page 2-27

2-26
Process Images Using Image Batch Processor App with File Metadata

Process Images Using Image Batch Processor App with File


Metadata

This example shows how to access input file information while processing a batch of images in the
Image Batch Processor app.

The processing pipeline used in this example renders RGB images from RAW Bayer-pattern color
filter array (CFA) images. For an example that shows how to implement the pipeline for one image,
see “Implement Digital Camera Processing Pipeline” on page 3-42. Once you create the processing
pipeline, the next step in many image processing applications is to scale it up to operate on a batch of
images. You can batch process images using built-in or custom processing functions using the Image
Batch Processor. The custom batch processing function in this example uses the info input argument
to access file information for each image in the batch.

In this example, you load each image into the app as a raw CFA image using rawread. The batch
processing function applies this sequence of operations from a traditional camera processing
pipeline.

1 Linearize the CFA image.


2 Scale the CFA data to a suitable range.
3 Apply white-balance adjustment.
4 Demosaic the Bayer pattern.
5 Convert the demosaiced image to the sRGB color space.

If your application does not require a flexible camera processing pipeline, you can load a folder of
RAW files as RGB images directly into the app. By default, the app reads RAW files by using raw2rgb,
which automatically converts each CFA image into an RGB image. The approach used in this example
is more flexible and customized to each image. By using the info argument, the batch processing
function uses the metadata for each input file to apply custom intensity scaling and correction before
converting it to RGB.

Create Image Datastore

Create an image datastore object containing all files with the NEF file extension in the current
example folder. To read the raw CFA images, specify the ReadFcn name-value argument as
@rawread.

dataDir = pwd;
imds = imageDatastore(dataDir,FileExtensions=".nef",ReadFcn=@rawread);

Load Image Datastore into Image Batch Processor App

Load the ImageDatastore object into the Image Batch Processor app. The app opens and loads the
images from the datastore using the read function rawread, which returns the visible portion of the
CFA image in each file. The app creates thumbnails of the images in the datastore and displays them
in a scrollable tab in the left pane. The app displays the first selected image (highlighted in blue) in
larger resolution in the Input Image tab in the right pane. In this example, the datastore contains
only one NEF file, so only one thumbnail is visible.

imageBatchProcessor(imds)

2-27
2 Introduction

Specify Batch Processing Function

Specify the name of the function you want to use to process the images. To specify an existing custom
function or a built-in MATLAB function, type the name in the Function Name box in the Batch
Function section of the app toolstrip. You can also click Open next to the box to browse and select
the function. To create a new batch processing function, click Create in the Batch Function section
of the app toolstrip. The app then opens the batch processing function template in the MATLAB®
Editor. For this example, click Create to create a new function.

2-28
Process Images Using Image Batch Processor App with File Metadata

In the batch processing function template, enter code for the new function into the space reserved in
the template file and click Save. This example uses the default name for the batch processing
function, myimfcn, but you can specify any name.

The function template uses varargin to allow either one or two input arguments. The first input
argument is always the image data array im. The second argument, if included, is a structure, info,
that contains these fields:

• Filename — The image source filename, including the path string, name of the file, and file
extension.
• FileSize — Total file size, in bytes.
• Label — Image label name, if present. Otherwise, the Label field contains an empty string.

The Filename field is required to read the file metadata used by the camera processing pipeline. The
myimfcn function accesses the filename using fileName = info.Filename. The rest of the script
uses the image array and metadata attributes to balance the image levels and convert the image to
the sRGB color space. The last two lines of the script assign the variables to be exported in the output
results. Export the intermediate result imDemosaicLinear and the final sRGB image imsRGB.
function results = myimfcn(varargin)
%Image Processing Function

2-29
2 Introduction

%
% VARARGIN - Can contain up to two inputs:
% IM - First input is a numeric array containing the image data.
% INFO - Second input is a scalar structure containing information about
% the input image source.
%
% INFO can be used to obtain metadata about the image read.
% To apply a batch function using the INFO argument, you must select the
% Include Image Info check box in the app toolstrip.
%
% RESULTS - A scalar struct with the processing results.
%
%
%
%--------------------------------------------------------------------------
% Auto-generated by imageBatchProcessor App.
%
% When used by the App, this function will be called for each input image
% file automatically.
%
%--------------------------------------------------------------------------

% Input parsing------------------------------------------------------------
im = varargin{1};

if nargin == 2
% Obtain information about the input image source
info = varargin{2};
end

% Replace the sample below with your code----------------------------------

% Get filename from INFO argument


fileName = info.Filename;

% Read file metadata


cfaInfo = rawinfo(fileName);

% Perform black level correction


colorInfo = cfaInfo.ColorInfo;
blackLevel = colorInfo.BlackLevel;
blackLevel = reshape(blackLevel,[1 1 numel(blackLevel)]);
blackLevel = planar2raw(blackLevel);
repeatDims = cfaInfo.ImageSizeInfo.VisibleImageSize ./ size(blackLevel);
blackLevel = repmat(blackLevel,repeatDims);
cfaImage = im - blackLevel;

% Clamp negative pixel values to 0


cfaImage = max(0,cfaImage);

% Scale pixel values to maximum pixel value


cfaImage = double(cfaImage);
maxValue = max(cfaImage(:));
cfaImage = cfaImage ./ maxValue;

% Adjust white balance


whiteBalance = colorInfo.CameraAsTakenWhiteBalance;
gLoc = strfind(cfaInfo.CFALayout,"G");

2-30
Process Images Using Image Batch Processor App with File Metadata

gLoc = gLoc(1);
whiteBalance = whiteBalance/whiteBalance(gLoc);
whiteBalance = reshape(whiteBalance,[1 1 numel(whiteBalance)]);
whiteBalance = planar2raw(whiteBalance);
whiteBalance = repmat(whiteBalance,repeatDims);
cfaWB = cfaImage .* whiteBalance;
cfaWB = im2uint16(cfaWB);

% Demosaic
cfaLayout = cfaInfo.CFALayout;
imDebayered = demosaic(cfaWB,cfaLayout);

% Convert to sRGB color space


cam2srgbMat = colorInfo.CameraTosRGB;
imTransform = imapplymatrix(cam2srgbMat,imDebayered,"uint16");
srgbTransform = lin2rgb(imTransform);

% Assign results to export


results.imDemosaicLinear = imDebayered;
results.imsRGB = srgbTransform;

%--------------------------------------------------------------------------
end

After saving the file, the app displays the name of this new custom function in the Function Name
box on the app toolstrip.

Process Images Using Batch Processing Function

Test the new function by running the batch processor on one of your images. To pass the info
argument to the custom batch function, in the Batch Function section of the app toolstrip, select
Include Image Info. You must select the checkbox if the batch processing function expects the info
argument or the app returns an error when you try to process images.

With one image selected (highlighted in blue), click Process Selected Images to process the
selected image. The app displays the results of the processing in a new Results pane. For this
example, the app displays the demosaiced RGB image and the final sRGB image.

2-31
2 Introduction

To get a closer view of the image results, in the Results pane, click Show for that particular result.
The app opens a larger resolution version of the image in a new tab in a bottom-center pane. For this
example, view the sRGB results. In the Results pane, click Show for imsRGB. To explore the results,
move the cursor over the image to access the pan and zoom controls. When zooming and panning, the
app links the result image to the original image—panning or zooming on one image pans or zooms on
the other image as well. If you do not want this behavior, clear Link Axes in the app toolstrip.

2-32
Process Images Using Image Batch Processor App with File Metadata

If the results of the test run on one image are successful, then you can execute the function on all of
the images in the datastore. To process all the images at once, on the app toolstrip, click Process
Selected and select Process All. You can also process only a subset of the images by selecting those
images and clicking Process Selected Images. You can select images to process either by holding
Ctrl and clicking each image you want to include or by clicking one image, holding Shift, and
clicking another image to select all images in between and including the two images. If you have
Parallel Computing Toolbox™, you can click Use Parallel on the app toolstrip to process the images
on a local parallel pool. For this example, select Process All to process all of the images.

The app processes all the images in the datastore. A filled-in green circle with a check mark next to a
thumbnail indicates the app successfully processed that image. The Results pane contains the results
of the selected image thumbnail (highlighted in blue). A status bar at the bottom of the app window
reports on the number of images processed and how many have been processed successfully.

Export Processed Images and Processing Pipeline

You can save your results by exporting them to the workspace or to a file. Alternatively, you can
generate a MATLAB function from the code the app uses to generate the results.

Save the results in a workspace variable. On the app toolstrip, click Export and select Export result
of all processed images to workspace. In the dialog box that opens, select which results you want

2-33
2 Introduction

to export. For example, you can export the nonimage results to the workspace and save the images
returned by your batch processing function to files. For this example, save the image results and the
name of the input file to the workspace variable allresults.

By default, the app returns the results you select in a table named allresults. To store the results
in a structure instead of a table, select Struct Array. To specify another name for the results
variable, change Variable name in the dialog box. If you select Include input image file name, the
app includes the name of the image associated with the results in the structure or table. After
specifying your exporting details, click OK.

To generate a MATLAB function from the code that the app uses to process your files, on the app
toolstrip, click Export and select Generate function. The app opens the Generate function dialog
box, where you can specify options for the generated function. By default, the generated function
returns a table with the results, but you can select Struct Array to output a structure instead. For
image results, you can specify whether to write the image to a specified output folder or include it in
the results output. When outputting the images to files, you can also specify the file format for each.
Once you click OK, the app generates a function that accepts an image source, as a folder name or
imageDatastore object, and an output folder name as input arguments.

See Also
Image Batch Processor | imageDatastore

Related Examples
• “Process Folder of Images Using Image Batch Processor App” on page 2-20
• “Implement Digital Camera Processing Pipeline” on page 3-42

2-34
Process Large Set of Images Using MapReduce Framework and Hadoop

Process Large Set of Images Using MapReduce Framework and


Hadoop

This example shows how to execute a cell counting algorithm on a large number of images using
Image Processing Toolbox™ with MATLAB® MapReduce and datastores. MapReduce is a
programming technique for analyzing data sets that do not fit in memory. The example also uses
MATLAB Parallel Server™ to run parallel MapReduce programs on Hadoop® clusters. The example
shows how to test your algorithm on a local system on a subset of the images before moving it to the
Hadoop cluster.

Download Sample Data

Download the BBBC005v1 data set from the Broad Bioimage Benchmark Collection. This data set is
an annotated biological image set designed for testing and validation. The image set provides
examples of in- and out-of-focus synthetic images, which can be used for validation of focus metrics.
The data set contains almost 20,000 files. For more information, see this introduction to the data set.

At the system prompt on a Linux® system, use the wget command to download the zip file containing
the BBBC data set. Before running this command, make sure that your target location has enough
space to hold the zip file (1.8 GB) and the extracted images (2.6 GB).

wget https://data.broadinstitute.org/bbbc/BBBC005/BBBC005_v1_images.zip

At the system prompt on a Linux system, extract the files from the zip file.

unzip BBBC005_v1_images.zip

Examine the image file names in this data set. The names are constructed in a specific format to
contain useful information about each image. For example, the file name BBBC005_v1_images/
SIMCEPImages_A05_C18_F1_s16_w1.TIF indicates that the image contains 18 cells (C18) and was
filtered with a Gaussian low-pass filter with diameter 1 and a sigma of 0.25x diameter to simulate
focus blur (F1). The w1 identifies the stain used. For example, find the number of images in the data
set that use the w1 stain.
d = dir('C:\Temp\BBBCdata\BBBC005_v1_images\*w1*');
numel(d)

ans = 9600

Test Algorithm on Sample Image

View the files in the BBBC data set and test an algorithm on a small subset of the files using the
Image Batch Processor app. The example tests a simple algorithm that segments the cells in the
images. (The example uses a modified version of this cell segmentation algorithm to create the cell
counting algorithm used in the MapReduce implementation.)

Load Image Files into the Image Batch Processor

Open the Image Batch Processor app. From the MATLAB toolstrip, on the Apps tab, in the Image
Processing and Computer Vision section, click Image Batch Processor. You can also open the app
from the command line using the imageBatchProcessor command.

In the Image Batch Processor app, click Import Images and navigate to the folder in which you
stored the downloaded data set.

2-35
2 Introduction

The Image Batch Processor app displays thumbnails of the images in the folder in the left pane and
a higher-resolution version of the currently selected image in the Input Image tab. View some of the
images to get familiar with the data set.

2-36
Process Large Set of Images Using MapReduce Framework and Hadoop

Specify Segmentation Function

Specify the name of the function that implements your cell segmentation algorithm. To specify an
existing function, type its name in the Function name field or click the folder icon to browse and
select the function. To create a new batch processing function, click Create. The app opens the batch
function template in the MATLAB® editor. For this example, create a new function containing the
following image segmentation code. Click Save to create the batch function. The app updates to
display the name of the function you created in the Batch Function section of the app toolstrip.
function imout = cellSegmenter(im) % A simple cell segmenter
% Otsu thresholding
bw = imbinarize(im);

% Show thresholding result in app


imout = imfuse(im,bw);

% Find area of blobs


stats = regionprops('table',bw,{'Area'});

% Average cell diameter is about 33 pixels (based on random inspection)


cellArea = pi*(33/2)^2;

% Estimate cell count based on area of blobs


cellsPerBlob = stats.Area/cellArea;
cellCount = sum(round(cellsPerBlob));
disp(cellCount);
end

Test Your Segmentation Function on Sample Image

Select the thumbnail of an image displayed in the app and click Process Selected to execute a test
run of your algorithm. For this example, choose only an image with the “w1” stain (identifiable in the
file name). The segmentation algorithm works best with these images.

Examine the results of running your algorithm to verify that your segmentation algorithm found the
correct number of cells in the image. The names of the images contain the cell count in the C number.
For example, the image named SIMCEPImages_A05_C18_F1_s05_w1.TIF contains 18 cells.
Compare this number to the results returned at the command line for the sample image.

Test Algorithm on MapReduce Framework Locally

After assuring that your segmentation code works as expected on one image, set up a small test
version on your local system of the large scale processing you want to perform. You should test your
processing framework before running it on thousands of files.

2-37
2 Introduction

Load Image Files into Image Datastore

First, create an image datastore, using the imageDatastore function, containing a small subset of
your images. MapReduce uses a datastore to process data in small chunks that individually fit into
memory. Move to the folder containing the images and create an image datastore. Because the cell
segmentation algorithm implemented in cellSegmenter.m works best with the cell body stain,
select only the files with the indicator w1 in their file names.

localimds = imageDatastore(fullfile('/your_data/broad_data/BBBC005_v1-
images','*w1*'));

Even limiting the selection to file with "w1" in their names, the image data store still contains over
9000 files. Subset the list of images further, selecting every 100th file from the thousands of files in
the data set.

localimds.Files = localimds.Files(1:100:end);

Repackage the Sample Set into an Hadoop Sequence File

Once you have created the image datastore, convert the sample subset of images into Hadoop
sequence files, a format used by the Hadoop cluster. Note that this step simply changes the data from
one storage format to another without changing the data value. For more information about sequence
files, see “Getting Started with MapReduce”.

To convert the image datastore to an Hadoop sequence file, create a “map” function and a “reduce”
function which you pass to the mapreduce function. To convert the image files to Hadoop sequence
files, the map function should be a no-op function. For this example, the map function simply saves
the image data as-is, using its file name as a key.
function identityMap(data, info, intermKVStore)
add(intermKVStore, info.Filename, data);
end

Create a reduce function that converts the image files into a key-value datastore backed by sequence
files.
function identityReduce(key, intermValueIter, outKVStore)
while hasnext(intermValueIter)
add(outKVStore, key, getnext(intermValueIter));
end
end

Call mapreduce, passing your map and reduce functions. The example first calls the mapreducer
function to specify where the processing takes place. To test your set up and perform the processing
on your local system, specify 0.

mapreducer(0);

When run locally, mapreduce creates a key-value datastore backed by MAT-files.

localmatds =
mapreduce(localimds,@identityMap,@identityReduce,'OutputFolder',pwd);

Test MapReduce Framework Locally

After creating the subset of image files for testing, and converting them to a key-value datastore, you
are ready to test the algorithm. Modify your original cell segmentation algorithm to return the cell

2-38
Process Large Set of Images Using MapReduce Framework and Hadoop

count. (The Image Batch Processor app, where this example first tested the algorithm, can only
return processed images, not values such as the cell count.)

Modify the cell segmentation function to return a cell count and remove the display of the image.

function cellCount = cellCounter(im)


% Otsu thresholding
bw = imbinarize(im);

% Find area of blobs


stats = regionprops('table',bw,{'Area'});

% Average cell diameter is about 33 pixels (based on random inspection)


cellArea = pi*(33/2)^2;

% Estimate cell count based on area of blobs


cellsPerBlob = stats.Area/cellArea;
cellCount = sum(round(cellsPerBlob));
end

Create a map function that calculates the error count for a specific image. This function gets the
actual cell count for an image from the file name coding (the C number) and compares it to the cell
count returned by the segmentation algorithm.

function mapImageToMisCountError(data, ~, intermKVStore)


% Extract the image
im = data.Value{1};
% Call the cell counting algorithm
actCount = cellCounter(im);
% The original file name is available as the key
fileName = data.Key{1};
[~, name] = fileparts(fileName);
% Extract expected cell count and focus blur from the file name
strs = strsplit(name, '_');
expCount = str2double(strs{3}(2:end));
focusBlur = str2double(strs{4}(2:end));
diffCount = abs(actCount-expCount);
% Note: focus blur is the key
add(intermKVStore, focusBlur, diffCount);
end

Create a reduce function that computes the average error in cell count for each focus value.

function reduceErrorCount(key, intermValueIter, outKVStore)


focusBlur = key;
% Compute the sum of all differences in cell count for this value of
% focus blur
count = 0;
totalDiff = 0;
while hasnext(intermValueIter)
diffCount = getnext(intermvalueIter);
count = count + 1;
totalDiff = totalDiff+diffCount;
end
% Average
meanDiff = totalDiff/count;
add(outKVStore, focusBlue, meanDiff);
end

2-39
2 Introduction

Run the mapreduce job on your local system.

focusErrords =
mapreduce(localmatds,@mapImageToMisCountError,@reduceErrorCount);

Gather the results.

focusErrorTbl = readall(focusErrords);

Get the average error values.

averageErrors = cell2mat(focusErrorTbl.Value);

The simple cell counting algorithm used here relies on the average area of a cell or a group of cells.
Increasing focus blur diffuses cell boundaries, and thus the area. The expected result is for the error
to go up with increasing focus blur, as seen in this plot of the results.

function plot_errors()
bar(focusErrorTbl.Key, averageErrors);
ha = gca;
ha.XTick = sort(focusErrorTbl.Key);
ha.XLim = [min(focusErrorTbl.Key)-2 max(focusErrorTbl.Key)+2];
title('Cell counting result on a test data set');
xlabel('Focus blur');
ylabel('Average error in cell count');
end

2-40
Process Large Set of Images Using MapReduce Framework and Hadoop

Run MapReduce Framework on Hadoop Cluster

Now that you've verified the processing of your algorithm on a subset of your data, run your
algorithm on the full dataset on a Hadoop cluster.

Load Data into the Hadoop File System

Load all the image data into the Hadoop file system and run your MapReduce framework on a
Hadoop cluster, using the following shell commands. To run this command, replace your_data with
the location on your computer.

hadoop fs -mkdir /user/broad_data/

hadoop fs -copyFromLocal /your_data/broad_data/BBBC005_v1_images /user/


broad_data/BBBC005_v1_images

Set Up Access to MATLAB Parallel Server Cluster

Set up access to the MATLAB Parallel Server cluster. To run this command, replace 'your/hadoop/
install' with the location on your computer.

setenv('HADOOP_HOME','/your/hadoop/install');

cluster = parallel.cluster.Hadoop;

cluster.HadoopProperties('mapred.job.tracker') = 'hadoop01glnxa64:54311';

cluster.HadoopProperties('fs.default.name') = 'hdfs://hadoop01glnxa64:54310';

disp(cluster);

Change Mapreduce Execution Environment to Remote Cluster

Change the mapreduce execution environment to point to the remote cluster.

mapreducer(cluster);

Convert All Image Data into Hadoop Sequence Files

Convert all the image data into Hadoop sequence files. This is similar to what you did on your local
system when you converted a subset of the images for prototyping. You can reuse the map and reduce
functions you used previously. Use the internal Hadoop cluster.

broadFolder = 'hdfs://hadoop01glnxa64:54310/user/broad_data/
BBBC005_v1_images';

Pick only the cell body stain (w1) files for processing.

w1Files = fullfile(broadFolder,'*w1*.TIF');

Create an ImageDatastore representing all these files

imageDS = imageDatastore(w1Files);

Specify the output folder.

2-41
2 Introduction

seqFolder = 'hdfs://hadoop01glnxa64:54310/user/datasets/images/broad_data/
broad_sequence';

Convert the images to a key-value datastore.

seqds =
mapreduce(imageDS,@identityMap,@identityReduce,'OutputFolder',seqFolder);

Run Cell Counting Algorithm on Entire Data Set

Run the cell counting algorithm on the entire data set stored in the Hadoop file system using the
MapReduce framework. The only change from running the framework on your local system is that
now the input and output locations are on the Hadoop file system.

First, specify the output location for error count.

output = 'hdfs://hadoop01glnxa64:54310/user/broad_data/
BBBC005_focus_vs_errorCount';

Run your algorithm on the Mapreduce framework. Use the tic and toc functions to record how long
it takes to process the set of images.

tic

focusErrords =
mapreduce(seqds,@mapImageToMisCountError,@reduceErrorCount,'OutputFolder',out
put);

toc

Gather results.

focusErrorTbl = readall(focusErrords);

averageErrors = cell2mat(focusErrorTbl.Value);

Plot the results, as before.

function reduceErrorCountAll(key, intermValueIter, outKVStore)


bar(focusErrorTbl.Key, averageErrors);
ha = gca;
ha.XTick = sort(focusErrorTbl.Key);
ha.XLim = [min(focusErrorTbl.Key)-2 max(focusErrorTbl.Key)+2];
title('Cell counting result on the entire data set');
xlabel('Focus blur');
ylabel('Average error in cell count');
end

2-42
Process Large Set of Images Using MapReduce Framework and Hadoop

See Also
ImageDatastore | mapreduce

More About
• “Getting Started with MapReduce”
• “Work with Remote Data”

2-43
2 Introduction

Detecting Cars in a Video of Traffic

This example shows how to use Image Processing Toolbox™ to visualize and analyze videos or image
sequences. This example uses VideoReader (MATLAB®), implay, and other Image Processing
Toolbox functions to detect light-colored cars in a video of traffic. Note that VideoReader has
platform-specific capabilities and may not be able to read the supplied Motion JPEG2000 video on
some platforms.

Step 1: Access Video with VideoReader

The VideoReader function constructs a multimedia reader object that can read video data from a
multimedia file. See VideoReader for information on which formats are supported on your platform.

Use VideoReader to access the video and get basic information about it.

trafficVid = VideoReader('traffic.mj2')

trafficVid =

VideoReader with properties:

General Properties:
Name: 'traffic.mj2'
Path: 'B:\matlab\toolbox\images\imdata'
Duration: 8
CurrentTime: 0
NumFrames: 120

Video Properties:
Width: 160
Height: 120
FrameRate: 15
BitsPerPixel: 24
VideoFormat: 'RGB24'

The get method provides more information on the video such as its duration in seconds.

get(trafficVid)

obj =

VideoReader with properties:

General Properties:
Name: 'traffic.mj2'
Path: 'B:\matlab\toolbox\images\imdata'
Duration: 8
CurrentTime: 0
NumFrames: 120

Video Properties:
Width: 160
Height: 120

2-44
Detecting Cars in a Video of Traffic

FrameRate: 15
BitsPerPixel: 24
VideoFormat: 'RGB24'

Step 2: Explore Video with IMPLAY

Explore the video in implay.


implay('traffic.mj2');

Step 3: Develop Your Algorithm

When working with video data, it can be helpful to select a representative frame from the video and
develop your algorithm on that frame. Then, this algorithm can be applied to the processing of all the
frames in the video.

For this car-tagging application, examine a frame that includes both light-colored and dark-colored
cars. When an image has many structures, like the traffic video frames, it is useful to simplify the
image as much as possible before trying to detect an object of interest. One way to do this for the car
tagging application is to suppress all objects in the image that are not light-colored cars (dark-colored
cars, lanes, grass, etc.). Typically, it takes a combination of techniques to remove these extraneous
objects.

One way to remove the dark-colored cars from the video frames is to use the imextendedmax
function. This function returns a binary image that identifies regions with intensity values above a
specified threshold, called regional maxima. All other objects in the image with pixel values below

2-45
2 Introduction

this threshold become the background. To eliminate the dark-colored cars, determine the average
pixel value for these objects in the image. (Use im2gray to convert the original video from RGB to
grayscale.) You can use the pixel region tool in implay to view pixel values. Specify the average pixel
value (or a value slightly higher) as the threshold when you call imextendedmax. For this example,
set the value to 50.
darkCarValue = 50;
darkCar = im2gray(read(trafficVid,71));
noDarkCar = imextendedmax(darkCar, darkCarValue);
imshow(darkCar)
figure, imshow(noDarkCar)

In the processed image, note how most of the dark-colored car objects are removed but many other
extraneous objects remain, particularly the lane-markings. The regional maxima processing will not
remove the lane markings because their pixel values are above the threshold. To remove these
objects, you can use the morphological function imopen. This function uses morphological processing
to remove small objects from a binary image while preserving large objects. When using
morphological processing, you must decide on the size and shape of the structuring element used in
the operation. Because the lane-markings are long and thin objects, use a disk-shaped structuring
element with radius corresponding to the width of the lane markings. You can use the pixel region
tool in implay to estimate the width of these objects. For this example, set the value to 2.
sedisk = strel('disk',2);
noSmallStructures = imopen(noDarkCar, sedisk);
imshow(noSmallStructures)

2-46
Detecting Cars in a Video of Traffic

To complete the algorithm, use regionprops to find the centroid of the objects in
noSmallStructures (should just be the light-colored cars). Use this information to position the tag
on the light-colored cars in the original video.

Step 4: Apply the Algorithm to the Video

The car-tagging application processes the video one frame at a time in a loop. (Because a typical
video contains a large number of frames, it would take a lot of memory to read and process all the
frames at once.)

A small video (like the one in this example) could be processed at once, and there are many functions
that provide this capability. For more information, see “Process Image Sequences” on page 2-16.

For faster processing, preallocate the memory used to store the processed video.

nframes = trafficVid.NumberOfFrames;
I = read(trafficVid, 1);
taggedCars = zeros([size(I,1) size(I,2) 3 nframes], class(I));

for k = 1 : nframes
singleFrame = read(trafficVid, k);

% Convert to grayscale to do morphological processing.


I = rgb2gray(singleFrame);

% Remove dark cars.


noDarkCars = imextendedmax(I, darkCarValue);

% Remove lane markings and other non-disk shaped structures.


noSmallStructures = imopen(noDarkCars, sedisk);

% Remove small structures.


noSmallStructures = bwareaopen(noSmallStructures, 150);

% Get the area and centroid of each remaining object in the frame. The
% object with the largest area is the light-colored car. Create a copy
% of the original frame and tag the car by changing the centroid pixel
% value to red.
taggedCars(:,:,:,k) = singleFrame;

stats = regionprops(noSmallStructures, {'Centroid','Area'});

2-47
2 Introduction

if ~isempty([stats.Area])
areaArray = [stats.Area];
[junk,idx] = max(areaArray);
c = stats(idx).Centroid;
c = floor(fliplr(c));
width = 2;
row = c(1)-width:c(1)+width;
col = c(2)-width:c(2)+width;
taggedCars(row,col,1,k) = 255;
taggedCars(row,col,2,k) = 0;
taggedCars(row,col,3,k) = 0;
end
end

Step 5: Visualize Results

Get the frame rate of the original video and use it to see taggedCars in implay.

frameRate = trafficVid.FrameRate;
implay(taggedCars,frameRate);

See Also
Video Viewer | VideoReader | rgb2gray | imextendedmax | imopen | regionprops |
bwareaopen

2-48
Detecting Cars in a Video of Traffic

More About
• “Work with Image Sequences as Multidimensional Arrays” on page 2-15
• “Perform an Operation on a Sequence of Images” on page 2-18

2-49
2 Introduction

Image Arithmetic Functions


Image arithmetic is the implementation of standard arithmetic operations, such as addition,
subtraction, multiplication, and division, on images. Image arithmetic has many uses in image
processing both as a preliminary step in more complex operations and by itself. For example, image
subtraction can be used to detect differences between two or more images of the same scene or
object.

You can do image arithmetic using the MATLAB arithmetic operators. The Image Processing Toolbox
software also includes a set of functions that implement arithmetic operations for all numeric,
nonsparse data types. The toolbox arithmetic functions accept any numeric data type, including
uint8, uint16, and double, and return the result image in the same format. The functions perform
the operations in double precision, on an element-by-element basis, but do not convert images to
double-precision values in the MATLAB workspace. Overflow is handled automatically. The functions
clip return values to fit the data type.

Note On Intel® architecture processors, the image arithmetic functions can take advantage of the
Intel Integrated Performance Primitives (Intel IPP) library, thus accelerating their execution time. The
Intel IPP library is only activated, however, when the data passed to these functions is of specific data
types. See the reference pages for the individual arithmetic functions for more information.

See Also

More About
• “Image Arithmetic Clipping Rules” on page 2-51
• “Nest Calls to Image Arithmetic Functions” on page 2-52

2-50
Image Arithmetic Clipping Rules

Image Arithmetic Clipping Rules


The results of integer arithmetic can easily overflow the data type allotted for storage. For example,
the maximum value you can store in uint8 data is 255. Arithmetic operations can also result in
fractional values, which cannot be represented using integer arrays.

MATLAB arithmetic operators and the Image Processing Toolbox arithmetic functions use these rules
for integer arithmetic:

• Values that exceed the range of the integer type are clipped, or truncated, to that range.
• Fractional values are rounded.

For example, if the data type is uint8, results greater than 255 (including Inf) are set to 255. The
table lists some additional examples.

Result Data type Clipped Value


300 uint8 255
-45 uint8 0
10.5 uint8 11

See Also

More About
• “Nest Calls to Image Arithmetic Functions” on page 2-52

2-51
2 Introduction

Nest Calls to Image Arithmetic Functions


You can use the image arithmetic functions in combination to perform a series of operations. For
example, to calculate the average of two images,

A+B
C=
2

You could enter

I = imread('rice.png');
I2 = imread('cameraman.tif');
K = imdivide(imadd(I,I2),2); % not recommended

When used with uint8 or uint16 data, each arithmetic function rounds and clips its result before
passing it on to the next operation. This can significantly reduce the precision of the calculation.

A better way to perform this calculation is to use the imlincomb function. imlincomb performs all
the arithmetic operations in the linear combination in double precision and only rounds and clips the
final result.

K = imlincomb(.5,I,.5,I2); % recommended

See Also
imlincomb

More About
• “Image Arithmetic Clipping Rules” on page 2-51

2-52
Find Vegetation in a Multispectral Image

Find Vegetation in a Multispectral Image

This example shows how to use MATLAB® array arithmetic to process images and plot image data. In
particular, this example works with a three-dimensional image array where the three planes
represent the image signal from different parts of the electromagnetic spectrum, including the visible
red and near-infrared (NIR) channels.

Image data differences can be used to distinguish different surface features of an image, which have
varying reflectivity across different spectral channels. By finding differences between the visible red
and NIR channels, the example identifies areas containing significant vegetation.

Step 1: Import Color-Infrared Channels from a Multispectral Image File

This example finds vegetation in a LANDSAT Thematic Mapper image covering part of Paris, France,
made available courtesy of Space Imaging, LLC. Seven spectral channels (bands) are stored in one
file in the Erdas LAN format. The LAN file, paris.lan, contains a 7-channel 512-by-512 Landsat
image. A 128-byte header is followed by the pixel values, which are band interleaved by line (BIL) in
order of increasing band number. Pixel values are stored as unsigned 8-bit integers, in little-endian
byte order.

The first step is to read bands 4, 3, and 2 from the LAN file using the MATLAB® function
multibandread.

Channels 4, 3, and 2 cover the near infrared (NIR), the visible red, and the visible green parts of the
electromagnetic spectrum. When they are mapped to the red, green, and blue planes, respectively, of
an RGB image, the result is a standard color-infrared (CIR) composite. The final input argument to
multibandread specifies which bands to read, and in which order, so that you can construct a
composite in a single step.

CIR = multibandread('paris.lan',[512, 512, 7],'uint8=>uint8',...


128,'bil','ieee-le',{'Band','Direct',[4 3 2]});

Variable CIR is a 512-by-512-by-3 array of class uint8. It is an RGB image, but with false colors.
When the image is displayed, red pixel values signify the NIR channel, green values signify the visible
red channel, and blue values signify the visible green channel.

In the CIR image, water features are very dark (the Seine River) and green vegetation appears red
(parks and shade trees). Much of the image appearance is due to the fact that healthy, chlorophyll-
rich vegetation has a high reflectance in the near infrared. Because the NIR channel is mapped to the
red channel in the composite image, any area with a high vegetation density appears red in the
display. A noticeable example is the area of bright red on the left edge, a large park (the Bois de
Boulogne) located west of central Paris within a bend of the Seine River.

imshow(CIR)
title('CIR Composite')
text(size(CIR,2),size(CIR,1) + 15,...
'Image courtesy of Space Imaging, LLC',...
'FontSize',7,'HorizontalAlignment','right')

2-53
2 Introduction

By analyzing differences between the NIR and red channels, you can quantify this contrast in spectral
content between vegetated areas and other surfaces such as pavement, bare soil, buildings, or water.

Step 2: Construct an NIR-Red Spectral Scatter Plot

A scatter plot is a natural place to start when comparing the NIR channel (displayed as red pixel
values) with the visible red channel (displayed as green pixel values). It's convenient to extract these
channels from the original CIR composite into individual variables. It's also helpful to convert from
class uint8 to class single so that the same variables can be used in the NDVI computation below,
as well as in the scatter plot.
NIR = im2single(CIR(:,:,1));
R = im2single(CIR(:,:,2));

Viewing the two channels together as grayscale images, you can see how different they look.

2-54
Find Vegetation in a Multispectral Image

imshow(R)
title('Visible Red Band')

imshow(NIR)
title('Near Infrared Band')

2-55
2 Introduction

With one simple call to the plot command in MATLAB, you can create a scatter plot displaying one
point per pixel (as a blue cross, in this case), with its x-coordinate determined by its value in the red
channel and its y-coordinate by the value its value in the NIR channel.

plot(R,NIR,'+b')
ax = gca;
ax.XLim = [0 1];
ax.XTick = 0:0.2:1;
ax.YLim = [0 1];
ax.YTick = 0:0.2:1;
axis square
xlabel('red level')
ylabel('NIR level')
title('NIR vs. Red Scatter Plot')

2-56
Find Vegetation in a Multispectral Image

The appearance of the scatter plot of the Paris scene is characteristic of a temperate urban area with
trees in summer foliage. There's a set of pixels near the diagonal for which the NIR and red values
are nearly equal. This "gray edge" includes features such as road surfaces and many rooftops. Above
and to the left is another set of pixels for which the NIR value is often well above the red value. This
zone encompasses essentially all of the green vegetation.

Step 3: Compute Vegetation Index via MATLAB® Array Arithmetic

Observe from the scatter plot that taking the ratio of the NIR level to red level would be one way to
locate pixels containing dense vegetation. However, the result would be noisy for dark pixels with
small values in both channels. Also notice that the difference between the NIR and red channels
should be larger for greater chlorophyll density. The Normalized Difference Vegetation Index (NDVI)
is motivated by this second observation. It takes the (NIR - red) difference and normalizes it to help
balance out the effects of uneven illumination such as the shadows of clouds or hills. In other words,

2-57
2 Introduction

on a pixel-by-pixel basis subtract the value of the red channel from the value of the NIR channel and
divide by their sum.

ndvi = (NIR - R) ./ (NIR + R);

Notice how the array-arithmetic operators in MATLAB make it possible to compute an entire NDVI
image in one simple command. Recall that variables R and NIR have class single. This choice uses
less storage than class double but unlike an integer class also allows the resulting ratio to assume a
smooth gradation of values.

Variable ndvi is a 2-D array of class single with a theoretical maximum range of [-1 1]. You can
specify these theoretical limits when displaying ndvi as a grayscale image.

figure
imshow(ndvi,'DisplayRange',[-1 1])
title('Normalized Difference Vegetation Index')

2-58
Find Vegetation in a Multispectral Image

The Seine River appears very dark in the NDVI image. The large light area near the left edge of the
image is the park (Bois de Boulogne) noted earlier.

Step 4: Locate Vegetation -- Threshold the NDVI Image

In order to identify pixels most likely to contain significant vegetation, apply a simple threshold to the
NDVI image.
threshold = 0.4;
q = (ndvi > threshold);

The percentage of pixels selected is thus


100 * numel(NIR(q(:))) / numel(NIR)

ans = 5.2204

2-59
2 Introduction

or about 5 percent.

The park and other smaller areas of vegetation appear white by default when displaying the logical
(binary) image q.
imshow(q)
title('NDVI with Threshold Applied')

Step 5: Link Spectral and Spatial Content

To link the spectral and spatial content, you can locate above-threshold pixels on the NIR-red scatter
plot, re-drawing the scatter plot with the above-threshold pixels in a contrasting color (green) and
then re-displaying the threshold NDVI image using the same blue-green color scheme. As expected,
the pixels having an NDVI value above the threshold appear to the upper left of the rest and
correspond to the redder pixels in the CIR composite displays.

2-60
Find Vegetation in a Multispectral Image

Create the scatter plot, then display the thresholded NDVI.

figure
subplot(1,2,1)
plot(R,NIR,'+b')
hold on
plot(R(q(:)),NIR(q(:)),'g+')
axis square
xlabel('red level')
ylabel('NIR level')
title('NIR vs. Red Scatter Plot')

subplot(1,2,2)
imshow(q)
colormap([0 0 1; 0 1 0]);
title('NDVI with Threshold Applied')

See Also
im2single | imshow | multibandread

Related Examples
• “Enhance Multispectral Color Composite Images” on page 8-90

2-61
2 Introduction

More About
• “Images in MATLAB” on page 2-2

2-62
Image Coordinate Systems

Image Coordinate Systems


You can access locations in images using several different image coordinate systems. You can specify
locations using discrete pixel indices because images are stored as arrays. You can also specify
locations using continuous spatial coordinates because images represent real-world scenes in
continuous space.

Pixel Indices
As described in “Images in MATLAB” on page 2-2, MATLAB stores most images as arrays. Each (row,
column) index of the array corresponds to a single pixel in the displayed image.

There is a one-to-one correspondence between pixel indices and subscripts for the first two matrix
dimensions. Similar to array indexing in MATLAB, pixel indices are integer values and range from 1
to the length of the row or column. The indices are ordered from top to bottom, and from left to right.

For example, the data for the pixel in the fifth row, second column is stored in the matrix element
(5,2). You use normal MATLAB matrix subscripting to access values of individual pixels. For example,
the MATLAB code

I(2,15)

returns the value of the pixel at row 2, column 15 of the single-channel image I. Similarly, the
MATLAB code

RGB(2,15,:)

returns the color values of the pixel at row 2, column 15 of the multi-channel image RGB.

Spatial Coordinates
In a spatial coordinate system, locations in an image are positions on a continuous plane. Locations
are described in terms of Cartesian x and y coordinates (not row and column indices as in the pixel
indexing system). From this Cartesian perspective, an (x,y) location such as (3.2, 5.3) is meaningful
and distinct from the coordinate (5, 3).

The Image Processing Toolbox defines two types of spatial coordinate systems depending on the
frame of reference. Intrinsic coordinates specify locations with respect to the image's frame of
reference. World coordinates specify locations with respect to an external world observer.

2-63
2 Introduction

Intrinsic Coordinates

By default, the toolbox defines spatial image coordinates using the intrinsic coordinate system. This
spatial coordinate system corresponds to the image’s pixel indices. The intrinsic coordinates (x,y) of
the center point of any pixel are identical to the column and row indices for that pixel. For example,
the center point of the pixel in row 5, column 3 has spatial coordinates x = 3.0, y = 5.0. Be aware,
however, that the order of intrinsic coordinates (3.0, 5.0) is reversed relative to pixel indices (5,3).

The intrinsic coordinates of the center of every pixel are integer valued. The center of the upper left
pixel has intrinsic coordinates (1.0, 1.0). The center of the lower right pixel has intrinsic coordinates
(numCols, numRows), where numCols and numRows are the number of rows and columns in the
image. In general, the center of the pixel with pixel indices (m, n) falls at the point x = n, y = m in the
intrinsic coordinate system.

Because the size of each pixel in the intrinsic coordinate system is one unit, the boundaries of the
image have fractional coordinates. The upper left corner of the image is located at (0.5, 0.5), not at
(0, 0). Similarly, the lower right corner of the image is located at (numCols + 0.5, numRows + 0.5).

Several functions primarily work with spatial coordinates rather than pixel indices, but as long as you
are using the default spatial coordinate system (intrinsic coordinates), you can specify locations in
terms of their columns (x) and rows (y).

World Coordinates

A world coordinate system (also called a nondefault spatial coordinate system) relaxes several
constraints of the intrinsic coordinate system. In a world coordinate system, pixels can have any
length and width and they can be centered on any coordinate.

Some situations when you might want to use a world coordinate system include:

• When you perform a geometric transformation, such as translation, on an image and want to
preserve information about how the new position relates to the original position.
• When pixels are not square. For example, in magnetic resonance imaging (MRI), you can collect
data such that pixels have a higher sampling rate in one direction than an orthogonal direction.
• When you know how the extent of pixels aligns with positions in the real world. For example, in an
aerial photograph, every pixel might cover a specific 5-by-5 meter patch on the ground.
• When you want to reverse the direction of the x-axis or y-axis. This is a common technique to use
with geospatial data.

There are several ways to define a world coordinate system. You can use spatial referencing objects,
which encode the location of the image in a world coordinate system, the image resolution, and how
the image extent relates to intrinsic and world coordinates. You can also specify the maximum and

2-64
Image Coordinate Systems

minimum coordinate in each dimension. For more information, see “Define World Coordinate System
of Image” on page 2-66.

See Also

Related Examples
• “Shift X- and Y-Coordinate Range of Displayed Image” on page 2-69

More About
• “Images in MATLAB” on page 2-2
• “Define World Coordinate System of Image” on page 2-66

2-65
2 Introduction

Define World Coordinate System of Image


The world coordinate system is a continuous spatial coordinate system that specifies the location in
an image independently of the pixel indices of the image. For more information about coordinate
systems in Image Processing Toolbox, see “Image Coordinate Systems” on page 2-63.

Define Spatial Referencing Objects


To specify a world coordinate system for an image, you can use spatial referencing objects. Spatial
referencing objects define the location of the image in a world coordinate system and specify how the
extents of the image relate to intrinsic and world limits. You can use these objects to specify
nonsquare pixel dimensions by specifying a different image resolution in each dimension. Spatial
referencing objects also enable you to convert between coordinate systems.

Image Processing Toolbox uses includes two spatial referencing objects, imref2d and imref3d. The
table describes the properties of the 2-D spatial referencing object, imref2d. The 3-D spatial
referencing object, imref3d, includes these properties as well as corresponding properties for the Z
dimension.

Property Description
XWorldLimits Upper and lower bounds along the X dimension in world
coordinates (nondefault spatial coordinates)
YWorldLimits Upper and lower bounds along the Y dimension in world
coordinates (nondefault spatial coordinates)
ImageSize Size of the image, returned by the size function.
PixelExtentInWorldX Size of pixel along the X dimension
PixelExtentInWorldY Size of pixel along the Y dimension
ImageExtentInWorldX Size of image along the X dimension
ImageExtentInWorldY Size of image along the Y dimension
XIntrinsicLimits Upper and lower bounds along X dimension in intrinsic
coordinates (default spatial coordinates)
YIntrinsicLimits Upper and lower bounds along Y dimension in intrinsic
coordinates (default spatial coordinates).

To illustrate spatial referencing, this sample code creates a spatial referencing object associated with
a 2-by-2 image. The code specifies the pixel extent in the horizontal and vertical directions as 4 units/
pixel and 2 units/pixel, respectively. The object calculates the world limits, image extent in world
coordinates, and image extent in intrinsic coordinates.
R = imref2d([2 2],4,2)

R =

imref2d with properties:

XWorldLimits: [2 10]
YWorldLimits: [1 5]
ImageSize: [2 2]
PixelExtentInWorldX: 4
PixelExtentInWorldY: 2

2-66
Define World Coordinate System of Image

ImageExtentInWorldX: 8
ImageExtentInWorldY: 4
XIntrinsicLimits: [0.5000 2.5000]
YIntrinsicLimits: [0.5000 2.5000]

The figure illustrates how these properties map to elements of the image.

Specify Minimum and Maximum Image Extent


Image objects (such as obtained when displaying an image using imshow) define the world extent
using the XData and YData properties. Each property is a two-element vector that specifies the
center coordinate of the outermost pixels in that dimension. For more information, see Image.

By default, the intrinsic coordinates, world coordinates, and MATLAB axes coordinates of an image
coincide. For an image A, the default value of XData is [1 size(A,2)] and the default value of
YData is [1 size(A,1)]. For example, if A is a 100 row by 200 column image, the default XData is
[1 200] and the default YData is [1 100].

To define a nondefault world coordinate system for an image, specify the image XData and YData
properties with the range of coordinates spanned by the image in each dimension. When you do this,
the MATLAB axes coordinates become identical to the world coordinates and no longer coincide with
the intrinsic coordinates. For an example, see “Shift X- and Y-Coordinate Range of Displayed Image”
on page 2-69.

Note that the values in XData and YData are actually the coordinates for the center point of the
boundary pixels, not the outermost edge of the boundary pixels. Therefore, the actual coordinate
range spanned by the image is slightly larger. For instance, if XData is [1 200] and the image is 200
pixels wide, as for the intrinsic coordinate system, then each pixel is one unit wide and the interval in
X spanned by the image is [0.5 200.5]. Similarly, if XData is [1 200] and the image is 50 pixels wide,

2-67
2 Introduction

as for a nondefault world coordinate system, then each pixel is four units wide and the interval in X
spanned by the image is [–1 202].

You can set XData or YData such that the x-axis or y-axis is reversed. You would do this by placing
the larger value first. For example, set the XData to [200 1].

See Also
imwarp | imshow | imregtform | imregister | imref2d | imref3d

Related Examples
• “Shift X- and Y-Coordinate Range of Displayed Image” on page 2-69

More About
• “Image Coordinate Systems” on page 2-63

2-68
Shift X- and Y-Coordinate Range of Displayed Image

Shift X- and Y-Coordinate Range of Displayed Image

This example shows how to specify a nondefault world coordinate system by changing the XData and
YData properties of a displayed image.

Read an image.
I = imread("peppers.png");

Display the image using the intrinsic coordinate system, returning properties of the image in ax. Turn
on the axis to display the coordinate system.
figure
ax = imshow(I);
title("Image Displayed with Intrinsic Coordinates")
axis on

Check the range of the x- and y-coordinates, which are stored in the XData and YData properties of
ax. The ranges match the dimensions of the image.
xrange = ax.XData

xrange = 1×2

2-69
2 Introduction

1 512

yrange = ax.YData

yrange = 1×2

1 384

Change the range of the x- and y-coordinates. This example shifts the image to the right by adding
100 to the x-coordinates and shifts the image up by subtracting 25 from the y-coordinates.
xrangeNew = xrange + 100;
yrangeNew = yrange - 25;

Display the image, specifying the shifted spatial coordinates.


figure
axNew = imshow(I,"XData",xrangeNew,"YData",yrangeNew);
title("Image Displayed with Nondefault Coordinates");
axis on

Confirm that the ranges of the x- and y-coordinates of the new image match the shifted ranges
specified by xrangeNew and yrangeNew.

2-70
Shift X- and Y-Coordinate Range of Displayed Image

axNew.XData

ans = 1×2

101 612

axNew.YData

ans = 1×2

-24 359

See Also

More About
• “Image Coordinate Systems” on page 2-63
• “Define World Coordinate System of Image” on page 2-66

2-71
3

Reading and Writing Image Data

This chapter describes how to get information about the contents of a graphics file, read image data
from a file, and write image data to a file, using standard graphics and medical file formats.

• “Read Image Data into the Workspace” on page 3-2


• “Read Multiple Images from a Single Graphics File” on page 3-4
• “Read and Write 1-Bit Binary Images” on page 3-5
• “Write Image Data to File in Graphics Format” on page 3-6
• “DICOM Support in Image Processing Toolbox” on page 3-7
• “Read Metadata from DICOM Files” on page 3-10
• “Read Image Data from DICOM Files” on page 3-12
• “Write Image Data to DICOM Files” on page 3-14
• “Remove Confidential Information from DICOM File” on page 3-16
• “Create New DICOM Series” on page 3-20
• “Create Image Datastore Containing DICOM Images” on page 3-24
• “Create Image Datastore Containing Single and Multi-File DICOM Volumes” on page 3-26
• “Add and Modify ROIs of DICOM-RT Contour Data” on page 3-29
• “Create and Display 3-D Mask of DICOM-RT Contour Data” on page 3-34
• “Mayo Analyze 7.5 Files” on page 3-40
• “Interfile Files” on page 3-41
• “Implement Digital Camera Processing Pipeline” on page 3-42
• “Work with High Dynamic Range Images” on page 3-52
• “Display High Dynamic Range Image” on page 3-54
3 Reading and Writing Image Data

Read Image Data into the Workspace

This example shows to read image data from a graphics file into the MATLAB® workspace using the
imread function.

Read a truecolor image into the workspace. The example reads the image data from a graphics file
that uses JPEG format.

RGB = imread("football.jpg");

If the image file format uses 8-bit pixels, imread returns the image data as an m-by-n-by-3 array of
uint8 values. For graphics file formats that support 16-bit data, such as PNG and TIFF, imread
returns an array of uint16 values.

whos

Name Size Bytes Class Attributes

RGB 256x320x3 245760 uint8

Read a grayscale image into the workspace. The example reads the image data from a graphics file
that uses the TIFF format. imread returns the grayscale image as an m-by-n array of uint8 values.

I = imread("cameraman.tif");
whos

Name Size Bytes Class Attributes

I 256x256 65536 uint8


RGB 256x320x3 245760 uint8

Read an indexed image into the workspace. imread uses two variables to store an indexed image in
the workspace: one for the image and another for its associated colormap. imread always reads the
colormap into a matrix of class double, even though the image array itself may be of class uint8 or
uint16.

[X,map] = imread("trees.tif");
whos

Name Size Bytes Class Attributes

I 256x256 65536 uint8


RGB 256x320x3 245760 uint8
X 258x350 90300 uint8
map 256x3 6144 double

In these examples, imread infers the file format to use from the contents of the file. You can also
specify the file format as an argument to imread. imread supports many common graphics file
formats, such as the Graphics Interchange Format (GIF), Joint Photographic Experts Group (JPEG),
Portable Network Graphics (PNG), and Tagged Image File Format (TIFF) formats. For the latest
information concerning the bit depths and image formats supported, see imread and imformats
reference pages.

pep = imread("peppers.png","png");
whos

3-2
Read Image Data into the Workspace

Name Size Bytes Class Attributes

I 256x256 65536 uint8


RGB 256x320x3 245760 uint8
X 258x350 90300 uint8
map 256x3 6144 double
pep 384x512x3 589824 uint8

See Also
imread

More About
• “Image Types in the Toolbox” on page 2-3
• “Read Multiple Images from a Single Graphics File” on page 3-4
• “Write Image Data to File in Graphics Format” on page 3-6

3-3
3 Reading and Writing Image Data

Read Multiple Images from a Single Graphics File

This example shows how to read multiple images from a single graphics file. Some graphics file
formats allow you to store multiple images. You can read these images using format-specific
parameters with imread. By default, imread imports only the first image in the file.

Preallocate a 4-D array to hold the images to be read from a file.

mri = zeros([128 128 1 27],'uint8');

Read the images from the file, using a loop to read each image sequentially.

for frame=1:27
[mri(:,:,:,frame),map] = imread('mri.tif',frame);
end
whos

Name Size Bytes Class Attributes

frame 1x1 8 double


map 256x3 6144 double
mri 128x128x1x27 442368 uint8

See Also
imread

More About
• “Read Image Data into the Workspace” on page 3-2

3-4
Read and Write 1-Bit Binary Images

Read and Write 1-Bit Binary Images

This example shows how to read and write 1-bit binary images.

Check the bit depth of the graphics file containing a binary image, text.png. Note that the file
stores the binary image in 1-bit format.

info = imfinfo('text.png');
info.BitDepth

ans = 1

Read the binary image from the file into the workspace. When you read a binary image stored in 1-bit
format, imread represents the data in the workspace as a logical array.

BW = imread('text.png');
whos

Name Size Bytes Class Attributes

BW 256x256 65536 logical


ans 1x1 8 double
info 1x1 4566 struct

Write the binary image to a file in 1-bit format. If the file format supports it, imwrite exports a
binary image as a 1-bit image, by default. To verify this, use imfinfo to get information about the
newly created file and check the BitDepth field. When writing binary files, imwrite sets the
ColorType field to grayscale.

imwrite(BW,'test.tif');
info = imfinfo('test.tif');
info.BitDepth

ans = 1

See Also
imread | imwrite | imfinfo

More About
• “Image Types in the Toolbox” on page 2-3
• “Read Image Data into the Workspace” on page 3-2

3-5
3 Reading and Writing Image Data

Write Image Data to File in Graphics Format

This example shows how to write image data from the workspace to a file in one of the supported
graphics file formats using the imwrite function.

Load image data into the workspace. This example loads the indexed image X from a MAT-file,
trees.mat, along with the associated colormap map.

load trees
whos

Name Size Bytes Class Attributes

X 258x350 722400 double


caption 1x66 132 char
map 128x3 3072 double

Export the image data as a bitmap file using imwrite, specifying the name of the variable and the
name of the output file you want to create. If you include an extension in the filename, imwrite
attempts to infer the desired file format from it. For example, the file extension .bmp specifies the
Microsoft Windows Bitmap format. You can also specify the format explicitly as an argument to
imwrite.

imwrite(X,map,'trees.bmp')

Use format-specific parameters with imwrite to control aspects of the export process. For example,
with PNG files, you can specify the bit depth. To illustrate, read an image into the workspace in TIFF
format and note its bit depth.

I = imread('cameraman.tif');
s = imfinfo('cameraman.tif');
s.BitDepth

ans = 8

Write the image to a graphics file in PNG format, specifying a bit depth of 4.

imwrite(I,'cameraman.png','Bitdepth',4)

Check the bit depth of the newly created file.

newfile = imfinfo('cameraman.png');
newfile.BitDepth

ans = 4

See Also
imwrite

More About
• “Image Types in the Toolbox” on page 2-3
• “Read Image Data into the Workspace” on page 3-2

3-6
DICOM Support in Image Processing Toolbox

DICOM Support in Image Processing Toolbox


Digital Imaging and Communications in Medicine (DICOM) is a highly standardized imaging format
used to store and transmit medical imaging files across devices and networks. The DICOM format
combines image data with metadata that describes the patient, the imaging procedure, and the
spatial referencing information. The structure, storage, and transmission of DICOM files is governed
by the DICOM standard, available on the official DICOM website. The standard defines separate
Information Object Definitions (IODs) for modalities and applications such as computed tomography
(CT), magnetic resonance imaging (MRI), and radiotherapy (RT).

MATLAB provides support for reading and writing DICOM files, as well as working with DICOM
image data and metadata. You can extract and process image data using toolbox functions, and you
can search and update the metadata attributes. MATLAB is compatible with most DICOM IODs, and
can write new DICOM files for certain IODs that fully conform to the DICOM standard. Medical
Imaging Toolbox™ provides additional tools for advanced processing of DICOM files. To get started,
see “Import and Spatial Referencing” (Medical Imaging Toolbox).

Note MATLAB supports working with DICOM files. There is no support for working with DICOM
network capabilities.

Read and Display DICOM Image Data


Explore directories with multiple DICOM series using the DICOM Browser app or dicomCollection
function. Read 2-D image data from a DICOM series by using the dicomread function or 3-D image
data by using the dicomreadVolume function. For more information, see “Read Image Data from
DICOM Files” on page 3-12. You can view imported DICOM images using toolbox display functions
such as imshow and volshow.

You can process the image data you read from DICOM files using operations such as image filtering,
registration, segmentation, and labeling. For an example that shows how to segment and calculate
region properties in medical image data, see “Segment Lungs from 3-D Chest Scan” on page 12-19.

Work with DICOM Metadata


Import DICOM metadata using the dicominfo function, which creates a MATLAB structure
specifying the name and value of each metadata attribute in the file. For more information, see “Read
Metadata from DICOM Files” on page 3-10.

3-7
3 Reading and Writing Image Data

List all attributes of a metadata structure in the Command Window by using the dicomdisp function,
or search for specific attributes by name using the dicomfind function. Update specific attribute
values using the dicomupdate function, or remove all personally identifying information from a
DICOM metadata structure using the dicomanon function. For an example that shows how to
anonymize a DICOM file, see “Remove Confidential Information from DICOM File” on page 3-16.

When processing DICOM files, MATLAB uses a data dictionary file that defines standard DICOM
metadata attributes. You can view or update the current data dictionary file using the dicomdict
function, or search the data dictionary for specific attributes using the dicomlookup function.

Write New DICOM Files


Write images and metadata to new DICOM files using the dicomwrite function. The toolbox writes
the computed tomography, magnetic resonance, and secondary capture (a modality-independent
object definition) IODs with validation, which ensures that the new file contains all metadata
attributes required by the DICOM standard. For detailed information, see “Write Image Data to
DICOM Files” on page 3-14 and “Create New DICOM Series” on page 3-20.

Work with DICOM-RT Contour Data


The DICOM-RT Structure Set is an IOD specific to radiotherapy applications. The DICOM-RT
metadata includes contour data for ROIs, such as tumors and organs, used in radiation treatment
planning. You can extract ROI contour data to create a dicomContours object.

Plot contours, add or delete contours, and create a new DICOM-RT metadata structure using the
plotContour, addContour, deleteContour, and convertToInfo functions. For an example, see
“Add and Modify ROIs of DICOM-RT Contour Data” on page 3-29.

Use the createMask object function to convert contour data into a binary mask, such as to view
ROIs overlaid on image data or to label image pixels. For an example, see “Create and Display 3-D
Mask of DICOM-RT Contour Data” on page 3-34.

Prepare DICOM Files for Deep Learning Workflows


You can use medical image data to train deep learning networks to perform tasks such as image
denoising, segmentation, and registration. You can use imageDatastore or pixelLabelDatastore
objects that contain DICOM files to train a deep learning network. For details, see “Create Image
Datastore Containing DICOM Images” on page 3-24 and “Create Image Datastore Containing Single
and Multi-File DICOM Volumes” on page 3-26. For more information about how to use image
datastores to train deep learning networks, see “Preprocess Images for Deep Learning” on page 19-
6.

These examples show applications of deep learning in medical image analysis.

• “Unsupervised Medical Image Denoising Using CycleGAN” on page 19-192


• “Unsupervised Medical Image Denoising Using UNIT” on page 19-206
• “3-D Brain Tumor Segmentation Using Deep Learning” on page 19-149

3-8
DICOM Support in Image Processing Toolbox

Tips
• Use the dicomFile, medicalVolume, and medicalImage objects and their related functions for
advanced processing of DICOM files.

See Also
Apps
DICOM Browser

Functions
dicominfo | dicomread | dicomreadVolume | dicomwrite | dicomCollection

Objects
dicomFile | medicalVolume | medicalImage

More About
• “Read Metadata from DICOM Files” on page 3-10
• “Read Image Data from DICOM Files” on page 3-12
• “Create New DICOM Series” on page 3-20
• “Remove Confidential Information from DICOM File” on page 3-16
• “Write Image Data to DICOM Files” on page 3-14

3-9
3 Reading and Writing Image Data

Read Metadata from DICOM Files


DICOM files contain metadata that provide information about the image data, such as the size,
dimensions, bit depth, modality used to create the data, and equipment settings used to capture the
image. To read metadata from a DICOM file, use the dicominfo function. dicominfo returns the
information in a MATLAB structure where every field contains a specific piece of DICOM metadata.
You can use the metadata structure returned by dicominfo to specify the DICOM file you want to
read using dicomread — see “Read Image Data from DICOM Files” on page 3-12. If you just want
to view the metadata in a DICOM file, for debugging purposes, you can use the dicomdisp function.

The following example reads the metadata from a sample DICOM file that is included with the
toolbox.
info = dicominfo("CT-MONO2-16-ankle.dcm")

info =

Filename: [1x89 char]


FileModDate: '18-Dec-2000 11:06:43'
FileSize: 525436
Format: 'DICOM'
FormatVersion: 3
Width: 512
Height: 512
BitDepth: 16
ColorType: 'grayscale'
FileMetaInformationGroupLength: 192
FileMetaInformationVersion: [2x1 uint8]
MediaStorageSOPClassUID: '1.2.840.10008.5.1.4.1.1.7'
MediaStorageSOPInstanceUID: [1x50 char]
TransferSyntaxUID: '1.2.840.10008.1.2'
ImplementationClassUID: '1.2.840.113619.6.5'
.
.
.

Private DICOM Metadata


The DICOM specification defines many of these metadata fields, but files can contain additional fields,
called private metadata. This private metadata is typically defined by equipment vendors to provide
additional information about the data they provide.

When dicominfo encounters a private metadata field in a DICOM file, it returns the metadata
creating a generic name for the field based on the group and element tags of the metadata. For
example, if the file contained private metadata at group 0009 and element 0006, dicominfo creates
the name:Private_0009_0006. dicominfo attempts to interpret the private metadata, if it can. For
example, if the metadata contains characters, dicominfo processes the data. If it can't interpret the
data, dicominfo returns a sequence of bytes.

If you need to process a DICOM file created by a manufacturer that uses private metadata, and you
prefer to view the correct name of the field as well as the data, you can create your own copy of the
DICOM data dictionary and update it to include definitions of the private metadata. You will need
information about the private metadata that vendors typically provide in DICOM compliance
statements. For more information about updating DICOM dictionary, see “Create Your Own Copy of
DICOM Dictionary” on page 3-11.

3-10
Read Metadata from DICOM Files

Create Your Own Copy of DICOM Dictionary


MathWorks® uses a DICOM dictionary that contains definitions of thousands of standard DICOM
metadata fields. If your DICOM file contains metadata that is not defined this dictionary, you can
update the dictionary, creating your own copy that it includes these private metadata fields.

To create your own dictionary, perform this procedure:

1 Make a copy of the text version of the DICOM dictionary that is included with MATLAB. This file,
called dicom-dict.txt is located in matlabroot/toolbox/images/medformats or
matlabroot/toolbox/images/iptformats depending on which version of the Image
Processing Toolbox software you are working with. Do not attempt to edit the MAT-file version of
the dictionary, dicom-dict.mat.
2 Edit your copy of the DICOM dictionary, adding entries for the metadata. Insert the new
metadata field using the group and element tag, type, and other information. Follow the format of
the other entries in the file. The creator of the metadata (such as an equipment vendor) must
provide you with the information.
3 Save your copy of the dictionary.
4 Set MATLAB to use your copy of the DICOM dictionary, dicomdict function.

See Also
Apps
DICOM Browser

Functions
dicomread | dicominfo | dicomreadVolume | dicomdict | dicomdisp

More About
• “Read Image Data from DICOM Files” on page 3-12
• “Create New DICOM Series” on page 3-20
• “Remove Confidential Information from DICOM File” on page 3-16
• “Write Image Data to DICOM Files” on page 3-14

3-11
3 Reading and Writing Image Data

Read Image Data from DICOM Files


To read image data from a DICOM file, use the dicomread function. The dicomread function reads
files that comply with the DICOM specification but can also read certain common noncomplying files.

When using dicomread, you can specify the file name as an argument, as in the following example.
The example reads the sample DICOM file that is included with the toolbox.

I = dicomread("CT-MONO2-16-ankle.dcm");

You can also use the metadata structure returned by dicominfo to specify the file you want to read,
as in the following example.

info = dicominfo("CT-MONO2-16-ankle.dcm");
I = dicomread(info);

View DICOM Images


To view the image data imported from a DICOM file, use one of the toolbox image display functions
imshow or imtool. Note, however, that because the image data in this DICOM file is signed 16-bit
data, you must use the autoscaling syntax with either display function to make the image viewable.

imshow(I,DisplayRange=[])

See Also
Apps
DICOM Browser

Functions
dicomread | dicominfo | dicomreadVolume

More About
• “Read Metadata from DICOM Files” on page 3-10
• “Create New DICOM Series” on page 3-20

3-12
Read Image Data from DICOM Files

• “Remove Confidential Information from DICOM File” on page 3-16


• “Write Image Data to DICOM Files” on page 3-14

3-13
3 Reading and Writing Image Data

Write Image Data to DICOM Files


To write image data or metadata to a file in DICOM format, use the dicomwrite function. This
example writes the image I to the DICOM file ankle.dcm.
dicomwrite(I,"ankle.dcm")

Include Metadata with Image Data


When writing image data to a DICOM file, dicomwrite automatically includes the minimum set of
metadata fields required by the type of DICOM information object (IOD) you are creating.
dicomwrite supports the following DICOM IODs with full validation.

• Secondary capture (default)


• Magnetic resonance
• Computed tomography

dicomwrite can write many other types of DICOM data (such as X-ray, radiotherapy, or nuclear
medicine) to a file. However, dicomwrite does not perform any validation of this data.

You can also specify the metadata you want to write to the file by passing to dicomwrite an existing
DICOM metadata structure that you retrieved using dicominfo. In the following example, the
dicomwrite function writes the relevant information in the metadata structure info to the new
DICOM file.
info = dicominfo("CT-MONO2-16-ankle.dcm");
I = dicomread(info);
dicomwrite(I,"ankle.dcm",info)

Note that the metadata written to the file is not identical to the metadata in the info structure.
When writing metadata to a file, there are certain fields that dicomwrite must update. To illustrate,
look at the instance ID in the original metadata and compare it with the ID in the new file.
info.SOPInstanceUID

ans =

1.2.840.113619.2.1.2411.1031152382.365.1.736169244

Now, read the metadata from the newly created DICOM file, using dicominfo, and check the
SOPInstanceUID field.
info2 = dicominfo("ankle.dcm");

info2.SOPInstanceUID

ans =

1.2.841.113411.2.1.2411.10311244477.365.1.63874544

Note that the instance ID in the newly created file differs from the ID in the original file.

Specify Value Representation


Each field of DICOM metadata (known as an attribute or data element), includes a tag that identifies
the attribute, information about the length of the attribute, and the attribute data. The attribute

3-14
Write Image Data to DICOM Files

optionally includes a two-letter value representation (VR) that identifies the format of the attribute
data. For example, the format can be a single-precision binary floating point number, a character
vector that represents a decimal integer, or a character vector in the format of a date-time.

To include the VR in the attribute when using dicomwrite, specify the VR name-value argument as
"explicit". If you do not specify the VR, then dicomwrite infers the value representation from the
data dictionary.

The figure shows an attribute with and without the VR.

See Also
Apps
DICOM Browser

Functions
dicomread | dicomwrite | dicominfo | dicomuid | dicomanon

More About
• “Read Metadata from DICOM Files” on page 3-10
• “Read Image Data from DICOM Files” on page 3-12
• “Create New DICOM Series” on page 3-20
• “Remove Confidential Information from DICOM File” on page 3-16

3-15
3 Reading and Writing Image Data

Remove Confidential Information from DICOM File

This example shows how to anonymize a DICOM file.

When using a DICOM file as part of a training set, blinded study, or a presentation, you might want to
remove confidential patient information, a process called anonymizing the file. To do this, use the
dicomanon function.

Read an image from a DICOM file into the workspace.

dicomFile = 'CT-MONO2-16-ankle.dcm';
I = dicomread(dicomFile);

Display the image. Because the DICOM image data is signed 16-bit data, automatically scale the
display range so that the minimum pixel value is black and the maximum pixel value is white.

imshow(I,'DisplayRange',[])

3-16
Remove Confidential Information from DICOM File

Read the metadata from the DICOM file.

info = dicominfo(dicomFile);

The DICOM file in this example has already been anonymized for patient privacy. To create an
informative test DICOM file, set the PatientName with an artificial value using the Person Name
(PN) value representation.

info.PatientName = 'Doe^John';

Write the image with modified metadata to a new DICOM file.

dicomFileNotAnon = 'ankle_notAnon.dcm';
dicomwrite(I,dicomFileNotAnon,info);

3-17
3 Reading and Writing Image Data

Read the metadata from the non-anonymous DICOM file, then confirm that the patient name in the
new file is not anonymous.

infoNotAnon = dicominfo(dicomFileNotAnon);
infoNotAnon.PatientName

ans = struct with fields:


FamilyName: 'Doe'
GivenName: 'John'

To identify the series to which the non-anonymous image belongs, display the value of the
SeriesInstanceUID field.

infoNotAnon.SeriesInstanceUID

ans =
'1.2.840.113619.2.1.2411.1031152382.365.736169244'

Anonymize the file using the dicomanon function. The function creates a new series with new study
values, changes some of the metadata, and then writes the image to a new file.

dicomFileAnon = 'ankle_anon.dcm'

dicomFileAnon =
'ankle_anon.dcm'

dicomanon(dicomFileNotAnon,dicomFileAnon);

Read the metadata from the anonymized DICOM file.

infoAnon = dicominfo(dicomFileAnon);

Confirm that the patient name information has been removed.

infoAnon.PatientName

ans = struct with fields:


FamilyName: ''
GivenName: ''
MiddleName: ''
NamePrefix: ''
NameSuffix: ''

Confirm that the anonymous image belongs to a new study by displaying the value of the
SeriesInstanceUID field.

infoAnon.SeriesInstanceUID

ans =
'1.3.6.1.4.1.9590.100.1.2.100100869011374609916120487930268957064'

See Also
Apps
DICOM Browser

3-18
Remove Confidential Information from DICOM File

Functions
dicominfo | dicomread | dicomwrite | dicomuid | dicomanon

More About
• “Read Image Data from DICOM Files” on page 3-12
• “Read Metadata from DICOM Files” on page 3-10
• “Create New DICOM Series” on page 3-20
• “Write Image Data to DICOM Files” on page 3-14
• “DICOM Support in Image Processing Toolbox” on page 3-7

3-19
3 Reading and Writing Image Data

Create New DICOM Series

This example shows how to create a new DICOM series for a modified DICOM image.

In the DICOM standard, images can be organized into series. By default, when you write an image
with metadata to a DICOM file, dicomwrite puts the image in the same series. You typically only
start a new DICOM series when you modify the image in some way. To make the modified image the
start of a new series, assign a new DICOM unique identifier to the SeriesInstanceUID metadata
field.

Read an image from a DICOM file into the workspace.

I = dicomread('CT-MONO2-16-ankle.dcm');

Display the image. Because the DICOM image data is signed 16-bit data, automatically scale the
display range so that the minimum pixel value is black and the maximum pixel value is white.

imshow(I,'DisplayRange',[])

3-20
Create New DICOM Series

Read the metadata from the DICOM file.

info = dicominfo('CT-MONO2-16-ankle.dcm');

To identify the series an image belongs to, view the value of the SeriesInstanceUID metadata
field.

info.SeriesInstanceUID

ans =
'1.2.840.113619.2.1.2411.1031152382.365.736169244'

This example modifies the image by removing all of the text from the image. Text in the image
appears white. Find the maximum value of all pixels in the image, which corresponds to the white
text.

3-21
3 Reading and Writing Image Data

textValue = max(I(:));

The background of the image appears black. Find the minimum value of all pixels in the image, which
corresponds to the background.

backgroundValue = min(I(:));

To remove the text, set all pixels with the maximum value to the minimum value.

Imodified = I;
Imodified(Imodified == textValue) = backgroundValue;

View the processed image.

imshow(Imodified,'DisplayRange',[])

3-22
Create New DICOM Series

To write the modified image as a new series, you need a new DICOM unique identifier (UID).
Generate a new UID using the dicomuid function. dicomuid is guaranteed to generate a unique
UID.

uid = dicomuid

uid =
'1.3.6.1.4.1.9590.100.1.2.331387322612028437031085633731794306496'

Set the value of the SeriesInstanceUID field in the metadata associated with the original DICOM
file to the generated value.

info.SeriesInstanceUID = uid;

Write the modified image to a new DICOM file, specifying the modified metadata structure, info, as
an argument. Because you set the SeriesInstanceUID value, the written image is part of a new
series.

dicomwrite(Imodified,'ankle_newseries.dcm',info);

To verify this operation, view the image and the SeriesInstanceUID metadata field in the new file.

See Also
Apps
DICOM Browser

Functions
dicominfo | dicomread | dicomwrite | dicomuid | dicomanon

More About
• “Remove Confidential Information from DICOM File” on page 3-16
• “DICOM Support in Image Processing Toolbox” on page 3-7

3-23
3 Reading and Writing Image Data

Create Image Datastore Containing DICOM Images

This example shows how to create an image datastore from a collection of DICOM files containing 2-D
images.

Specify the location of a directory containing 2-D DICOM image files.

dicomDir = 'dog';

Create an imageDatastore, specifying the read function as a handle to the dicomread function.

dicomds = imageDatastore(dicomDir, ...


'FileExtensions','.dcm','ReadFcn',@(x) dicomread(x));

Read and display the first image in the datastore.

I = read(dicomds);

Display the image. The image has signed 16-bit data, so scale the display range to the pixel values in
the image.

imshow(I,[])

3-24
Create Image Datastore Containing DICOM Images

See Also
dicomread | imageDatastore

More About
• “Create Image Datastore Containing Single and Multi-File DICOM Volumes” on page 3-26

3-25
3 Reading and Writing Image Data

Create Image Datastore Containing Single and Multi-File


DICOM Volumes

This example shows how to create an image datastore containing volumetric DICOM data stored as a
single file and as multiple files.

Specify the location of a directory containing DICOM data. The data includes 2-D images, a 3-D
volume in a single file, and a multi-file 3-D volume.
dicomDir = fullfile(matlabroot,"toolbox","images","imdata");

Gather details about the DICOM files by using the dicomCollection function. This function returns
the details as a table, where each row represents a single study. For multi-file DICOM volumes, the
function aggregates the files into a single study.
collection = dicomCollection(dicomDir,IncludeSubfolders=true)

collection=5×14 table
StudyDateTime SeriesDateTime PatientName PatientSex
________________________ ________________________ ____________ __________ _

s1 {0×0 double } {0×0 double } "" "" "


s2 {[30-Apr-1993 11:27:24]} {[30-Apr-1993 11:27:24]} "Anonymized" "" "
s3 {[03-Oct-2011 19:18:11]} {[03-Oct-2011 18:59:02]} "" "M" "
s4 {[03-Oct-2011 19:18:11]} {[03-Oct-2011 19:05:04]} "" "M" "
s5 {[30-Jan-1994 11:25:01]} {0×0 double } "Anonymized" "" "

Create a temporary directory to store the processed DICOM volumes.


matFileDir = fullfile(pwd,"MATFiles");
if ~exist(matFileDir,"dir")
mkdir(matFileDir)
end

Loop through each study in the collection.


for idx = 1:size(collection,1)

Get the file names that comprise the study. For multi-file DICOM volumes, the file names are listed as
a string array.
dicomFileName = collection.Filenames(idx);
if length(dicomFileName) > 1
matFileName = fileparts(dicomFileName(1));
matFileName = split(matFileName,filesep);
matFileName = replace(strtrim(matFileName(end))," ","_");
else
[~,matFileName] = fileparts(dicomFileName);
end
matFileName = fullfile(matFileDir,matFileName);

Read the data. Try different read functions because the images have a different number of
dimensions and are stored in different formats.

1) Try reading the data of the study by using the dicomreadVolume function.

3-26
Create Image Datastore Containing Single and Multi-File DICOM Volumes

• If the data is a multi-file volume, then dicomreadVolume runs successfully and returns the
complete volume in a single 4-D array. You can add this data to the datastore.
• If the data is contained in a single file, then dicomreadVolume does not run successfully.

2) Try reading the data of the study by using the dicomread function.

• If dicomread returns a 4-D array, then the study contains a complete 3-D volume. You can add
this data to the datastore.
• If dicomread returns a 2-D matrix or 3-D array, then the study contains a single 2-D image. Skip
this data and continue to the next study in the collection.

try
data = dicomreadVolume(collection,collection.Row{idx});
catch ME
data = dicomread(dicomFileName);
if ndims(data)<4
% Skip files that are not volumes
continue;
end
end

For complete volumes returned in a 4-D array, write the data and the absolute file name to a MAT file.

save(matFileName,"data","dicomFileName");

End the loop over the studies in the collection.

end

Create an imageDatastore from the MAT files containing the volumetric DICOM data. Specify the
ReadFcn property as the helper function matRead, defined at the end of this example.

imdsdicom = imageDatastore(matFileDir,FileExtensions=".mat", ...


ReadFcn=@matRead);

Read the first DICOM volume from the image datastore.

[V,Vinfo] = read(imdsdicom);
[~,VFileName] = fileparts(Vinfo.Filename);

The DICOM volume is grayscale. Remove the singleton channel dimension by using the squeeze
function, then display the volume by using the volshow function.

V = squeeze(V);
volshow(V);

3-27
3 Reading and Writing Image Data

Supporting Functions

The matRead function loads data from the first variable of a MAT file with file name filename.

function data = matRead(filename)


inp = load(filename);
f = fields(inp);
data = inp.(f{1});
end

See Also
dicominfo | dicomread | dicomreadVolume | dicomCollection | imageDatastore | volshow

More About
• “Create Image Datastore Containing DICOM Images” on page 3-24
• “Preprocess Volumes for Deep Learning” (Deep Learning Toolbox)

3-28
Add and Modify ROIs of DICOM-RT Contour Data

Add and Modify ROIs of DICOM-RT Contour Data

This example shows how to extract and modify contour data stored in a DICOM-RT structure set.

A DICOM-RT structure set is a DICOM Information Object Definition (IOD) specific to radiotherapy.
The metadata of a DICOM-RT structure set file includes contour data for regions of interest (ROIs)
relevant to radiation treatment planning. This example extracts ROI contour data into a
dicomContours object, and uses object functions to display, add, and remove ROIs and save the
modified contours to a new metadata structure. These steps are useful for exploring DICOM-RT
structure files and for writing new metadata structures if you update the contour data in MATLAB®.
This example uses a data set defining contours for a human torso and synthetic tumor and organ
regions.

Read the DICOM metadata from a DICOM-RT structure set file by using the dicominfo function.

info = dicominfo("rtstruct.dcm");

Extract the ROI data from the DICOM metadata. The output is a dicomContours object that stores
the extracted ROI data.

contourData = dicomContours(info);

Display the ROIs property of the dicomContours object. The ROIs property is a table that contains
the extracted ROI data.

contourData.ROIs

ans=2×5 table
Number Name ContourData GeometricType Color
______ _________________ ___________ _____________ ____________

1 {'Body_Contour' } {90x1 cell} {90x1 cell} {3x1 double}


2 {'Tumor_Contour'} {21x1 cell} {21x1 cell} {3x1 double}

Plot the ROI contour data from the dicomContours object.

figure
plotContour(contourData)
axis equal

3-29
3 Reading and Writing Image Data

Load the MAT file countours.mat into the workspace. The cell array contours specifies 3-D
coordinates of the boundary points for a new ROI with 21 axial slices.

load("contours.mat")

To create an ROI sequence that contains the new ROI contour data, specify these attributes.

• ROI number
• User-defined name for the ROI
• Geometric type of the contours

The ROI number for the sequence must be unique, but the ROI name can be any user-defined name.
Because all points in the new ROI contour data are coplanar, and the last point is connected to the
first point, specify the geometric type as "Closed_planar". Specify the display color for plotting the
new ROI as blue-green.

number = 3;
name = "Organ";
geometricType = "Closed_planar";
color = [0; 127; 127];

Add the new ROI sequence to the ROIs property of the dicomContours object. The output is a
dicomContours object that contains the new ROI sequence, as well as the original ones.

contourData = addContour(contourData,number,name,contours,geometricType,color);
contourData.ROIs

3-30
Add and Modify ROIs of DICOM-RT Contour Data

ans=3×5 table
Number Name ContourData GeometricType Color
______ _________________ ___________ _____________ ____________

1 {'Body_Contour' } {90x1 cell} {90x1 cell} {3x1 double}


2 {'Tumor_Contour'} {21x1 cell} {21x1 cell} {3x1 double}
3 {'Organ' } {21x1 cell} {21x1 cell} {3x1 double}

Plot the updated contour data.


figure
plotContour(contourData)
axis equal

Delete the tumor ROI specified by ROI number 2.


contourData = deleteContour(contourData,2);
contourData.ROIs

ans=2×5 table
Number Name ContourData GeometricType Color
______ ________________ ___________ _____________ ____________

1 {'Body_Contour'} {90x1 cell} {90x1 cell} {3x1 double}


3 {'Organ' } {21x1 cell} {21x1 cell} {3x1 double}

Plot the final contour data.

3-31
3 Reading and Writing Image Data

figure
plotContour(contourData)
axis equal

Export the modified ROI data to a DICOM metadata structure.


info = convertToInfo(contourData);

Write the metadata to a DICOM-RT structure set file by using the dicomwrite function. If the
DICOM image associated with the ROI contour data is not available, specify the first input argument
value in the dicomwrite function as an empty array. Set the CreateMode name-value argument to
"copy" to copy the metadata to a new DICOM-RT structure set file, rtfile.dcm.
dicomwrite([],"rtfile.dcm",info,CreateMode="copy");

Verify that the new file includes the correct ROI contour data.
info_test = dicominfo("rtfile.dcm");
contour_test = dicomContours(info_test);
contour_test.ROIs

ans=2×5 table
Number Name ContourData GeometricType Color
______ ________________ ___________ _____________ ____________

1 {'Body_Contour'} {90x1 cell} {90x1 cell} {3x1 double}


3 {'Organ' } {21x1 cell} {21x1 cell} {3x1 double}

3-32
Add and Modify ROIs of DICOM-RT Contour Data

See Also
dicomContours | plotContour | addContour | deleteContour | convertToInfo | dicominfo

3-33
3 Reading and Writing Image Data

Create and Display 3-D Mask of DICOM-RT Contour Data

This example shows how to create a binary mask of ROI data stored in the metadata of a DICOM-RT
structure set file, and display the mask as an overlay on the underlying image data.

A DICOM-RT structure set is a DICOM Information Object Definition (IOD) specific to radiotherapy.
The metadata of a DICOM-RT structure set file includes contour data for regions of interest (ROIs)
relevant to radiation treatment planning. The metadata specifies ROI data as boundary coordinates,
with separate contours for each axial slice. This example converts the contours defining a brain
tumor ROI into a single 3-D binary mask. You can display the mask as an overlay by using the
volshow function or the Volume Viewer app.

Load Image Data and Metadata

This example uses a modified subset of data from the BraTS data set [1 on page 3-38] [2 on page 3-
38]. The data includes a brain MRI scan, as well as ROI contour data for a tumor within the scan
region.

Load the brain MRI image volume into the workspace. The image volume is stored in the MAT file
vol_001.mat.

load(fullfile(toolboxdir("images"),"imdata", ...
"BrainMRILabeled","images","vol_001.mat"));

Read the metadata from the DICOM-RT structure set file brainMRI_rt.dcm, attached to this
example as a supporting file. The file contains contour data for the brain tumor ROI.

info = dicominfo("brainMRI_rt.dcm");

Extract ROI Data from DICOM-RT Metadata

Create a dicomContours object containing the ROI contour data stored in the metadata structure
info.

rtContours = dicomContours(info);

Display the ROI information as a table. Each element in the ContourData cell array specifies the xy-
coordinates of the boundary points in one axial slice. The 'Brain Tumor' ROI consists of 56 closed
planar contours.

rtContours.ROIs

ans=1×5 table
Number Name ContourData GeometricType Color
______ _______________ ___________ _____________ ____________

1 {'Brain Tumor'} {56×1 cell} {56×1 cell} {3×1 double}

Plot the contours of the 'Brain Tumor' ROI by using the plotContours object function.

figure
plotContour(rtContours)

3-34
Create and Display 3-D Mask of DICOM-RT Contour Data

Create 3-D Binary Mask of ROI

Define the spatial referencing for the ROI by creating an imref3d object with the same number of
slices and pixel size as the brain MRI image data. The pixel size is 1-by-1-by-1 mm, meaning that each
pixel of the MRI image data corresponds to a section of the brain with length, width, and height of 1
mm each.

referenceInfo = imref3d(size(vol),1,1,1);

Create a 3-D logical mask of the 'Brain Tumor' ROI. This corresponds to the first ROI index in
rtContours, so specify the ROIindex input as 1. Specify the imref3d object to define spatial
information for the mask.

rtMask = createMask(rtContours,1,referenceInfo);

Display the mask by using the volshow function.

viewer = viewer3d(BackgroundColor="white",BackgroundGradient="off",CameraZoom=1.5);
maskDisp = volshow(rtMask,Parent=viewer);

3-35
3 Reading and Writing Image Data

Display ROI Mask as Image Overlay Using volshow

Display the binary tumor mask as an overlay on the MRI image data by using the volshow function.

viewer = viewer3d(BackgroundColor="white",BackgroundGradient="off",CameraZoom=1.5);
volDisp = volshow(vol,OverlayData=rtMask,Parent=viewer, ...
RenderingStyle="GradientOpacity",GradientOpacityValue=0.8, ...
Alphamap=linspace(0,0.2,256),OverlayAlphamap=0.8);

3-36
Create and Display 3-D Mask of DICOM-RT Contour Data

Display ROI Mask as Image Overlay Using Volume Viewer

You can also visualize the position of the ROI in the 3-D slice planes of the brain MRI image using the
Volume Viewer app. Load the brain as an intensity volume and the tumor mask as a labeled volume
into the Volume Viewer app by using the volumeViewer command.

volumeViewer(vol,rtMask);

To display the mask in the 3-D slice planes of the intensity volume, on the app toolstrip, select Slice
Planes. Use the scroll bars in the three slice panes to change the position of the slice planes in the 3-
D Volume window. For more details about viewing labeled volumes using the Volume Viewer app,
see “Explore 3-D Labeled Volumetric Data with Volume Viewer” on page 4-74.

3-37
3 Reading and Writing Image Data

References

[1] Isensee, Fabian, Philipp Kickingereder, Wolfgang Wick, Martin Bendszus, and Klaus H. Maier-
Hein. “Brain Tumor Segmentation and Radiomics Survival Prediction: Contribution to the BRATS
2017 Challenge.” In Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries,
edited by Alessandro Crimi, Spyridon Bakas, Hugo Kuijf, Bjoern Menze, and Mauricio Reyes,
10670:287–97. Cham: Springer International Publishing, 2018. https://doi.org/
10.1007/978-3-319-75238-9_25.

[2] Medical Segmentation Decathlon. "Brain Tumours." Tasks. Accessed May 10, 2018. http://
medicaldecathlon.com/.

The BraTS data set is provided by Medical Segmentation Decathlon under the CC-BY-SA 4.0 license.
All warranties and representations are disclaimed. See the license for details. MathWorks® has
modified the subset of data used in this example. This example uses the MRI and label data of one

3-38
Create and Display 3-D Mask of DICOM-RT Contour Data

scan from the original data set. The MRI image data has been converted to a MAT file, and the tumor
label data has been converted to a DICOM-RT structure set file.

See Also
dicomContours | createMask | plotContour | Volume Viewer

3-39
3 Reading and Writing Image Data

Mayo Analyze 7.5 Files


Analyze 7.5 is a file format, developed by the Mayo Clinic, for storing MRI data. An Analyze 7.5 data
set consists of two files:

• Header file (filename.hdr) — Provides information about dimensions, identification, and


processing history. You use the analyze75info function to read the header information.
• Image file (filename.img) — Image data, whose data type and ordering are described by the
header file. You use analyze75read to read the image data into the workspace.

Note The Analyze 7.5 format uses the same dual-file data set organization and the same file name
extensions as the Interfile format; however, the file formats are not interchangeable. To learn how to
read data from an Interfile data set, see “Interfile Files” on page 3-41.

The following example calls the analyze75info function to read the metadata from the Analyze 7.5
header file. The example then passes the info structure returned by analyze75info to the
analyze75read function to read the image data from the image file.

info = analyze75info("brainMRI.hdr");
X = analyze75read(info);

See Also
analyze75info | analyze75read

3-40
Interfile Files

Interfile Files
Interfile is a file format that was developed for the exchange of nuclear medicine image data. For
more information, see interfileinfo or interfileread.

An Interfile data set consists of two files:

• Header file — Provides information about dimensions, identification and processing history. You
use the interfileinfo function to read the header information. The header file has the .hdr
file extension.
• Image file — Image data, whose data type and ordering are described by the header file. You use
interfileread to read the image data into the workspace. The image file has the .img file
extension.

Note The Interfile format uses the same dual-file data set organization and the same file name
extensions as the Analyze 7.5 format; however, the file formats are not interchangeable. To learn how
to read data from an Analyze 7.5 data set, see “Mayo Analyze 7.5 Files” on page 3-40.

See Also
interfileinfo | interfileread

3-41
3 Reading and Writing Image Data

Implement Digital Camera Processing Pipeline

This example shows how to implement a camera processing pipeline that renders an RGB image from
a RAW Bayer-pattern color filter array (CFA) image.

Digital single-lens reflex (DSLR) cameras, and many modern phone cameras, can save data collected
from the camera sensor directly to a RAW file. Each pixel of RAW data is the amount of light captured
by the corresponding camera photosensor. The data depends on fixed characteristics of the camera
hardware, such as the sensitivity of each photosensor to a particular range of wavelengths of the
electromagnetic spectrum. The data also depends on camera acquisition settings, such as exposure
time, and factors of the scene, such as the light source.

Depending on the information available in the metadata for your image, there can be multiple ways to
implement a pipeline that yields aesthetically pleasing results. The example shows one sequence of
operations in a traditional camera processing pipeline using a subset of the metadata associated with
the RAW data.

1 Import the RAW file contents


2 Linearize the CFA image
3 Scale the CFA data to a suitable range
4 Apply white-balance adjustment
5 Demosaic the Bayer pattern
6 Convert the demosaiced image to the sRGB color space

The example also shows how to create an RGB image from a RAW image when you do not have the
RAW file metadata.

Read RAW File Contents

The RAW files created by a digital camera contain:

• A CFA image recorded by the photosensor of the camera


• Metadata, which contains all information needed to render an RGB image

Read a Bayer-pattern CFA image from a RAW file using the rawread function.

fileName = "colorCheckerTestImage.NEF";
cfaImage = rawread(fileName);

List information about the cfaImage variable and display the image.

whos cfaImage

Name Size Bytes Class Attributes

cfaImage 4012x6034 48416816 uint16

imshow(cfaImage,[])
title("RAW CFA Image")

3-42
Implement Digital Camera Processing Pipeline

Read the RAW file metadata using the rawinfo function.

cfaInfo = rawinfo(fileName);

The metadata contains several fields with information used to process the RAW image. For example,
display the contents of the ImageSizeInfo field. The metadata indicates that there are more
columns in the CFA photosensor array than in the visible image. The difference typically arises when
a camera masks a portion of the photosensor to prevent those sections from capturing any light. This
enables an accurate measure of the black level of the sensor.

disp(cfaInfo.ImageSizeInfo)

CFAImageSize: [4012 6080]


VisibleImageSize: [4012 6034]
VisibleImageStartLocation: [1 1]
PixelAspectRatio: 1
ImageRotation: 0
RenderedImageSize: [4012 6034]

The RAW file metadata also includes information that enable the linearization, black level correction,
white balance, and other processing operations needed to convert the RAW data to RGB.

colorInfo = cfaInfo.ColorInfo

colorInfo = struct with fields:


LinearizationTable: [0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

3-43
3 Reading and Writing Image Data

BlackLevel: [0 0 0 0]
WhiteLevel: [3827 3827 3827 3827]
CameraToXYZ: [3x3 double]
CameraTosRGB: [3x3 double]
CameraAsTakenWhiteBalance: [495 256 256 324]
D65WhiteBalance: [2.1900 0.9286 0.9286 1.0595]
ICCProfile: []

Linearize CFA Image

Many cameras apply nonlinear range compression to acquired signals before storing them in RAW
files. Cameras typically store this range compression as a lookup table.

Plot a representative subset of the values in the LinearizationTable field of the image metadata.
Values above the maxLinValue continue to increase linearly.

maxLinValue = 10^4;
linTable = colorInfo.LinearizationTable;
plot(0:maxLinValue-1,linTable(1:maxLinValue))
title("Linearization Table")

Processing operations on the digital camera processing pipeline are typically performed on linear
data. To generate linear data, you must reverse the nonlinear range compression. The rawread
function automatically performs this operation and returns linearized light values.

3-44
Implement Digital Camera Processing Pipeline

Scale Pixel Values to Suitable Range

Perform Black Level Correction

RAW images do not have a true black value. Even with the shutter closed, electricity flowing through
the sensors causes nonzero photon counts. Cameras use the value of the masked pixels to compute
the black level of the CFA image. To scale the image, subtract the measured black level from the CFA
image data.

RAW file formats can report this black level in different formats. The RAW file metadata for the image
in this example specifies black level as a vector, with one element per channel of the CFA image.
Other RAW file formats, such as DNG, specify black level as a repeated m-by-n matrix that starts at
the top left corner of the visible portion of the CFA.

Get the black level value of the RAW data from the BlackLevel metadata field.

blackLevel = colorInfo.BlackLevel;

To perform black level correction, first convert the black level vector to a 2-by-2 matrix. Omit this
step for RAW images that specify black level as a matrix.

blackLevel = reshape(blackLevel,[1 1 numel(blackLevel)]);


blackLevel = planar2raw(blackLevel);

Replicate the black level matrix to be the size of the visible image.

repeatDims = cfaInfo.ImageSizeInfo.VisibleImageSize ./ size(blackLevel);


blackLevel = repmat(blackLevel,repeatDims);

Subtract the black level matrix from the CFA image matrix.

cfaImage = cfaImage - blackLevel;

Clamp Negative Pixel Values

To correct for CFA data values less than the black-level value, clamp the values to 0.

cfaImage = max(0,cfaImage);

Scale Pixel Values

RAW file metadata often represent the white level as the maximum value allowed by the data type. If
this white level value is much higher than the highest intensity value in the image, then using this
white level value for scaling results in an image that is darker than it should be. To avoid this, scale
the CFA image using the maximum pixel value found in the image.

cfaImage = double(cfaImage);
maxValue = max(cfaImage(:))

maxValue = 3366

cfaImage = cfaImage ./ maxValue;

Adjust White Balance

White balance is the process of removing unrealistic color casts from a rendered image, such that it
appears closer to how human eyes would see the subject.

3-45
3 Reading and Writing Image Data

Get the white balance values from the metadata. There are two types of white balance metadata
available. This step of the example uses the CameraAsTakenWhiteBalance field scales the color
channels to balance the linear pixel values. The example uses the D65WhiteBalance field later in
the pipeline to adjust the colors to a D65 white point.

whiteBalance = colorInfo.CameraAsTakenWhiteBalance

whiteBalance = 1×4

495 256 256 324

Scale the multipliers so that the values of the green color channels are 1.

gLoc = strfind(cfaInfo.CFALayout,"G");
gLoc = gLoc(1);
whiteBalance = whiteBalance/whiteBalance(gLoc);

whiteBalance = reshape(whiteBalance,[1 1 numel(whiteBalance)]);


whiteBalance = planar2raw(whiteBalance);

Replicate the white balance matrix to be the size of the visible image.

whiteBalance = repmat(whiteBalance,repeatDims);
cfaWB = cfaImage .* whiteBalance;

Convert the CFA image to a 16-bit image.

cfaWB = im2uint16(cfaWB);

Demosaic

Convert the Bayer-encoded CFA image into a truecolor image by demosaicing. The truecolor image is
in linear camera space.

cfaLayout = cfaInfo.CFALayout;
imDebayered = demosaic(cfaWB,cfaLayout);
imshow(imDebayered)
title("Demosaiced RGB Image in Linear Camera Space")

3-46
Implement Digital Camera Processing Pipeline

Convert from Camera Color Space to RGB Color Space

The metadata enables two options for converting the image from the linear camera space to a
gamma-corrected RGB color space. You can use the CameraToXYZ metadata field to convert the data
from the linear camera space to the RGB color space through the XYZ profile connection space (PCS),
or you can use the CameraTosRGB metadata field to convert the image from the linear camera space
to the RGB color space directly.

Use Profile Connection Space Conversion

Get the transformation matrix between the linear camera space and the XYZ profile connection space
from the CameraToXYZ metadata field. This matrix imposes an RGB order. In other words:

[X,Y,Z]' = CameraToXYZ.*[R,G,B]'
cam2xyzMat = colorInfo.CameraToXYZ

cam2xyzMat = 3×3

1.5573 0.1527 0.0794


0.6206 0.9012 -0.2690
0.0747 -0.3117 1.4731

Normalize the cam2xyzMat matrix according to a D65 white point. Get the XYZ normalization values
from the D65WhiteBalance metadata field.

3-47
3 Reading and Writing Image Data

whiteBalanceD65 = colorInfo.D65WhiteBalance

whiteBalanceD65 = 1×4

2.1900 0.9286 0.9286 1.0595

The white balance multipliers are ordered according to the CFALayout metadata field. Reorder the
multipliers to match the row ordering of the cam2xyzMat matrix.
cfaLayout = cfaInfo.CFALayout;
wbIdx(1) = strfind(cfaLayout,"R");
gidx = strfind(cfaLayout,"G");
wbIdx(2) = gidx(1);
wbIdx(3) = strfind(cfaLayout,"B");

wbCoeffs = whiteBalanceD65(wbIdx);
cam2xyzMat = cam2xyzMat ./ wbCoeffs;

Convert the image from the linear camera space to the XYZ color space using the imapplymatrix
function. Then, convert the image to the sRGB color space and apply gamma correction using the
xyz2rgb function.
imXYZ = imapplymatrix(cam2xyzMat,im2double(imDebayered));
srgbPCS = xyz2rgb(imXYZ,OutputType="uint16");
imshow(srgbPCS)
title("sRGB Image Using PCS")

3-48
Implement Digital Camera Processing Pipeline

Use Conversion Matrix from RAW File Metadata

Convert the image from the linear camera space to the linear RGB color space using the
transformation matrix in the CameraTosRGB metadata field.

cam2srgbMat = colorInfo.CameraTosRGB;
imTransform = imapplymatrix(cam2srgbMat,imDebayered,"uint16");

Apply gamma correction to bring the image from the linear sRGB color space to the sRGB color
space.

srgbTransform = lin2rgb(imTransform);
imshow(srgbTransform)
title("sRGB Image Using Transformation Matrix")

Convert RAW to RGB Without Metadata

You can convert RAW data directly to an RGB image using the raw2rgb function. The raw2rgb
function provides comparable results as a custom processing pipeline tailored to your data and
acquisition settings. However, the raw2rgb function does not provide the same fine-tuned precision
and flexibility as a custom pipeline.

Convert the image data in the RAW file to the sRGB color space using the raw2rgb function.

3-49
3 Reading and Writing Image Data

srgbAuto = raw2rgb(fileName);
imshow(srgbAuto)
title("sRGB Image Using raw2rgb Function")

Compare the result of the raw2rgb conversion to that obtained using the full camera processing
pipeline using PCS and a transformation matrix.

montage({srgbPCS,srgbTransform,srgbAuto},Size=[1 3]);
title("sRGB Image Using PCS, raw2rgb Function, and Transformation Matrix (Left to Right)")

3-50
Implement Digital Camera Processing Pipeline

See Also
raw2planar | rawinfo | planar2raw | raw2rgb | rawread

More About
• “Process Images Using Image Batch Processor App with File Metadata” on page 2-27

3-51
3 Reading and Writing Image Data

Work with High Dynamic Range Images


Dynamic range refers to the range of brightness levels, from dark to light. The dynamic range of real-
world scenes can be quite high. High dynamic range (HDR) images attempt to capture the whole
tonal range of real-world scenes (called scene-referred), using 32-bit floating-point values to store
each color channel. HDR images contain a high level of detail, close to the range of human vision. The
toolbox includes functions for reading, creating, and writing HDR images. The toolbox also includes
tone-map operators for creating low dynamic range (LDR) images from HDR images for processing
and display.

Read HDR Image


To read an HDR image into the MATLAB workspace, use the hdrread function.

hdr_image = hdrread("office.hdr");

The output image hdr_image is an m-by-n-by-3 image of data type single.

whos

Name Size Bytes Class Attributes

hdr_image 665x1000x3 7980000 single

The range of data exceeds the range [0, 1] expected of LDR data.

hdr_range = [min(hdr_image(:)) max(hdr_image(:))]

hdr_range =

1×2 single row vector

0 3.2813

Display and Process HDR Image


Many toolbox functions assume that images of data type single and double are LDR images with
data in the range [0, 1]. Since HDR data is not bound to the range [0, 1] and can contain Inf values,
you must examine the behavior of each function carefully when working with HDR data.

• Some functions clip values outside the expected range before continuing to process the data.
These functions can return unexpected results because clipping causes a loss of information.
• Some functions expect data in the range [0, 1] but do not adjust the data before processing it.
These functions can return incorrect results.
• Some functions expect real data. If your HDR image contains values of Inf, then these functions
can return unexpected results.
• Some functions have no limitations for the range of input data. These functions accept and
process HDR data correctly.

To work with functions that require LDR data, you can reduce the dynamic range of an image using a
process called tone mapping. Tone mapping scales HDR data to the range [0, 1] while attempting to
preserve the appearance of the original image. Tone mapping functions such as tonemap,
tonemapfarbman, and localtonemap give more accurate results than simple linear rescaling such

3-52
Work with High Dynamic Range Images

as performed by the rescale function. However, note that tone mapping incurs a loss of subtle
information and detail.

To display HDR images, you must perform tone mapping. For an example, see “Display High Dynamic
Range Image” on page 3-54.

Create High Dynamic Range Image


To create an HDR image from a group of low dynamic range images, use the makehdr function. Note
that the low dynamic range images must be spatially registered and the image files must contain
EXIF metadata. Specify the low dynamic range images in a cell array.

hdr_image = makehdr(files);

Write High Dynamic Range Image to File


To write an HDR image from the workspace into a file, use the hdrwrite function.

hdrwrite(hdr,"filename");

See Also
hdrread | makehdr | hdrwrite | tonemap | tonemapfarbman | localtonemap

Related Examples
• “Display High Dynamic Range Image” on page 3-54

More About
• “Image Types in the Toolbox” on page 2-3

3-53
3 Reading and Writing Image Data

Display High Dynamic Range Image

This example shows how to display a high dynamic range (HDR) image. To view an HDR image, you
must first convert the data to a dynamic range that can be displayed correctly on a computer.

Read a high dynamic range (HDR) image using hdrread. If you try to display the HDR image, notice
that it does not display correctly.
hdr_image = hdrread("office.hdr");
imshow(hdr_image)

Convert the HDR image to a dynamic range that can be viewed on a computer, using the tonemap
function. This function converts the HDR image into an RGB image of data type uint8.
rgb = tonemap(hdr_image);
whos

Name Size Bytes Class Attributes

hdr_image 665x1000x3 7980000 single


rgb 665x1000x3 1995000 uint8

Display the RGB image.


imshow(rgb)

3-54
Display High Dynamic Range Image

See Also
tonemap | tonemapfarbman | localtonemap

More About
• “Work with High Dynamic Range Images” on page 3-52

3-55
4

Displaying and Exploring Images

This section describes the image display and exploration tools provided by the Image Processing
Toolbox software.

• “Choose Approach to Display 2-D and 3-D Images” on page 4-2


• “Display an Image in Figure Window” on page 4-10
• “Display Multiple Images” on page 4-14
• “View and Edit Collection of Images in Folder or Datastore” on page 4-19
• “Get Started with Image Viewer App” on page 4-25
• “Get Pixel Information in Image Viewer App” on page 4-33
• “Measure Distances and Areas Using Image Viewer App” on page 4-37
• “Adjust Image Contrast in Image Viewer App” on page 4-41
• “Crop Image Using Image Viewer App” on page 4-47
• “Get Started with Image Tool” on page 4-50
• “Explore 3-D Volumetric Data with Volume Viewer App” on page 4-60
• “Explore 3-D Labeled Volumetric Data with Volume Viewer” on page 4-74
• “Display Interior Labels by Clipping Volume Planes” on page 4-80
• “Display Interior Labels by Adjusting Volume Overlay Properties” on page 4-88
• “Display Volume Using Cinematic Rendering” on page 4-97
• “Remove Objects from Volume Display Using 3-D Scissors” on page 4-103
• “Display Translucent Volume with Advanced Light Scattering” on page 4-109
• “Display Large 3-D Images Using Blocked Volume Visualization” on page 4-114
• “View Image Sequences in Video Viewer” on page 4-122
• “Convert Multiframe Image to Movie” on page 4-127
• “Display Different Image Types” on page 4-128
• “Add Color Bar to Displayed Grayscale Image” on page 4-133
• “Print Images” on page 4-135
• “Manage Display Preferences” on page 4-136
4 Displaying and Exploring Images

Choose Approach to Display 2-D and 3-D Images


Image Processing Toolbox offers a variety of functions that display 2-D images, frames of image
sequences, 2-D slices of 3-D (volumetric) images, and 3-D renderings of volumetric images. You can
display images programmatically using functions or interactively using apps.

Display 2-D Images


You can display 2-D images captured by digital cameras in standard file formats such as JPEG, and
you can display the images that result from an image processing pipeline. You can also display
multiple 2-D images in the same figure.

2-D image display functions, such as imshow, support RGB, grayscale, and binary images.
Nonstandard formats include RAW data, medical DICOM images, high dynamic range (HDR) images,
and hyperspectral images. For 2-D images in a nonstandard format, you may need to convert the data
to a standard format, depending on the display approach that you choose.

When to Use Approach Sample Display


You want to display a single 2-D Use the imshow function.
image
See “Display an Image in Figure
Window” on page 4-10.

4-2
Choose Approach to Display 2-D and 3-D Images

When to Use Approach Sample Display


You want to display a single 2-D Use the Image Viewer app.
image and perform some
common image processing tasks See “Get Started with Image
interactively. Viewer App” on page 4-25.

You want to display multiple 2-D Use the montage function.


images next to each other. For
example, you might want to See “Display Multiple Images”
compare an original image and on page 4-14.
a processed version of the
image.

4-3
4 Displaying and Exploring Images

When to Use Approach Sample Display


You want to display two 2-D Use the imshowpair function.
images overlaid on each other.
For example, you want to check See “Display Multiple Images”
the alignment of two images on page 4-14.
using falsecolor or alpha
blending.

You want to display the Use the Image Browser app.


thumbnails of images in an
image datastore or a folder. See “View and Edit Collection of
Images in Folder or Datastore”
on page 4-19.

4-4
Choose Approach to Display 2-D and 3-D Images

When to Use Approach Sample Display


You want to display: Read the image as a
blockedImage object, then
• A single 2-D image that is display the image using
too large to fit in memory bigimageshow.
• A multiresolution (or
multilevel) 2-D image

You want to display Use the Hyperspectral Viewer


hyperspectral data, including app (requires Image Processing
color or false-color Toolbox Hyperspectral Imaging
representations of hyperspectral Library.)
images.
See “Explore Hyperspectral
Data in the Hyperspectral
Viewer” on page 20-22.

4-5
4 Displaying and Exploring Images

Display 2-D Slices and Frames


3-D volumes and image sequences are both collections of related 2-D images. The 2-D images (slices)
in a 3-D volume are related along the third spatial dimension, depth. The 2-D images (frames) in an
image sequence are related by a nonspatial dimension, such as time.

You can display individual slices of a 3-D volume or frames of an image sequence. To see changes in
successive slices or frames, you can display the slices or frames sequentially or alongside each other
in a montage. For 3-D volumes, you can also display cross-sections along the three orthogonal spatial
planes.

When to Use Approach Sample Display


You want to display: Index into the numeric array
representing the image
• A single frame of an image sequence or 3-D volumetric
sequence image, then display the indexed
• A single slice of a 3-D frame or slice by using the
volumetric image imshow function.

See “Perform an Operation on a


Sequence of Images” on page 2-
18.

You want to display: Use the montage function.

• Multiple frames of an image See “Display Multiple Images”


sequence next to each other on page 4-14.
• Multiple parallel slices of a
3-D volumetric image next to
each other

4-6
Choose Approach to Display 2-D and 3-D Images

When to Use Approach Sample Display


You want to display a slice of a Use the sliceViewer function.
3-D volume and change the
selected slice interactively using
a slider.

You want to animate an image Use the Video Viewer app.


sequence or successive slices of
a 3-D volume as a video. See “View Image Sequences in
Video Viewer” on page 4-122.

You want to display orthogonal Use the orthosliceViewer


slices of a 3-D volume along the function. You can adjust the
x, y, and z dimensions. slices interactively by dragging
the yellow lines that indicate the
planes of each cross-section.

4-7
4 Displaying and Exploring Images

When to Use Approach Sample Display


You want to display orthogonal Use the Volume Viewer app.
slices of a volumetric image or
labeled volumetric image, and See “Explore 3-D Volumetric
adjust the display properties Data with Volume Viewer App”
interactively. on page 4-60 and “Explore 3-D
Labeled Volumetric Data with
Volume Viewer” on page 4-74.

Display 3-D Renderings of 3-D Volumes


You can display a 3-D volumetric image in 3-D space. To improve the visibility of features within the
volume, you can move the camera position within the scene, adjust the transparency of the data, or
change rendering styles.

When to Use Approach Sample Display


You want to display a volumetric Use the Volume Viewer app.
image or labeled volumetric
image, and adjust the display See “Explore 3-D Volumetric
properties interactively. Data with Volume Viewer App”
on page 4-60 and “Explore 3-D
Labeled Volumetric Data with
Volume Viewer” on page 4-74.

You want to display a volumetric Use the volshow function.


image using a 3-D rendering. Adjust the camera controls,
You can optionally: scene lighting, and scene color
using the viewer3d function.
• Include pixel labels.
• Specify the rendering style See “Display Interior Labels by
of the data and overlays. Adjusting Volume Overlay
Properties” on page 4-88 and
• Interactively crop the “Display Interior Labels by
volume, or remove Clipping Volume Planes” on
background objects using 3- page 4-80.
D scissors.

4-8
Choose Approach to Display 2-D and 3-D Images

When to Use Approach Sample Display


You want to display: Read the image as a
blockedImage object, then
• A single 3-D image that is display the image using
too large to fit in memory. volshow.
• A multiresolution (or
multilevel) 3-D image. See “Display Large 3-D Images
Using Blocked Volume
Visualization” on page 4-114.

See Also

Related Examples
• “Display Volume Using Cinematic Rendering” on page 4-97
• “Explore Slices from 3-D Image Volume with Anisotropic Voxel Spacing” on page 6-75

More About
• “Display Different Image Types” on page 4-128
• “Choose Approach for Medical Image Visualization” (Medical Imaging Toolbox)

4-9
4 Displaying and Exploring Images

Display an Image in Figure Window

Overview
To display image data, use the imshow function. The following example reads an image into the
workspace and then displays the image in a figure window using the imshow function.

moon = imread("moon.tif");
imshow(moon)

4-10
Display an Image in Figure Window

You can also pass imshow the name of a file containing an image.
imshow("moon.tif");

This syntax can be useful for scanning through images. Note, however, that when you use this syntax,
imread does not store the image data in the workspace. If you want to bring the image into the
workspace, you must use the getimage function, which retrieves the image data from the current

4-11
4 Displaying and Exploring Images

image object. This example assigns the image data from moon.tif to the variable moon, if the figure
window in which it is displayed is currently active.

moon = getimage;

For more information about using imshow to display the various image types supported by the
toolbox, see “Display Different Image Types” on page 4-128.

Specifying the Initial Image Magnification


By default, imshow attempts to display an image in its entirety at 100% magnification (one screen
pixel for each image pixel). However, if an image is too large to fit in a figure window on the screen at
100% magnification, imshow scales the image to fit onto the screen and issues a warning message.

To override the default initial magnification behavior for a particular call to imshow, specify the
InitialMagnification parameter. For example, to view an image at 150% magnification, use this
code.

pout = imread("pout.tif");
imshow(pout,"InitialMagnification",150)

imshow attempts to honor the magnification you specify. However, if the image does not fit on the
screen at the specified magnification, imshow scales the image to fit. You can also specify the "fit"
as the initial magnification value. In this case, imshow scales the image to fit the current size of the
figure window.

To change the default initial magnification behavior of imshow, set the


ImshowInitialMagnification toolbox preference. To set the preference, open the Image
Processing Toolbox Preferences dialog box by calling iptprefs or, on the MATLAB Home tab, in the
Environment section, click Preferences.

When imshow scales an image, it uses interpolation to determine the values for screen pixels that do
not directly correspond to elements in the image matrix. For more information about specifying
interpolation methods, see “Resize an Image” on page 6-2.

Controlling the Appearance of the Figure


By default, when imshow displays an image in a figure, it surrounds the image with a gray border.
You can change this default and suppress the border using the "Border" name-value argument, as
shown in the following example.

imshow("moon.tif","Border","tight")

The following figure shows the same image displayed with and without a border.

4-12
Display an Image in Figure Window

The "Border" argument affects only the image being displayed in the call to imshow. If you want all
the images that you display using imshow to appear without the gray border, set the Image
Processing Toolbox "ImshowBorder" preference to "tight". You can also use preferences to
include visible axes in the figure. For more information about preferences, see iptprefs.

See Also

More About
• “Display Multiple Images” on page 4-14

4-13
4 Displaying and Exploring Images

Display Multiple Images


This section describes various ways you can view multiple images at the same time.

Display Multiple Images in Separate Figure Windows


The simplest way to display multiple images at the same time is to display them in separate figure
windows. MATLAB does not place any restrictions on the number of images you can display
simultaneously.

imshow always displays an image in the current figure. If you display two images in succession, the
second image replaces the first image. To view multiple figures with imshow, use the figure
command to explicitly create a new empty figure before calling imshow for the next image. The
following example views the first three frames in an array of grayscale images I.
imshow(I(:,:,:,1))
figure
imshow(I(:,:,:,2))
figure
imshow(I(:,:,:,3))

Display Multiple Images in a Montage


You can view multiple images as a single image object in a figure window using the montage
function. By default, montage scales the images, depending on the number of images and the size of
your screen, and arranges them to form a square. montage preserves the aspect ratio of the original
images. You can specify the size of the thumbnails using the ThumbnailSize name-value argument.

The images in the montage can be of different types and sizes. montage converts indexed images to
RGB using the colormap present in the file.

By default, the montage function does not include any blank space between the images in the
montage. You can specify the amount of blank space between the image using the BorderSize
parameter. You can also specify the color of the space between images using the BackgroundColor
parameter.

The following example shows how to view a sequence of images as a montage.

View Image Sequence as Montage

This example shows how to view multiple frames in a multiframe array at one time, using the
montage function. montage displays all the image frames, arranging them into a rectangular grid.
The montage of images is a single image object. The image frames can be grayscale, indexed, or
truecolor images. If you specify indexed images, they all must use the same colormap.

Create an array of truecolor images.


onion = imread('onion.png');
onionArray = repmat(onion, [ 1 1 1 4 ]);

Display all the images at once, in a montage. By default, the montage function displays the images in
a grid. The first image frame is in the first position of the first row, the next frame is in the second
position of the first row, and so on.

4-14
Display Multiple Images

montage(onionArray);

To specify a different number of rows and columns, use the 'size' parameter. For example, to
display the images in one horizontal row, specify the 'size' parameter with the value [1 NaN].
Using other montage parameters you can specify which images you want to display and adjust the
contrast of the images displayed.

montage(onionArray,'size',[1 NaN]);

4-15
4 Displaying and Exploring Images

Display Images Individually in the Same Figure


You can use the imshow function with the subplot function to display multiple images in a single
figure window. For additional options, see “Work with Image Sequences as Multidimensional Arrays”
on page 2-15.

You can display multiple images with different colormaps in the same figure using imshow with the
tiledlayout and nexttile functions.

Note The Image Viewer app does not support this capability.

Divide a Figure Window into Multiple Display Regions

subplot divides a figure into multiple display regions. Using the syntax subplot(m,n,p), you
define an m-by-n matrix of display regions and specify which region, p, is active.

For example, you can use this syntax to display two images side by side.

[X1,map1]=imread("forest.tif");
[X2,map2]=imread("trees.tif");
subplot(1,2,1), imshow(X1,map1)
subplot(1,2,2), imshow(X2,map2)

4-16
Display Multiple Images

Compare a Pair of Images


The imshowpair function displays a pair of images in the same figure window. This display can be
helpful when comparing images. imshowpair supports many visualization methods, including:

• falsecolor, in which the two images are overlaid in different color bands. Gray regions indicate
where the images have the same intensity, and colored regions indicate where the image intensity
values differ. RGB images are converted to grayscale before display in falsecolor.
• alpha blending, in which the intensity of the display is the mean of the two input images. Alpha
blending supports grayscale and truecolor images.
• checkerboard, in which the output image consists of alternating rectangular regions from the two
input images.
• the difference of the two images. RGB images are converted to grayscale.
• montage, in which the two images are displayed alongside each other. This visualization mode is
similar to the display using the montage function.

imshowpair uses optional spatial referencing information to display the pair of images.

4-17
4 Displaying and Exploring Images

See Also
imshow | imshowpair | montage

More About
• “Display an Image in Figure Window” on page 4-10
• “Display Different Image Types” on page 4-128

4-18
View and Edit Collection of Images in Folder or Datastore

View and Edit Collection of Images in Folder or Datastore

This example shows how to use the Image Browser app to view a collection of images, inspect and
select images to send to another app, and export a subset of the collection to an image datastore.

Open Images in Image Browser

Open the Image Browser app from the MATLAB® toolstrip. On the Apps tab, in the Image

Processing and Computer Vision section, click Image Browser .

Load images into the app by clicking Add. You can add images that are in a folder excluding
subfolders, in a folder including subfolders, or in an image datastore in the workspace. This example
works with the images in the sample image folder, imdata, and excludes the images in subfolders of
imdata. Therefore, select Folder of images. The app displays a file explorer window. Navigate to
the folder you want to view.

You can also open the app at the command line using the imageBrowser function, specifying the
name of the folder you want to view. For example, to view all the images in the sample image folder,
use this command:

imageBrowser(fullfile(matlabroot,'toolbox/images/imdata/'))

4-19
4 Displaying and Exploring Images

Image Browser displays thumbnails of all the images in the folder.

Add images to the collection by using the Add button.

Delete an image from the collection by right-clicking the image and selecting Remove Selected from
the context menu. You can select a group of images to delete by using Ctrl+click or Shift+click.
Image Browser does not delete images from the file system—it only removes the thumbnails from
the display.

Explore Images in the Collection

To adjust the size of the image thumbnails, use the Thumbnail Size slider in the app toolstrip.

4-20
View and Edit Collection of Images in Folder or Datastore

To get a closer look at an image, select the image and click Preview. You can also get a preview by
double-clicking an image. The app displays the image at a higher resolution in a Preview tab. To
explore the image displayed in the Preview tab, use the zoom and pan options visible over the top-
right corner of the image when you pause over the image.

4-21
4 Displaying and Exploring Images

You can select more than one image to preview by using Ctrl+click or Shift+click. The first image
appears in the preview panel. Advance and backtrack to the other selected images using the
navigation arrows under the image.

Launch Another App from Image Browser

You can use Image Browser to open a selected image in another app that supports the image data
type and file format:

• Image Viewer — Explore details of color, grayscale, and binary images


• Color Thresholder — Segment color images according to color values
• Image Segmenter — Segment color and grayscale images according to image features
• Image Region Analyzer — Measure properties of regions in binary images

For example, in the Image Browser app select the blobs binary image. In the app toolstrip, click
Image Region Analyzer. The Image Region Analyzer app opens with the blobs image.

4-22
View and Edit Collection of Images in Folder or Datastore

Export Images to the Workspace or an Image Datastore

To export the entire collection to an image datastore, click Export > All on the app toolstrip and
specify the name of the image datastore.

You can export a subset of the collection to an image datastore. Select the images, click Export >
Selected on the app toolstrip, and specify the name of the image datastore. Alternatively, you can
right-click one of the selected images and select Export to workspace from the context menu.

To export an individual image from the folder, right-click the image, select Export image to
workspace from the context menu, and specify the name of the workspace variable.

4-23
4 Displaying and Exploring Images

See Also
Image Browser | imageDatastore

More About
• “Getting Started with Datastore”

4-24
Get Started with Image Viewer App

Get Started with Image Viewer App


In this section...
“Open Image Viewer App” on page 4-26
“Navigate Image in Image Viewer App” on page 4-27
“Get Information About Image Data” on page 4-28
“Modify Image Data” on page 4-30
“Save and Export Results” on page 4-31

The Image Viewer app presents an integrated environment for displaying images and performing
common image processing tasks. The workflow for using Image Viewer typically involves a
combination of these steps:

• Open the app and read an image.


• Navigate the image.
• Get information about the image data.
• Modify the image data.
• Save and export results.

The figure shows an image displayed in Image Viewer, highlighting the contrast adjustment,
colormap, and pixel values view options.

Note You can access individual tools outside the Image Viewer app. To do so, display an image in a
figure window by using a function such as imshow, and then create one or more tools using toolbox
functions. You can build an image processing app with custom layout and behavior using a

4-25
4 Displaying and Exploring Images

combination of individual tools. For more information, see “Interactive Image Viewing and Processing
Tools” on page 5-2.

Open Image Viewer App


You can open the Image Viewer app programmatically or from the MATLAB toolstrip. In both cases,
you can load an image from a workspace variable or from a file. Image Viewer can open any file that
the imread function can read, plus formats such as DICOM, NITF, and RAW.

• Open the Image Viewer app from the command line by using the imageViewer function. Use this
function when you want to specify the image to display upon opening, as well as control various
aspects of the initial image display, such as the initial magnification, colormap, or display range,
by using name-value arguments. For example, this code opens Image Viewer and displays the
image with filename cameraman.tif at 200% of the original image dimensions.

imageViewer("cameraman.tif",InitialMagnification=200)
• Open the Image Viewer app from the Apps tab of the MATLAB toolstrip, under Image
Processing and Computer Vision. To import an image from a file, on the Viewer tab of the app
toolstrip, select Import Image > From File. To import image data from the workspace, click the

import button or select Import Image > From Workspace. Select the image you want to
import from the dialog box and click OK.

4-26
Get Started with Image Viewer App

Navigate Image in Image Viewer App


After you open Image Viewer and import an image, you can use navigation aids and tools to explore
the image in detail.

Action Tools and Navigation Aids Example


See image An image overview helps you navigate a
overview zoomed in image. To open the Overview
pane, on the Viewer tab of the app
toolstrip, select Image Overview.

The Overview pane displays the entire


image with a rectangle that indicates
which portion of the image is currently
visible in the main display pane.

You can pan the image in the main display


by moving the rectangle in the Overview
pane.

4-27
4 Displaying and Exploring Images

Action Tools and Navigation Aids Example


Magnify Zoom in and out using the mouse scroll
image wheel. You can also zoom in or out by
pressing Ctrl+Plus or Ctrl+Minus keys,
respectively.

When Image Viewer scales an image, it


uses interpolation to determine the values
for screen pixels that do not directly
correspond to pixel elements in the image
matrix. You can change the interpolation
method in the app toolstrip under
Interpolation Method. Select nearest
to view image data as a grid of pixel
elements, such as when viewing pixel
values. Select bilinear to view
smoothed pixel intensities without visible
pixel elements. For more information
about the interpolation methods used for
resizing images, see imshow.
Pan image To pan an image that is too large to fit in
the Image Viewer window, use the arrow
keys or the Overview pane. Alternatively,
in the top-right corner of the main display

pane, select Pan , and then drag the


image to move around.
Choose To enhance the visibility of features in
image grayscale and indexed images, you can
colormap change the image colormap on the
Colormap tab of the app toolstrip. You
can select a predefined MATLAB colormap
such as jet or parula, or a colormap
variable from the workspace. You can also
create a colormap by entering a MATLAB
expression that, when evaluated,
generates an m-by-3 colormap.

Note The Colormap tab is not available


for RGB images.

Get Information About Image Data


Image Viewer provides information about pixel values and other aspects of the image data. These
tools are all available on the Viewer tab of the app toolstrip.

4-28
Get Started with Image Viewer App

Information Description Example


Image View information about the image by
metadata selecting Image Metadata in the app
toolstrip.

The app provides basic information about


the width, height, number of color
channels, data type, and image type of the
image.

If you load an image from a file, then the


app also displays the file metadata. This
metadata is the same information
returned by the information function for
that file format, such as the imfinfo,
dicominfo, rawinfo, or nitfinfo
function.
Pixel When you pause the mouse pointer on a
information pixel, the bottom-left corner of the main
display pane shows the xy-coordinates and
value of that pixel. For more information,
see “Determine Individual Pixel Values in
Image Viewer” on page 4-33.

Pixel values When you zoom far enough into the image,
the app superimposes the pixel values on
the image in the main display pane. For
more information, see “View Pixel Values
in Image Region” on page 4-34.

To view pixel values, zoom in on the image


and, on the Viewer tab of the app
toolstrip, select Show Pixel Value. You
can manually zoom in until the pixel
values appear, or, in the drop-down menu
in the Zoom section of the app toolstrip,
select Zoom To Pixels.

When viewing pixel values, in the Viewer


tab of the app toolstrip, set Interpolation
Method to nearest.
Image height The bottom-right corner of the main
and width display pane shows the height and width
of the image, in pixels.

4-29
4 Displaying and Exploring Images

Information Description Example


Measure Measure the Euclidean distance between
distance or two pixels, or the area within a polygon.
area For more information, see “Measure
Distances and Areas Using Image Viewer
App” on page 4-37

Modify Image Data


Using Image Viewer, you can adjust image contrast and crop an image.

By default, when you close Image Viewer, the app does not save the modified image data. However,
you can export the modified image to a file or save the modified data in a workspace variable. For
more information, see “Save and Export Results” on page 4-31.

Modificatio Description Example


n
Adjust Adjust the contrast of grayscale images,
contrast on the Contrast tab of the app toolstrip,
by using an interactive histogram, or by
entering values for the window bounds,
window width, and window center. For
more information, see “Adjust Image
Contrast in Image Viewer App” on page 4-
41.

Note The Contrast tab is not available


for RGB or binary images.

4-30
Get Started with Image Viewer App

Modificatio Description Example


n
Crop image To crop an image to a rectangular region
of interest, on the Viewer tab of the app
toolstrip, select Crop Image. For more
information, see “Crop Image Using Image
Viewer App” on page 4-47.

Save and Export Results


Image Viewer enables you to export image data to the workspace or save image data to a file. The
exported image data includes any cropping or contrast adjustment changes you perform in the app.
The exported image does not include any colormap changes. The exported image has the same
number of channels and is of the same data type as the imported image. You can also export area and
distance measurements to the workspace.

Action Description Example


Export On the Viewer tab of the app toolstrip,
image as
workspace click the export button or select
variable Export > Export Image > To
Workspace.

In the Export To Workspace dialog box,


specify a workspace variable name, and
click OK.

Save image On the Viewer tab of the app toolstrip,


to file select Export > Export Image > To
Image File.

In the Save Image To File dialog box,


navigate to the location where you want to
save the image file. Specify a filename and
format, and click Save.

4-31
4 Displaying and Exploring Images

Action Description Example


Export On the Viewer tab of the app toolstrip,
measuremen select Export > Export Measurements.
ts to In the Export To Workspace dialog box,
workspace select which measurements to export and
specify a variable name for each
measurement. Then, click OK.

See Also
Image Viewer

Related Examples
• “Get Pixel Information in Image Viewer App” on page 4-33
• “Measure Distances and Areas Using Image Viewer App” on page 4-37
• “Adjust Image Contrast in Image Viewer App” on page 4-41
• “Crop Image Using Image Viewer App” on page 4-47

4-32
Get Pixel Information in Image Viewer App

Get Pixel Information in Image Viewer App


In this section...
“Determine Individual Pixel Values in Image Viewer” on page 4-33
“View Pixel Values in Image Region” on page 4-34

The Image Viewer app enables you to see the pixel values and coordinates of individual pixels and
groups of pixels.

Determine Individual Pixel Values in Image Viewer


Image Viewer shows the xy-location and value of an individual pixel from the image in the bottom-left
corner of the main display pane. The displayed information corresponds to the pixel that is currently
under the pointer. Image Viewer updates this information as you move the pointer over the image.

This figure shows Image Viewer displaying the location and value for a grayscale image pixel.

Note You can also obtain pixel value information from a figure created by imshow by using the
impixelinfo function.

4-33
4 Displaying and Exploring Images

For grayscale and indexed images, if you adjust the contrast using the DisplayRange name-value
argument of the imageViewer function or in the app, Image Viewer displays the original pixel value
and the adjusted pixel value. For more details about contrast adjustment, see “Adjust Image Contrast
in Image Viewer App” on page 4-41.

This figure shows Image Viewer displaying the location, the original imported value, and the contrast-
adjusted value for a grayscale image pixel.

The format of the pixel value depends on the image type.

Image Type Value Description


Intensity (grayscale) Numeric scalar representing the pixel intensity.

If you adjust the contrast using the DisplayRange name-value


argument of the imageViewer function or in the app, Image
Viewer displays the original pixel value and the adjusted pixel
value.
Indexed Numeric scalar representing an index into a colormap.

If you adjust the contrast using the DisplayRange name-value


argument of the imageViewer function or in the app, Image
Viewer displays the original pixel value and the adjusted pixel
value.
Binary Logical scalar (true or false).
Truecolor (RGB) Three-element vector [R G B] in which each element specifies
the intensity of a color channel.

View Pixel Values in Image Region


When you zoom in far enough on an image, you can view the pixel values overlaid on the image.

1 On the Viewer tab of the app toolstrip, in the Zoom section, open the drop-down and select Zoom
To Pixels. The app automatically zooms in until the individual pixels of the image are visible.
2 Select Show Pixel Values.
3 Set Interpolation Method to nearest to view the image data as a grid of pixel elements that
correspond to the value labels.

4-34
Get Pixel Information in Image Viewer App

To locate the magnified pixels in the overall image, open the Overview pane by, on the Viewer tab of
the app toolstrip, selecting Image Overview. The Overview pane displays the entire image with a
rectangle indicating which part of the image is visible in the main display. You can pan the main

display by dragging the rectangle in the Overview pane, or by selecting Pan in the top-right
corner of the main display pane and dragging the image in the main display pane. The pixel values
update as you pan the image.

This figure shows Image Viewer with pixel values visible in the main display pane, and the Overview
pane indicating the currently displayed part of the image.

The format of the pixel values depends on the image type.

4-35
4 Displaying and Exploring Images

Image Type Value Description


Intensity (grayscale) Numeric scalar I, representing the pixel intensity.

If you adjust the image contrast, the app displays the original
imported value I and the adjusted value Adj.
Indexed Numeric scalar I, representing an index into a color map.

If you adjust the image contrast, the app displays the original
imported value I and the adjusted value Adj.
Binary Logical scalar I, which is either true or false.
Truecolor (RGB) Three values, R, G, and B, for the three color channels.

Note You can also obtain pixel region information from a figure created by imshow by using the
impixelregion function.

See Also
Image Viewer | impixelregion | impixelinfo

Related Examples
• “Get Started with Image Viewer App” on page 4-25
• “Adjust Image Contrast in Image Viewer App” on page 4-41
• “Crop Image Using Image Viewer App” on page 4-47
• “Measure Distances and Areas Using Image Viewer App” on page 4-37

4-36
Measure Distances and Areas Using Image Viewer App

Measure Distances and Areas Using Image Viewer App


In this section...
“Determine Distance Between Pixels” on page 4-37
“Determine Area of Polygon” on page 4-38
“Hide or Delete Measurements” on page 4-39
“Export Distance and Area Measurements” on page 4-39

The Image Viewer app enables you to measure the Euclidean distance between two pixels, or the area
within a polygon. The app displays the line or polygon region of interest (ROI) and measurement
labels, in units of pixels. You can create multiple measurements and export measurements to the
workspace.

Determine Distance Between Pixels


1 Load an image into Image Viewer. For more information about opening images in the app, see
“Open Image Viewer App” on page 4-26.
2 On the Viewer tab of the app toolstrip, select Measure Distance.
3 To draw the measurement line, move the pointer to the image, so the pointer changes to a

crosshairs . Position the crosshairs at the first endpoint, left-click, and then drag the cross
hairs to the second endpoint and release the mouse button.
4 You can reposition individual endpoints or move the whole line by dragging the ends or middle
portion of the line, respectively.

This figure shows a distance line with the line endpoints and distance measurement label. The app
bases distance measurements on the image coordinates rather than the pixel grid, such that the value
includes decimals and is not rounded to the nearest whole pixel.

4-37
4 Displaying and Exploring Images

Note You can also measure distances in a figure created by imshow by using the imdistline
function.

Determine Area of Polygon


1 Load an image into Image Viewer. For more information about opening images in the app, see
“Open Image Viewer App” on page 4-26.
2 On the Viewer tab of the app toolstrip, select Measure Area.
3 To draw a polygon around the region of interest, move the pointer to the image, so the pointer

changes to a crosshairs . Left-click to add a vertex to the polygon. To finish the polygon,
double-click.
4 You can reposition individual vertices or translate the whole polygon by dragging a vertex or the
inside of the polygon, respectively. To delete a single vertex, right-click the vertex and select
Delete Vertex. To add a vertex, right-click a line segment and select Add Vertex.

This figure shows a polygon with the area measurement label in pixels. The app bases area
measurements on the image coordinates rather than the pixel grid, such that the value includes
decimals and is not rounded to include or exclude whole pixels.

4-38
Measure Distances and Areas Using Image Viewer App

Hide or Delete Measurements


You can control whether the main display pane shows distance and area measurements.

• To temporarily hide all measurements, on the Viewer tab of the app toolstrip, clear Show All.
Select Show All again to show the measurements.
• To permanently delete all measurements, from the app toolstrip, select Delete All.

Export Distance and Area Measurements


On the Viewer tab of the app toolstrip, select Export > Export Measurements. Use the Export to
Workspace dialog box to select the measurements to export and to specify the names for each
corresponding workspace variable.

4-39
4 Displaying and Exploring Images

When you click OK, the app creates the variables in the workspace.

See Also
Image Viewer | imdistline

Related Examples
• “Get Started with Image Viewer App” on page 4-25
• “Get Pixel Information in Image Viewer App” on page 4-33
• “Adjust Image Contrast in Image Viewer App” on page 4-41
• “Crop Image Using Image Viewer App” on page 4-47

4-40
Adjust Image Contrast in Image Viewer App

Adjust Image Contrast in Image Viewer App


In this section...
“Load Image into Image Viewer” on page 4-41
“Adjust Contrast and Brightness” on page 4-42
“View Imported and Adjusted Image Values” on page 4-44
“Export Contrast-Adjusted Image” on page 4-45

Image contrast refers to the difference in brightness between the lightest and darkest parts of a
grayscale image. An image lacks contrast when there is no sharp difference between the lowest and
highest pixel values. Contrast adjustment works by manipulating the display range, or the range of
pixel values that map to different display colors. Pixel values within the display range appear as
shades of gray. Pixels values less than or equal to the display range minimum appear black. Pixel
values greater than or equal to the display range maximum appear white.

This figure shows an image before and after contrast adjustment. In the original image, the pixel
values occupy fewer shades of gray, and the image has poor contrast. The adjusted image stretches
the pixel values to occupy the full range of colors from black to white, and the image has higher
contrast.

Load Image into Image Viewer


Load a grayscale image into the app by using the imageViewer function, or by selecting Import
Image from the app toolstrip. For more information about opening images in the app, see “Open
Image Viewer App” on page 4-26.

The Contrast tab of the app toolstrip displays information about the image contrast. The Data
Range section displays the minimum and maximum pixel values of the imported image. The Window
Bounds section displays the minimum, maximum, width, and center of the display range window, and
has a drop-down list of preset options.

By default, the drop-down value is Match Datatype Range, and the window bounds values match
the default display range for the data type. The default display range for integer data types is the full
range of that data type. For example, the default display range for uint8 is 0 to 255, such that 0
appears black and 255 appears white. The default display range for the single and double data
types is 0 to 1, such that 0 appears black and 1 appears white.

4-41
4 Displaying and Exploring Images

This figure shows the default contrast information for a uint8 image with a data range of 74 to 224.
The histogram shows a cluster of values in the middle of the default display range window.

Adjust Contrast and Brightness


To adjust the contrast of an image, modify the display range window using one of the methods listed
in this table.

4-42
Adjust Image Contrast in Image Viewer App

Description Image
Scale the display range automatically.

• Select Match Data Range to make


the display range equal to the data
range of the image.
• Select Eliminate Outliers (%)
to saturate an equal percentage of
pixels at the top and bottom of the
display range. By default, the tool
eliminates 2%, so the top 1% and the
bottom 1% of pixel values. This is
equivalent to the operation
performed by the stretchlim
function. Adjust the percentage by
entering a value in the field below
the drop down.
• Select Match Datatype Range to
restore the default display range
based on the data type of the image.

If you select an automatic preset, the


other fields in the Window Bounds
section update to reflect the new display
range window.
Interactively adjust the minimum and
maximum values of the display range in
the Histogram pane by dragging the
left and right edges of the window.
Change the center of the window by
dragging the interior of the window.

Enter specific values in the Window


Bounds section of the app toolstrip. You
can also define the minimum or
maximum value by clicking the dropper
button next to the corresponding field.
When you do this and move the pointer
over the image, the pointer becomes an
eye dropper shape. Position the eye
dropper shape over the pixel with the
value that you want to be the minimum
or maximum value and left-click.

This animation shows how to increase the image contrast by selecting Eliminate Outliers (%)
from the Window Bounds section of the app toolstrip. The display range window changes to 78 to

4-43
4 Displaying and Exploring Images

161. When you change the display range window, the image display updates in real time. You can
restore the default display range window by selecting Undo Changes.

View Imported and Adjusted Image Values


As you modify the display range window, Image Viewer calculates adjusted values for each pixel by
linearly scaling the display window to the data type range. This is similar to calling the imadjust
function and specifying the window bounds as the input contrast limits and the data type range as the
output contrast limits. For example, this figure shows a graphical representation of mapping imported
values between 78 and 161 to an adjusted value range of 0 to 255. Imported values less than or equal
to 78 map to an adjusted value of 0. Imported values greater than or equal to 161 map to 255.

4-44
Adjust Image Contrast in Image Viewer App

After you adjust the contrast, you can view both the imported and adjusted value for each pixel. When
you pause on the image, the bottom-left corner of the app window displays information about the
pixel that is under the cursor. After making contrast changes, the information includes the imported
and adjusted value for the pixel. If you zoom in far enough on the image, with Show Pixel Values
selected, Image Viewer displays the imported value I and adjusted value Adj overlaid on individual
pixels. For more details on zooming to view pixel values, see “Get Pixel Information in Image Viewer
App” on page 4-33.

Export Contrast-Adjusted Image


You can export the contrast-adjusted image to a file or to a workspace variable. On the Viewer tab of
the app toolstrip, select Export > Export Image, and select either To Workspace or To Image File.
The app saves the adjusted pixel values using the same data type as the imported image.

See Also
Image Viewer | imcontrast | imadjust | stretchlim

Related Examples
• “Contrast Enhancement Techniques” on page 8-69
• “Get Started with Image Viewer App” on page 4-25
• “Get Pixel Information in Image Viewer App” on page 4-33
• “Measure Distances and Areas Using Image Viewer App” on page 4-37

4-45
4 Displaying and Exploring Images

• “Crop Image Using Image Viewer App” on page 4-47

4-46
Crop Image Using Image Viewer App

Crop Image Using Image Viewer App


Cropping is the process of creating a new image from a part of an original image. You can use image
cropping to extract a region of interest, or to remove unwanted areas. The Image Viewer app enables
you to crop an image interactively and save the cropped area to a new image file or to the workspace.

Define Cropping Rectangle


To crop an image loaded in Image Viewer, on the Viewer tab of the app toolstrip, select Crop Image.
A cropping rectangle appears over the image in the main display pane. The rectangle displays the
dimensions of the cropped region, in pixels. Resize or move the cropping region by dragging the
adjustment points or the inside of the region, respectively. When you are satisfied with the cropping
region, you can either crop the image to the rectangle in the app or export the region without
cropping the view in the app.

This figure shows an image with the cropping tool enabled. You cannot use other tools until you apply
or cancel the cropping operation.

4-47
4 Displaying and Exploring Images

View Cropped Image in App


To crop the image in the app, in the app toolstrip, select Apply. The app displays the cropped image.
You cannot reverse the crop operation. To view the full image, you must reload the image from the
workspace or file.

You can save the cropped image to the workspace or in a new image file. On the Viewer tab of the
app toolstrip, select Export > Export Image, and select either To Workspace or To Image File.
The app saves the cropped image using the same data type as the original image.

Export Cropped Region


Alternatively, you can export the image region inside the cropped rectangle without applying the crop
in the app. With the cropping tool open, right-click inside the cropping rectangle and select Export
Selected Region, then Export Selected Region To Workspace or Export Selected Image To
File. The app saves only area inside the rectangle to the workspace or file, respectively. The full
image remains visible in the app.

See Also
Image Viewer | imcrop

4-48
Crop Image Using Image Viewer App

Related Examples
• “Get Started with Image Viewer App” on page 4-25
• “Get Pixel Information in Image Viewer App” on page 4-33
• “Measure Distances and Areas Using Image Viewer App” on page 4-37
• “Adjust Image Contrast in Image Viewer App” on page 4-41

4-49
4 Displaying and Exploring Images

Get Started with Image Tool


In this section...
“Open Image Tool and Display Image” on page 4-51
“Navigate Image in Image Tool” on page 4-52
“Get Information about Image Data” on page 4-53
“Modify Image Data” on page 4-56
“Save and Export Results” on page 4-58

The Image Tool presents an integrated environment for displaying images and performing common
image processing tasks. The workflow for using Image Tool typically involves a combination of these
steps:

• Open the tool and display an image on page 4-51


• Navigate the image on page 4-52
• Get information about the image data on page 4-53
• Modify the image data on page 4-56
• Save and export results on page 4-58

Note The Image Viewer app is recommended over Image Tool in most situations. Image Viewer
offers more functionality and is easier to use than the Image Tool. Use the Image Tool only when you
need to customize the figure containing the tool. For more information, see “Get Started with Image
Viewer App” on page 4-25.

The figure shows an image displayed in an Image Tool with many of the related tools open and active.

4-50
Get Started with Image Tool

Note You can also access individual tools outside the Image Tool. To do so, display an image in a
figure window using a function such as imshow, then create one or more tools using toolbox
functions. For example, you can build an image processing app with custom layout and behavior
using a combination of individual tools. For more information, see “Interactive Tool Workflow” on
page 5-6

Open Image Tool and Display Image


There are three ways to open the Image Tool. In each case, you can select an image from a variable in
the workspace or specify the name of the image file. Image Tool can open any file that can be read by
imread.

• You can open the Image Tool from the command line by using the imtool function. You can
control various aspects of the initial image display, such as the initial magnification, colormap, or
display range. For example, this code opens Image Tool and loads the image with filename
cameraman.tif.

imtool("cameraman.tif")

4-51
4 Displaying and Exploring Images

• You can open the Image Tool from the command line by using the
images.compatibility.imtool.r2023b.imtool function. Use this option to return the
figure containing the Image Tool. For example, this code opens Image Tool, loads the image with
filename cameraman.tif, and returns the figure containing the tool.

hTool = images.compatibility.imtool.r2023b.imtool("cameraman.tif")
• You can start a new Image Tool from within an existing Image Tool by using the New option from
the File menu.

Navigate Image in Image Tool

After you open Image Tool, the image appears in the figure window. Image Tool provides navigation
aids and tools that can help you explore images in detail.

Action Tools and Navigation Aids Depiction of Tool


See image The Overview tool displays the entire
overview image with a superimposed detail
rectangle that indicates which portion of
the image is currently visible in Image
Tool.

You can pan and zoom the image visible in


the Image Tool by moving and resizing the
detail rectangle in the Overview tool.

To get the current position and size of the


detail rectangle, right-click anywhere
inside the rectangle and select Copy
Position from the context menu. The tool
copies the position as a four-element
vector of the form [xmin ymin width
height] to the clipboard.

To print the view of the image displayed in


the Overview tool, select the Print to
Figure option from the Overview tool File
menu. See “Print Images” on page 4-135
for more information.

4-52
Get Started with Image Tool

Action Tools and Navigation Aids Depiction of Tool


Magnify To enlarge or shrink an image by
image specifying a scale factor, use the
Magnification option on the Tools menu.

To enlarge or shrink an image by clicking


the image, use the Zoom tool. The tool
centers the new view of the image on the
spot where you clicked.

Note You can also zoom by using the Ctrl


+Plus or Ctrl+Minus keys. These are the
Plus(+) and Minus(-) keys on the
numeric keypad of your keyboard.

When Image Tool scales an image, it uses


interpolation to determine the values for
screen pixels that do not directly
correspond to elements in the image
matrix. For more information about
interpolation methods used for resizing
images, see imresize.
Pan image To pan an image that is too large to fit in
the Image Tool, use scroll bars or the Pan
tool.
Choose To enhance the visibility of features in
image grayscale and indexed images, you can
colormap change the image colormap using the
Choose Colormap tool. You can select a
MATLAB colormap or a colormap variable
from the workspace. You can also create a
colormap by entering a MATLAB
command.

Image Tool does not provide a color bar.


To add a color bar, open the image in
another figure window. For more
information, see “Save and Export
Results” on page 4-58.

Get Information about Image Data

Image Tool provides tools that can help you get information about pixel values and other aspects of
the image data.

4-53
4 Displaying and Exploring Images

Tool Description Depiction of Tool


Pixel Get the (x, y) coordinates and the value of
Information a single pixel under the pointer.
tool
The format of the pixel information
depends on the image type.

To save the pixel location and value


information, right-click a pixel in the
image and choose the Copy pixel info
option. The Image Tool copies the x- and y-
coordinates and the pixel value to the
clipboard. You can paste this pixel
information into the workspace or another
application.
Display Determine the display range of grayscale
Range tool image data. The tool is not enabled for
RGB, indexed, or binary images.

The display range is of the form [cmin


cmax]. The cmin and cmax values specify
the pixel intensities that map to the first
and last colors in the colormap,
respectively.

4-54
Get Started with Image Tool

Tool Description Depiction of Tool


Pixel Region Get information about a group of pixels.
tool
When the magnified pixels are sufficiently
large, the Pixel Region tool overlays the
pixel value over each pixel. For RGB
images, this information includes three
numeric values, one for each color
channel. For indexed images, this
information includes the index value and
the associated RGB value. You can toggle
the display of numeric pixel values by
going to the Pixel Region tool Edit menu
and changing the Superimpose Pixel
Values option.

The Pixel Region tool includes its own


Pixel Information tool that enables you to
get the (x, y) coordinates and pixel value
of the pixel under the pointer in the Pixel
Region tool.

To save the position of the pixel region


rectangle with respect to the image, select
the Copy Position option from the Pixel
Region tool Edit menu. The tool copies the
position as a four-element vector of the
form [xmin ymin width height] to
the clipboard.

To print the view of the image displayed in


the Pixel Region tool, select the Print to
Figure option from the Pixel Region tool
File menu. See “Print Images” on page 4-
135 for more information.

4-55
4 Displaying and Exploring Images

Tool Description Depiction of Tool


Distance tool Measure the Euclidean distance between
two pixels.

The Distance tool displays a line between


the pixels and a label with the Euclidean
distance between endpoints of the line.
The tool specifies the distance in data
units determined by the XData and YData
properties, which is pixels, by default.

To customize aspects of the Distance tool


appearance and behavior, use the Distance
tool context menu. For example, you can
toggle the distance label on and off,
change the color of the distance line, and
constrain the tool to the vertical or
horizontal direction.

To create variables in the workspace for


the endpoint locations and distance
information, right-click the Distance tool
and choose the Export to Workspace
option from the context menu.
Image Get information about image and image
Information file metadata.
tool
The Image Information tool always
provides basic information about the
width, height, data type, and image type.
For grayscale and indexed images, this
information includes the minimum and
maximum intensity values.

If you select an image to open in Image


Tool by specifying a filename, then the
Image Information tool also displays
image metadata. This metadata is the
same information returned by the
imfinfo function or the dicominfo
function.

Modify Image Data

Image Tool provides tools that can help you adjust the image contrast and crop an image.

By default, when you close Image Tool, the tool does not save the modified image data. However, you
can export the modified image to a file or save the modified data in a workspace variable. For more
information, see “Save and Export Results” on page 4-58.

4-56
Get Started with Image Tool

Tool Description Depiction of Tool


Adjust Adjust the contrast of an image by setting
Contrast tool a window over a histogram of pixel values.
Open the Adjust Contrast tool by clicking
Adjust Contrast in the Image Tool
toolbar.

You can adjust the display window by


clicking and dragging on the edges of the
red window, or by entering values in the
boxes for the minimum, maximum, center,
and width of the window. Alternatively, let
the Adjust Contrast tool scale the display
range automatically. Select Match data
range to make the display range equal to
the data range of the image. Select
Eliminate outliers to saturate an equal
percentage of pixels at the top and bottom
of the display range. By default, the tool
eliminates 2%, so the top 1% and the
bottom 1% of pixel values.
Window/ Adjust the contrast of an image by
Level tool interacting with the image.

To start the Window/Level tool, click the


Window/Level in the Image Tool
toolbar or select the Window/Level
option from the Image Tool Tools menu.

Move the pointer over the image. The


pointer changes to the Window/Level
cursor . To adjust the image contrast,
click and drag the mouse horizontally. To
adjust image brightness, click and drag
the mouse vertically.

4-57
4 Displaying and Exploring Images

Tool Description Depiction of Tool


Crop Image Crop an image to a rectangular region of
tool interest. Start the Crop Image tool by

clicking Crop Image in the Image


Tool toolbar or by selecting Crop Image
from the Image Tool Tools menu.

When you move the pointer over the


image, the pointer changes to cross hairs

. Define the rectangular crop region by


clicking and dragging the mouse over the
image. You can move or resize the crop
rectangle using the mouse. Or, if you want
to crop a different region, move to the new
location and click and drag again.

When you are finished defining the crop


region, double-click the left mouse button
to crop the image.

Save and Export Results


Image Tool enables you to export image data to the workspace, save the image data to file, and open
images in a new figure window. When saving and exporting image data, changes to the display range
are not preserved. If you would like to preserve your changes, then use the imcontrast function.

Destination Procedure
Create workspace There are three ways to create a workspace variable from the image data
variable in Image Tool.

• You can use the Export to Workspace option on the Image Tool File
menu.
• If you open the tool by using the imtool function and specify a handle
to the tool, then you can use the getimage function and specify the
handle to the tool. For example, this code opens the image with file
name moon.tif in an Image Tool then exports the image to the
variable moon.

hTool = imtool("moon.tif");
moon = getimage(hTool);
• If you open the tool without specifying a handle to the tool, then you
can use the getimage function and specify a handle to the image
object within the figure. For example, this code opens the image with
file name moon.tif in an Image Tool then exports the image to the
variable moon.

imtool("moon.tif")
moon = getimage(imgca);

4-58
Get Started with Image Tool

Destination Procedure
Save to file Use the Save Image tool by selecting the Save as option on the Image Tool
File menu. This tool enables you to navigate your file system to determine
where to save the file, specify the file name, and choose the file format.

Open new figure Select the Print to Figure option from the File menu. You can use this
window figure window to see a color bar and print the image. For more
information, see “Add Color Bar to Displayed Grayscale Image” on page 4-
133 and “Print Images” on page 4-135.

See Also
Image Viewer

Related Examples
• “Get Started with Image Viewer App” on page 4-25

4-59
4 Displaying and Exploring Images

Explore 3-D Volumetric Data with Volume Viewer App

This example shows how to look at and explore 3-D volumetric data using the Volume Viewer app.
Volume rendering is highly dependent on defining an appropriate alphamap so that structures in your
data that you want to see are opaque and structures that you do not want to see are transparent. To
illustrate, the example loads an MRI study of the human head into the Volume Viewer app and
explores the data using the visualization capabilities of the Volume Viewer.

• Load Volume Data into the Volume Viewer on page 4-60


• View Volume Data in Volume Viewer on page 4-62
• Adjust View of Volume Data in Volume Viewer on page 4-65
• Refine View with Rendering Editor on page 4-67
• Save Volume Viewer Rendering and Camera Configuration Settings on page 4-72

Load Volume Data into Volume Viewer

This part of the example shows how to load volumetric data into the Volume Viewer app.

Load the MRI data of a human head from a MAT-file into the workspace. The MRI data is a modified
subset of the BraTS data set [1 on page 4-72]. This operation creates a variable named D in your
workspace that contains the volumetric data. Use the squeeze command to remove the singleton
dimension from the data.

load mri
D = squeeze(D);
whos

Name Size Bytes Class Attributes

D 128x128x27 442368 uint8


map 89x3 2136 double
siz 1x3 24 double

Open the Volume Viewer app. From the MATLAB® toolstrip, open the Apps tab and under Image

Processing and Computer Vision, click . You can also open the app using the volumeViewer
command.

volumeViewer(D)

Load volumetric data into the Volume Viewer app. Click Import Volume. You can load an image by
specifying its file name or load a variable from the workspace. If you have volumetric data in a
DICOM format that uses multiple files to represent a volume, you can specify the DICOM folder
name. Choose the Import From Workspace option because the data is in the workspace.

4-60
Explore 3-D Volumetric Data with Volume Viewer App

Select the workspace variable in the Import Volume dialog box and click OK.

To start a new instance of the Volume Viewer app, click New Session.

4-61
4 Displaying and Exploring Images

When you create a new session, this option deletes all the data currently in the viewer. Click Yes to
create the new session.

View Volume Data in Volume Viewer

In this part of the example, you decide how you want to view your data. Volume Viewer offers several
options.

View the volume in the Volume Viewer app. Volume Viewer displays the data as a volume and as slice
planes. The MRI data displayed as a volume is recognizable as a human head. To explore the volume,
zoom in and out on the image using the mouse wheel or a right-click. You can also rotate the volume
by positioning the cursor in the image window, pressing and holding the mouse, and moving the
cursor. You are always zooming or rotating around the center of the volume. The position of the axes
in the Orientation Axes window reflects the spatial orientation of the image as you rotate it.

4-62
Explore 3-D Volumetric Data with Volume Viewer App

To change the background color used in the display window, open the 3D-Display tab, click
Background Color, and select a color.

4-63
4 Displaying and Exploring Images

View the MRI data as a set of slice planes. Click Slice Planes. You can also zoom in and rotate this
view of the data. Use the scroll functionality in the three slice windows to view individual slices in any
of the planes.

4-64
Explore 3-D Volumetric Data with Volume Viewer App

Continue using Volume Viewer capabilities until you achieve the best view of your data.

Adjust View of Volume Data in Volume Viewer

In this part of the example, you adjust the view of the volumetric data in the Volume Viewer app.

Click 3D Volume to return to viewing your data as a volume and use the capabilities of Volume
Viewer to get the best visualization of your data. Volume Viewer provides several spatial referencing
options that let you get a more realistic view of the head volume. (The head appears flattened in the
default view.)

• Specify Dimensions—In the 3D-Display tab you can specify the dimensions in the X, Y, and Z
directions.

4-65
4 Displaying and Exploring Images

• Upsample To Cube—Volume Viewer calculates a scale factor that makes the number of samples
in each dimension the same as the largest dimension in the volume. This setting can make
anisotropically sampled data appear scaled more correctly.

4-66
Explore 3-D Volumetric Data with Volume Viewer App

• Use Volume Metadata—If the data file includes resolution data in its metadata, Volume Viewer
uses the metadata and displays the volume true to scale. Volume Viewer selects the Use Volume
Metadata option, by default, if metadata is present.

Refine View with Rendering Editor

This part of the example describes how to use the Volume Viewer Rendering Editor to modify your
view of the data. Using the Rendering Editor, you can:

• Choose the overall viewing approach: Volume Rendering, Maximum Intensity Projection, or
Isosurface.
• Modify the alphamap by specifying a preset alphamap, such as ct-bone, or by customizing the
alphamap using the Opacity/Image Intensity curve.
• Specify the colormap used in the visualization.
• Specify the lighting in the visualization.

Choose the Viewing Approach

Volume Viewer offers several viewing approaches for volumes. The Maximum Intensity Projection
(MIP) option looks for the voxel with the highest intensity value for each ray projected through the
data. MIP can be useful for revealing the highest intensity structure within a volume. You can also
view the volume as an Isosurface.

4-67
4 Displaying and Exploring Images

Specify the Alphamap

Volume rendering is highly dependent on defining an appropriate alphamap so that structures you
want to see are opaque and structures you do not want to see are transparent. The Rendering Editor
lets you define the opacity and transparency of voxel values throughout the volume. You can choose
from a set of alphamap presets that automatically achieve certain well-defined effects. For example,
to define a view that works well with CT bone data, select the CT Bone rendering preset. By default,
Volume Viewer uses a simple linear relationship, but each preset changes the curve of the plot to give
certain data value more or less opacity. You customize the alphamap by manipulating the plot directly.

4-68
Explore 3-D Volumetric Data with Volume Viewer App

4-69
4 Displaying and Exploring Images

Specify the Colormap

Color, when used with voxel intensity and opacity, is an important element of volume visualization. In
the Rendering Editor, you can select from a list of predefined MATLAB colormaps, such as jet and
parula. You can also specify a custom colormap that you have defined as a variable in the
workspace. You can also change the colormapping for any colormap by using the interactive color bar
scale. For example, to lighten the color values in a visualization, click on the color bar to create a
circular slider. To modify the colormapping so that more value map to lighter colors, move the slider
to the left. You can create multiple sliders on the color bar to define other colormappings.

4-70
Explore 3-D Volumetric Data with Volume Viewer App

4-71
4 Displaying and Exploring Images

Modify Lighting Effects

By default, Volume Viewer uses certain lighting effects on the volume display. You can turn off these
lighting effects by clearing the Lighting check box.

Save Volume Viewer Rendering and Camera Configuration Settings

After working in Volume Viewer to achieve the best view of your data, you can save your rendering
settings and camera configuration. Volume Viewer stores this information in a structure, called
config by default, that it writes to the workspace. You can use this structure with the viewer3d and
volshow functions to recreate the view you achieved in Volume Viewer.

To save rendering and camera configuration settings, click Export and click the Rendering and
Camera Configurations option.

Specify the names for the structure that Volume Viewer creates or accept the default names
(sceneConfig) and (objectConfig), then click OK.

References

[1] Medical Segmentation Decathlon. "Brain Tumours." Tasks. Accessed May 10, 2018. http://
medicaldecathlon.com/.

The BraTS data set is provided by Medical Segmentation Decathlon under the CC-BY-SA 4.0 license.
All warranties and representations are disclaimed. See the license for details. MathWorks® has

4-72
Explore 3-D Volumetric Data with Volume Viewer App

modified the subset of data used in this example. This example uses the MRI data of one scan from
the original data set, saved to a MAT file.

See Also
Volume Viewer | volshow

Related Examples
• “Explore 3-D Labeled Volumetric Data with Volume Viewer” on page 4-74
• “Explore Slices from 3-D Image Volume with Anisotropic Voxel Spacing” on page 6-75

4-73
4 Displaying and Exploring Images

Explore 3-D Labeled Volumetric Data with Volume Viewer

This example shows how to explore 3-D labeled volumetric data using the Volume Viewer app. Using
the app, you can view the labeled volume by itself or view the labels as an overlay on the intensity
volume. To illustrate, the example loads an intensity volume that shows the human brain with labeled
areas that show the location and type of tumors found in the brain.

Load Labeled Volume and Intensity Volume

Load the MRI intensity data of a human brain and the labeled volume from MAT files into the
workspace. The MRI data is a modified subset of the BraTS data set [1]. This operation creates two
variables in the workspace: vol and label.

datadir = fullfile(toolboxdir("images"),"imdata","BrainMRILabeled");
load(fullfile(datadir,"images","vol_001.mat"));
load(fullfile(datadir,"labels","label_001.mat"));
whos

Name Size Bytes Class Attributes

datadir 1x1 230 string


label 240x240x155 8928000 uint8
vol 240x240x155 17856000 uint16

Open the Volume Viewer app. From the MATLAB toolstrip, open the Apps tab and under Image
Processing and Computer Vision, click Volume Viewer. You can also open the app using the
volumeViewer command.

volumeViewer(vol,label)

Load the labeled volume into the Volume Viewer app. Click Import Labeled Volume to open the
labeled volume. You can load an image in a file name or load a variable from the workspace. (If you
have volumetric data in a DICOM format that uses multiple files to represent a volume, you can
specify the DICOM folder name.) For this example, choose the Import From Workspace option.

4-74
Explore 3-D Labeled Volumetric Data with Volume Viewer

Select the workspace variable associated with the labeled volume data in the Import from
Workspace dialog box and click OK.

4-75
4 Displaying and Exploring Images

View Labeled Volume in Volume Viewer

View the labeled volume in the Volume Viewer app. By default, the Volume Viewer displays the data
as a labeled volume but you can also view it as slice planes. To explore the labeled volume, zoom in
and out on the image using the mouse wheel or a right-click. You can also rotate the volume by
positioning the cursor in the image window, pressing and holding the mouse, and moving the cursor.
You are always zooming or rotating around the center of the volume. The position of the axes in the
Orientation Axes window reflects the spatial orientation of the labeled volume as you rotate it.

Refine your view of the labeled volume using the Rendering Editor part of the Volume Viewer. You
can use the Rendering Editor to view certain labels and hide others, change the color, and modify
the transparency of the labels. Label 000 is the background and it is typically not visible. When you
select the background label, the Show Labels check box is clear, by default. To select all the visible
labels at once, select the background label and click Invert Selection. To change the color of a label
(or all the labels), select the label in the Rendering Editor and specify the color in the Color
selector. You can also control the transparency of the label using the Opacity slider.

Embed Labeled Volume with Intensity Volume

In this part of the example, you view the labeled volume and the intensity volume in the Volume
Viewer at the same time. Viewing the labeled volume overlaid on the intensity volume can provide
context for the data.

4-76
Explore 3-D Labeled Volumetric Data with Volume Viewer

With the labeled volume already in the Volume Viewer, load the intensity volume into the app. Click
Import Volume and choose the Import From Workspace option.

Select the workspace variable associated with the intensity volume in the Import from Workspace
dialog box and click OK.

4-77
4 Displaying and Exploring Images

The Volume Viewer displays the labeled volume over the intensity volumetric data. By default, the
Volume Viewer displays the label data and the intensity data as volumes but you can also view it as
slice planes. To explore the labeled and intensity volumes, zoom in and out using the mouse wheel or
a right-click. You can also rotate the volumes by positioning the cursor in the image window, pressing
and holding the mouse, and moving the cursor. To view only the intensity volume, and hide the
labeled volume, click View Volume.

Refine your view of the labeled volume and the intensity volume using options in the Rendering
Editor. To only view the labeled volume, and hide the intensity volume, clear the Embed Labels in
Volume check box. In the Labels area of the Rendering Editor, you can select any of the labels and
change its color or transparency. Label 000 is the background. By default, the background is set to
black and is not visible. The Show Labels check box is clear. To select all the labels, click on the
background and then click Invert Selection. If the intensity volume is visible, you can modify the
threshold value and transparency using the sliders in the Volume area of the Rendering Editor.

4-78
Explore 3-D Labeled Volumetric Data with Volume Viewer

References

[1] Medical Segmentation Decathlon. "Brain Tumours." Tasks. Accessed May 10, 2018. http://
medicaldecathlon.com/.

The BraTS data set is provided by Medical Segmentation Decathlon under the CC-BY-SA 4.0 license.
All warranties and representations are disclaimed. See the license for details. MathWorks® has
modified the subset of data used in this example. This example uses the MRI data of one scan from
the original data set, saved to a MAT file.

See Also
Volume Viewer | volshow

Related Examples
• “Explore 3-D Volumetric Data with Volume Viewer App” on page 4-60
• “Explore Slices from 3-D Image Volume with Anisotropic Voxel Spacing” on page 6-75

4-79
4 Displaying and Exploring Images

Display Interior Labels by Clipping Volume Planes

This example shows how to interactively clip a quadrant of a volumetric image to expose a surface
within the volume.

Display Volumetric Image in Scene

Load a volumetric image and corresponding labels.

dataDir = fullfile(toolboxdir("images"),"imdata","BrainMRILabeled");
load(fullfile(dataDir,"images","vol_001.mat"));
load(fullfile(dataDir,"labels","label_001.mat"));

Create a 3-D scene with a white background color. Display the volume in the scene.

viewer = viewer3d(BackgroundColor="white",BackgroundGradient="off");
hVolume = volshow(vol,Parent=viewer);

Display Quadrant of Volume

To interactively add a clipping plane, click the clipping plane button in the axes toolbar. The
Viewer3d object adds the clipping plane in a default location and orientation in the center of the
scene.

4-80
Display Interior Labels by Clipping Volume Planes

You can manipulate the location of the plane using the context menu of the plane, or by interacting
directly with the plane on the screen.

• Translate the plane along the normal vector by clicking and dragging the plane.
• Rotate the plane by clicking and dragging on the push pin on the plane. While rotating, you can
hold down Shift to snap to 45 degree angles.
• Interactively remove a clipping plane by clicking "Remove Plane" in the context menu.

All interactive controls can be restricted using the ClippingInteractions property. You can turn
off clipping interactions entirely using the Interactions property.

Add another clipping plane by clicking the clipping plane button in the axes toolbar again. The colors
of the planes indicate that the planes are aligned with a particular axis. Oblique planes are black and
white. You can add up to six planes in the scene.

4-81
4 Displaying and Exploring Images

Remove Quadrant of Volume

By default, the Viewer3d object hides all regions that are clipped by any clipping plane. The two
perpendicular clipping planes cause the scene to display only a quadrant of the volume.

To hide only regions that are clipped by all clipping planes, set the ClipIntersection property of
the Viewer3d object to false. For these two perpendicular clipping planes, the scene removes the
quadrant of the volume that intersects both planes.

viewer.ClipIntersection = "on";

4-82
Display Interior Labels by Clipping Volume Planes

Add Surface Representing Labeled Object

Create a binary mask for the class with numeric label 3.

label3 = label==3;

Create a Surface object encompassing the binary mask and display the surface in the scene. The
scene applies the clipping planes to the surface.

surf = images.ui.graphics3d.Surface(viewer,Data=label3,Alpha=1);

4-83
4 Displaying and Exploring Images

Remove Clipping Planes from Surface

By default, the Viewer3d object applies clipping planes to all objects that are children of the object.
To apply different clipping planes to different children, set the GlobalClipping property of the
viewer to "off". The volume and the surface no longer appear clipped because their
ClippingPlanes property is empty.

viewer.GlobalClipping = "off";

4-84
Display Interior Labels by Clipping Volume Planes

The ClippingPlanes property of the viewer stores the position and orientation of the global
clipping planes. Copy these values to the ClippingPlanes property of the volume. The volume is
clipped, but the surface object remains intact.

hVolume.ClippingPlanes = viewer.ClippingPlanes;

4-85
4 Displaying and Exploring Images

Refine Volume Clipping Planes

Notice that the two interactive planes are no longer visible. When GlobalClipping is turned off,
you can interactively manipulate the clipping planes for only one object at a time. The active object
for interactive clipping planes is the object specified by the CurrentObject property of the viewer.
By default, the current object is the most recent object added to the scene, which in this case is the
surface.

To continue modifying the clipping planes of the volume interactively, set the CurrentObject
property of the viewer as the volume. Interacting with the clipping planes impacts the volume and not
the surface.

viewer.CurrentObject = hVolume;

4-86
Display Interior Labels by Clipping Volume Planes

See Also
viewer3d | volshow | Surface

Related Examples
• “Display Interior Labels by Adjusting Volume Overlay Properties” on page 4-88
• “Explore 3-D Labeled Volumetric Data with Volume Viewer” on page 4-74
• “Remove Objects from Volume Display Using 3-D Scissors” on page 4-103
• “Display Volume Using Cinematic Rendering” on page 4-97

4-87
4 Displaying and Exploring Images

Display Interior Labels by Adjusting Volume Overlay Properties

This example shows how to reveal labels on the interior of a labeled volumetric image by adjusting
the transparency of the volume.

By default, the volshow function renders the exterior of a volume as opaque. This visualization is not
very informative when there are labels inside of the volume. To see the labels inside the volume, you
can increase the transparency of the volume or change the rendering style. Using these approaches,
you can see the interior labels and still recognize the details of the exterior surface of the volume.

Display Discrete Labels with Label Overlay Rendering

Load a volumetric image and corresponding labels. The volume has three classes and a background
label, 0.

dataDir = fullfile(toolboxdir("images"),"imdata","BrainMRILabeled");
load(fullfile(dataDir,"images","vol_001.mat"));
load(fullfile(dataDir,"labels","label_001.mat"));

Display the volume with the labels as an overlay, and zoom in on the volume. The volume appears
opaque with the default visualization settings. When you have labeled data, the default value of the
OverlayRenderingStyle property is "LabelOverlay". This value is appropriate when the labels
are discrete, such as these labels.

hVolume = volshow(vol,OverlayData=label);
viewer = hVolume.Parent;
viewer.CameraZoom = 2;

4-88
Display Interior Labels by Adjusting Volume Overlay Properties

Apply Gradient Opacity Rendering Style to Volume

Set the RenderingStyle property of the volume to "GradientOpacity". The gradient opacity
rendering style applies a localized transparency when sequential voxels have similar intensities. The
effect is that regions in the volume with uniform intensity more transparent, while voxels that have
large gradients in intensity around them are relatively more opaque. This rendering style is
particularly helpful when you want to view inside of a volume while still visualizing the defining
features of the volume. You can adjust the amount of transparency using the
GradientOpacityValue property.

hVolume.RenderingStyle = "GradientOpacity";

4-89
4 Displaying and Exploring Images

The volume is more transparent. However, both the colored labels and the colored background
appear through the volume. To make the labels more distinguishable, change the background color to
white and remove the background gradient.

viewer.BackgroundColor="white";
viewer.BackgroundGradient="off";

4-90
Display Interior Labels by Adjusting Volume Overlay Properties

Adjust Volume and Label Transparency

The labels inside the volume are now visible, but they are hard to distinguish from the rest of the
volume. Make the volume data more transparent by decreasing the Alphamap property.

hVolume.Alphamap = linspace(0,0.2,256);

4-91
4 Displaying and Exploring Images

You can also increase the opacity of the labels by increasing the OverlayAlphamap property.

hVolume.OverlayAlphamap = 0.8;

4-92
Display Interior Labels by Adjusting Volume Overlay Properties

Only two labels are easily visible. The third label in yellow is mostly hidden within the red and green
labels, and only a few voxels of yellow are visible on the surface. You can hide individual labels by
adjusting the OverlayAlphamap property. Set all elements of OverlayAlphamap, except for the
element corresponding to the yellow label, to 0. The yellow label corresponds to the value 2 in the
label data and is the third element of the alphamap.

hVolume.OverlayAlphamap = [0 0 0.8 0];

4-93
4 Displaying and Exploring Images

Display Continuous Labels with Gradient Overlay Rendering

Instead of discrete labels, suppose you want to overlay continuous data over the volume. Continuous
data could be statistical voxel measurements such as heatmaps and activation maps, or image data
from other imaging modalities. In this case, suitable values of the OverlayRenderingStyle are
"VolumeOverlay" and "GradientOverlay".

Load another image modality and overlay it with the original volume data using the
"GradientOverlay" overlay rendering style. Specify the same scene display settings as for the
earlier sections of the example.

vol2 = load(fullfile(dataDir,"images","vol_003.mat"));
vol2 = vol2.vol;
viewerContinuous = viewer3d(BackgroundColor="white",BackgroundGradient="off",CameraZoom=2);
hVolumeContinuous = volshow(vol,OverlayData=vol2,Parent=viewerContinuous,Alphamap=linspace(0,0.2,
OverlayRenderingStyle="GradientOverlay",RenderingStyle="GradientOpacity");

4-94
Display Interior Labels by Adjusting Volume Overlay Properties

Increase the visibility of the interior labels by decreasing the values of the OverlayAlphamap
property.

hVolumeContinuous.OverlayAlphamap=linspace(0,0.5,256);

4-95
4 Displaying and Exploring Images

See Also
viewer3d | volshow

Related Examples
• “Display Interior Labels by Clipping Volume Planes” on page 4-80
• “Explore 3-D Labeled Volumetric Data with Volume Viewer” on page 4-74
• “Remove Objects from Volume Display Using 3-D Scissors” on page 4-103
• “Display Volume Using Cinematic Rendering” on page 4-97

4-96
Display Volume Using Cinematic Rendering

Display Volume Using Cinematic Rendering

Cinematic rendering is an advanced 3-D visualization technique that simulates realistic lighting and
shadows. Cinematic rendering can improve the aesthetic of a display, and makes it easier for viewers
to visually perceive depths and the relative position of objects in a scene. Cinematic rendering is
particularly popular for displaying 3-D medical image volumes, because visual perception is
important for accurate diagnosis and planning.

Download Image Volume Data

This example uses a subset of the Medical Segmentation Decathlon data set [1]. The subset of data
includes two CT chest volumes and corresponding label images, stored in the NIfTI file format.

Run this code to download the MedicalVolumNIfTIData.zip file from the MathWorks® website,
then unzip the file. The size of the data file is approximately 76 MB.

zipFile = matlab.internal.examples.downloadSupportFile("medical","MedicalVolumeNIfTIData.zip");
filepath = fileparts(zipFile);
unzip(zipFile,filepath)

The folder dataFolder contains the downloaded and unzipped data.

dataFolder = fullfile(filepath,"MedicalVolumeNIfTIData");

Specify the filenames of the volume and label images used in this example.

dataFile = fullfile(dataFolder,"lung_043.nii.gz");
labelDataFile = fullfile(dataFolder,"LabelData","lung_043.nii.gz");

Import Image Volume

Read the image data and the metadata from the image file.

V = niftiread(dataFile);
info = niftiinfo(dataFile);

4-97
4 Displaying and Exploring Images

Extract the voxel spacing from the file metadata, and define the transformation to display the volume
with correct dimensions.

voxelSize = info.PixelDimensions;
sx = voxelSize(2);
sy= voxelSize(1);
sz = voxelSize(3);
A = [sx 0 0 0; 0 sy 0 0; 0 0 sz 0; 0 0 0 1];

tform = affinetform3d(A);

Define a transparency map and colormap for this volume. The values used in this example have been
determined using manual trial and error.

alpha = [0 0 0.7 0.9];


color = [0 0 0; 200 140 75; 231 208 141; 255 255 255]./255;
intensity = [-3024 -700 -400 3071];
queryPoints = linspace(min(intensity),max(intensity),256);
alphamap = interp1(intensity,alpha,queryPoints)';
colormap = interp1(intensity,color,queryPoints);

Display Volume Using Default Volume Rendering

Create a viewer window in which to display the volume.

viewer = viewer3d(BackgroundColor="black",BackgroundGradient="off");

Display the volume using the default VolumeRendering rendering style. Specify the camera zoom
level to focus on the rib cage.

vol = volshow(V, ...


Parent=viewer, ...
RenderingStyle="VolumeRendering", ...
Transformation=tform, ...
Colormap=colormap, ...
Alphamap=alphamap);

viewer.CameraZoom = 1.5;

4-98
Display Volume Using Cinematic Rendering

Display Volume Using Cinematic Rendering

Change the rendering style to cinematic rendering. Cinematic rendering applies iterative
postprocessing to more realistically model the diffuse light source in the scene. The default number of
iterations is 100. Changing the style to cinematic rendering also automatically applies denoising to
the viewer display.

vol.RenderingStyle="CinematicRendering";

4-99
4 Displaying and Exploring Images

Pause to apply all postprocessing iterations before updating the display in Live Editor.

pause(4)
drawnow

Adjust Display Settings

You can further refine the cinematic rendering display by adjusting volume and viewer properties.

You can adjust the number of iterations. During each iteration, the viewer samples a ray of light from
the light source. A greater number of iterations generally models a real-world light source more
realistically, but increases rendering times. For this example, keep the default value.

vol.CinematicNumIterations = 100;

The specular reflectance of the volume controls the amount of light reflected off the surface. A value
close to 0 makes the volume appear less shiny. A value closer to 1 makes the volume appear more
shiny. The default value is 0.4. For this example, increase the value to make the ribcage appear
shinier.

vol.SpecularReflectance = 0.7;

Specifying the cinematic rendering style automatically applies denoising to the viewer by setting the
Denoising property to on. You can refine the kernel size and strength of the denoising using the
DenoisingDegreeOfSmoothing and DenoisingSigma viewer properties. For this example, keep
the default denoising settings.

4-100
Display Volume Using Cinematic Rendering

viewer.Denoising = "on";
viewer.DenoisingDegreeOfSmoothing = 0.1;
viewer.DenoisingSigma = 2;

The size of the light source affects the softness of the shadows in the scene. Change the size by
specifying the Size property of the light source, which is defined in the Lights property of the
viewer. For this example, decrease the size of the light to make the shadows visually sharper.

viewer.Lights.Size = 0.05;

Pause to apply all postprocessing iterations before updating the display in Live Editor.

pause(4)
drawnow

Optionally, you can clean up the viewer window by using the 3-D Scissors tool to remove the patient
bed. For an example, see “Remove Objects from Volume Display Using 3-D Scissors” on page 4-103.
This image shows the final volume after bed removal.

4-101
4 Displaying and Exploring Images

References
[1] Medical Segmentation Decathlon. "Lung." Tasks. Accessed May 10, 2018. http://
medicaldecathlon.com/. The Medical Segmentation Decathlon data set is provided under the
CC-BY-SA 4.0 license. All warranties and representations are disclaimed. See the license for
details.

See Also
viewer3d | volshow | Light

Related Examples
• “Display Translucent Volume with Advanced Light Scattering” on page 4-109
• “Remove Objects from Volume Display Using 3-D Scissors” on page 4-103
• “Display Interior Labels by Adjusting Volume Overlay Properties” on page 4-88
• “Display Interior Labels by Clipping Volume Planes” on page 4-80

4-102
Remove Objects from Volume Display Using 3-D Scissors

Remove Objects from Volume Display Using 3-D Scissors

This example shows how to interactively remove unwanted regions in a 3-D display, such as a patient
bed in a CT scan, by using 3-D scissors.

Download Image Volume Data

This example uses a subset of the Medical Segmentation Decathlon data set [1]. The subset of data
includes two CT chest volumes and their corresponding label images, stored in the NIfTI file format.

Run this code to download the MedicalVolumNIfTIData.zip file from the MathWorks® website,
then unzip the file. The size of the data file is approximately 76 MB.

zipFile = matlab.internal.examples.downloadSupportFile("medical","MedicalVolumeNIfTIData.zip");
filepath = fileparts(zipFile);
unzip(zipFile,filepath)

Specify the path to the folder that contains the downloaded and unzipped data.

dataFolder = fullfile(filepath,"MedicalVolumeNIfTIData");

Specify the filenames of the volume and label images used in this example.

dataFile = fullfile(dataFolder,"lung_043.nii.gz");
labelDataFile = fullfile(dataFolder,"LabelData","lung_043.nii.gz");

Import Image Volume

Read the image data and the metadata from the image file.

V = niftiread(dataFile);
info = niftiinfo(dataFile);

Extract the voxel spacing from the file metadata, and define the transformation to display the volume
with correct dimensions.

4-103
4 Displaying and Exploring Images

voxelSize = info.PixelDimensions;
sx = voxelSize(2);
sy= voxelSize(1);
sz = voxelSize(3);
A = [sx 0 0 0; 0 sy 0 0; 0 0 sz 0; 0 0 0 1];

tform = affinetform3d(A);

Define a transparency map and colormap for this volume. The values used in this example have been
determined using manual trial and error.

alpha = [0 0 0.7 0.9];


color = [0 0 0; 200 140 75; 231 208 141; 255 255 255]./255;
intensity = [-3024 -700 -400 3071];
queryPoints = linspace(min(intensity),max(intensity),256);
alphamap = interp1(intensity,alpha,queryPoints)';
colormap = interp1(intensity,color,queryPoints);

Display Volume

Create a viewer window in which to display the volume.

viewer = viewer3d;

Display the volume using the cinematic rendering style with the specified colormap and alphamap. To
learn more about cinematic rendering, see “Display Volume Using Cinematic Rendering” on page 4-
97.

vol = volshow(V, ...


Parent=viewer, ...
RenderingStyle="CinematicRendering", ...
Transformation=tform, ...
Colormap=colormap, ...
Alphamap=alphamap);

4-104
Remove Objects from Volume Display Using 3-D Scissors

Remove Bed Using 3-D Scissors

Interactively remove the bed from the display by using the 3-D scissors tool. First, rotate the volume
to the plane in which you want to define the cut region. The 3-D scissors tool removes everything
within the cut out region along the direction perpendicular to the plane. For this example, click the Z
axes display indicator in the bottom-left corner of the viewer to rotate the image.

4-105
4 Displaying and Exploring Images

To enable the 3-D scissors tool, in the viewer window toolstrip, click the scissors icon . Then, in
the viewer window, draw a polygon to define a 2-D cut region. Click to place each vertex. To undo the
most recent vertex placement, press Backspace. To cancel and exit the drawing operation, press
Esc. Complete the drawing action by double-clicking. When you complete the drawing, the viewer
automatically removes everything within the polygon.

4-106
Remove Objects from Volume Display Using 3-D Scissors

Restore the initial view position by clicking the home icon in the viewer window toolstrip. The
bed has been successfully removed from the volume display. Note that you cannot undo a 3-D scissors
operation. If you want to restore the full volume, reset the Data property of the Volume object, or
call volshow again to create a new Volume object.

4-107
4 Displaying and Exploring Images

References
[1] Medical Segmentation Decathlon. "Lung." Tasks. Accessed May 10, 2018. http://
medicaldecathlon.com/. The Medical Segmentation Decathlon data set is provided under the
CC-BY-SA 4.0 license. All warranties and representations are disclaimed. See the license for
details.

See Also
viewer3d | volshow

Related Examples
• “Display Interior Labels by Adjusting Volume Overlay Properties” on page 4-88
• “Display Interior Labels by Clipping Volume Planes” on page 4-80
• “Display Volume Using Cinematic Rendering” on page 4-97

4-108
Display Translucent Volume with Advanced Light Scattering

Display Translucent Volume with Advanced Light Scattering

This example shows how to display translucent volumes using realistic light scattering.

The volshow function provides several rendering styles for displaying volumes, including standard
volume rendering, cinematic rendering, and light scattering. The light scattering style most
accurately models light propagation through an object, but is computationally expensive. Consider
using the light scattering style when displaying translucent volumes with soft opacity gradients. For
opaque volumes, consider the cinematic rendering style, which is less computationally expensive and
generates photorealistic displays. For an example, see “Display Volume Using Cinematic Rendering”
on page 4-97.

In this example, you use light scattering to accurately model light shining through a cloud.

Download Image Volume Data

This example uses a version of the Walt Disney Animation Studios Cloud Data Set [1] saved as a MAT
file. Run this code to download the lightScatteringCloud.mat file from the MathWorks®
website. The size of the data file is approximately 6 MB.

file = matlab.internal.examples.downloadSupportFile("image","data/lightScatteringCloud.mat");

Load the file into the workspace. The vol variable contains the cloud image volume.

load(file)
whos vol

Name Size Bytes Class Attributes

vol 307x250x189 58023000 single

Display Using Volume Rendering

Create a viewer window in which to display the volume.

viewer = viewer3d;

By default, the viewer contains ambient light and one light source located above and to the right of
the camera.

viewer.LightPositionMode

ans =
"right"

Specify the colormap and alphamap for displaying the volume.

alpha = [0.0 repmat(0.1,[1 255])];


color = "white";

Display the volume using the default rendering style, VolumeRendering. Specify the
SpecularReflectance value as 0 so the cloud does not appear shiny.

obj = volshow(vol, ...


Parent=viewer, ...
RenderingStyle="VolumeRendering", ...

4-109
4 Displaying and Exploring Images

SpecularReflectance=0, ...
Alphamap=alpha, ...
Colormap=color);

Adjust the camera position to view the front of the cloud.

viewer.CameraPosition = [-313.2140 154 95];


viewer.CameraZoom = 1.8;

Change the rendering style to light scattering. The display appears more realistic.

obj.RenderingStyle = "LightScattering";

4-110
Display Translucent Volume with Advanced Light Scattering

Move the light behind the cloud to simulate light shining through the cloud.

viewer.LightPositionMode = "target-behind";

4-111
4 Displaying and Exploring Images

Manage Light Scattering Performance

You can balance rendering quality and performance by adjusting the LightScatteringQuality
property of the Volume object. By default, the LightScatteringQuality value is auto, and the
software automatically decreases quality during interactions, such as pan, zoom, and rotate, and
maximizes quality between interactions. This animation shows the temporary change in quality
during an interaction.

4-112
Display Translucent Volume with Advanced Light Scattering

If your application requires that the light scattering quality remain constant, specify the
LightScatteringQuality value as a numeric scalar from 0 to 1, where a greater value
corresponds to higher quality. For example, this code specifies a value that balances quality and
performance.

obj.LightScatteringQuality = 0.5;

References
[1] Walt Disney Animation Studios. “Walt Disney Animation Studios - Clouds.” Accessed June 30,
2023. https://disneyanimation.com/resources/clouds/. The Clouds data set is licensed under
the CC-BY-SA 3.0 license. See the license for details.

See Also
viewer3d | volshow | Light

Related Examples
• “Display Volume Using Cinematic Rendering” on page 4-97
• “Display Interior Labels by Clipping Volume Planes” on page 4-80
• “Display Interior Labels by Adjusting Volume Overlay Properties” on page 4-88

4-113
4 Displaying and Exploring Images

Display Large 3-D Images Using Blocked Volume Visualization

This example shows how to display large 3-D image volumes using a blockedImage object and the
volshow function.

The volshow function typically displays 3-D image volumes stored as numeric arrays. However, large
volumes can require too much memory to store in the MATLAB workspace, or can exceed the size
limitations of graphics hardware. If you do not need to view the full volume at full resolution, you can
downsample the image resolution or crop the volume to your region of interest, but to view the full
volume at full resolution, you can create a blockedImage object and display it using volshow. A
blockedImage object manages large images as a group of smaller, discrete blocks. The
blockedImage object points to an image source, which can be either an in-memory numeric array or
an image file saved outside of MATLAB. When you pass a blockedImage object to volshow, the
function reads and renders the image one block at a time, avoiding out-of-memory issues.

Display Blocked Image Volume

Create a large 500-by-500-by-2500 voxel image volume. If your machine does not have enough
memory to create and store the 2.5 GB volume, decrease imSize before running this example.

imSize = [500 500 2500];

Create a simulated 3-D image of bubbles, V. This can take several minutes.

V = rand(imSize,"single");
BW = false(size(V));
BW(V < 0.000001) = true;
V = bwdist(BW);
V(V <= 20) = 1;
V(V > 20) = 0;

If you try to display V directly, volshow returns an error that the volume is too large. Instead, create
a blockedImage object that points to V and has a block size of 500-by-500-by-500 voxels. In general,
a smaller block size uses less memory, but results in slower rendering times, while a large block size
renders more quickly, but can exceed memory or graphics hardware limitations. For most image
volumes and most hardware, a block size between 300 and 500 voxels in each dimension is
appropriate.

bim = blockedImage(V,BlockSize=[500 500 500]);

Display the blockedImage using volshow. The volshow function reads blocks into memory one at a
time and stitches the individual block renderings to produce the final high-resolution volume
rendering. The function creates a BlockedVolume object, which you can use to query and modify
display properties.

bVol = volshow(bim);

4-114
Display Large 3-D Images Using Blocked Volume Visualization

Pause to load all blocks before updating the display in Live Editor.

pause(4)
drawnow

Interact with Blocked Image Volume

You can interact with the volume using the rotate, zoom, and pan tools. When you interact with the
volume, the viewer temporarily switches to a lower rendering resolution while it rerenders each block
in the new view. Zoom in on the volume.

4-115
4 Displaying and Exploring Images

Crop Blocked Volume

To focus on a region of interest and improve performance between interactions, crop the volume
using a rectangular crop box. You can add a crop box interactively from the viewer toolstrip. Pause on
the Add Clipping Plane icon and, from the list, select the Add Crop Box icon.

The crop box appears in the viewer. To change the size of the cropped volume, drag a face of the crop
box. To move the crop box without changing its size, hold Ctrl while you drag. To undo the cropping,
right-click the crop box and select Remove crop region. This image shows the viewer after dragging
the top of the crop box to shorten it in the Z-direction. After cropping, the display can update faster
between interactions because the viewer rerenders only the visible blocks.

4-116
Display Large 3-D Images Using Blocked Volume Visualization

Alternatively, if you know the coordinates of your desired region of interest, you can create the crop
box programmatically. Specify the coordinates for the crop box as a matrix of the form [xmin ymin
zmin; xmax ymax zmax]. Create a viewer containing the crop box.

cropRegion = [0 0 1000; 500 500 1500];


viewer = viewer3d(CropRegion=cropRegion);

Render the blocked volume. The volshow function renders only the blocks contained within the crop
box, which is faster than rendering the full volume.

bVolCropped = volshow(bim,Parent=viewer);

4-117
4 Displaying and Exploring Images

Pause to load the blocks before updating the display in Live Editor.

pause(3)
drawnow

View Blocked Volume Clipping Planes

You can view inside the blocked image volume using clipping planes. To interactively add a clipping
plane, in the axes toolbar, select the clipping plane button. The viewer rerenders the clipped volume
at full resolution. To improve performance, volshow does not rerender any blocks that are
completely cropped out by the clipping plane. To learn more about working with clipping planes, see
“Display Interior Labels by Clipping Volume Planes” on page 4-80.

4-118
Display Large 3-D Images Using Blocked Volume Visualization

Display Multilevel Blocked Image Volumes

A blockedImage object can point to an image with a single resolution level or multiple resolution
levels. You can create a new multilevel blockedImage object from a single resolution
blockedImage object by using the makeMultiLevel3D function. Create a multilevel
blockedImage object for the bubbles image. The makeMultiLevel3D function adds several lower
resolution levels to the original data in bim.

multibim = makeMultiLevel3D(bim);

Update the data displayed by the BlockedVolume object bVol to the new multilevel image. For
multilevel blockedImage objects, volshow defaults to dynamically selecting the resolution level to
display based on the figure window size, camera positioning, and the size of the volume. This can
improve rendering speed by showing lower resolution levels when the rendering is too small to see
small details.

bVol.Data = multibim;

You can set a fixed resolution level by setting the ResolutionLevel property of the
BlockedVolume object.

bVol.ResolutionLevel = 4;

4-119
4 Displaying and Exploring Images

Pause to load all blocks before updating the display in Live Editor.

pause(1)
drawnow

Display File-Backed Blocked Image Volumes

A file-backed blockedImage object points to an image source file saved outside of MATLAB. Use file-
backed blocked images when your image is too large to read into the MATLAB workspace. The
volshow function reads and renders the image data from the file one block at a time. Rendering
performance is best when you read file-backed images from locally stored files. Reading files from
remote servers can significantly increase rendering times.

To control whether volshow stores some, none, or all of the blocks in memory once it reads them
from the file, set the CachingStrategy property. By default, the CachingStrategy is "auto", and
volshow stores a subset of blocks based on the amount of memory available to MATLAB. Storing
blocks in memory improves performance when you interact with the volume or update display
properties, because volshow does not need to reread all of the file data while it rerenders the
volume. You can also set the CachingStrategy to "none" to discard all blocks from memory after
reading, or to "all" to store all blocks in memory. You can also specify CachingStrategy as a
numeric scalar to allocate a specific amount of memory for block storage, in GB.

Allocate approximately 5 GB of CPU memory to visualize the blocked volume bim.

bVol = volshow(bim,CachingStrategy=5);

4-120
Display Large 3-D Images Using Blocked Volume Visualization

Pause to load all blocks before updating the display in Live Editor.

pause(4)
drawnow

The volshow function scales the intensity range of a volume using the DataLimits and
DataLimitsMode properties. By default, the DataLimitsMode value is "auto", and volshow
automatically sets the data limits for scaling. For file-backed blocked volumes that do not have a
resolution level smaller than 512 voxels in all dimensions, volshow scales the data to the range of
the underlying data type. For example, if the ClassUnderlying property value of the blocked image
is "single", then volshow scales the data values to the range [0, 1]. Automatic scaling can result
in poor visualization results. Therefore, if you know the intensity range of your blocked image
volume, specify it directly using the DataLimits property. When you specify DataLimits, the
DataLimitsMode value automatically changes from "auto" to "manual".

bVol.DataLimits = [0, 1];

See Also
volshow | BlockedVolume Properties | blockedImage | bigimageshow

Related Examples
• “Display Interior Labels by Clipping Volume Planes” on page 4-80

4-121
4 Displaying and Exploring Images

View Image Sequences in Video Viewer


This section describes how to use the Video Viewer app to view image sequences and provides
information about configuring the app.

Open Data in Video Viewer


This example shows how to view multi-slice volumetric data in Video Viewer.

Load the image sequence into the MATLAB workspace. For this example, load the MRI data from the
file mristack.mat, which is included in the imdata folder. This creates a variable named mristack
in your workspace. The variable is an array of 21 grayscale frames containing MRI images of the
brain. Each frame is a 256-by-256 array of uint8 data.

load mristack

mristack 256x256x21 1276256 uint8

Click Video Viewer in the apps gallery and select the Import from workspace option on the File
menu. You can also call implay, specifying the name of the image sequence variable as an argument.

implay(mristack)

Video Viewer opens, displaying the first frame of the image sequence. Note how Video Viewer
displays information about the image sequence, such as the size of each frame and the total number
of frames, at the bottom of the window.

Explore Image Sequence Using Playback Controls

To view the image sequence or video as an animation, click the Play button in the Playback toolbar,
select Play from the Playback menu, or press P or the Space bar. By default, Video Viewer plays the
image sequence forward, once in its entirety, but you can view the frames in the image sequence in
many ways, described in this table. As you view an image sequence, Video Viewer updates the Status
Bar at the bottom of the window.

4-122
View Image Sequences in Video Viewer

Viewing Option Playback Control Keyboard


Shortcut
Specify the direction in Click the Playback mode button in the Playback toolbar A
which to play the image or select Playback Modes from the Playback menu. You can
sequence. select Forward, Backward, or AutoReverse. As you click the
playback mode button, Video Viewer cycles through these
options and the appearance changes to indicate the current
selection. Video Viewer uses plus (+) and minus (-) signs to
indicate playback direction in the Current Frame display.
When you play a video backwards, the frame numbers
displayed are negative. When you are in AutoReverse mode,
Video Viewer uses plus signs to indicate the forward
direction.
View the sequence Click the Repeat button in the Playback toolbar or select R
repeatedly. Playback Modes > Repeat from the Playback menu. You
toggle this option on or off.
Jump to a specific frame Click the Jump to button in the Playback toolbar or select J
in the sequence. Jump to from the Playback menu. This option opens a dialog
box in which you can specify the number of the frame.
Stop the sequence. Click the Stop button in the Playback toolbar or select S
Stop from the Playback menu. This button is only enabled
when an image sequence is playing.
Step through the Arrow keys
Click one of the navigation buttons in the
sequence, one frame at Page Up/
Playback toolbar, in the desired direction, or select an
a time, or jump to the Page Down
option, such as Fast Forward or Rewind from the Playback
beginning or end of the
menu. L (last
sequence (rewind).
frame) F
(first frame)

Examine Frame More Closely


Video Viewer supports several tools listed in the Tools menu and on the Toolbar that you can use to
examine the frames in the image sequence more closely.

Viewing Option Playback Control


Zoom in or out on the image, and
Click one of the zoom buttons in the toolbar or select
pan to change the view.
Zoom In or Zoom Out from the Tools menu. Click the Pan

button in the toolbar or select Pan from the Tools menu. If

you click Maintain fit to window button in the toolbar or


select Maintain fit to window or from the Tools menu, the
zoom and pan buttons are disabled.
Examine an area of the current
Click the Pixel region button in the Playback toolbar or
frame in detail.
select Pixel Region from the Tools menu.

4-123
4 Displaying and Exploring Images

Viewing Option Playback Control


Export frame to Image Viewer
Click the Export to Image Viewer button in the Playback
toolbar or select Export to Image Viewer from the File menu.
Video Viewer opens an Image Viewer app containing the
current frame.

Specify Frame Rate

To decrease or increase the playback rate, select Frame Rate from the Playback menu, or use the
keyboard shortcut T. The Frame Rate dialog box displays the frame rate of the source, lets you
change the rate at which Video Viewer plays the image sequence or video, and displays the actual
playback rate. The playback rate is the number of frames that Video Viewer processes per second.

If you want to increase the actual playback rate, but your system's hardware cannot keep up with the
desired rate, select the Allow frame drop to achieve desired playback rate check box. This
parameter enables Video Viewer to achieve the playback rate by dropping frames. When you select
this option, the Frame Rate dialog box displays several additional options that you can use to specify
the minimum and maximum refresh rates. If your hardware allows it, increase the refresh rate to
achieve a smoother playback. However, if you specify a small range for the refresh rate, the
computed frame replay schedule may lead to a choppy replay, and a warning will appear.

Specify Colormap

To specify the colormap to apply to the intensity values, select Colormap from the Tools menu, or use
the keyboard shortcut C. Video Viewer displays a dialog box that enables you to change the colormap.

Use the Colormap parameter to specify a particular colormap.

If you know that the pixel values do not use the entire data type range, you can select the Specify
range of displayed pixel values check box and enter the range for your data. The dialog box
automatically displays the range based on the data type of the pixel values.

4-124
View Image Sequences in Video Viewer

Get Information about an Image Sequence

To view basic information about the image data, click the Video Information button in the toolbar
or select Video Information from the Tools menu. Video Viewer displays a dialog box containing
basic information about the image sequence, such as the size of each frame, the frame rate, and the
total number of frames.

Configure Video Viewer App


The Configuration dialog box enables you to change the appearance and behavior of the player. To
open the Configuration dialog box, select File > Configuration > Edit. To load a preexisting
configuration set, select File > Configuration > Load.

The Configuration dialog box contains four tabs: Core, Sources, Visuals, and Tools. On each tab,
select a category and then click Properties to view configuration settings.

The following table lists the options that are available for each category on every pane.

4-125
4 Displaying and Exploring Images

Pane Option Category Option Descriptions


Core General UI Display the full path check box — Select to display the full
path to the video data source in the title bar. By default, Video
Viewer displays a shortened name in the title bar.
Core Source UI Keyboard commands respect playback mode check box —
Select to make keyboard shortcut keys aware of your
playback mode selection. If you clear this check box, the
keyboard shortcut keys behave as if the playback mode is set
to Forward play and Repeat is set to off.

Recently used sources list parameter — Specifies the


number of sources listed in the File menu.
Sources File Default open file path parameter — Specify the directory
that is displayed in the Connect to File dialog box when you
click File > Open.
Sources Workspace There are no options associated with this selection.
Sources Simulink Load Simulink model if not open check box — You must
have Simulink installed.

Connect scope on selection of: — Signal lines only or


signal lines and blocks. You must have Simulink installed.
Visuals Video There are no options associated with this selection.
Tools Instrumentation Sets There are no options associated with this selection.
Tools Image Viewer Open new Image Viewer app for each export check box —
Opens a new Image Viewer for each exported frame.
Tools Pixel Region There are no options associated with this selection.
Tools Image Navigation Tools There are no options associated with this selection.

Save Image Viewer App Configuration Settings

To save your configuration settings for future use, select File > Configuration Set > Save as.

Note By default, Video Viewer uses the configuration settings from the file implay.cfg. If you want
to store your configuration settings in this file, you should first create a backup copy of the file.

See Also
Video Viewer | Volume Viewer | Image Viewer | montage

Related Examples
• “Explore 3-D Volumetric Data with Volume Viewer App” on page 4-60
• “Get Started with Image Viewer App” on page 4-25

4-126
Convert Multiframe Image to Movie

Convert Multiframe Image to Movie


To create a MATLAB movie from a multiframe image array, use the immovie function. This example
creates a movie from a multiframe indexed image.

mov = immovie(X,map);

In the example, X is a four-dimensional array of images that you want to use for the movie.

To play the movie, use the implay function. This function opens the multiframe image array in a
Video Viewer app.

implay(mov);

This example loads the multiframe image mri.tif and makes a movie out of it.

mri = uint8(zeros(128,128,1,27));
for frame=1:27
[mri(:,:,:,frame),map] = imread("mri.tif",frame);
end

mov = immovie(mri,map);
implay(mov);

Note To view a MATLAB movie, you must have MATLAB software installed. To make a movie that can
be run outside the MATLAB environment, use the VideoWriter class to create a movie in a standard
video format, such as AVI.

4-127
4 Displaying and Exploring Images

Display Different Image Types


In this section...
“Display Indexed Images” on page 4-128
“Display Grayscale Images” on page 4-128
“Display Binary Images” on page 4-130
“Display Truecolor Images” on page 4-131

If you need help determining what type of image you are working with, see “Image Types in the
Toolbox” on page 2-3.

Display Indexed Images


To display an indexed image using either imshow function or the Image Viewer app, specify both the
image matrix and the colormap. This sample code uses the variable name X to represent an indexed
image in the workspace, and map to represent the colormap.

imshow(X,map)

or

imageViewer(X,Colormap=map)

For each pixel in X, these functions display the color stored in the corresponding row of map. If the
image matrix data is of data type double, the value 1 points to the first row in the colormap, the
value 2 points to the second row, and so on. However, if the image matrix data is of data type uint8
or uint16, the value 0 (zero) points to the first row in the colormap, the value 1 points to the second
row, and so on. This offset is handled automatically by the Image Viewer app and the imshow
function.

If the colormap contains a greater number of colors than the image, the functions ignore the extra
colors in the colormap. If the colormap contains fewer colors than the image requires, the functions
set all image pixels over the limits of the colormap's capacity to the last color in the colormap. For
example, if an image of data type uint8 contains 256 colors, and you display it with a colormap that
contains only 16 colors, all pixels with a value of 15 or higher are displayed with the last color in the
colormap.

Display Grayscale Images


To display a grayscale image, call the imshow function or open the Image Viewer app. This
documentation uses the variable name I to represent a grayscale image in the workspace.

Both functions display the image by scaling the intensity values to serve as indices into a grayscale
colormap.

If I is double, a pixel value of 0.0 is displayed as black, a pixel value of 1.0 is displayed as white, and
pixel values in between are displayed as shades of gray. If I is uint8, then a pixel value of 255 is
displayed as white. If I is uint16, then a pixel value of 65535 is displayed as white.

Grayscale images are similar to indexed images in that each uses an m-by-3 RGB colormap, but you
normally do not specify a colormap for a grayscale image. MATLAB displays grayscale images by

4-128
Display Different Image Types

using a grayscale system colormap (where R=G=B). By default, the number of levels of gray in the
colormap is 256 on systems with 24-bit color, and 64 or 32 on other systems. (See “Display Colors” on
page 16-2 for a detailed explanation.)

Display Grayscale Images with Unconventional Ranges

In some cases, the image data you want to display as a grayscale image could have a display range
that is outside the conventional toolbox range (that is, [0, 1] for single or double arrays, [0, 255]
for uint8 arrays, [0, 65535] for uint16 arrays, or [-32767, 32768] for int16 arrays). For example, if
you filter a grayscale image, some of the output data could fall outside the range of the original data.

To display unconventional range data as an image, you can specify the display range directly, using
this syntax for both the imshow function and Image Viewer app.

imshow(I,DisplayRange=[low high])

or

imageViewer(I,DisplayRange=[low high])

If you use an empty matrix ([]) for the display range, these functions scale the data automatically,
setting low and high to the minimum and maximum values in the array.

The next example filters a grayscale image, creating unconventional range data. The example calls
imageViewer to display the image in Image Viewer, using the automatic scaling option. If you
execute this example, note the display range specified in the lower right corner of the Image Viewer
window.

I = imread("testpat1.png");
J = filter2([1 2;-1 -2],I);
imageViewer(J,DisplayRange=[]);

4-129
4 Displaying and Exploring Images

Display Binary Images


In MATLAB, a binary image is of data type logical. Binary images contain only 0's and 1's. Pixels
with the value 0 are displayed as black; pixels with the value 1 are displayed as white.

Note For the toolbox to interpret the image as binary, it must be of data type logical. Grayscale
images that happen to contain only 0's and 1's are not binary images.

To display a binary image, call the imshow function or open the Image Viewer app. For example, this
code reads a binary image into the MATLAB workspace and then displays the image. The sample code
uses the variable name BW to represent a binary image in the workspace.

BW = imread("circles.png");
imshow(BW)

4-130
Display Different Image Types

Change Display Colors of Binary Image

You might prefer to invert binary images when you display them, so that 0 values are displayed as
white and 1 values are displayed as black. To do this, use the NOT (~) operator in MATLAB. (In this
figure, a box is drawn around the image to show the image boundary.) For example:

imshow(~BW)

You can also display a binary image using the indexed image colormap syntax. For example, the
following command specifies a two-row colormap that displays 0's as red and 1's as blue.

imshow(BW,[1 0 0; 0 0 1])

Display Truecolor Images


Truecolor images, also called RGB images, represent color values directly, rather than through a
colormap. A truecolor image is an m-by-n-by-3 array. For each pixel (r,c) in the image, the color is
represented by the triplet (r,c,1:3).

4-131
4 Displaying and Exploring Images

To display a truecolor image, call the imshow function or open the Image Viewer app. For example,
this code reads a truecolor image into the MATLAB workspace and then displays the image. This
sample code uses the variable name RGB to represent a truecolor image in the workspace.

RGB = imread("peppers.png");
imshow(RGB)

Systems that use 24 bits per screen pixel can display truecolor images directly, because they allocate
8 bits (256 levels) each to the red, green, and blue color planes. On systems with fewer colors,
imshow displays the image using a combination of color approximation and dithering. See “Display
Colors” on page 16-2 for more information.

Note If you display a color image and it appears in black and white, check if the image is an indexed
image. With indexed images, you must specify the colormap associated with the image. For more
information, see “Display Indexed Images” on page 4-128.

4-132
Add Color Bar to Displayed Grayscale Image

Add Color Bar to Displayed Grayscale Image

This example shows how to display a grayscale image with a color bar that indicates the mapping of
data values to colors. Seeing the correspondence between data values and the colors displayed by
using a color bar is especially useful if you are displaying unconventional range data as an image.

Read and display a grayscale image.

I = imread('liftingbody.png');

Convert the image to data type double. Data is in the range [0, 1].

I = im2double(I);
dataRangeI = [min(I(:)) max(I(:))]

dataRangeI = 1×2

0 1

Filter the image using an edge detection filter. The filtered data exceeds the default range [0, 1]
because the filter is not normalized.

h = [1 2 1; 0 0 0; -1 -2 -1];
J = imfilter(I,h);
dataRangeJ = [min(J(:)) max(J(:))]

dataRangeJ = 1×2

-2.5961 2.5451

Display the filtered image using the full display range of the filtered data. imshow displays the
minimum data value as black and the maximum data value as white.

imshow(J,[])

Use the colorbar function to add the color bar to the image.

colorbar

4-133
4 Displaying and Exploring Images

See Also
imshow

More About
• “Display Grayscale Images” on page 4-128

4-134
Print Images

Print Images
If you want to output a MATLAB image to use in another application (such as a word-processing
program or graphics editor), use imwrite to create a file in the appropriate format. See “Write
Image Data to File in Graphics Format” on page 3-6 for details.

If you want to print an image, use imshow to display the image in a MATLAB figure window. If you
are using Image Tool (imtool), then you must use the Print to Figure option on the File menu.
When you choose this option, Image Tool opens a separate figure window and displays the image in it.
You can access the standard MATLAB printing capabilities in this figure window. You can also use the
Print to Figure option to print the image displayed in the Overview tool and the Pixel Region tool.

Once the image is displayed in a figure window, you can use either the MATLAB print command or
the Print option from the File menu of the figure window to print the image. When you print from
the figure window, the output includes non-image elements such as labels, titles, and other
annotations.

Graphics Object Properties That Impact Printing


The output reflects the settings of various properties of graphic objects. In some cases, you might
need to change the settings of certain properties to get the results you want. Here are some tips that
could be helpful when you print images:

• Image colors print as shown on the screen. This means that images are not affected by the figure
object's InvertHardcopy property.
• To ensure that printed images have the proper size and aspect ratio, set the figure object's
PaperPositionMode property to auto. When PaperPositionMode is set to auto, the width
and height of the printed figure are determined by the figure's dimensions on the screen. By
default, the value of PaperPositionMode is manual. If you want the default value of
PaperPositionMode to be auto, you can add this line to your startup.m file.

set(0,"DefaultFigurePaperPositionMode","auto")

For detailed information about printing with File/Print or the print command, see “Print Figure
from File Menu”. For a complete list of options for the print command, enter help print at the
MATLAB command-line prompt or see the print command reference page.

See Also
imshow | images.compatibility.imtool.r2023b.imtool | print

More About
• “Write Image Data to File in Graphics Format” on page 3-6
• “Print Figure from File Menu”

4-135
4 Displaying and Exploring Images

Manage Display Preferences


In this section...
“Retrieve Toolbox Preferences” on page 4-136
“Set Toolbox Preferences” on page 4-136
“Control Image Display Using Preferences and Name-Value Arguments” on page 4-136

You can control how Image Processing Toolbox displays images either by using preferences or by
using function name-value arguments. Preferences control the default display settings of all images
for imshow and the Image Tool (imtool). You can also use name-value arguments, in functions such
as imshow, to override default preferences for individual images.

This page shows you how to use preferences and name-value arguments to control how images
display.

Retrieve Toolbox Preferences


Retrieve the current preference values using one of these options:

• Interactively — Use the Preferences window. To access the window, on the Home tab, in the
Environment section, click Preferences. You can also open the Preferences window from the
Image Tool (imtool), under File > Preferences.
• Programmatically — Use the iptgetpref function. For example, this code retrieves the value of
the ImtoolInitialMagnification preference.
iptgetpref("ImtoolInitialMagnification")

ans =

100

Set Toolbox Preferences


Set Image Processing Toolbox preferences using one of these options:

• Interactively — Use the Preferences window. To access the window, on the Home tab, in the
Environment section, click Preferences. You can also open the Preferences window from the
Image Tool (imtool), under File > Preferences.

Note In MATLAB Online™, access the imshow display settings in the Preferences window under
MATLAB > Image Display.
• Programmatically — Use the iptsetpref function. For example, this code specifies that, by
default, imshow resize the figure window tightly around displayed images.
iptsetpref("ImshowBorder","tight");

For a complete list of toolbox preferences, see the iptsetpref reference page.

Control Image Display Using Preferences and Name-Value Arguments

4-136
Manage Display Preferences

This example shows how to control image magnification in the Image Tool using preferences versus
name-value arguments. Image Processing Toolbox preferences set the default display behavior for a
function. You can use name-value arguments to override the default preference for individual images.

Display Image Using Factory Default

The imtool function opens images in the Image Tool. By default, the app displays images at the
magnification specified by the ImtoolInitialMagnification preference. The original default
value is 100, meaning the app loads images at 100% magnification.

imtool("peppers.png")

Update Default Using Preferences

To display images at 80% magnification by default, update the ImtoolInitialMagnification


value by using the Preferences window or the iptsetpref function. Note that the Image Tool now
displays images at 80% magnification by default.

iptsetpref("ImtoolInitialMagnification",80);
imtool("peppers.png")

4-137
4 Displaying and Exploring Images

Override Preference Using Name-Value Argument

To display a single image at 50% magnification, without changing the default value, include the
InitialMagnification name-value argument when you call imtool. You do not need to update
the ImtoolInitialMagnification preference.

imtool("peppers.png",InitialMagnification=50)

4-138
Manage Display Preferences

See Also
iptprefs | iptgetpref | iptsetpref

4-139
5

Building GUIs with Modular Tools

This chapter describes how to use interactive modular tools and create custom image processing
applications.

• “Interactive Image Viewing and Processing Tools” on page 5-2


• “Interactive Tool Workflow” on page 5-6
• “Add Scroll Panel to Figure” on page 5-10
• “Get Handle to Target Image” on page 5-13
• “Create Pixel Region Tool” on page 5-15
• “Build App to Display Pixel Information” on page 5-19
• “Build App for Navigating Large Images” on page 5-21
• “Build Image Comparison Tool” on page 5-24
• “Create Angle Measurement Tool Using ROI Objects” on page 5-28
5 Building GUIs with Modular Tools

Interactive Image Viewing and Processing Tools


The toolbox includes several tools that you can use to interact with an image displayed in a MATLAB
figure window. For example, you can use tools to adjust the display of the image, get information
about the image data, and adjust the contrast or crop the image.

You can use the tools independently or in combination. You can create custom image processing apps
that open a combination of tools and initialize their display and interactions. For more information,
see “Interactive Tool Workflow” on page 5-6.

You can also access all tools using Image Tool (imtool).

Tool Example Description


Adjust Displays a histogram of the target image
Contrast tool and enables interactive adjustment of
contrast and brightness by manipulation
of the display range.

Use the imcontrast function to create


the tool in a separate figure window and
associate it with an image.

Choose Allows you to change the colormap of the


Colormap target figure. You can select one of the
tool MATLAB colormaps, select a colormap
variable from the MATLAB workspace, or
enter a custom MATLAB function.

Use the imcolormaptool function to


launch the tool in a separate figure
window.

Crop Image Displays a draggable, resizable rectangle


tool on an image. You can move and resize the
rectangle to define the crop region.
Double-click to perform the crop operation
or select Crop Image from the context
menu.

Use the imcrop function to create the tool


and associate it with an image.

5-2
Interactive Image Viewing and Processing Tools

Tool Example Description


Display Displays the display range values of the
Range tool associated image.

Use the imdisplayrange function to


create the tool, associate it with an image,
and embed it in a figure or panel.

Distance tool Displays a draggable, resizable line on an


image. Superimposed on the line is the
distance between the two endpoints of the
line. The distance is measured in units
specified by the XData and YData
properties, which is pixels by default.

Use the imdistline function to create


the tool and associate it with an image.

Image Displays basic attributes about the target


Information image. If the image displayed was
tool specified as a graphics file, the tool
displays any metadata that the image file
might contain.

Use the imageinfo function to create the


tool in a separate figure window and
associate it with an image.

Magnificatio Creates a text edit box containing the


n box current magnification of the target image.
Users can change the magnification of the
image by entering a new magnification
value.

Use immagbox to create the tool,


associate it with an image, and embed it in
a figure or panel.

Note The target image must be contained


in a scroll panel.

5-3
5 Building GUIs with Modular Tools

Tool Example Description


Overview Displays the target image in its entirety
tool with the portion currently visible in the
scroll panel outlined by a rectangle
superimposed on the image. Moving the
rectangle changes the portion of the
target image that is currently visible in the
scroll panel.

Use imoverview to create the tool in a


separate figure window and associate it
with an image.

Use imoverviewpanel to create the tool


in a panel that can be embedded within
another figure or panel.

Note The target image must be contained


in a scroll panel.
Pixel Displays information about the pixel the
Information mouse is over in the target image.
tool
Use impixelinfo to create the tool,
associate it with an image, and display it
in a figure or panel.

If you want to display only the pixel


values, without the Pixel info label, use
impixelinfoval.
Pixel Region Display pixel values for a specified region
tool in the target image.

Use impixelregion to create the tool in


a separate figure window and associate it
with an image.

Use impixelregionpanel to create the


tool as a panel that can be embedded
within another figure or panel.

5-4
Interactive Image Viewing and Processing Tools

Tool Example Description


Save Image Display the Save Image dialog window. In
tool the window, navigate to the desired
directory, specify the name of the output
image, and choose the file format used to
store the image.

Use imsave to create the tool in a


separate figure window and associate it
with an image.

Scroll Panel Display target image in a scrollable panel.


tool
Use imscrollpanel to add a scroll panel
to an image displayed in a figure window.

See Also

More About
• “Interactive Tool Workflow” on page 5-6
• “Get Started with Image Tool” on page 4-50

5-5
5 Building GUIs with Modular Tools

Interactive Tool Workflow


In this section...
“Display Target Image in Figure Window” on page 5-6
“Create the Tool” on page 5-6
“Position Tools” on page 5-7
“Add Navigation Aids” on page 5-8
“Customize Tool Interactivity” on page 5-8

Using the interactive tools typically involves the following steps.

Display Target Image in Figure Window


Display the image to be processed (called the target image) in a MATLAB figure window. The imshow
function is recommended because it optimizes figure, axes, and image object properties for image
display, but you can also use the image or imagesc functions.

Some of the tools add themselves to the figure window containing the image. Prevent the tools from
displaying over the image by including a border. If you are using the imshow function, then make
sure that the Image Processing Toolbox ImshowBorder preference is set to "loose" (this is the
default setting).

Create the Tool


After you display an image in a figure window, create one or more tools using the corresponding tool
creation functions. For a list of available tools, see “Interactive Image Viewing and Processing Tools”
on page 5-2. The functions create the tools and automatically set up the interactivity connection
between the tool and the target image.

Associate Tool with Target Image

When you create a tool, you can specify the target image or you can let the tool pick a suitable target
image.

• To specify the target image, provide a handle to the target image as an input argument to the tool
creation function. The handle can be a specific image object, or a figure, axes, or panel object that
contains an image.
• To let the tool pick the target image, call the tool creation function with no input arguments. By
default, the tool uses the image in the current figure as the target image. If the current figure
contains multiple images, then the tool associates with the first image in the figure object's
children (the last image created). Note that not all tools offer a no-argument syntax.

Some tools can work with multiple images in a figure. These are impixelinfo, impixelinfoval,
and imdisplayrange.

Specify Parent of Tool

When you create a tool, you can optionally specify the object that you want to be the parent of the
tool. By specifying the parent, you determine where the tool appears on your screen. Using this

5-6
Interactive Tool Workflow

syntax of the tool creation functions, you can add the tool to the figure window containing the target
image, open the tool in a separate figure window, or create some other combination.

Specifying the parent is optional. When you do not specify the parent, the tools use default behavior.

• Some of the smaller tools, such as the Display Range tool and Pixel Information tool, use the
parent of the target image as their parent, inserting themselves in the same figure window as the
target image.
• Other tools, such as the Adjust Contrast tool and Choose Colormap tool, open in separate figures
of their own.
• Two tools, the Overview tool and Pixel Region tool, have different creation functions for specifying
the parent figure. Their primary creation functions, imoverview and impixelregion, open the
tools in a separate figure window. To specify a different parent, you must use the
imoverviewpanel and impixelregionpanel functions. For an example, see “Create Pixel
Region Tool” on page 5-15.

Note The Overview tool and the Pixel Region tool provide additional capabilities when created in
their own figure windows. For example, both tools include zoom buttons that are not part of their
panel versions.

Position Tools
Each tool has default positioning behavior. For example, the impixelinfo function creates the tool
as a panel object that is the full width of the figure window, positioned in the lower left corner of the
target image figure window.

Because the tools are constructed from graphics objects, such as panel objects, you can change their
default positioning or other characteristics by setting properties of the objects. To specify the position
of a tool or other graphics object, set the Position property as a four-element position vector [left
bottom width height]. The values of left and bottom specify the distance from the lower left
corner of the parent container object, such as a figure. The values of width and height specify the
dimensions of the object.

When you specify a position vector, you can specify the units of the values in the vector by setting the
value of the Units property of the object. To allow better resizing behavior, use normalized units
because they specify the relative position of the tool, not the exact location in pixels.

For example, when you first create an embedded Pixel Region tool in a figure, it appears to take over
the entire figure because, by default, the position vector is set to [0 0 1 1], in normalized units.
This position vector tells the tool to align itself with the bottom left corner of its parent and fill the
entire object. To accommodate the image and the Pixel Information tool and Display Range tools,
change the position of the Pixel Region tool in the lower half of the figure window, leaving room at the
bottom for the Pixel Information and Display Range tools. Here is the position vector for the Pixel
Region tool.

set(hpixreg,"Units","normalized","Position",[0 .08 1 .4])

To accommodate the Pixel Region tool, reposition the target image so that it fits in the upper half of
the figure window, using the following position vector. To reposition the image, you must specify the
Position property of the axes object that contains it; image objects do not have a Position
property.

5-7
5 Building GUIs with Modular Tools

set(hax,"Units","normalized","Position",[0 0.5 1 0.5])

For an example, see “Create Pixel Region Tool” on page 5-15.

Add Navigation Aids


The toolbox includes tools that you can use to add navigation aids to a GUI application.

The scroll panel is the primary navigation tool and is a prerequisite for the other navigation tools.
When you display an image in a scroll panel, the tool displays only a portion of the image, if it is too
big to fit into the figure window. When only a portion of the image is visible, the scroll panel adds
horizontal and vertical scroll bars, to enable viewing of the parts of the image that are not currently
visible.

Once you create a scroll panel, you can optionally add the other navigation tools: the Overview tool
and the Magnification tool. The Overview tool displays a view of the entire image, scaled to fit, with a
rectangle superimposed over it that indicates the part of the image that is currently visible in the
scroll panel. The Magnification Box displays the current magnification of the image and can be used
to change the magnification.

Adding a scroll panel to an image display changes the relationship of the graphics objects used in the
display. For more information, see “Add Scroll Panel to Figure” on page 5-10.

Note The toolbox navigation tools are incompatible with standard MATLAB figure window navigation
tools. When using these tools in a GUI, suppress the toolbar and menu bar in the figure windows to
avoid conflicts between the tools.

Customize Tool Interactivity


When you create a tool and associate it with a target image, the tool automatically makes the
necessary connections between the target image and the tool.

Some tools have a one-way connection to the target image. These tools get updated when you
interact with the target image, but you cannot use the tool to modify the target image. For example,
the Pixel Information tool receives information about the location and value of the pixel currently
under the pointer.

Other tools have a two-way connection to the target image. These tools get updated when you
interact with the target image, and you can update the target image by interacting with the tools. For
example, the Overview tool sets up a two-way connection to the target image. For this tool, if you
change the visible portion of the target image by scrolling, panning, or by changing the
magnification, then the Overview tool changes the size and location of the detail rectangle to match
the portion of the image that is now visible. Conversely, if you move the detail window in the
Overview tool, then the tool updates the visible portion of the target image in the scroll panel.

The tools accomplish this interactivity by using callback properties of the graphics objects. For
example, the figure object supports a WindowButtonMotionFcn callback that executes whenever
the mouse button is depressed. You can customize the connectivity of a tool by using the application
programmer interface (API) associated with the tool to set up callbacks to get notification of events.
For more information, see “Create Callbacks for Graphics Objects” and “Overview Events and
Listeners”. For an example, see “Build Image Comparison Tool” on page 5-24.

5-8
Interactive Tool Workflow

For example, the Magnification box supports a single API function: setMagnification. You can use
this API function to set the magnification value displayed in the Magnification box. The Magnification
box automatically notifies the scroll panel to change the magnification of the image based on the
value. The scroll panel also supports an extensive set of API functions. To get information about these
APIs, see the reference page for each tool.

See Also

Related Examples
• “Create Pixel Region Tool” on page 5-15
• “Build App for Navigating Large Images” on page 5-21
• “Build App to Display Pixel Information” on page 5-19
• “Build Image Comparison Tool” on page 5-24

More About
• “Interactive Image Viewing and Processing Tools” on page 5-2
• “Add Scroll Panel to Figure” on page 5-10

5-9
5 Building GUIs with Modular Tools

Add Scroll Panel to Figure


The primary navigation tool of a figure is a scroll panel. When you display an image in a scroll panel,
the tool displays only a portion of the image, if it is too big to fit into the figure window. When only a
portion of the image is visible, the scroll panel adds horizontal and vertical scroll bars, to enable
viewing of the parts of the image that are not currently visible.

When you display an image in a scroll panel, it changes the object hierarchy of your displayed image.
This diagram illustrates the typical object hierarchy for an image displayed in an axes object in a
figure object.

Object Hierarchy of Image Displayed in a Figure

When you call the imscrollpanel function to put the target image in a scrollable window, this
object hierarchy changes. imscrollpanel inserts a new object into the hierarchy between the
figure object and the axes object containing the image. The figure shows the object hierarchy after
the call to imscrollpanel.

5-10
Add Scroll Panel to Figure

Object Hierarchy of Image Displayed in Scroll Panel

After you add a scroll panel to a figure, you can change the image data displayed in the scroll bar by
using the replaceImage function in the imscrollpanel API.

The scroll panel navigation tool is not compatible with the figure window toolbar and menu bar. When
you add a scroll panel to an image displayed in a figure window, suppress the toolbar and menu bar
from the figure. This sample code demonstrates one way to do this.

hfig = figure("Toolbar","none","Menubar","none");
himage = imshow("foggysf1.jpg");
hpanel = imscrollpanel(hfig,himage);

See Also
imscrollpanel | immagbox | imoverview

Related Examples
• “Build App for Navigating Large Images” on page 5-21

5-11
5 Building GUIs with Modular Tools

More About
• “Interactive Image Viewing and Processing Tools” on page 5-2

5-12
Get Handle to Target Image

Get Handle to Target Image

This example shows several ways to get the handle to the image displayed in a figure window,
referred to as the target image. This can be useful when creating apps with the modular interactive
tools.

Get the handle when you initially display the image in a figure window using the imshow syntax that
returns a handle.

hfig = figure;
himage = imshow('moon.tif')

5-13
5 Building GUIs with Modular Tools

himage =
Image with properties:

CData: [537x358 uint8]


CDataMapping: 'scaled'

Use GET to show all properties

Get the handle after you have displayed the image in a figure window using the imhandles function.
You must specify a handle to the figure window as a parameter.

himage2 = imhandles(hfig)

himage2 =
Image with properties:

CData: [537x358 uint8]


CDataMapping: 'scaled'

Use GET to show all properties

See Also
imhandles

More About
• “Interactive Tool Workflow” on page 5-6

5-14
Create Pixel Region Tool

Create Pixel Region Tool

This example shows how to create a Pixel Region tool in a separate figure window and embedded in
an existing figure window.

Create Pixel Region Tool in Separate Figure Window

Read an image into the workspace.

I = imread("pout.tif");

Display the image in a figure window. Return a handle to the target image, himage.

himage = imshow('pout.tif');

To create the Pixel Region tool in a separate window, use the impixelregion function.

hpixreg = impixelregion(himage);

5-15
5 Building GUIs with Modular Tools

Embed Pixel Region Tool in Existing Figure

Create a new figure window and return a handle to the figure.

fig = figure;

Create an axes and display the target image in the axes.

ax = axes;
img = imshow(I);

To create the Pixel Region tool in the same figure as the target image, use the
impixelregionpanel function. Specify the target image's parent figure, fig, as the parent of the
Pixel Region tool.

pixregionobj = impixelregionpanel(fig,img);

5-16
Create Pixel Region Tool

The Pixel Region tool overlaps and hides the original image. To see both the image and the tool, shift
their positions so that they do not overlap.

set(ax,'Units','normalized','Position',[0 .5 1 .5]);
set(pixregionobj,'Units','normalized','Position',[0 .04 1 .4]);

5-17
5 Building GUIs with Modular Tools

See Also
impixelregion | impixelregionpanel

More About
• “Position Tools” on page 5-7
• “Interactive Image Viewing and Processing Tools” on page 5-2

5-18
Build App to Display Pixel Information

Build App to Display Pixel Information

This example shows how to create a simple app that provides information about pixels and features in
an image using modular pixel information tools.

First, define a function that builds the app. This example uses a function called my_pixinfo_tool,
which is attached at the end of the example.

After you define the function that builds the app, test the app. Read an image into the workspace.
I = imread('pears.png');

Display the image with pixel information tools in the app.


my_pixinfo_tool(I)

App Creation Function

The my_pixinfo_tool function accepts an image as an argument and displays the image in a figure
window with a Pixel Information tool, Display Range tool, Distance tool, and Pixel Region tool. Note
that the function suppresses the toolbar and menu bar in the figure window because scrollable
navigation is incompatible with standard MATLAB™ figure window navigation tools.
function my_pixinfo_tool(im)
% Create figure, setting up properties
fig = figure('Toolbar','none', ...

5-19
5 Building GUIs with Modular Tools

'Menubar','none', ...
'Name','My Pixel Info Tool', ...
'NumberTitle','off', ...
'IntegerHandle','off');

% Create axes and reposition the axes


% to accommodate the Pixel Region tool panel
ax = axes('Units','normalized', ...
'Position',[0 .5 1 .5]);

% Display image in the axes


img = imshow(im);

% Add Distance tool, specifying axes as parent


distool = imdistline(ax);

% Add Pixel Information tool, specifying image as parent


pixinfo = impixelinfo(img);

% Add Display Range tool, specifying image as parent


drange = imdisplayrange(img);

% Add Pixel Region tool panel, specifying figure as parent


% and image as target
pixreg = impixelregionpanel(fig,img);

% Reposition the Pixel Region tool to fit in the figure


% window, leaving room for the Pixel Information and
% Display Range tools
set(pixreg, 'units','normalized','position',[0 .08 1 .4])

end

See Also
imdistline | impixelinfo | imdisplayrange | impixelregionpanel

Related Examples
• “Create Pixel Region Tool” on page 5-15
• “Build Image Comparison Tool” on page 5-24
• “Build App for Navigating Large Images” on page 5-21

More About
• “Interactive Tool Workflow” on page 5-6
• “Interactive Image Viewing and Processing Tools” on page 5-2

5-20
Build App for Navigating Large Images

Build App for Navigating Large Images

This example shows how to build an app using modular tools that displays an image with navigation
aids including scroll bars, and overview window, and a magnification box.

First, define a function that builds the app. This example defines a function called
my_large_image_display at the end of the example.

After you define the function that builds the app, test the app. Read an image into the workspace.

I = imread('car1.jpg');

Display the image with navigation aids in the app.

my_large_image_display(I)

5-21
5 Building GUIs with Modular Tools

App Creation Function

The my_large_image_display function accepts an image as an argument and displays the image
in a figure window with scroll bars, an Overview tool, and a magnification box. Note that the function
suppresses the toolbar and menu bar in the figure window because scrollable navigation is
incompatible with standard MATLAB™ figure window navigation tools.

function my_large_image_display(im)

% Create a figure without toolbar and menubar


hfig = figure('Toolbar','none', ...
'Menubar','none', ...
'Name','My Large Image Display Tool', ...
'NumberTitle','off', ...
'IntegerHandle','off');

% Display the image in a figure with imshow


himage = imshow(im);

% Add the scroll panel


hpanel = imscrollpanel(hfig,himage);

% Position the scroll panel to accommodate the other tools


set(hpanel,'Units','normalized','Position',[0 .1 1 .9]);

% Add the magnification box


hMagBox = immagbox(hfig,himage);

% Position the magnification box


pos = get(hMagBox,'Position');
set(hMagBox,'Position',[0 0 pos(3) pos(4)]);

% Add the Overview tool


hovervw = imoverview(himage);

5-22
Build App for Navigating Large Images

end

See Also
imscrollpanel | immagbox | imoverview

Related Examples
• “Build App to Display Pixel Information” on page 5-19
• “Create Image Comparison Tool Using ROIs” on page 15-43

More About
• “Interactive Tool Workflow” on page 5-6
• “Interactive Image Viewing and Processing Tools” on page 5-2
• “Add Scroll Panel to Figure” on page 5-10

5-23
5 Building GUIs with Modular Tools

Build Image Comparison Tool

This example shows how to make a GUI that displays two images side by side in scroll panels that are
synchronized in location and magnification.

First, define a function that builds the app. This example uses a function called
my_image_compare_tool, which is attached at the end of the example.

After you define the function that builds the app, test the app. Get two images.

I = imread('flamingos.jpg');
L = rgb2lightness(I);
Iedge = edge(L,'Canny');

Display the images in the app. When you move the detail rectangle in the Overview tool or change the
magnification in one image, both images respond.

my_image_compare_tool(I,Iedge);

5-24
Build Image Comparison Tool

App Creation Function

The my_image_compare_tool function accepts two images as input arguments and displays the
images in scroll panels. The custom tool also includes an Overview tool and a Magnification box. Note
that the function suppresses the toolbar and menu bar in the figure window because scrollable
navigation is incompatible with standard MATLAB™ figure window navigation tools.

To synchronize the scroll panels, the function makes the connections between tools using callbacks
and the Scroll Panel API functions. The function specifies a callback function that executes every time
the magnification changes. The function specified is the setMagnification API function of the
other scroll panel. Thus, whenever the magnification changes in one of the scroll panels, the other
scroll panel changes its magnification to match. The tool sets up a similar connection for position
changes.
function my_image_compare_tool(left_image,right_image)

% Create the figure


hFig = figure('Toolbar','none',...
'Menubar','none',...
'Name','My Image Compare Tool',...
'NumberTitle','off',...
'IntegerHandle','off');

% Display left image


subplot(121)
hImL = imshow(left_image);

5-25
5 Building GUIs with Modular Tools

% Display right image


subplot(122)
hImR = imshow(right_image);

% Create a scroll panel for left image


hSpL = imscrollpanel(hFig,hImL);
set(hSpL,'Units','normalized',...
'Position',[0 0.1 .5 0.9])

% Create scroll panel for right image


hSpR = imscrollpanel(hFig,hImR);
set(hSpR,'Units','normalized',...
'Position',[0.5 0.1 .5 0.9])

% Add a Magnification box


hMagBox = immagbox(hFig,hImL);
pos = get(hMagBox,'Position');
set(hMagBox,'Position',[0 0 pos(3) pos(4)])

%% Add an Overview tool


imoverview(hImL)

%% Get APIs from the scroll panels


apiL = iptgetapi(hSpL);
apiR = iptgetapi(hSpR);

%% Synchronize the left and right scroll panels


apiL.setMagnification(apiR.getMagnification())
apiL.setVisibleLocation(apiR.getVisibleLocation())

% When the magnification changes on the left scroll panel,


% tell the right scroll panel
apiL.addNewMagnificationCallback(apiR.setMagnification);

% When the magnification changes on the right scroll panel,


% notify the left scroll panel
apiR.addNewMagnificationCallback(apiL.setMagnification);

% When the location changes on the left scroll panel,


% notify the right scroll panel
apiL.addNewLocationCallback(apiR.setVisibleLocation);

% When the location changes on the right scroll panel,


% notify the left scroll panel
apiR.addNewLocationCallback(apiL.setVisibleLocation);

end

See Also
imscrollpanel | immagbox | imoverview

Related Examples
• “Build App to Display Pixel Information” on page 5-19
• “Build App for Navigating Large Images” on page 5-21

5-26
Build Image Comparison Tool

More About
• “Interactive Tool Workflow” on page 5-6
• “Interactive Image Viewing and Processing Tools” on page 5-2
• “Add Scroll Panel to Figure” on page 5-10

5-27
5 Building GUIs with Modular Tools

Create Angle Measurement Tool Using ROI Objects

Note The impoly function used in this example is not recommended. Use the new drawpolyline
function and Polyline ROI object instead. See “Use Polyline to Create Angle Measurement Tool” on
page 15-78.

This example shows how to create an angle measurement tool using interactive tools and ROI objects.
The example displays an image in a figure window and overlays a simple angle measurement tool
over the image. When you move the lines in the angle measurement tool, the function calculates the
angle formed by the two lines and displays the angle in a title.

Create a function that accepts an image as an argument and displays an angle measurement tool over
the image in a figure window. This code includes a second function used as a callback function that
calculates the angle and displays the angle in the figure.

function my_angle_measurement_tool(im)
% Create figure, setting up properties
figure("Name","My Angle Measurement Tool",...
"NumberTitle","off",...
"IntegerHandle","off")

% Display image in the axes % Display image


imshow(im)

% Get size of image.


m = size(im,1);
n = size(im,2);

% Get center point of image for initial positioning.


midy = ceil(m/2);
midx = ceil(n/2);

% Position first point vertically above the middle.


firstx = midx;
firsty = midy - ceil(m/4);
lastx = midx + ceil(n/4);
lasty = midy;

% Create a two-segment right-angle polyline centered in the image.


h = impoly(gca,[firstx,firsty;midx,midy;lastx,lasty],"Closed",false);
api = iptgetapi(h);
initial_position = api.getPosition()

% Display initial position


updateAngle(initial_position)

% set up callback to update angle in title.


api.addNewPositionCallback(@updateAngle);
fcn = makeConstrainToRectFcn("impoly",get(gca,"XLim"),get(gca,"YLim"));
api.setPositionConstraintFcn(fcn);
%

% Callback function that calculates the angle and updates the title.
% Function receives an array containing the current x,y position of

5-28
Create Angle Measurement Tool Using ROI Objects

% the three vertices.


function updateAngle(p)
% Create two vectors from the vertices.
% v1 = [x1 - x2, y1 - y2]
% v2 = [x3 - x2, Y3 - y2]
v1 = [p(1,1)-p(2,1), p(1,2)-p(2,2)];
v2 = [p(3,1)-p(2,1), p(3,2)-p(2,2)];
% Find the angle.
theta = acos(dot(v1,v2)/(norm(v1)*norm(v2)));
% Convert it to degrees.
angle_degrees = (theta * (180/pi));
% Display the angle in the title of the figure.
title(sprintf("(%1.0f) degrees",angle_degrees))

Read image into the workspace.

I = imread("gantrycrane.png");

Open the angle measurement tool, specifying the image as an argument. The tool opens a figure
window, displaying the image with the angle measure tool centered over the image in a right angle.
Move the pointer over any of the vertices of the tool to measure any angle in the image. In the
following figure, the tool is measuring an angle in the image. Note the size of the angle displayed in
the title of the figure.

my_angle_measurement_tool(I);

5-29
5 Building GUIs with Modular Tools

See Also

More About
• “Create ROI Shapes” on page 15-7
• “ROI Migration” on page 15-18
• “Use Polyline to Create Angle Measurement Tool” on page 15-78

5-30
6

Geometric Transformations

A geometric transformation (also known as a spatial transformation) modifies the spatial relationship
between pixels in an image, mapping pixel locations in a moving image to new locations in an output
image. The toolbox includes functions that perform certain specialized geometric transformations,
such as resizing and rotating an image. In addition, the toolbox includes functions that you can use to
perform many types of 2-D and N-D geometric transformations, including custom transformations.

• “Resize an Image” on page 6-2


• “Resize Image and Preserve Aspect Ratio” on page 6-7
• “Rotate an Image” on page 6-13
• “Crop an Image” on page 6-15
• “Translate an Image Using imtranslate Function” on page 6-17
• “2-D and 3-D Geometric Transformation Process Overview” on page 6-20
• “Migrate Geometric Transformations to Premultiply Convention” on page 6-25
• “Matrix Representation of Geometric Transformations” on page 6-27
• “Create Composite 2-D Affine Transformations” on page 6-32
• “Specify Fill Values in Geometric Transformation Output” on page 6-36
• “Perform Simple 2-D Translation Transformation” on page 6-38
• “N-Dimensional Spatial Transformations” on page 6-41
• “Register Two Images Using Spatial Referencing to Enhance Display” on page 6-43
• “Create a Gallery of Transformed Images” on page 6-48
• “Exploring a Conformal Mapping” on page 6-63
• “Explore Slices from 3-D Image Volume with Anisotropic Voxel Spacing” on page 6-75
• “Padding and Shearing an Image Simultaneously” on page 6-85
6 Geometric Transformations

Resize an Image

This example shows how to resize an image using the imresize function.

Start by reading and displaying an image.

I = imread("circuit.tif");
imshow(I)

Specify Magnification Value

Resize the image, using the imresize function. In this example, you specify a magnification factor.
To enlarge an image, specify a magnification factor greater than 1.

magnificationFactor = 1.25;
J = imresize(I,magnificationFactor);

Display the original and enlarged image in a montage.

imshowpair(I,J,method="montage")

6-2
Resize an Image

Specify Size of the Output Image

Resize the image again, this time specifying the desired size of the output image, rather than a
magnification value. Pass imresize a vector that contains the number of rows and columns in the
output image. If the specified size does not produce the same aspect ratio as the input image, the
output image will be distorted. If you specify one of the elements in the vector as NaN, imresize
calculates the value for that dimension to preserve the aspect ratio of the image. To perform the
resizing required for multi-resolution processing, use impyramid.

K = imresize(I,[100 150]);
imshowpair(I,K,method="montage")

6-3
6 Geometric Transformations

Specify Interpolation Method

Resize the image again, this time specifying the interpolation method. When you enlarge an image,
the output image contains more pixels than the original image. imresize uses interpolation to
determine the values of these pixels, computing a weighted average of some set of pixels in the
vicinity of the pixel location. imresize bases the weightings on the distance each pixel is from the
point. By default, imresize uses bicubic interpolation, but you can specify other interpolation
methods or interpolation kernels. You can also specify your own custom interpolation kernel. This
example use nearest neighbor interpolation.

L = imresize(I,magnificationFactor,"nearest");

Display the resized image using bicubic interpolation, J, and the resized image using nearest
neighbor interpolation, L, in a montage.

imshowpair(J,L,method="montage")

6-4
Resize an Image

Prevent Aliasing When Shrinking an Image

Resize the image again, this time shrinking the image. When you reduce the size of an image, you
lose some of the original pixels because there are fewer pixels in the output image. This can
introduce artifacts, such as aliasing. The aliasing that occurs as a result of size reduction normally
appears as stair-step patterns (especially in high-contrast images), or as moire (ripple-effect) patterns
in the output image. By default, imresize uses antialiasing to limit the impact of aliasing on the
output image for all interpolation types except nearest neighbor. To turn off antialiasing, specify the
"Antialiasing" name-value argument and set the value to false. Even with antialiasing turned
on, resizing can introduce artifacts because information is always lost when you reduce the size of an
image.

magnificationFactor = 0.66;
M = imresize(I,magnificationFactor);
N = imresize(I,magnificationFactor,Antialiasing=false);

Display the resized image with and without antialiasing in a montage.

imshowpair(M,N,method="montage")

6-5
6 Geometric Transformations

See Also
imresize

More About
• “Resize Image and Preserve Aspect Ratio” on page 6-7
• “Crop an Image” on page 6-15

6-6
Resize Image and Preserve Aspect Ratio

Resize Image and Preserve Aspect Ratio

This example shows how to resize an image while maintaining the ratio of width to height. This
example covers two common situations:

1 You want to resize the image and specify the exact height or width.
2 You want to resize an image to be as large as possible without the width and height exceeding a
maximum value.

Start by reading and displaying an image.

I = imread("lighthouse.png");
imshow(I)

6-7
6 Geometric Transformations

Get the size of the image. The aspect ratio of this image is 3:4, meaning that the width is 3/4 of the
height.

[heightI,widthI,~] = size(I)

heightI = 640

6-8
Resize Image and Preserve Aspect Ratio

widthI = 480

Resize Image to Specified Height

Specify the desired height of the image.


targetHeight = 300;

Resize the image, specifying the output size of the image. Because the second element of
targetSize is NaN, imresize automatically calculates the number of rows needed to preserve the
aspect ratio.
targetSize = [targetHeight NaN];
J = imresize(I,targetSize);
[h,w,~] = size(J)

h = 300

w = 225

imshow(J)

Resize Image to Specified Width

Specify the desired width of the image.


targetWidth = 300;

Resize the image, specifying the output size of the image. Because the first element of targetSize
is NaN, imresize automatically calculates the number of columns needed to preserve the aspect
ratio.

6-9
6 Geometric Transformations

targetSize = [NaN targetWidth];


K = imresize(I,targetSize);
[h,w,~] = size(K)

h = 400

w = 300

imshow(K)

Resize Image Within Maximum Size

You can resize an image to be as large as possible without the width and height exceeding a
maximum value.

Specify the maximum dimensions of the image.


maxHeight = 256;
maxWidth = 256;

Determine which dimension requires a larger scaling factor.


scaleWidth = widthI/maxWidth

scaleWidth = 1.8750

6-10
Resize Image and Preserve Aspect Ratio

scaleHeight = heightI/maxHeight

scaleHeight = 2.5000

Select the output size based on whether the height or width requires a greater scale factor to fit
within the maximum image dimensions. If the height requires a greater scale factor, then specify the
target height as maxHeight. Conversely, if the width requires a greater scale factor, then specify the
target width as maxWidth. Let imresize automatically calculate the size of the other dimension to
preserve the aspect ratio.

if scaleHeight>scaleWidth
targetSize = [maxHeight NaN];
else
targetSize = [NaN maxWidth];
end

Resize the image, specifying the output size of the image.

L = imresize(I,targetSize);
[h,w,~] = size(L)

h = 256

w = 192

imshow(L)

See Also
imresize

6-11
6 Geometric Transformations

More About
• “Crop an Image” on page 6-15
• “Resize an Image” on page 6-2

6-12
Rotate an Image

Rotate an Image

This example shows how to rotate an image and adjust the size of the resulting image.

When you rotate an image using the imrotate function, you specify the image to be rotated and the
rotation angle, in degrees. If you specify a positive rotation angle, the image rotates
counterclockwise; if you specify a negative rotation angle, the image rotates clockwise.

Rotate an Image Counterclockwise

Read an image into the workspace.

I = imread("circuit.tif");

Rotate the image 35 degrees counterclockwise using bilinear interpolation.

J = imrotate(I,35,"bilinear");

Display the original image and the rotated image. By default, the output image is large enough to
include the entire original image. Pixels that fall outside the boundaries of the original image are set
to 0 and appear as a black background in the output image.

figure
imshowpair(I,J,"montage")

Crop a Rotated Image

Rotate the original image again and specify that the rotated image be cropped to the same size as the
original image.

K = imrotate(I,35,"bilinear","crop");

6-13
6 Geometric Transformations

Display the original image and the new image.

figure
imshowpair(I,K,"montage")

See Also
imrotate

Related Examples
• “Crop an Image” on page 6-15

6-14
Crop an Image

Crop an Image

Note You can also crop an image interactively using the Image Viewer app. For details, see “Crop
Image Using Image Viewer App” on page 4-47.

To extract a rectangular portion of an image, use the imcrop function. Using imcrop, you can
specify the crop region interactively using the mouse or programmatically by specifying the size and
position of the crop region.

This example illustrates an interactive syntax. The example reads an image into the MATLAB
workspace and calls imcrop specifying the image as an argument. imcrop displays the image in a
figure window and waits for you to draw the crop rectangle on the image. When you move the pointer

over the image, the shape of the pointer changes to cross hairs . Click and drag the pointer to
specify the size and position of the crop rectangle. You can move and adjust the size of the crop
rectangle using the mouse. When you are satisfied with the crop rectangle, double-click to perform
the crop operation, or right-click inside the crop rectangle and select Crop Image from the context
menu. imcrop returns the cropped image in J.
I = imread("circuit.tif")
J = imcrop(I);

You can also specify the size and position of the crop rectangle as parameters when you call imcrop.
Specify the crop rectangle as a four-element position vector, [xmin ymin width height].

In this example, you call imcrop specifying the image to crop, I, and the crop rectangle. imcrop
returns the cropped image in J.
I = imread("circuit.tif");
J = imcrop(I,[60 40 100 90]);

6-15
6 Geometric Transformations

See Also
imcrop

More About
• “Crop Image Using Image Viewer App” on page 4-47
• “Resize an Image” on page 6-2
• “Resize Image and Preserve Aspect Ratio” on page 6-7

6-16
Translate an Image Using imtranslate Function

Translate an Image Using imtranslate Function

This example shows how to perform a translation operation on an image using the imtranslate
function. A translation operation shifts an image by a specified number of pixels in either the x- or y-
direction, or both.

Read an image into the workspace.

I = imread("cameraman.tif");

Display the image. The size of the image is 256-by-256 pixels. By default, imshow displays the image
with the upper right corner at (0,0).

figure
imshow(I)
title("Original Image")

Translate the image, shifting the image by 15 pixels in the x-direction and 25 pixels in the y-direction.
Note that, by default, imtranslate displays the translated image within the boundaries (or limits) of
the original 256-by-256 image. This results in some of the translated image being clipped.

J = imtranslate(I,[15, 25]);

Display the translated image. The size of the image is 256-by-256 pixels.

figure
imshow(J)
title("Translated Image")

6-17
6 Geometric Transformations

Use the OutputVie name-value argument set to "full" to prevent clipping the translated image.
The size of the new image is 281-by-271 pixels.

K = imtranslate(I,[15, 25],"OutputView","full");

Display the translated image.

figure
imshow(K)
title("Translated Image, Unclipped")

6-18
Translate an Image Using imtranslate Function

6-19
6 Geometric Transformations

2-D and 3-D Geometric Transformation Process Overview


To perform a 2-D or 3-D geometric transformation, first create a geometric transformation object that
stores information about the transformation. Then, pass the image to be transformed and the
geometric transformation object to the imwarp function. You optionally can provide spatial
referencing information about the input image to imwarp.

imwarp uses the geometric transformation to map coordinates in the output image to the
corresponding coordinates in the input image (inverse mapping). Then, imwarp uses the coordinate
mapping to interpolate pixel values within the input image and calculate the output pixel value.

Create Geometric Transformation Object


Different types of geometric transformation objects store different information about the
transformation.

• Several objects store a transformation matrix that represents a specific type of linear geometric
transformation. These objects include: affinetform2d, affinetform3d, rigidtform2d,
rigidtform3d, simtform2d, simtform3d, transltform2d, transltform3d, and
projtform2d.
• The geometricTransform2d and geometricTransform3d objects store an inverse point-wise
mapping function, and optionally a forward point-wise mapping function.
• The PolynomialTransformation2D object stores an inverse point mapping in the form of a 2-D
polynomial.
• The LocalWeightedMeanTransformation2D and PiecewiseLinearTransformation2D
objects represent different forms of locally-varying point-wise mapping functions.

6-20
2-D and 3-D Geometric Transformation Process Overview

There are several ways to create a geometric transformation object.

Approach to Create transltf simtform affinetf projecti geometri Other


Geometric orm2d 2d orm2d ve2d cTransfo Geometri
Transformation rm2d c
transltf simtform affinetf Transform
orm3d 3d orm3d geometri ations
cTransfo
rigidtfo rm3d
rm2d

rigidtfo
rm3d
“Specify Translation, X X
Rotation, or Scale
Parameters” on page 6-21
“Specify Transformation X X X X
Matrix” on page 6-22
“Specify Custom Point-Wise X
Mapping Function” on page
6-22
“Estimate Transformation X (2-D) X (2-D) X (2-D) X
from Control Point Pairs” on
page 6-23
“Estimate Transformation X (2-D) X (2-D) X (2-D)
Using Similarity
Optimization” on page 6-
23
“Estimate Transformation X (2-D) X (2-D)
Using Phase Correlation” on
page 6-24
“Generate Random Affine X
Transformations” on page 6-
24

Specify Translation, Rotation, or Scale Parameters

If you know the amount of translation, the rotation angle, and the scale factor, then you can create a
transformation by specifying these parameters.

• Specify translation to create transltform2d and transltform3d objects that represent


translation transformations.
• Specify translation, rotation angles, or both to create rigidtform2d and rigidtform3d objects
that represent rigid transformations.
• Specify any combination of translation, rotation, and an isotropic scale factor to create
simtform2d and simtform3d objects that represent nonreflective similarity transformations.

The following example defines a translation and rotation angle, then creates a rigidtform2d
geometric transformation object from the specified parameters.

6-21
6 Geometric Transformations

theta = 30;
translation = [10 20.5];
tform = rigidtform2d(theta,translation)

tform =

rigidtform2d with properties:

Dimensionality: 2
RotationAngle: 30
Translation: [10 20.5000]
R: [2×2 double]
A: [3×3 double]

Specify Transformation Matrix

For more complex linear geometric transformations, you can represent the transformation as a
matrix. For example, use a matrix representation for projective transformations or for affine
transformations involving reflection, anisotropic scaling, shear, or compositions of linear
transformations. Specify the transformation matrix to create an affinetform2d, affinetform3d,
or projtform2d object. For more information about creating a transformation matrix, see “Matrix
Representation of Geometric Transformations” on page 6-27.

The following example defines the transformation matrix for anisotropic scaling and reflection about
the y axis, then creates an affinetform2d geometric transformation object from the transformation
matrix.
scaleX = 0.8;
scaleY = 1.5;
A = [scaleX 0 0; 0 -scaleY 0; 0 0 1];
tform = affinetform2d(A)

tform =

affinetform2d with properties:

Dimensionality: 2
A: [3×3 double]

Specify Custom Point-Wise Mapping Function

If you have an inverse point-wise mapping function, then you can create a custom 2-D and 3-D
geometric transformation using the geometricTransform2d and the geometricTransform3d
objects respectively.

The following example specifies an inverse mapping function that accepts and returns 2-D points in
packed (x,y) format. Then, the example creates a geometricTransform2d geometric transformation
object from the inverse mapping function.
inversefn = @(c) [c(:,1)+c(:,2),c(:,1).^2]

inversefn =

function_handle with value:

@(c)[c(:,1)+c(:,2),c(:,1).^2]

tform = geometricTransform2d(inversefn)

6-22
2-D and 3-D Geometric Transformation Process Overview

tform =

geometricTransform2d with properties:

InverseFcn: [function_handle]
ForwardFcn: []
Dimensionality: 2

Similarly, the following example creates a geometricTransform3d geometric transformation object


using the inverse mapping function. The example specifies an inverse mapping function that accepts
and returns 3-D points in packed (x,y,z) format.

inversefn = @(c)[c(:,1)+c(:,2),c(:,1)-c(:,2),c(:,3).^2]

inversefn =

function_handle with value:

@(c)[c(:,1)+c(:,2),c(:,1)-c(:,2),c(:,3).^2]

tform = geometricTransform3d(inversefn)

tform =

geometricTransform3d with properties:

InverseFcn: [function_handle]
ForwardFcn: []
Dimensionality: 3

Estimate Transformation from Control Point Pairs

You can create a geometric transformation object by passing pairs of control points to the
fitgeotform2d function. The fitgeotform2d function automatically estimates the transformation
from these points and returns one of the geometric transformation objects.

Different transformations require a varying number of points. For example, affine transformations
require three non-collinear points in each image (a triangle) and projective transformations require
four points (a quadrilateral).

This example defines two pairs of control points, then uses the fitgeotform2d to create an
affinetform2d geometric transformation object.

movingPoints = [11 11;21 11; 21 21];


fixedPoints = [51 51;61 51;61 61];
tform = fitgeotform2d(movingPoints,fixedPoints,"affine")

tform =

affinetform2d with properties:

Dimensionality: 2
A: [3×3 double]

Estimate Transformation Using Similarity Optimization

If you have a fixed image and a moving image that are slightly misaligned, then you can use the
imregtform function to estimate an affine geometric transformation that aligns the images.

6-23
6 Geometric Transformations

imregtform optimizes the mean squares or Mattes mutual information similarity metric of the two
images, using a regular step gradient descent or one-plus-one evolutionary optimizer. For more
information, see “Create an Optimizer and Metric for Intensity-Based Image Registration” on page 7-
26.

Estimate Transformation Using Phase Correlation

If you have a fixed image and a moving image that are severely misaligned, then you can use the
imregcorr function to estimate an affine geometric transformation that improves the image
alignment. You can refine the resulting transformation by using similarity optimization.

Generate Random Affine Transformations

You can create an affine geometric transformation with randomized transformation parameters using
the randomAffine2d and randomAffine3d functions. These functions support all affine parameters
including reflection about each axis, rotation, shearing, and anisotropic scale factors. Randomized
affine transformations are commonly used as a data augmentation technique for deep learning.

See Also
imwarp

Related Examples
• “Perform Simple 2-D Translation Transformation” on page 6-38
• “Create Composite 2-D Affine Transformations” on page 6-32

More About
• “Matrix Representation of Geometric Transformations” on page 6-27
• “N-Dimensional Spatial Transformations” on page 6-41

6-24
Migrate Geometric Transformations to Premultiply Convention

Migrate Geometric Transformations to Premultiply Convention


Starting in R2022b, functions that create and perform geometric transformations use a premultiply
matrix convention.

A new set of objects enable geometric transformations using a premultiply convention. There are no
plans to remove the old geometric transformation objects that support a postmultiply convention.

About the Premultiply and Postmultiply Conventions


Using the previous 2-D postmultiply matrix convention, you transform the point (u,v) in the input
coordinate space to the point (x,y) in the output coordinate space using the convention:

x y 1 = u v 1 ∗Τ

The geometric transformation matrix T is represented by a 3-by-3 matrix:

a d0
Τ= b e 0
c f 1

In the 2-D premultiply matrix convention, you transform the point (u,v) in the input coordinate space
to the point (x,y) in the output coordinate space using the convention:

x u
y = Α× v
1 1

The geometric transformation matrix A is represented by a 3-by-3 matrix that is the transpose of
matrix T:

a b c
Α= d e f
0 0 1

Create New Geometric Transformation Objects from Previous


Geometric Transformation Objects
If your code uses one of the previous geometric transformation objects, then you can update your
code by using a new geometric transformation object that supports the premultiply convention.

1 Select a type of new geometric transformation object that performs your desired transformation.
The affine and rigid postmultiply geometric transformation objects support multiple types of new
premultiply geometric transformation objects. The table shows the available geometric
transformations objects that you can use instead of the previous objects.

6-25
6 Geometric Transformations

Previous Geometric Current Geometric Transformation Object


Transformation Object
affine2d Use affinetform2d instead. To create a 2-D affine
transformation that represents a purely rigid,
similar, or translation transformation, use
rigidtform2d, simtform2d, or transltform2d,
respectively.
affine3d Use affinetform3d instead. To create a 3-D affine
transformation that represents a purely rigid,
similar, or translation transformation, use
rigidtform3d, simtform3d, or transltform3d,
respectively.
rigid2d Use rigidtform2d instead. To create a 2-D rigid
transformation that represents pure translation, use
transltform2d.
rigid3d Use rigidtform3d instead. To create a 3-D rigid
transformation that represents pure translation, use
transltform3d.
projective2d Use projtform2d instead.
2 Create the object using the transpose of the transformation matrix stored in the old object. For
example, this code shows how to convert a 2-D affine transformation represented by an
affine2d object named tformPost to an affinetform2d object named tformPre.

T = tformPost.T;
A = T';
tformPre = affinetform2d(A)

See Also

More About
• “2-D and 3-D Geometric Transformation Process Overview” on page 6-20
• “Matrix Representation of Geometric Transformations” on page 6-27

6-26
Matrix Representation of Geometric Transformations

Matrix Representation of Geometric Transformations


You can represent a linear geometric transformation as a numeric matrix. Each type of
transformation, such as translation, scaling, rotation, and reflection, is defined using a matrix whose
elements follow a specific pattern. You can combine multiple transformations by taking a composite of
the matrices representing the transformations. For more information, see “Create Composite 2-D
Affine Transformations” on page 6-32.

2-D Affine Transformations


The table lists 2-D affine transformations with the transformation matrix used to define them. For 2-D
affine transformations, the last row must be [0 0 1].

• Use combinations of 2-D translation matrices to create a transltform2d object representing a


translation transformation.
• Use combinations of 2-D translation and rotation matrices to create a rigidtform2d object
representing a nonreflective rigid transformation.
• Use combinations of 2-D translation, rotation, and scaling matrices to create a simtform2d object
representing a nonreflective similarity transformation.
• Use any combination of 2-D transformation matrices to create an affinetform2d object
representing a general affine transformation.

2-D Affine Example (Original and Transformation Matrix


Transform Transformed Image)
ation
Translatio 1 0 tx tx specifies the displacement along the x
n 0 1 ty axis

00 1 ty specifies the displacement along the y


axis.

For more information about pixel


coordinates, see “Image Coordinate
Systems” on page 2-63.

Scale sx 0 0 sx specifies the scale factor along the x


0 sy 0 axis

0 0 1 sy specifies the scale factor along the y


axis.

6-27
6 Geometric Transformations

2-D Affine Example (Original and Transformation Matrix


Transform Transformed Image)
ation
Shear 1 shx 0 shx specifies the shear factor along the x
shy 1 0 axis.

0 0 1 shy specifies the shear factor along the y


axis.

Reflection φ specifies
cosd(2φ) sind(2φ) 0 the angle of the axis of
reflection,
sind(2φ) −cosd(2φ) 0 in degrees.
0 0Two common
1 reflections are vertical and
horizontal reflection. Vertical reflection is
reflection about the x-axis, so φ is 0 and
the reflection matrix simplifies to:

[1 0 0; 0 -1 0; 0 0 1].

Horizontal reflection is reflection about


the y-axis, so φ is 90 and the reflection
matrix simplifies to:

[-1 0 0; 0 1 0; 0 0 1]
Rotation θ specifies
cosd(θ) −sind(θ) 0 the angle of rotation about the
origin,
sind(θ) cosd(θ) 0 in degrees.
0 0 1

2-D Projective Transformations


Projective transformation enables the plane of the image to tilt. Parallel lines can converge towards a
vanishing point, creating the appearance of depth.

The transformation is a 3-by-3 matrix. Unlike affine transformations, there are no restrictions on the
last row of the transformation matrix. Use any composition of 2-D affine and projective
transformation matrices to create a projtform2d object representing a general projective
transformation.

6-28
Matrix Representation of Geometric Transformations

2-D Example Transformation Matrix


Projective
Transform
ation
Tilt 1 0 0 E and F influence the vanishing point.
0 1 0
When E and F are large, the vanishing
E F 1 point comes closer to the origin and thus
parallel lines appear to converge more
quickly.

3-D Affine Transformations


The table lists the 3-D affine transformations with the transformation matrix used to define them.
Note that in the 3-D case, there are multiple matrices, depending on how you want to rotate or shear
the image. For 3-D affine transformations, the last row must be [0 0 0 1].

• Use combinations of 3-D translation matrices to create a transltform3d object representing a


translation transformation.
• Use combinations of 3-D translation and rotation matrices to create a rigidtform3d object
representing a nonreflective rigid transformation.
• Use combinations of 3-D translation, rotation, and scaling matrices to create a simtform3d object
representing a nonreflective similarity transformation.
• Use any combination of 3-D transformation matrices to create an affinetform3d object
representing a general affine transformation.

3-D Affine Transformation Matrix


Transformation
Translation Translation by amount tx, ty, and tz in the x, y, and z directions, respectively:

1 0 0 tx
0 1 0 ty
0 0 1 tz
0 0 0 1

6-29
6 Geometric Transformations

3-D Affine Transformation Matrix


Transformation
Scale Scale by scale factor sx, sy, and sz in the x, y, and z dimensions, respectively:

sx 0 0 0
0 sy 0 0
0 0 sz 0
0 0 0 1
Shear Shear within the y-z Shear within the x-z Shear within the x-y
plane: plane: plane:

1 0 00 1 shyx 0 0 1 0 shzx 0
shxy 1 0 0 0 1 0 0 0 1 shzy 0
shxz 0 1 0 0 shyz 1 0 0 0 1 0
0 00 1 0 0 01 0 0 0 1

such that such that such that

x′ = x x′ = x + shyx y x′ = x + shzxz
y′ = y + shxyx y′ = y y′ = y + shzyz
z′ = z + shxzx z′ = z + shyz y z′ = z
Reflection Reflection across the y-z Reflection across the x-z Reflection across the x-y
plane, negating the x plane, negating the y plane, negating the z
coordinate: coordinate: coordinate:

−1 0 0 0 1 0 0 0 1 0 0 0
0 1 0 0 0 −1 0 0 0 1 0 0
0 0 1 0 0 0 1 0 0 0 −1 0
0 0 0 1 0 0 0 1 0 0 0 1
Rotation Rotation within the y-z Rotation within the x-z Rotation within the x-y
plane, by angle θx about plane, by angle θy about plane, by angle θz about
the x axis, in degrees: the y axis, in degrees: the z axis, in degrees:

1 0 0 0 cosd(θy) 0 sind(θy) 0 cosd(θz) −sind(θz) 0 0


0 cosd(θx) −sind(θx) 0 0 1 0 0 sind(θz) cosd(θz) 0 0
0 sind(θx) cosd(θx) 0 −sind(θy) 0 cosd(θy) 0 0 0 10
0 0 0 1 0 0 0 1 0 0 01

6-30
Matrix Representation of Geometric Transformations

3-D Projective and N-D Transformations


The imwarp function does not support 3-D projective transformations or N-D affine and projective
transformations. Instead, you can create a spatial transformation structure from a geometric
transformation matrix using the maketform function. Then, apply the transformation to an image
using the tformarray function. For more information, see “N-Dimensional Spatial Transformations”
on page 6-41.

The dimensions of the transformation matrix must be (N+1)-by-(N+1). The maketform and
tformarray functions use the postmultiply matrix convention. Geometric transformation matrices in
the postmultiply convention are the transpose of matrices in the premultiply convention. Therefore,
for N-D affine transformation matrices, the last column must contain [zeros(N,1); 1] and there
are no restrictions on the values of the last row.

See Also
imwarp | fitgeotform2d | affinetform2d | affinetform3d | projtform2d

Related Examples
• “Create Composite 2-D Affine Transformations” on page 6-32
• “Perform Simple 2-D Translation Transformation” on page 6-38

More About
• “2-D and 3-D Geometric Transformation Process Overview” on page 6-20

6-31
6 Geometric Transformations

Create Composite 2-D Affine Transformations

This example shows how to create a composite of 2-D translation and rotation transformations. The
order of the transformation matters, so there are two approaches to creating a composition:

1) Create a matrix that represents the individual transformations, then create the composite
transformation by multiplying the matrices together, and finally store the transformation matrix as an
affinetform2d object. This approach has the most flexibility because you can use all types of affine
transformations including shear, reflection, anisotropic scaling, and other composite affine
transformations. You can also control the order of transformations through the order of the matrix
multiplication.

2) Create a simtform2d, rigidtform2d, or transltform2d geometric object. This approach


makes it easy to create a composite transformation when you know the scale factor, rotation angle,
and translation distance of a nonreflective similarity transformation. However, the geometric
transformation objects always perform the relevant transformations in a fixed order: scaling first,
then rotation, then translation. If you need to perform the transformations in a different order, then
you must use matrices and create an affine transformation object.

This example demonstrates both approaches for a rigid transformation consisting of translation and
rotation.

Create a checkerboard image that will undergo transformation. Also create a spatial reference object
for the image.

cb = checkerboard(4,2);
cb_ref = imref2d(size(cb));

To illustrate the spatial position of the image, create a flat background image. Overlay the
checkerboard over the background, highlighting the position of the checkerboard in green.

background = zeros(150);
imshowpair(cb,cb_ref,background,imref2d(size(background)))

Create a translation matrix that shifts an image horizontally and vertically.

6-32
Create Composite 2-D Affine Transformations

tx = ;

ty = ;
T = [1 0 tx; 0 1 ty; 0 0 1]

T = 3×3

1 0 100
0 1 0
0 0 1

Create a rotation matrix that rotates an image clockwise about the origin.

theta = ;
R = [cosd(theta) -sind(theta) 0; sind(theta) cosd(theta) 0; 0 0 1]

R = 3×3

0.8660 -0.5000 0
0.5000 0.8660 0
0 0 1.0000

Rotation Followed by Translation using Matrices

Perform rotation first and translation second. Using the premultiply matrix convention, the
translation matrix T is on the left and the rotation matrix R is on the right.

TR = T*R

TR = 3×3

0.8660 -0.5000 100.0000


0.5000 0.8660 0
0 0 1.0000

Store the composite transformation as an affine transformation, then warp the original checkerboard
image. Display the result with spatial referencing.

tform_tr = affinetform2d(TR);
[out,out_ref] = imwarp(cb,cb_ref,tform_tr);
imshowpair(out,out_ref,background,imref2d(size(background)))

6-33
6 Geometric Transformations

Translation Followed by Rotation using Matrices

Reverse the order of the transformations: perform translation first and rotation second. Using the
premultiply matrix convention, the rotation matrix R is on the left and the translation matrix T is on
the right.

RT = R*T

RT = 3×3

0.8660 -0.5000 86.6025


0.5000 0.8660 50.0000
0 0 1.0000

Store the composite transformation as an affine transformation, then warp the original checkerboard
image. Display the result with spatial referencing. Notice how the spatial position of the transformed
image is different than when translation was followed by rotation.

tform_rt = affinetform2d(RT);
[out,out_ref] = imwarp(cb,cb_ref,tform_rt);
imshowpair(out,out_ref,background,imref2d(size(background)))

6-34
Create Composite 2-D Affine Transformations

Rotation Followed by Translation using Rigid Transformation Object

Create a rigid geometric transformation object with the rotation angle and translation distance that
were specified earlier in the example.

tform_rigid = rigidtform2d(theta,[tx ty]);

View the geometric transformation matrix by querying the A property. The matrix is equal to the
matrix TR as expected, because a rigid transformation object performs rotation before translation.

tform_rigid.A

ans = 3×3

0.8660 -0.5000 100.0000


0.5000 0.8660 0
0 0 1.0000

Confirm the order of operations by warping the test image and displaying the result with spatial
referencing. The result is identical to the result obtained through matrix multiplication.

[out,out_ref] = imwarp(cb,cb_ref,tform_rigid);
imshowpair(out,out_ref,background,imref2d(size(background)))

See Also
imwarp

More About
• “2-D and 3-D Geometric Transformation Process Overview” on page 6-20
• “Matrix Representation of Geometric Transformations” on page 6-27

6-35
6 Geometric Transformations

Specify Fill Values in Geometric Transformation Output

This example shows how to specify the fill values used by imwarp when it performs a geometric
transformation.

When you perform a transformation, there are often pixels in the output image that are not part of
the original input image. These pixels must be assigned some value, called a fill value. By default,
imwarp sets these pixels to zero and they display as black. You can specify a different value using the
FillValues name-value argument. If the image being transformed is a grayscale image, specify a
scalar value that specifies a shade of gray. If the image being transformed is an RGB image, you can
use either a scalar value or a 1-by-3 vector. If you specify a scalar, imwarp uses that shade of gray for
each plane of the RGB image. If you specify a 1-by-3 vector, imwarp interprets the values as an RGB
color value.

Read a color image into workspace.


rgb = imread("onion.png");

Specify an amount of translation, and create a geometric transformation object that represents a
translation transformation.
translation = [15 40];
tform = transltform2d(translation);

Create a 2-D spatial referencing object. This object specifies aspects of the coordinate system of the
output space so that the area needing fill values is visible. By default, imwarp sizes the output image
to be just large enough to contain the entire transformed image but not the entire output coordinate
space.
Rout = imref2d(size(rgb));
Rout.XWorldLimits(2) = Rout.XWorldLimits(2)+translation(1);
Rout.YWorldLimits(2) = Rout.YWorldLimits(2)+translation(2);
Rout.ImageSize = Rout.ImageSize+translation;

Perform the transformation using the imwarp function.


cb_rgb = imwarp(rgb,tform,"OutputView",Rout);
imshow(cb_rgb)

6-36
Specify Fill Values in Geometric Transformation Output

Now perform the transformation again, this time specifying a fill value.

fillValue = [187 192 57];


cb_fill = imwarp(rgb,tform,"OutputView",Rout,"FillValues",fillValue);
imshow(cb_fill)

See Also
imwarp | imref2d | transltform2d | affinetform2d

More About
• “2-D and 3-D Geometric Transformation Process Overview” on page 6-20
• “Matrix Representation of Geometric Transformations” on page 6-27

6-37
6 Geometric Transformations

Perform Simple 2-D Translation Transformation

This example shows how to perform a simple affine transformation called a translation. In a
translation, you shift an image in coordinate space by adding a specified value to the x- and y-
coordinates. (You can also use the imtranslate function to perform translation.)

Read the image to be transformed. This example creates a checkerboard image using the
checkerboard function.

cb = checkerboard;
imshow(cb)

Get spatial referencing information about the image. This information is useful when you want to
display the result of the transformation.

cb_ref = imref2d(size(cb))

cb_ref =
imref2d with properties:

XWorldLimits: [0.5000 80.5000]


YWorldLimits: [0.5000 80.5000]
ImageSize: [80 80]
PixelExtentInWorldX: 1
PixelExtentInWorldY: 1
ImageExtentInWorldX: 80
ImageExtentInWorldY: 80
XIntrinsicLimits: [0.5000 80.5000]
YIntrinsicLimits: [0.5000 80.5000]

Specify the amount of translation in the x and y directions.

tx = 20;
ty = 30;

Create a transltform2d geometric transformation object that represents the translation


transformation. For other types of geometric transformations, you can use other types of objects.

tform = transltform2d(tx,ty);

Perform the transformation. Call the imwarp function, specifying the image you want to transform
and the geometric transformation object. imwarp returns the transformed image, cb_translated.

6-38
Perform Simple 2-D Translation Transformation

This example also returns the optional spatial referencing object, cb_translated_ref, which
contains spatial referencing information about the transformed image.

[cb_translated,cb_translated_ref] = imwarp(cb,tform);

View the original and the transformed image side-by-side. When viewing the translated image, it
might appear that the transformation had no effect. The transformed image looks identical to the
original image. The reason that no change is apparent in the visualization is because imwarp sizes
the output image to be just large enough to contain the entire transformed image but not the entire
output coordinate space. Notice, however, that the coordinate values have been changed by the
transformation.

figure
subplot(1,2,1)
imshow(cb,cb_ref)
subplot(1,2,2)
imshow(cb_translated,cb_translated_ref)

To see the entirety of the transformed image in the same relation to the origin of the coordinate space
as the original image, use imwarp with the "OutputView" name-value argument, specifying a
spatial referencing object. The spatial referencing object specifies the size of the output image and
how much of the output coordinate space is included in the output image. To do this, the example
makes a copy of the spatial referencing object associated with the original image and modifies the
world coordinate limits to accommodate the full size of the transformed image. The example sets the
limits of the output image in world coordinates to include the origin from the input.

6-39
6 Geometric Transformations

cb_translated_ref = cb_ref;
cb_translated_ref.XWorldLimits(2) = cb_translated_ref.XWorldLimits(2)+tx;
cb_translated_ref.YWorldLimits(2) = cb_translated_ref.YWorldLimits(2)+ty;

[cb_translated,cb_translated_ref] = imwarp(cb,tform, ...


OutputView=cb_translated_ref);

figure
subplot(1,2,1)
imshow(cb,cb_ref)
subplot(1,2,2)
imshow(cb_translated,cb_translated_ref);

See Also
imwarp | imref2d | transltform2d | affinetform2d

More About
• “2-D and 3-D Geometric Transformation Process Overview” on page 6-20
• “Matrix Representation of Geometric Transformations” on page 6-27

6-40
N-Dimensional Spatial Transformations

N-Dimensional Spatial Transformations


You can perform N-D geometric transformations using the tformarray function. You can also use
tformarray to perform mixed-dimensional transformations, in which the input and output arrays do
not have the same dimensions. The output can have either a lower or higher number of dimensions
than the input. For example, if you are sampling 3-D data on a 2-D slice or manifold, the input array
might have a lower dimensionality. The output dimensionality might be higher, for example, if you
combine multiple 2-D transformations into a single 2-D to 3-D operation.

Before using the tformarray function, prepare the input arguments required to perform the
geometric transformation.

• Create the spatial transformation using the maketform function. If you create the spatial
transformation from a matrix, maketform expects the matrix to be in the postmultiply convention.
• Create the resampling structure using the makeresampler function. A resampler structure
defines how to interpolate values of the input array at specified locations. For example, you could
specify your own separable interpolation kernel, build a custom resampler around the interp2 or
interp3 functions, or even implement an advanced antialiasing technique. The resampling
structure also controls the edge behavior when interpolating.

Next, apply the geometric transformation to an image using the tformarray function, specifying the
spatial transformation structure and the resampling structure. You can also transform individual
points and lines to explore the geometric effects of a transformation. Use the tformfwd and
tforminv functions to perform forward and inverse transformations, respectively.

This example uses tformarray to perform a projective transformation of a checkerboard image, and
makeresampler to create a resampling structure with a standard bicubic interpolation method.

I = checkerboard(20,1,1);
figure
imshow(I)
T = maketform("projective",[1 1; 41 1; 41 41; 1 41], ...
[5 5; 40 5; 35 30; -10 30]);
R = makeresampler("cubic","circular");
J = tformarray(I,T,R,[1 2],[2 1],[100 100],[],[]);
figure
imshow(J)

The makeresampler and tformarray functions enable you to control many aspects of the
transformation. For example, note how tformarray created an output image that is larger than the
size of the original image. Further, notice that the transformed image appears to contain multiple
copies of the original image. This is accomplished by specifying a padding method in the resampling
structure that extends the input image by repeating the pixels in a circular pattern.

6-41
6 Geometric Transformations

See Also

More About
• “2-D and 3-D Geometric Transformation Process Overview” on page 6-20

6-42
Register Two Images Using Spatial Referencing to Enhance Display

Register Two Images Using Spatial Referencing to Enhance


Display

This example shows how to use spatial referencing objects to understand the spatial relationship
between two images in image registration and display them effectively. This example brings one of
the images, called the moving image, into alignment with the other image, called the fixed image.

Read the two images of the same scene that are slightly misaligned.
fixed = imread("westconcordorthophoto.png");
moving = imread("westconcordaerial.png");

Display the moving (unregistered) image.


iptsetpref(ImshowAxesVisible="on")
imshow(moving)
text(size(moving,2),size(moving,1)+35,"Image courtesy of mPower3/Emerge", ...
FontSize=7,HorizontalAlignment="right")

Load a MAT file that contains preselected control points for the fixed and moving images.
load westconcordpoints

6-43
6 Geometric Transformations

Fit a projective geometric transformation to the control point pairs using the fitgeotform2d
function.

tform = fitgeotform2d(movingPoints,fixedPoints,"projective");

Perform the transformation necessary to register the moving image with the fixed image, using the
imwarp function. This example uses the optional "FillValues" name-value argument to specify a
fill value (white), which will help when displaying the fixed image over the transformed moving
image, to check registration. Notice that the full content of the geometrically transformed moving
image is present, now called registered. Also note that there are no blank rows or columns.

registered = imwarp(moving,tform,FillValues=255);
imshow(registered)

Overlay the transformed image, registered, over the fixed image using the imshowpair function.
Notice how the two images appear misregistered. This happens because imshowpair assumes that
the images are both in the default intrinsic coordinate system. The next steps provide two ways to
remedy this display problem.

imshowpair(fixed,registered,"blend");

6-44
Register Two Images Using Spatial Referencing to Enhance Display

Constrain the transformed image, registered, to the same number of rows and columns, and the
same spatial limits, as the fixed image. This ensures that the registered image appears registered
with the fixed image but areas of the registered image that would extrapolate beyond the extent of
the fixed image are discarded. To do this, create a default spatial referencing object that specifies the
size and location of the fixed image, and use the "OutputView" name-value argument of imwarp to
create a constrained resampled image registered1. Display the registered image over the fixed
image. In this view, the images appear to have been registered, but not all of the unregistered image
is visible.

Rfixed = imref2d(size(fixed));
registered1 = imwarp(moving,tform,FillValues=255,OutputView=Rfixed);
imshowpair(fixed,registered1,"blend");

6-45
6 Geometric Transformations

As an alternative, use the optional imwarp syntax that returns the output spatial referencing object
that indicates the position of the full transformed image in the same default intrinsic coordinate
system as the fixed image. Display the registered image over the fixed image and note that now the
full registered image is visible.

[registered2,Rregistered] = imwarp(moving,tform,FillValues=255);
imshowpair(fixed,Rfixed,registered2,Rregistered,"blend")

6-46
Register Two Images Using Spatial Referencing to Enhance Display

Clean up.

iptsetpref("ImshowAxesVisible","off")

6-47
6 Geometric Transformations

Create a Gallery of Transformed Images

This example shows many properties of geometric transformations by applying different


transformations to a checkerboard image.

A two-dimensional geometric transformation is a mapping that associates each point in a Euclidean


plane with another point in a Euclidean plane. In these examples, the geometric transformation is
defined by a rule that tells how to map the point with Cartesian coordinates (x, y) to another point
with Cartesian coordinates (u, v). A checkerboard pattern is helpful in visualizing a coordinate grid in
the plane of the input image and the type of distortion introduced by each transformation.

Create a sample checkerboard image using the checkerboard function. The image has rectangular
tiles and four unique corners, which makes it easy to see how the checkerboard image gets distorted
by geometric transformations. After you run this example once, try changing the image I to your
favorite image.

sqsize = 60;
I = checkerboard(sqsize,4,4);
imshow(I)
title("Original")

6-48
Create a Gallery of Transformed Images

Get the size of the image, and specify a fill value for the background.

nrows = size(I,1);
ncols = size(I,2);
fill = 0.3;

Similarity Transformation

Similarity transformations can include rotation, isotropic scaling, and translation, but not reflection.
Shapes and angles are preserved. Parallel lines remain parallel and straight lines remain straight.

Specify the rotation angle, scale factor, and translation amounts in the x and y directions. Then create
a similarity geometric transformation object.

scale = 1.2;
angle = 40;
tx = 0;

6-49
6 Geometric Transformations

ty = 0;
t_sim = simtform2d(scale,angle,[tx ty]);

Apply the similarity geometric transformation to the image and display the result.
I_similarity = imwarp(I,t_sim,FillValues=fill);

imshow(I_similarity);
title("Similarity")

If you change either tx or ty to a nonzero value, you will notice that it has no effect on the output
image. If you want to see the coordinates that correspond to your transformation, including the
translation, include spatial referencing information:

6-50
Create a Gallery of Transformed Images

[I_similarity,RI] = imwarp(I,t_sim,FillValues=fill);

imshow(I_similarity,RI)
axis on
title("Similarity (Spatially Referenced)")

Notice that passing the output spatial referencing object RI from imwarp reveals the translation. To
specify what part of the output image you want to see, use the OutputView name-value argument in
the imwarp function.

Reflective Similarity Transformation

In a reflective similarity transformation, similar triangles map to similar triangles.

6-51
6 Geometric Transformations

Specify the rotation angle, scale factor, and translation amounts in the x and y directions, and
reflection coefficient. Specify the reflection coefficient r as -1 to perform reflection, and 1 otherwise.
Create a 3-by-3 matrix that represents the transformation.

scale = 1.5;
angle = 10;
tx = 0;
ty = 0;
r = -1;

sc = scale*cosd(angle);
ss = scale*sind(angle);

A = [ sc r*ss tx;
-ss r*sc ty;
0 0 1];

Because reflective similarities are a subset of affine transformations, create an affine geometric
transformation object from the geometric transformation matrix.

t_reflective = affinetform2d(A);

Apply the reflective similarity transformation to the image and display the result with the output
spatial referencing information.

[I_reflective,RI] = imwarp(I,t_reflective,FillValues=fill);
imshow(I_reflective,RI)
axis on
title("Reflective Similarity")

6-52
Create a Gallery of Transformed Images

Affine Transformation

In an affine transformation, the x and y dimensions can be scaled or sheared independently and there
can be translation, reflection, and rotation. Parallel lines remain parallel. Straight lines remain
straight.

Specify a general affine transformation matrix. All six elements of the first and second columns can
be different, and the third row must be [0 0 1]. Create an affine geometric transformation object from
the geometric transformation matrix.

A = [ 1 1 0;
0.3 2 0;

6-53
6 Geometric Transformations

0 0 1];
t_aff = affinetform2d(A);

Apply the generic affine transformation to the image and display the result with the output spatial
referencing information.

I_affine = imwarp(I,t_aff,FillValues=fill);
imshow(I_affine)
title("Affine")

6-54
Create a Gallery of Transformed Images

Projective Transformation

In a projective transformation, quadrilaterals map to quadrilaterals. Straight lines remain straight but
parallel lines do not necessarily remain parallel.

Specify a general projective transformation matrix. All nine elements of the matrix can be different,
with no constraints on the last row. Create a projective geometric transformation object from the
geometric transformation matrix.

6-55
6 Geometric Transformations

A = [ 1 1 0;
0 1 0;
0.002 0.0002 1];
t_proj = projtform2d(A);

Apply the projective transformation to the image and display the result with the output spatial
referencing information.

I_projective = imwarp(I,t_proj,FillValues=fill);
imshow(I_projective)
title("Projective")

Piecewise Linear Transformation

In a piecewise linear transformation, affine transformations are applied separately to regions of the
image. In this example, the top-left, top-right, and bottom-left points of the checkerboard remain
unchanged, but the triangular region at the lower-right of the image is stretched so that the bottom-
right corner of the transformed image is 50% further to the right and 20% lower than the original
coordinate.

6-56
Create a Gallery of Transformed Images

movingPoints = [0 0; 0 nrows; ncols 0; ncols nrows;];


fixedPoints = [0 0; 0 nrows; ncols 0; ncols*1.5 nrows*1.2];
t_piecewise_linear = fitgeotform2d(movingPoints,fixedPoints,"pwl");

I_piecewise_linear = imwarp(I,t_piecewise_linear,FillValues=fill);
imshow(I_piecewise_linear)
title("Piecewise Linear")

Sinusoidal Transformation

This example and the following two examples show how you can create an explicit mapping to
associate each point in a regular grid (xi,yi) with a different point (ui,vi). This mapping is stored in a
geometricTransform2d object, which used by imwarp to transform the image.

In this sinusoidal transformation, the x-coordinate of each pixel is unchanged. The y-coordinate of
each row of pixels is shifted up or down following a sinusoidal pattern.

a = ncols/12; % Try varying the amplitude of the sinusoid


ifcn = @(xy) [xy(:,1), xy(:,2) + a*sin(2*pi*xy(:,1)/nrows)];
tform = geometricTransform2d(ifcn);

6-57
6 Geometric Transformations

I_sinusoid = imwarp(I,tform,FillValues=fill);
imshow(I_sinusoid);
title("Sinusoid")

Barrel Transformation

Barrel distortion perturbs an image radially outward from its center. Distortion is greater farther
from the center, resulting in convex sides.

First, define a function that maps pixel indices to distance from the center. Use the meshgrid
function to create arrays of the x-coordinate and y-coordinate of each pixel, with the origin in the
upper-left corner of the image.
[xi,yi] = meshgrid(1:ncols,1:nrows);

Shift the origin to the center of the image. Then, convert the Cartesian x- and y-coordinates to
cylindrical angle (theta) and radius (r) coordinates using the cart2pol function. r changes linearly
as distance from the center pixel increases.

6-58
Create a Gallery of Transformed Images

xt = xi - ncols/2;
yt = yi - nrows/2;
[theta,r] = cart2pol(xt,yt);

Define the amplitude, a, of the cubic term. This parameter is adjustable. Then, add a cubic term to r
so that r changes nonlinearly with distance from the center pixel.

a = 1; % Try varying the amplitude of the cubic term.


rmax = max(r(:));
s1 = r + r.^3*(a/rmax.^2);

Convert back to the Cartesian coordinate system. Shift the origin back to the upper-right corner of
the image.

[ut,vt] = pol2cart(theta,s1);
ui = ut + ncols/2;
vi = vt + nrows/2;

Store the mapping between (xi,yi) and (ui,vi) in a geometricTransform2d object. Use imwarp
to transform the image according to the pixel mapping.

ifcn = @(c) [ui(:) vi(:)];


tform = geometricTransform2d(ifcn);

I_barrel = imwarp(I,tform,FillValues=fill);
imshow(I_barrel)
title("Barrel")

6-59
6 Geometric Transformations

Pin Cushion Transformation

Pin-cushion distortion is the inverse of barrel distortion because the cubic term has a negative
amplitude. Distortion is still greater farther from the center but the distortion appears as concave
sides.

You can begin with the same theta and r values as for the barrel transformation. Define a different
amplitude, b, of the cubic term. This parameter is adjustable. Then, subtract a cubic term to r so that
r changes nonlinearly with distance from the center pixel.

b = 0.4; % Try varying the amplitude of the cubic term.


s = r - r.^3*(b/rmax.^2);

Convert back to the Cartesian coordinate system. Shift the origin back to the upper-right corner of
the image.

6-60
Create a Gallery of Transformed Images

[ut,vt] = pol2cart(theta,s);
ui = ut + ncols/2;
vi = vt + nrows/2;

Store the mapping between (xi,yi) and (ui,vi) in a geometricTransform2d object. Use imwarp
to transform the image according to the pixel mapping.
ifcn = @(c) [ui(:) vi(:)];
tform = geometricTransform2d(ifcn);
I_pin = imwarp(I,tform,FillValues=fill);
imshow(I_pin)
title("Pin Cushion")

Summary: Display All Geometric Transformations of Checkerboard


figure
subplot(3,3,1),imshow(I),title("Original")
subplot(3,3,2),imshow(I_similarity),title("Similarity")

6-61
6 Geometric Transformations

subplot(3,3,3),imshow(I_reflective),title("Reflective Similarity")
subplot(3,3,4),imshow(I_affine),title("Affine")
subplot(3,3,5),imshow(I_projective),title("Projective")
subplot(3,3,6),imshow(I_piecewise_linear),title("Piecewise Linear")
subplot(3,3,7),imshow(I_sinusoid),title("Sinusoid")
subplot(3,3,8),imshow(I_barrel),title("Barrel")
subplot(3,3,9),imshow(I_pin),title("Pin Cushion")

Note that subplot changes the scale of the images being displayed.

See Also
Functions
checkerboard | imwarp | fitgeotform2d | makeresampler | tformarray

Objects
affinetform2d | projtform2d | PiecewiseLinearTransformation2D |
PolynomialTransformation2D | LocalWeightedMeanTransformation2D

More About
• “Matrix Representation of Geometric Transformations” on page 6-27
• “2-D and 3-D Geometric Transformation Process Overview” on page 6-20

6-62
Exploring a Conformal Mapping

Exploring a Conformal Mapping

This example shows how to explore a conformal mapping. Geometric image transformations are
useful in understanding a conformal mapping that is important in fluid-flow problems, and the
mapping itself can be used to transform imagery for an interesting special effect.

Step 1: Select a Conformal Transformation

Conformal transformations, or mappings, have many important properties and uses. One property
relevant to image transformation is the preservation of local shape (except sometimes at isolated
points).

This example uses a 2-D conformal transformation to warp an image. The mapping from output to
input, g: R^2 -> R^2, is defined in terms of a complex analytic function G: C -> C, where

G(z) = (z + 1/z) / 2.

We define g via a direct correspondence between each point (x,y) in R^2 (the Euclidean plane) and
the point z = x + i*y in C (the complex plane),

g(x,y) = (Re(w),Im(w)) = (u,v)

where

w = u + i*v = G(x + i*y).

This conformal mapping is important in fluid mechanics because it transforms lines of flow around a
circular disk (or cylinder, if we add a third dimension) to straight lines. (See pp. 340-341 in Strang,
Gilbert, Introduction to Applied Mathematics, Wellesley-Cambridge Press, Wellesley, MA, 1986.)

A note on the value of complex variables: although we could express the definition of g directly in
terms of x and y, that would obscure the underlying simplicity of the transformation. This
disadvantage would come back to haunt us in Step 3 below. There, if we worked purely in real
variables, we would need to solve a pair of simultaneous nonlinear equations instead of merely
applying the quadratic formula!

Step 2: Warp an Image Using the Conformal Transformation

We start by loading the peppers image, extracting a 300-by-500 subimage, and displaying it.

A = imread('peppers.png');
A = A(31:330,1:500,:);
figure
imshow(A)
title('Original Image','FontSize',14)

6-63
6 Geometric Transformations

Then use maketform to make a custom tform struct with a handle to function conformalInverse
as its INVERSE_FCN argument:

conformal = maketform('custom', 2, 2, [], @conformalInverse, []);

To view conformalInverse use:

type conformalInverse.m

function U = conformalInverse(X, ~)
% conformalInverse Inverse conformal transformation.
%
% Supports conformal transformation example, ConformalMappingImageExample.m
% ("Exploring a Conformal Mapping").

% Copyright 2005-2013 The MathWorks, Inc.

Z = complex(X(:,1),X(:,2));
W = (Z + 1./Z)/2;
U(:,2) = imag(W);
U(:,1) = real(W);

Horizontal and vertical bounds are needed for mapping the original and transformed images to the
input and output complex planes. Note that the proportions in uData and vData match the height-to-
width ratio of the original image (3/5).

uData = [ -1.25 1.25]; % Bounds for REAL(w)


vData = [ 0.75 -0.75]; % Bounds for IMAG(w)

6-64
Exploring a Conformal Mapping

xData = [ -2.4 2.4 ]; % Bounds for REAL(z)


yData = [ 2.0 -2.0 ]; % Bounds for IMAG(z)

We apply imtransform using the SIZE parameter to ensure an aspect ratio that matches the
proportions in xData and yData (6/5), and view the result.
B = imtransform( A, conformal, 'cubic', ...
'UData', uData,'VData', vData,...
'XData', xData,'YData', yData,...
'Size', [300 360], 'FillValues', 255 );
figure
imshow(B)
title('Transformed Image','FontSize',14)

Compare the original and transformed images. Except that the edges are now curved, the outer
boundary of the image is preserved by the transformation. Note that each feature from the original
image appears twice in the transformed image (look at the various peppers). And there is a hole in
the middle of the transformed image with four regular cusps around its edges.

In fact, every point in the input w-plane is mapped to two points in the output z-plane, one inside the
unit circle and one outside. The copies inside the unit circle are much smaller than those outside. It's
clear that the cusps around the central hole are just the copies of the four image corners that mapped
inside the unit circle.

Step 3: Construct Forward Transformations

If the transformation created with maketform has a forward function, then we can apply tformfwd
to regular geometric objects (in particular, to rectangular grids and uniform arrays of circles) to
obtain further insight into the transformation. In this example, because G maps two output points to

6-65
6 Geometric Transformations

each input point, there is no unique forward transformation. But we can proceed if we are careful and
work with two different forward functions.

Letting w = (z + 1/z)/2 and solving the quadratic equation that results,


z^2 + 2*w*z + 1 = 0,

we find that
z = w +/- sqrt{(w^2 - 1).

The positive and the negative square roots lead to two separate forward transformations. We
construct the first using maketform and a handle to the function, conformalForward1.
t1 = maketform('custom', 2, 2, @conformalForward1, [], []);

To view conformalForward1 use:


type conformalForward1.m

function X = conformalForward1(U, ~)
% conformalForward1 Forward transformation with positive square root.
%
% Supports conformal transformation example, ConformalMappingImageExample.m
% ("Exploring a Conformal Mapping").

% Copyright 2005-2013 The MathWorks, Inc.

W = complex(U(:,1),U(:,2));
Z = W + sqrt(W.^2 - 1);
X(:,2) = imag(Z);
X(:,1) = real(Z);

We construct the second transformation with another function that is identical to


conformalForward1 except for a sign change.
t2 = maketform('custom', 2, 2, @conformalForward2, [], []);

type conformalForward2.m

function X = conformalForward2(U, ~)
% conformalForward2 Forward transformation with negative square root.
%
% Supports conformal transformation example, ConformalMappingImageExample.m
% ("Exploring a Conformal Mapping").

% Copyright 2005-2013 The MathWorks, Inc.

W = complex(U(:,1),U(:,2));
Z = W - sqrt(W.^2 - 1);
X(:,2) = imag(Z);
X(:,1) = real(Z);

Step 4: Explore the Mapping Using Grid Lines

With the two forward transformations, we can illustrate the mapping of a grid of lines, using
additional helper functions.

6-66
Exploring a Conformal Mapping

f3 = figure('Name','Conformal Transformation: Grid Lines');


axIn = conformalSetupInputAxes( subplot(1,2,1));
axOut = conformalSetupOutputAxes(subplot(1,2,2));
conformalShowLines(axIn, axOut, t1, t2)

% Reduce wasted vertical space in figure


f3.Position = [1 1 1 0.7] .* f3.Position;

You can see that the grid lines are color-coded according to their quadrants in the input plane before
and after the transformations. The colors also follow the transformed grids to the output planes. Note
that each quadrant transforms to a region outside the unit circle and to a region inside the unit circle.
The right-angle intersections between grid lines are preserved under the transformation -- evidence
of the shape-preserving property of conformal mappings -- except for the points at +1 and -1 on the
real axis.

Step 5: Explore the Mapping Using Packed Circles

Under a conformal transformation, small circles should remain nearly circular, changing only in
position and size. Again applying the two forward transformations, this time we map a regular array
of uniformly-sized circles.

f4 = figure('Name','Conformal Transformation: Circles');


axIn = conformalSetupInputAxes( subplot(1,2,1));
axOut = conformalSetupOutputAxes(subplot(1,2,2));
conformalShowCircles(axIn, axOut, t1, t2)

% Reduce wasted vertical space in figure


f4.Position = [1 1 1 0.7] .* f4.Position;

6-67
6 Geometric Transformations

You can see that the transform to a circle packing where tangencies have been preserved. In this
example, the color coding indicates use of the positive (green) or negative (blue) square root of w^2
- 1. Note that the circles change dramatically but that they remain circles (shape-preservation, once
again).

Step 6: Explore the Mapping Using Images

To further explore the conformal mapping, we can place the input and transformed images on the
pair of axes used in the preceding examples and superpose a set of curves as well.

First we display the input image, rendered semi-transparently, over the input axes of the conformal
map, along with a black ellipse and a red line along the real axis.

figure
axIn = conformalSetupInputAxes(axes);
conformalShowInput(axIn, A, uData, vData)
title('Original Image Superposed on Input Plane','FontSize',14)

6-68
Exploring a Conformal Mapping

Next we display the output image over the output axes of the conformal map, along with two black
circles and one red circle. Again, the image is semi-transparent.

figure
axOut = conformalSetupOutputAxes(axes);
conformalShowOutput(axOut, B, xData, yData)
title('Transformed Image Superposed on Output Plane','FontSize',14)

6-69
6 Geometric Transformations

MATLAB® graphics made it easy to shift and scale the original and transformed images to superpose
them on the input (w-) and output (z-) planes, respectively. The use of semi-transparency makes it
easier to see the ellipse, line, and circles. The ellipse in the w-plane has intercepts at 5/4 and -5/4 on
the horizontal axis and 3/4 and -3/4 on the vertical axis. G maps two circles centered on the origin to
this ellipse: the one with radius 2 and the one with radius 1/2. And, as shown in red, G maps the unit
circle to the interval [-1 1] on the real axis.

Step 7: Obtain a Special Effect by Masking Parts of the Output Image

If the inverse transform function within a custom tform struct returns a vector filled with NaN for a
given output image location, then imtransform (and also tformarray) assign the specified fill
value at that location. In this step we repeat Step 1, but modify our inverse transformation function
slightly to take advantage of this feature.
type conformalInverseClip.m

function U = conformalInverseClip( X, ~)
% conformalInverseClip Inverse conformal transformation with clipping.
%
% This is a modification of conformalInverse in which points in X
% inside the circle of radius 1/2 or outside the circle of radius 2 map to
% NaN + i*NaN.
%
% Supports conformal transformation example, ConformalMappingImageExample.m
% ("Exploring a Conformal Mapping").

6-70
Exploring a Conformal Mapping

% Copyright 2000-2013 The MathWorks, Inc.

Z = complex(X(:,1),X(:,2));
W = (Z + 1./Z)/2;
q = 0.5 <= abs(Z) & abs(Z) <= 2;
W(~q) = complex(NaN,NaN);
U(:,2) = imag(W);
U(:,1) = real(W);

This is the same as the function defined in Step 2, except for the two additional lines:

q = 0.5 <= abs(Z) & abs(Z) <= 2;


W(~q) = complex(NaN,NaN);

which cause the inverse transformation to return NaN at any point not between the two circles with
radii of 1/2 and 2, centered on the origin. The result is to mask that portion of the output image with
the specified fill value.

ring = maketform('custom', 2, 2, [], @conformalInverseClip, []);


Bring = imtransform( A, ring, 'cubic',...
'UData', uData, 'VData', vData,...
'XData', [-2 2], 'YData', yData,...
'Size', [400 400], 'FillValues', 255 );
figure
imshow(Bring)
title('Transformed Image With Masking','FontSize',14);

6-71
6 Geometric Transformations

The result is identical to our initial transformation except that the outer corners and inner cusps have
been masked away to produce a ring effect.

Step 8: Repeat the Effect on a Different Image

Applying the "ring" transformation to an image of winter greens (hemlock and alder berries) leads to
an aesthetic special effect.

Load the image greens.jpg, which already has a 3/5 height-to-width ratio, and display it.

C = imread('greens.jpg');
figure
imshow(C)
title('Winter Greens Image','FontSize',14);

6-72
Exploring a Conformal Mapping

Transform the image and display the result, this time creating a square output image.

D = imtransform( C, ring, 'cubic',...


'UData', uData, 'VData', vData,...
'XData', [-2 2], 'YData', [-2 2],...
'Size', [400 400], 'FillValues', 255 );
figure
imshow(D)
title('Transformed and Masked Winter Greens Image','FontSize',14);

6-73
6 Geometric Transformations

Notice that the local shapes of objects in the output image are preserved. The alder berries stayed
round!

6-74
Explore Slices from 3-D Image Volume with Anisotropic Voxel Spacing

Explore Slices from 3-D Image Volume with Anisotropic Voxel


Spacing

This example shows how to display slices from an anisotropic 3-D MRI volume. In an anisotropic
image volume, the spacing between voxels, or volume pixels, varies between spatial dimensions. For
example, if a 1-by-1-by-1 voxel maps to a 1-by-1-by-2 mm region in world coordinates, the voxel
spacing is anisotropic. By default, Image Processing Toolbox™ functions display images in voxel units,
and assume that spacing is uniform in world coordinates. As a result, anisotropic volumes can appear
distorted.

In this example, you apply two approaches for displaying slices along each dimension without
distortions:

• Apply Display Settings on page 6-75 — Set name-value arguments in display functions to
customize the display without modifying the underlying image.
• Transform Image Data on page 6-80 — Apply resampling and geometric transformations to the
underlying image data before displaying it.

Medical Imaging Toolbox™ extends the functionality of Image Processing Toolbox™ to automatically
manage spatial referencing between voxels, world coordinates, and anatomical axes. To learn more,
see “Choose Approach for Medical Image Visualization” (Medical Imaging Toolbox).

Load MRI Data

Load an MRI data set that contains a numeric array D and a grayscale colormap map. The numeric
array contains the MRI image data. The voxel spacing is anisotropic, with transverse slices that are
2.5 times thicker in world coordinates than slices along the coronal and sagittal axes. You can use the
colormap to display the volume with sufficient contrast. Remove singleton dimensions by using the
squeeze function.
load mri
DTransverse = squeeze(D);

Apply Display Settings

In this section, you use display function name-value arguments to display the MRI volume without
distortion. These name-value arguments update only the aspect ratio of the display, and do not modify
the underlying image data. Adjusting only the display is beneficial when you do not want to modify
the raw intensity values of the data using interpolation. A limitation of this approach is that you must
set the name-value arguments each time you display the volume. Additionally, some functions, such as
montage, do not have arguments for scaling anisotropic volumes.

Display Orthogonal Slices

Display orthogonal slice planes by using the orthosliceViewer object. Specify the ScaleFactors
name-value argument to apply a 2.5 scale factor to the third dimension. Each slice view includes a
crosshair that you can use to navigate the volume. To navigate slices, pause on one of the crosshair

axes until the cursor changes to the fleur shape, , and then click and drag to a new position. The
other slice views update automatically.
figure
orthosliceViewer(DTransverse,ScaleFactors=[1 1 2.5]);

6-75
6 Geometric Transformations

Display Stack of Slices

Display a scrollable stack of slices by using the sliceViewer object. By default, sliceViewer
displays slices along the third dimension. In this example, voxel spacing in the first two dimensions is
equal, so the 2-D slices do not appear distorted.

sliceViewer(DTransverse,Parent=figure);
title("Transverse Slices")

Change the slice direction and apply scale factors using the SliceDirection and ScaleFactors
name-value arguments, respectively. Display a stack of sagittal slices by specifying the slice direction
as "X" with a 2.5 scale factor for the third dimension.

sliceViewer(DTransverse,SliceDirection="X",ScaleFactors=[1 1 2.5],Parent=figure);
title("Sagittal Slices")

6-76
Explore Slices from 3-D Image Volume with Anisotropic Voxel Spacing

Display Slices as 3-D Object

Display the slices in 3-D by using the volshow function with the SlicePlanes rendering style. To
display the anisotropic voxels without distortion, specify the Transformation name-value argument
as an affinetform3d object that scales the volume by a factor of 2.5 in the third dimension. Click
and drag a slice to navigate between slices along that axis. To rotate the volume to a desired
orientation, click any empty space in the figure window and drag. To snap the view to a particular
slice plane, click the X, Y, or Z label on the axis indicator in the bottom-left corner of the viewer.

sx = 1;
sy = 1;
sz = 2.5;
A = [sx 0 0 0; 0 sy 0 0; 0 0 sz 0; 0 0 0 1];
tform = affinetform3d(A);

vol = volshow(DTransverse,renderingStyle="SlicePlanes",Transformation=tform);

6-77
6 Geometric Transformations

You can also view the volume as a 3-D object with advanced lighting using the cinematic rendering
style.

vol.RenderingStyle = "CinematicRendering";

6-78
Explore Slices from 3-D Image Volume with Anisotropic Voxel Spacing

Explore Slices and 3-D Volume Using Volume Viewer

Explore image volumes interactively using the Volume Viewer app. Launch the app by using the
volumeViewer function. To correctly display the anisotropic voxels, specify the ScaleFactors
name-value argument.

volumeViewer(DTransverse,ScaleFactors=[1 1 2.5])

6-79
6 Geometric Transformations

You can also specify scale factors in the app. In the 3D-Display tab of the app toolstrip, under
Spatial Referencing, enter scale factor values for each axis. When you enter values, the option to
the left of the values automatically changes to Specify Dimensions.

Transform Image Data

In this section, you use resampling and geometric transformation to modify an image volume before
displaying it. Resampling interpolates the anisotropic voxel grid to an isotropic grid that displays the
images without scaling distortions. You can also rotate the volume or apply other geometric
transformations to orient the volume to a desired display convention. A benefit of this approach is
that you do not need to specify name-value arguments each time you display the volume. Additionally,
this approach enables you to use functions such as montage, which do not support changing the
default scaling or slice direction using name-value arguments.

6-80
Explore Slices from 3-D Image Volume with Anisotropic Voxel Spacing

Resample Volume to Isotropic Voxel Spacing

Resample the original volume to an isotropic voxel grid by using the imresize3 function. Calculate
the target number of slices for the isotropic volume using a 2.5 scale factor.

numSlices = round(2.5*size(DTransverse,3));
DTransverseIsotropic = imresize3(DTransverse,[128 128 numSlices]);

You can accurately display the resampled volume using display functions without applying scale
factors. Display the resampled volume using orthoSliceViewer without the ScaleFactors name-
value argument.

figure
orthosliceViewer(DTransverseIsotropic);

Display Montage of Oriented Slices

The montage function displays image volumes as a stack of slices along the third dimension. This
section shows how to modify the image data to view slices along all three dimensions in the expected
orientation for each anatomical plane.

Display the slices of the resampled transverse volume. Apply the colormap map to display the image
with sufficient contrast.

figure
montage(DTransverseIsotropic,map)
title("Transverse Slices")

6-81
6 Geometric Transformations

To view the volume as a stack of sagittal slices, reorder the dimensions of DTransverseIsotropic.

DSagittal = permute(DTransverseIsotropic,[1 3 2]);

Rotate the volume to match the expected orientation of a sagittal slice, with the top of the head at the
top of the image and the anterior side of the head on the left side of the image.

DSagittalRotated = imrotate3(DSagittal,90,[0 0 1],"cubic");

Display the sagittal slices as a montage. Include the colormap to set the display range. Use the
Indices name-value argument to display slices 20 to 100, to exclude empty slices.

figure
montage(DSagittalRotated,map,Indices=20:100)
title("Sagittal Slices")

6-82
Explore Slices from 3-D Image Volume with Anisotropic Voxel Spacing

To view the volume as a stack of coronal slices, reorder the dimensions of DTransverseIsotropic.

DCoronal = permute(DTransverseIsotropic,[2 3 1]);

Rotate the volume to match the expected orientation for a coronal slice, with the top of the head at
the top of the image and the left side of the head on the left side of the image.

DCoronalRotated = imrotate3(DCoronal,90,[0 0 1],"cubic");

Display the coronal slices as a montage. Include the colormap to set the display range. Use the
Indices name-value argument to display slices 17 to 127, to exclude empty slices.

figure
montage(DCoronalRotated,map,Indices=17:127)
title("Coronal Slices")

6-83
6 Geometric Transformations

See Also
Volume Viewer | Medical Image Labeler | sliceViewer | orthosliceViewer | volshow |
imresize3 | imrotate3

More About
• “Explore 3-D Volumetric Data with Volume Viewer App” on page 4-60
• “Choose Approach for Medical Image Visualization” (Medical Imaging Toolbox)
• “Display Volume Using Cinematic Rendering” on page 4-97

6-84
Padding and Shearing an Image Simultaneously

Padding and Shearing an Image Simultaneously

This example shows how to construct a tform struct that represents a simple shear transformation
and then applies it to an image. We explore how the transformation affects straight lines and circles,
and then use it as a vehicle to explore the various options for image padding that can be used with
imtransform and tformarray.

Step 1: Transform an Image Using Simple Shear

In two dimensions, a simple shear transformation that maps a pair of input coordinates [u v] to a
pair of output coordinates [x y] has the form

x = u + a*v

y=v

where a is a constant.

Any simple shear is a special case of an affine transformation. You can easily verify that

100
[x y 1] = [u v 1] * a 1 0
001

yields the values for x and y that you received from the first two equations.

Setting a = 0.45, we construct an affine tform struct using maketform.

a = 0.45;
T = maketform('affine', [1 0 0; a 1 0; 0 0 1] );

We select, read, and view and image to transform.

A = imread('football.jpg');
h1 = figure; imshow(A); title('Original Image');

6-85
6 Geometric Transformations

We choose a shade of orange as our fill value.

orange = [255 127 0]';

We are ready to use T to transform A. We could call imtransform as follows:

B = imtransform(A,T,'cubic','FillValues',orange);

but this is wasteful since we would apply cubic interpolation along both columns and rows. (With our
pure shear transform, we really only need to interpolate along each row.) Instead, we create and use
a resampler that applies cubic interpolation along the rows but simply uses nearest neighbor
interpolation along the columns, then call imtransform and display the result.

R = makeresampler({'cubic','nearest'},'fill');
B = imtransform(A,T,R,'FillValues',orange);
h2 = figure; imshow(B);
title('Sheared Image');

6-86
Padding and Shearing an Image Simultaneously

Step 2: Explore the Transformation

Transforming a grid of straight lines or an array of circles with tformfwd is a good way to
understand a transformation (as long as it has both forward and inverse functions).

Define a grid of lines covering the original image, and display it over the image Then use tformfwd
to apply the pure shear to each line in the grid, and display the result over the sheared image.

[U,V] = meshgrid(0:64:320,0:64:256);
[X,Y] = tformfwd(T,U,V);
gray = 0.65 * [1 1 1];

figure(h1);
hold on;
line(U, V, 'Color',gray);
line(U',V','Color',gray);

6-87
6 Geometric Transformations

figure(h2);
hold on;
line(X, Y, 'Color',gray);
line(X',Y','Color',gray);

You can do the same thing with an array of circles.

6-88
Padding and Shearing an Image Simultaneously

gray = 0.65 * [1 1 1];


for u = 0:64:320
for v = 0:64:256
theta = (0 : 32)' * (2 * pi / 32);
uc = u + 20*cos(theta);
vc = v + 20*sin(theta);
[xc,yc] = tformfwd(T,uc,vc);
figure(h1); line(uc,vc,'Color',gray);
figure(h2); line(xc,yc,'Color',gray);
end
end

6-89
6 Geometric Transformations

Step 3: Compare the 'fill', 'replicate', and 'bound' Pad Methods

When we applied the shear transformation, imtransform filled in the orange triangles to the left and
right, where there was no data. That's because we specified a pad method of 'fill' when calling
makeresampler. There are a total of five different pad method choices ('fill', 'replicate',
'bound', 'circular', and 'symmetric'). Here we compare the first three.

First, to get a better look at how the 'fill' option worked, use the 'XData' and 'YData' options
in imtransform to force some additional space around the output image.

R = makeresampler({'cubic','nearest'},'fill');

Bf = imtransform(A,T,R,'XData',[-49 500],'YData',[-49 400],...


'FillValues',orange);

figure, imshow(Bf);
title('Pad Method = ''fill''');

6-90
Padding and Shearing an Image Simultaneously

Now, try the 'replicate' method (no need to specify fill values in this case).

R = makeresampler({'cubic','nearest'},'replicate');
Br = imtransform(A,T,R,'XData',[-49 500],'YData', [-49 400]);

figure, imshow(Br);
title('Pad Method = ''replicate''');

6-91
6 Geometric Transformations

And try the 'bound' method.

R = makeresampler({'cubic','nearest'}, 'bound');
Bb = imtransform(A,T,R,'XData',[-49 500],'YData',[-49 400],...
'FillValues',orange);
figure, imshow(Bb);
title('Pad Method = ''bound''');

6-92
Padding and Shearing an Image Simultaneously

Results with 'fill' and 'bound' look very similar, but look closely and you'll see that the edges are
smoother with 'fill'. That's because the input image is padded with the fill values, then the cubic
interpolation is applied across the edge, mixing fill and image values. In contrast, 'bound'
recognizes a strict boundary between the inside and outside of the input image. Points falling outside
are filled. Points falling inside are interpolated, using replication when they're near the edge. A close
up look helps show this more clearly. We choose XData and YData to bracket a point near the lower
right corner of the image, in the output image space, the resize with 'nearest' to preserve the
appearance of the individual pixels.

R = makeresampler({'cubic','nearest'},'fill');
Cf = imtransform(A,T,R,'XData',[423 439],'YData',[245 260],...
'FillValues',orange);

R = makeresampler({'cubic','nearest'},'bound');
Cb = imtransform(A,T,R,'XData',[423 439],'YData',[245 260],...
'FillValues',orange);

Cf = imresize(Cf,12,'nearest');
Cb = imresize(Cb,12,'nearest');

6-93
6 Geometric Transformations

figure;
subplot(1,2,1); imshow(Cf); title('Pad Method = ''fill''');
subplot(1,2,2); imshow(Cb); title('Pad Method = ''bound''');

Step 4: Exercise the 'circular' and 'symmetric' Pad Methods

The remaining two pad methods are 'circular' (circular repetition in each dimension) and
'symmetric' (circular repetition of the image with an appended mirror image). To show more of the
pattern that emerges, we redefine the transformation to cut the scale in half.

Thalf = maketform('affine',[1 0; a 1; 0 0]/2);

R = makeresampler({'cubic','nearest'},'circular');
Bc = imtransform(A,Thalf,R,'XData',[-49 500],'YData',[-49 400],...
'FillValues',orange);
figure, imshow(Bc);
title('Pad Method = ''circular''');

6-94
Padding and Shearing an Image Simultaneously

R = makeresampler({'cubic','nearest'},'symmetric');
Bs = imtransform(A,Thalf,R,'XData',[-49 500],'YData',[-49 400],...
'FillValues',orange);
figure, imshow(Bs);
title('Pad Method = ''symmetric''');

6-95
6 Geometric Transformations

6-96
7

Image Registration

This chapter describes the image registration capabilities of the Image Processing Toolbox software.
Image registration is the process of aligning two or more images of the same scene. Image
registration is often used as a preliminary step in other image processing applications.

• “Choose Image Registration Technique” on page 7-2


• “Register Images Using Registration Estimator App” on page 7-6
• “Load Images, Spatial Referencing Information, and Initial Transformation” on page 7-14
• “Tune Registration Settings in Registration Estimator” on page 7-17
• “Export Results from Registration Estimator App” on page 7-20
• “Techniques Supported by Registration Estimator” on page 7-22
• “Intensity-Based Automatic Image Registration” on page 7-24
• “Create an Optimizer and Metric for Intensity-Based Image Registration” on page 7-26
• “Use Phase Correlation as Preprocessing Step in Registration” on page 7-27
• “Register Multimodal MRI Images” on page 7-32
• “Register Multimodal 3-D Medical Images” on page 7-42
• “Registering an Image Using Normalized Cross-Correlation” on page 7-51
• “Control Point Registration” on page 7-57
• “Geometric Transformation Types for Control Point Registration” on page 7-59
• “Control Point Selection Procedure” on page 7-61
• “Start the Control Point Selection Tool” on page 7-63
• “Find Visual Elements Common to Both Images” on page 7-65
• “Select Matching Control Point Pairs” on page 7-68
• “Export Control Points to the Workspace” on page 7-73
• “Find Image Rotation and Scale” on page 7-75
• “Use Cross-Correlation to Improve Control Point Placement” on page 7-79
• “Register Images with Projection Distortion Using Control Points” on page 7-80
7 Image Registration

Choose Image Registration Technique


Image registration is the process of aligning two or more images of the same scene. This process
involves designating one image as the reference image, also called the fixed image, and applying
geometric transformations or local displacements to the other images so that they align with the
reference. Images can be misaligned for a variety of reasons. Commonly, images are captured under
variable conditions that can change the camera perspective or the content of the scene. Misalignment
can also result from lens and sensor distortions or differences between capture devices.

Image registration is often used as a preliminary step in other image processing applications. For
example, you can use image registration to align satellite images or medical images captured with
different diagnostic modalities, such as MRI and SPECT. Image registration enables you to compare
common features in different images. For example, you might discover how a river has migrated, how
an area became flooded, or whether a tumor is visible in an MRI or SPECT image.

Image Processing Toolbox offers three image registration approaches: the interactive Registration
Estimator app, intensity-based automatic image registration, and control point registration. Computer
Vision Toolbox™ offers automated feature detection and matching.

Capability “The “Intensity-Based “Control Point “Automated


Registration Automatic Image Registration” on Feature
Estimator App” Registration” on page 7-4 Detection and
on page 7-2 page 7-3 Matching” on
page 7-5
(requires
Computer Vision
Toolbox)
Interactive X
registration
Automated X X
intensity-based
registration
Automated feature X X
detection
Manual feature X
selection
Automated feature X X X
matching
Nonrigid X X X
transformation
Fully automatic X X
workflow
Supports 3-D X
images

The Registration Estimator App


The Registration Estimator app enables you to register 2-D images interactively. You can compare
different registration techniques, tune settings, and visualize the registered image. The app provides

7-2
Choose Image Registration Technique

a quantitative measure of quality, and it returns the registered image and the transformation matrix.
The app also generates code with your selected registration technique and settings, so you can apply
an identical transformation to multiple images.

Registration Estimator offers six feature-based techniques, three intensity-based techniques, and one
nonrigid registration technique. For a more detailed comparison of the available techniques, see
“Techniques Supported by Registration Estimator” on page 7-22.

Intensity-Based Automatic Image Registration


“Intensity-Based Automatic Image Registration” on page 7-24 maps pixels in each image based on
relative intensity patterns. You can register both monomodal and multimodal image pairs, and you
can register 2-D and 3-D images. This approach is useful for:

• Registering a large collection of images


• Automated registration

To register images using an intensity-based technique, use imregister and specify the type of
geometric transformation to apply to the moving image. imregister iteratively adjusts the
transformation to optimize the similarity of the two images.

Alternatively, you can estimate a localized displacement field and apply a nonrigid transformation to
the moving image using imregdemons.

7-3
7 Image Registration

Control Point Registration


“Control Point Registration” on page 7-57 enables you to select common features in each image
manually. Control point registration is useful when:

• You want to prioritize the alignment of specific features, rather than the entire set of features
detected using automated feature detection. For example, when registering two medical images,
you can focus the alignment on desired anatomical features and disregard matched features that
correspond to less informative anatomical structures.
• Images have repeated patterns that provide an unclear mapping using automated feature
matching. For example, photographs of buildings with many windows, or aerial photographs of
gridded city streets, have many similar features that are challenging to map automatically. In this
case, manual selection of control point pairs can provide a clearer mapping of features, and thus a
better transformation to align the feature points.

Control point registration can apply many types of transformations to the moving image. Global
transformations, which act on the entire image uniformly, include affine, projective, and polynomial
geometric transformations. Nonrigid transformations, which act on local regions, include piecewise
linear and local weighted mean transformations.

Use the Control Point Selection Tool to select control points. Start the tool with cpselect.

7-4
Choose Image Registration Technique

Automated Feature Detection and Matching


Automated “Feature Detection and Extraction” (Computer Vision Toolbox) detects features such as
corners and blobs, matches corresponding features in the moving and fixed images, and estimates a
geometric transform to align the matched features.

For an example, see “Find Image Rotation and Scale Using Automated Feature Matching” (Computer
Vision Toolbox). You must have Computer Vision Toolbox to use this method.

Note The Registration Estimator app offers six feature-based techniques to register a single pair of
images. However, the app does not provide an automated workflow to register multiple images.

See Also
imregister | imwarp

Related Examples
• “Register Images Using Registration Estimator App” on page 7-6
• “Register Multimodal MRI Images” on page 7-32
• “Register Images with Projection Distortion Using Control Points” on page 7-80

7-5
7 Image Registration

Register Images Using Registration Estimator App

This example shows how to align a pair of images using the Registration Estimator app. Registration
Estimator offers several registration techniques using feature-based, intensity-based, and nonrigid
registration algorithms. For more information, see “Techniques Supported by Registration Estimator”
on page 7-22.

Create two misaligned images in the workspace. This example creates the moving image J by
rotating the fixed image I clockwise by 30 degrees.

I = imread("cameraman.tif");
J = imrotate(I,-30);

Open Registration Estimator

In this example, you can open Registration Estimator from the command window because the
images have no spatial referencing information or initial transformation estimate. Specify the moving
image and the fixed image as the two input arguments.

registrationEstimator(J,I)

If your images have spatial referencing information, or if you want to specify an initial transformation
estimate, then you must load the images using a dialog window. For more information, see “Load
Images, Spatial Referencing Information, and Initial Transformation” on page 7-14.

You can also open Registration Estimator from the MATLAB™ Toolstrip. Open the Apps tab and
click Registration Estimator under Image Processing and Computer Vision. If you open the app
from the toolstrip, you must load the images using a dialog window.

Obtain Initial Registration Estimate

After you load the images, the app displays an overlay of the images and creates three registration
trials: Phase Correlation, MSER, and SURF. These trials appear as drafts in the history list. You
can click on each trial to adjust the registration settings. To create a trial for a different registration
technique, select a technique from the Technique menu.

The default Green-Magenta overlay style shows the fixed image in green and the moving image in
magenta. The overlay looks gray in areas where the two images have similar intensity. Additional
overlay styles assist with visualizing the results of the registration. When you click a feature-based
technique in the history list, the image overlay displays a set of red and green dots connected by
yellow lines. These points are the matched features used to align the images.

7-6
Register Images Using Registration Estimator App

Run the three default registration trials with the default settings. Click each trial in the history list,
then click Register Images.

After the registration finishes, the trial displays a quality score and computation time. The quality
score is based loosely on the ssim function and provides an overall estimate of registration quality. A
score closer to 1 indicates a higher quality registration. Different registration techniques and settings
can yield similar quality scores but show error in different regions of the image. Inspect the image
overlay to confirm which registration technique is the most acceptable. Colors in the image overlay
indicate residual misalignment.

Note: due to randomness in the registration optimizer, the quality score, registered image, and
geometric transformation can vary slightly between trials despite identical registration settings.

7-7
7 Image Registration

Refine Registration Settings

After you have an initial registration estimate, adjust registration settings to improve the quality of
the alignment. For more information on available settings, see “Tune Registration Settings in
Registration Estimator” on page 7-17. If you know the conditions under which the images were
obtained, then you can select a different transformation type or clear the Has Rotation option. Post-
processing using nonrigid transformations is available for advanced workflows.

Adjust the settings of the MSER trial. Try increasing the number of detected features and the quality
of matched features independently to see if either improves the quality of the registration.

To increase the number of detected features, click the MSER trial, numbered 2, in the history list. In
the Current Registration Settings panel, drag the Number of Detected Features slider to the right.
When you change the setting, the app creates a new trial, numbered 2.1, in the history list. The
image overlay shows more matched features, as expected.

7-8
Register Images Using Registration Estimator App

To run the registration with these settings, click Register Images. The quality metric of this trial is
less than the quality of the original MSER trial with the default number of matched features. The
image overlay of this trial has an overall magenta tint and a thick green strip along the top of the
man's head and shoulder. Therefore, increasing the number of detected features does not necessarily
improve the quality of the registration.

7-9
7 Image Registration

To see the effect of increasing the quality of matched features, click the MSER trial 2 (not 2.1) in the
history list. In the Current Registration Settings panel, drag the Quality of Matched Features
slider to the right. When you change the setting, the app creates a new trial, numbered 2.2, in the
history list. The image overlay displays a smaller number of high quality matched points.

7-10
Register Images Using Registration Estimator App

To see the registration with these settings, click Register Images. Compared to the other MSER
trials, this trial has the best quality score. There is not a noticeable difference in the visual quality of
the image compared to the original MSER trial with default settings. If you want to see which pixels
differ between the default MSER trial and this trial, change the overlay style to Difference and toggle
between the two trials.

7-11
7 Image Registration

Export Registration Results

When you find an acceptable registration, export the registered image and the geometric
transformation to the workspace. You can use the registration results to apply a similar registration
to multiple frames in an image sequence. To learn more, see “Export Results from Registration
Estimator App” on page 7-20.

This example exports trial 2.2 because it has the best quality score and no severe regions of
misalignment. Click trial 2.2 in the history list, then click Export and select Export Images. In the
Export to Workspace dialog box, assign a name to the registration output. The output is a structure
that contains the final registered image and the geometric transformation.

See Also
Registration Estimator

7-12
Register Images Using Registration Estimator App

More About
• “Choose Image Registration Technique” on page 7-2
• “Techniques Supported by Registration Estimator” on page 7-22

7-13
7 Image Registration

Load Images, Spatial Referencing Information, and Initial


Transformation
You can load images into Registration Estimator from file or from the workspace. You can also provide
optional spatial referencing information and an optional initial geometric transformation.

Load Images from File or Workspace


You can load images into Registration Estimator from file or from the workspace. Although you can
load grayscale or color images, the app converts all RGB images to grayscale by using the rgb2gray
function. Registration Estimator supports only 2-D images.

Loading images from file supports only BMP, JPG, JPEG, TIF, TIFF, PNG, and DCM file types. To work
with a wider range of file formats, load images from the workspace. Registration Estimator supports
any image read into the workspace by the imread function and DICOM image read into the
workspace using the dicomread function.

Load images into the app by clicking the Load Images icon.

• To load images from a file, select the Load from file option. In the dialog box, specify the file
path of the moving and fixed images. Use Browse to navigate to a folder.

7-14
Load Images, Spatial Referencing Information, and Initial Transformation

• To load variables from the workspace, select the Load from workspace option. In the dialog
box, select the name of the variable containing the moving image from the Moving Image menu
and the variable containing the fixed image from the Fixed Image menu.

Provide Spatial Referencing Information


If you have 2-D spatial referencing objects in your workspace, or if you load DICOM images from a
file, then you can provide optional spatial referencing information. Spatial referencing information is
useful if you want to orient the images to a world coordinate system. For more information about
spatial referencing objects, see imref2d.

Note If you load DICOM images into the workspace using dicomread, spatial referencing
information in the metadata is no longer associated with the image data. To preserve spatial
referencing information with DICOM images, either load the images from file or create an imref2d
object from the image metadata. For more information about DICOM metadata, see “Read Metadata
from DICOM Files” on page 3-10.

If you do not have spatial referencing information, then the Spatial Referencing Object and DICOM
Metadata radio buttons are inactive.

7-15
7 Image Registration

Provide an Initial Geometric Transformation


You can provide an optional initial geometric transformation using affine2d and projective2d
geometric transformation objects in your workspace. An initial geometric transformation is useful if
you are processing a batch of images with similar initial misalignment. Once the first moving image
has been registered, you can export the geometric transformation to the workspace and apply the
transformation to other images in the series. See “Export Results from Registration Estimator App”
on page 7-20.

If you do not have a geometric transformation object in your workspace, the Initial Transformation
Object selection box is inactive.

Note If you have a 2-D geometric transformation object that uses the premultiply convention, such
as an affinetform2d or projtform2d object, then you can convert the object to a projective2d
object for use with Registration Estimator. First, get the A property of your geometric transformation
object, then create a projective2d object using the transpose of the A property. For example,
suppose you have a premultiply geometric transformation object called myTform:

A = myTform.A;
newTform = projective2d(A');

See Also
Functions
imread | dicomread

Classes
affinetform2d | projtform2d | affine2d | projective2d | imref2d

Related Examples
• “Register Images Using Registration Estimator App” on page 7-6

7-16
Tune Registration Settings in Registration Estimator

Tune Registration Settings in Registration Estimator


Adjust settings in Registration Estimator based on your registration technique.

Note Due to randomness in the registration optimizer, the quality metric, registered image, and
geometric transformation can vary slightly between trials despite identical registration settings.

Geometric Transformations Supported by Registration Estimator


All feature-based and intensity-based registration techniques allow you to set the transformation
type. For more details about each type of transformation matrix, see “Matrix Representation of
Geometric Transformations” on page 6-27.

• Translation transformations preserve the size and orientation of the image. Each pixel in the
image is displaced the same amount in the same direction.
• Rigid transformations include rotation and translation. Rigid transformations preserve length.

Note Although reflection is a type of rigid transformation, Registration Estimator does not
support reflection.
• Similarity transformations include isotropic scaling, rotation, and translation. Similarity
transformations preserve shape, but not size. When used with a featured-based registration
technique, at least two matched pairs of points are required.
• Affine transformations include shear and all supported similarity transformations. Affine
transformations preserve parallel lines, but not necessarily angles between lines or distances
between points. When used with a featured-based registration technique, at least three matched
pairs of points are required.
• Projective transformations allow tilting in addition to all supported affine transformations. When
used with a featured-based registration technique, at least four matched pairs of points are
required.

Registration Translation Rigid Similarity Affine Projective


Technique
All Feature- X X X
Based
Techniques
Monomodal X X X X
Intensity
Multimodal X X X X
Intensity
Phase X X X
Correlation

Feature-Based Registration Settings


Feature-based registration allows you to adjust three settings in addition to the geometric
transformation type:

7-17
7 Image Registration

• Number of detected features. The transformation type determines the minimum number of
matched features required to perform a registration. Similarity transformations require two or
more matched features. Affine transformations require three or more matched features. Projective
transformations require four or more matched features.
• Quality of matched features. The quality value is a combination of matched features options.
• Rotation. By default, feature-based registration allows the moving image to rotate. However, some
imaging scenarios, such as stereoscopy, produce images with identical rotation. If your images
have the same rotation, clearing this option can improve the accuracy of the registration.

Intensity-Based Registration Settings


All intensity-based registration techniques allow you to select the geometric transformation type.
Additional settings are available depending on the registration technique.

Monomodal and multimodal intensity-based registration provide three common settings:

• Normalize. This option scales the pixel values of both images to the same dynamic range.
• Apply Gaussian blur. Smoothing the images with a Gaussian blur can help the optimizer find the
global maximum or minimum of the solution surface. However, smoothing changes the shape of
the surface, and over-smoothing can shift the position of the extrema. Large amounts of blurring
are useful when the images are severely misaligned at the start of the registration, to help the
optimizer search the correct basin of attraction. Small amounts of blurring are useful when the
images start with close alignment.
• Align centers. This option provides an initial transformation that aligns the world coordinates of
the centers of the two images. The geometric option aligns the geometric centers, based on the
spatial referencing information of the images. The center of mass option aligns the centers of
mass, calculated from the weighted mean of pixel intensities.

Monomodal registration enables you to adjust the properties of the regular step gradient descent
optimizer. For more information about the properties of this optimizer, see
RegularStepGradientDescent.

Multimodal registration enables you to adjust the properties of the one plus one evolutionary
optimizer. For more information about the properties of this optimizer, see
OnePlusOneEvolutionary.

Phase correlation enables you to choose to window the frequency-domain representation of the
images. Windowing increases the stability of registration results. If the common features you are
trying to align in your images are oriented along the edges, clearing this option can improve
registration results. For more information about using phase correlation to transform an image, see
imregcorr.

Nonrigid and Post-Processing Settings


Every registration technique in Registration Estimator allows for nonrigid transformations to refine
the registration fit locally. For more information about estimating a displacement field for nonrigid
transformations, see imregdemons.

The nonrigid settings available in Registration Estimator are:

• Number of iterations. This value is the number of iterations on each pyramid level.

7-18
Tune Registration Settings in Registration Estimator

• Pyramid levels. The value represents the number of Gaussian pyramid reduction levels. The
maximum number of pyramid levels depends on the size of each dimension in the images. For
example, when the shortest dimension of the fixed and moving images is 256 pixels, at most eight
pyramid levels can be used. For more information about pyramid reduction, see impyramid.
• Smoothing. The value represents the standard deviation of Gaussian smoothing and remains the
same at each pyramid level. Values are in the range [0.5, 3]. Larger values result in smoother
output displacement fields. Smaller values result in more localized deformation in the output
displacement field.

Note Although isotropic scaling and shearing are nonrigid transformations from a mathematical
perspective, these transformations act globally on an image. Enable scaling and shearing in the
Registration Estimator app by selecting an affine or projective transformation type, not by applying a
nonrigid transformation.

See Also
imregcorr | imregdemons

Related Examples
• “Register Images Using Registration Estimator App” on page 7-6

More About
• “Techniques Supported by Registration Estimator” on page 7-22
• “Matrix Representation of Geometric Transformations” on page 6-27

7-19
7 Image Registration

Export Results from Registration Estimator App


When you find an acceptable registration from Registration Estimator, export the results. You can use
the exported results to apply similar registration to other frames in an image sequence. There are
two options to export the results:

• Export the registered image and the geometric transformation to the workspace. Apply an
identical geometric transformation to other images using imwarp.
• Generate a function with the desired registration technique and settings. Call this function to
register other images using the same settings.

Export Results to the Workspace


To export the registration results to the workspace, click the desired trial in the history list, then click
Export and select Export Images. In the Export to Workspace dialog box, assign a name to the
registration output. The output is a structure that contains the final registered image, the spatial
referencing object, and the geometric transformation used for the registration.

Generate a Function
To generate MATLAB code that registers images using the desired registration technique and
settings, click the corresponding trial in the history list, then click Export. Select the Generate
Function option. The app opens the MATLAB editor containing a function with the autogenerated
code. To save the code, click Save in the MATLAB editor.

Note If you generate a function using a feature-based registration technique, then you must have
Computer Vision Toolbox to run the function.

The generated function accepts a moving and a fixed image as inputs. The function returns a
structure that contains the final registered image, the spatial referencing object, and the geometric
transformation of the registered image. If you generate a function using a feature-based registration
technique, then the output structure has two additional fields for the moving matched features and
the fixed matched features.

See Also
imwarp

Related Examples
• “Register Images Using Registration Estimator App” on page 7-6

7-20
Export Results from Registration Estimator App

More About
• “2-D and 3-D Geometric Transformation Process Overview” on page 6-20

7-21
7 Image Registration

Techniques Supported by Registration Estimator

Feature-Based Registration
Feature-based registration techniques automatically detect distinct image features such as sharp
corners, blobs, or regions of uniform intensity. The moving image undergoes a single global
transformation to provide the best alignment of corresponding features with the fixed image.

FAST detects corner features, especially in scenes of human origin such as streets and indoor
rooms. FAST supports single-scale images and point-tracking.

MinEigen also detects corner features. MinEigen supports single-scale images and point-
tracking.

Harris also detects corner features, using a more efficient algorithm than MinEigen. Harris
supports single-scale images and point-tracking.

BRISK also detects corner features. Unlike the preceding algorithms, BRISK supports changes
in scale and rotation, and point-tracking.

ORB detects corners in images with changes in scale and/or rotation.

SURF detects blobs in images and supports changes in scale and rotation.

KAZE detects multiscale blob features from a scale space constructed using nonlinear diffusion.

MSER detects regions of uniform intensity. MSER supports changes in scale and rotation, and is
more robust to affine transformations than the other feature-based algorithms.

In Registration Estimator, you can register images and generate functions for all feature-based
techniques without a Computer Vision Toolbox license. However, to run an autogenerated function
that uses a feature-based registration technique, you must have Computer Vision Toolbox. For more
information, see “Export Results from Registration Estimator App” on page 7-20.

Intensity-Based Registration
Intensity-based registration techniques correlate image intensity in the spatial or frequency domain.
The moving image undergoes a single global transformation to maximize the correlation of its
intensity with the intensity of the fixed image.

7-22
Techniques Supported by Registration Estimator

Monomodal intensity registers images with similar brightness and contrast that are captured on
the same type of scanner or sensor. For example, use monomodal intensity to register MRI scans
taken of similar subjects using the same imaging sequence.

Multimodal intensity registers images with different brightness and contrast. These images can
come from two different types of devices, such as two camera models or two types of medical imaging
systems (such as CT and MRI). These images can also come from a single device. For example, use
multimodal intensity to register images taken with the same camera using different exposure
settings, or to register MRI images acquired during a single session using different imaging
sequences.

Phase correlation registers images in the frequency domain. Like multimodal intensity, phase
correlation is invariant to image brightness. Phase correlation is more robust to noise than the other
intensity-based registration techniques.

Note Phase correlation provides better results when the aspect ratio of each image is square.

Nonrigid Registration

Nonrigid registration applies nonglobal transformations to the moving image. Nonrigid


transformations generate a displacement field, in which each pixel location in the fixed image is
mapped to a corresponding location in the moving image. The moving image is then warped
according to the displacement field and resampled using linear interpolation. For more information
about estimating a displacement field for nonrigid transformations, see imregdemons.

See Also

Related Examples
• “Register Images Using Registration Estimator App” on page 7-6
• “Choose Image Registration Technique” on page 7-2

7-23
7 Image Registration

Intensity-Based Automatic Image Registration


Intensity-based automatic image registration is an iterative process. It requires that you specify a
pair of images, a metric, an optimizer, and a transformation type. The metric defines the image
similarity metric for evaluating the accuracy of the registration. This image similarity metric takes
two images and returns a scalar value that describes how similar the images are. The optimizer
defines the methodology for minimizing or maximizing the similarity metric. The transformation type
defines the type of 2-D transformation that aligns the misaligned image (called the moving image)
with the reference image (called the fixed image).

The process begins with the transformation type you specify and an internally determined
transformation matrix. Together, they determine the specific image transformation that is applied to
the moving image with bilinear interpolation.

Next, the metric compares the transformed moving image to the fixed image and a metric value is
computed.

Finally, the optimizer checks for a stop condition. A stop condition is anything that warrants the
termination of the process. In most cases, the process stops when it reaches a point of diminishing
returns or when it reaches the specified maximum number of iterations. If there is no stop condition,
the optimizer adjusts the transformation matrix to begin the next iteration.

Perform intensity-based image registration with the following steps:


1 Read the images into the workspace with imread or dicomread.
2 Create the optimizer and metric. See “Create an Optimizer and Metric for Intensity-Based Image
Registration” on page 7-26.

7-24
Intensity-Based Automatic Image Registration

3 Register the images with imregister.


4 View the results with imshowpair or save a copy of an image showing the results with imfuse.

See Also

Related Examples
• “Register Multimodal MRI Images” on page 7-32

More About
• “Use Phase Correlation as Preprocessing Step in Registration” on page 7-27
• “Choose Image Registration Technique” on page 7-2

7-25
7 Image Registration

Create an Optimizer and Metric for Intensity-Based Image


Registration
You can pass an image similarity metric and an optimizer technique to imregister. An image
similarity metric takes two images and returns a scalar value that describes how similar the images
are. The optimizer you pass to imregister defines the methodology for minimizing or maximizing
the similarity metric.

imregister supports two similarity metrics:

• Mattes mutual information


• Mean squared error

In addition, imregister supports two techniques for optimizing the image metric:

• One-plus-one evolutionary
• Regular step gradient descent

You can pass any combination of metric and optimizer to imregister, but some pairs are better
suited for some image classes. Refer to the table for help choosing an appropriate starting point.

Capture Scenario Metric Optimizer


Monomodal MeanSquares RegularStepGradientDesce
nt
Multimodal MattesMutualInformation OnePlusOneEvolutionary

Use imregconfig to create the default metric and optimizer for a capture scenario in one step. For
example, the following command returns the optimizer and metric objects suitable for registering
monomodal images.

[optimizer,metric] = imregconfig('monomodal');

Alternatively, you can create the objects individually. This enables you to create alternative
combinations to address specific registration issues. The following code creates the same monomodal
optimizer and metric combination.

optimizer = registration.optimizer.RegularStepGradientDescent();
metric = registration.metric.MeanSquares();

Getting good results from optimization-based image registration can require modifying optimizer or
metric settings. For an example of how to modify and use the metric and optimizer with imregister,
see “Register Multimodal MRI Images” on page 7-32.

See Also
imregister | imregconfig

7-26
Use Phase Correlation as Preprocessing Step in Registration

Use Phase Correlation as Preprocessing Step in Registration

This example shows how to use phase correlation as a preliminary step for automatic image
registration. In this process, you perform phase correlation using imregcorr, and then pass the
result of that registration as the initial condition of an optimization-based registration using
imregister. Phase correlation and optimization-based registration are complementary algorithms.
Phase correlation is good for finding gross alignment, even for severely misaligned images.
Optimization-based registration is good for finding precise alignment, given a good initial condition.

Read an image that will be the reference image in the registration.


fixed = imread("cameraman.tif");
imshow(fixed)

Create an unregistered image by deliberately distorting this image using rotation, isotropic scaling,
and shearing in the y direction.
theta = 170;
rot = [
cosd(theta) -sind(theta) 0; ...
sind(theta) cosd(theta) 0; ...
0 0 1];
sc = 2.3;
scale = [sc 0 0; 0 sc 0; 0 0 1];
sh = 0.1;
shear = [1 sh 0; 0 1 0; 0 0 1];

tform = affinetform2d(shear*scale*rot);
moving = imwarp(fixed,tform);

Add noise to the image, and display the result.

7-27
7 Image Registration

moving = imnoise(moving,"gaussian");
imshow(moving)

Estimate the registration required to bring these two images into alignment. imregcorr returns a
simtform2d object that defines the transformation.

tformEstimate = imregcorr(moving,fixed)

tformEstimate =
simtform2d with properties:

Dimensionality: 2
Scale: 0.4300
RotationAngle: -169.1579
Translation: [257.9866 302.4839]
R: [2x2 double]

7-28
Use Phase Correlation as Preprocessing Step in Registration

A: [3x3 double]

Apply the estimated geometric transform to the misaligned image. Specify the OutputView name-
value argument to ensure the registered image is the same size as the reference image.

Rfixed = imref2d(size(fixed));
movingReg = imwarp(moving,tformEstimate,OutputView=Rfixed);

Display the original image and the registered image in a montage. You can see that imregcorr has
done a good job handling the rotation and scaling differences between the images. The registered
image, movingReg, is very close to being aligned with the original image, fixed. However, some
misalignment remains. imregcorr can handle rotation and scale distortions well, but not shear
distortion.

imshowpair(fixed,movingReg,"montage");

View the aligned image overlaid on the original image, using imshowpair. In this view, imshowpair
uses color to highlight areas of misalignment.

imshowpair(fixed,movingReg,"falsecolor");

7-29
7 Image Registration

To finish the registration, use imregister, passing the estimated transformation returned by
imregcorr as the initial condition. imregister is more effective if the two images are roughly in
alignment at the start of the operation. The transformation estimated by imregcorr provides this
information for imregister. The example uses the default optimizer and metric values for a
registration of two images taken with the same sensor, which is a monomodal configuration.

[optimizer,metric] = imregconfig("monomodal");
movingRegistered = imregister(moving,fixed,"affine", ...
optimizer,metric,InitialTransformation=tformEstimate);

Display the result of this registration. Note that imregister achieves a very accurate registration,
given the good initial condition provided by imregcorr.

imshowpair(fixed,movingRegistered,Scaling="joint");

7-30
Use Phase Correlation as Preprocessing Step in Registration

See Also
imregcorr | imregister | imregconfig | imwarp

7-31
7 Image Registration

Register Multimodal MRI Images

This example shows how you can align two magnetic resonance (MRI) images to a common
coordinate system using intensity-based image registration. This approach does not find features or
use control points. Intensity-based registration is often well-suited for medical and remotely sensed
imagery.

Step 1: Load Images

This example uses two MRI images of a knee. The fixed image is a spin echo image, while the moving
image is a spin echo image with inversion recovery. The two sagittal slices were acquired at the same
time but are slightly out of alignment.

fixed = dicomread("knee1.dcm");
moving = dicomread("knee2.dcm");

The imshowpair function is useful to visualize images during every part of the registration process.
Use it to see the two images individually in a montage or display them overlapping to show the
amount of misregistration.

imshowpair(moving,fixed,"montage")
title("Unregistered")

In the overlapping image from imshowpair, gray areas correspond to areas that have similar
intensities, while magenta and green areas show places where one image is brighter than the other.
In some image pairs, green and magenta areas do not always indicate misregistration, but in this
example it is easy to use the color information to see where they do.

imshowpair(moving,fixed)
title("Unregistered")

7-32
Register Multimodal MRI Images

Step 2: Set up the Initial Registration

The imregconfig function makes it easy to pick the correct optimizer and metric configuration to
use with imregister. The optimizer and metric variables are objects whose properties control the
registration. For more information, see “Create an Optimizer and Metric for Intensity-Based Image
Registration” on page 7-26.

These two images have different intensity distributions, which suggests a multimodal configuration.

[optimizer,metric] = imregconfig("multimodal");

The distortion between the two images includes scaling, rotation, and possibly shear. Use an affine
transformation to register the images.

registeredDefault = imregister(moving,fixed,"affine",optimizer,metric);

7-33
7 Image Registration

Display the result. It is very rare that imregister will align images perfectly with the default
settings. Nevertheless, using them is a useful way to decide which properties to tune first.
imshowpair(registeredDefault,fixed)
title("A: Default Registration")

Step 3: Improve Registration by Tuning Optimizer and Metric

The initial registration is not very good. There are still significant regions of poor alignment,
particularly along the right edge. Try to improve the registration by adjusting the optimizer and
metric configuration properties.
disp(optimizer)

registration.optimizer.OnePlusOneEvolutionary

7-34
Register Multimodal MRI Images

Properties:
GrowthFactor: 1.050000e+00
Epsilon: 1.500000e-06
InitialRadius: 6.250000e-03
MaximumIterations: 100

disp(metric)

registration.metric.MattesMutualInformation

Properties:
NumberOfSpatialSamples: 500
NumberOfHistogramBins: 50
UseAllPixels: 1

The InitialRadius property of the optimizer controls the initial step size used in parameter space
to refine the geometric transformation. When multimodal registration problems do not converge with
the default parameters, InitialRadius is a good first parameter to adjust. Start by reducing the
default value of InitialRadius by a scale factor of 3.5.

optimizer.InitialRadius = optimizer.InitialRadius/3.5;
registeredAdjustedInitialRadius = imregister(moving,fixed,"affine",optimizer,metric);

Display the result. Adjusting InitialRadius has a positive impact. There is a noticeable
improvement in the alignment of the images at the top and right edges.

imshowpair(registeredAdjustedInitialRadius,fixed)
title("B: Adjusted InitialRadius")

7-35
7 Image Registration

The MaximumIterations property of the optimizer controls the maximum number of iterations that
the optimizer will be allowed to take. Increasing MaximumIterations allows the registration search
to run longer and potentially find better registration results. Does the registration continue to
improve if the InitialRadius from the last step is used with a large number of iterations?

optimizer.MaximumIterations = 300;
registeredAdjustedInitialRadius300 = imregister(moving,fixed,"affine",optimizer,metric);

Display the results. Further improvement in registration was achieved by reusing the
InitialRadius optimizer setting from the previous registration and allowing the optimizer to take a
large number of iterations.

imshowpair(registeredAdjustedInitialRadius300,fixed)
title("C: Adjusted InitialRadius, MaximumIterations = 300")

7-36
Register Multimodal MRI Images

Step 4: Improve Registration Using Initial Conditions

Optimization based registration works best when a good initial condition can be given for the
registration that relates the moving and fixed images. A useful technique for getting improved
registration results is to start with more simple transformation types such as rigid or similarity
transformations, and then to use the resulting transformation as an initial condition for more
complicated affine transformation types.

The function imregtform uses the same algorithm as imregister, but returns a geometric
transformation object as output instead of a registered output image. Use imregtform to get an
initial transformation estimate based on a similarity transformation consisting of translation, rotation,
and isotropic scaling. Use the tuned optimizer settings.

tformSimilarity = imregtform(moving,fixed,"similarity",optimizer,metric)

7-37
7 Image Registration

tformSimilarity =
simtform2d with properties:

Dimensionality: 2
Scale: 1.0390
RotationAngle: -6.1345
Translation: [-51.1491 6.9891]
R: [2x2 double]
A: [3x3 double]

Because the registration is being solved in the default coordinate system, also known as the intrinsic
coordinate system, obtain the default spatial referencing object that defines the location and
resolution of the fixed image.

Rfixed = imref2d(size(fixed));

Use imwarp to apply the geometric transformation output from imregtform to the moving image to
align it with the fixed image. Use the OutputView name-value argument in imwarp to assign to the
moving image the same resolution and world limits as the fixed image.

registeredSimilarity = imwarp(moving,tformSimilarity,OutputView=Rfixed);

Display the result.

imshowpair(registeredSimilarity,fixed)
title("D: Registration Based on Similarity Transformation Model")

7-38
Register Multimodal MRI Images

Refine the registration by using an affine transformation model and specifying the similarity
transformation as the initial condition. The refined estimate for the registration includes the
possibility of shear.

registeredAffineWithIC = imregister(moving,fixed,"affine",optimizer,metric, ...


InitialTransformation=tformSimilarity);

Display the result. Refining the registration with a similarity initial condition yields a nice registration
result.

imshowpair(registeredAffineWithIC,fixed)
title("E: Registration from Affine Model Based on Similarity Initial Condition")

7-39
7 Image Registration

Step 5: Decide When Enough is Enough

Comparing the results of running imregister with different configurations and initial conditions, it
becomes apparent that there are a large number of input parameters that can be varied in
imregister, each of which may lead to different registration results.

It can be difficult to quantitatively compare registration results because there is no one quality metric
that accurately describes the alignment of two images. Often, registration results must be judged
qualitatively by visualizing the results. In the results above, the registration results in C) and E) are
both very good and are difficult to tell apart visually.

Step 6: Alternate Visualizations

Often as the quality of multimodal registrations improve it becomes more difficult to judge the quality
of registration visually. This is because the intensity differences can obscure areas of misalignment.

7-40
Register Multimodal MRI Images

Sometimes switching to a different display mode for imshowpair exposes hidden details. (This is not
always the case.)

See Also
imregister | imregconfig | imwarp | imref2d | OnePlusOneEvolutionary |
MattesMutualInformation | MeanSquares | RegularStepGradientDescent

7-41
7 Image Registration

Register Multimodal 3-D Medical Images

This example shows how to automatically align two volumetric images using intensity-based
registration. In this example, you register multimodal images acquired using magnetic resonance
imaging (MRI) and computed tomography (CT). Multimodal images can be misaligned due to
differences in patient positioning (translation or rotation) and pixel size (scaling).

In image registration, there is a fixed image and a moving image. The position of the fixed image does
not change. A geometric transformation is applied to the moving image to align it with the fixed
image. Intensity-based image registration techniques use information about the pixel intensities to
calculate the geometric transformation. Intensity-based registration is often appropriate for medical
images and remote sensing images. To learn more about alternative registration techniques, such as
feature-based and control point registration, see “Choose Image Registration Technique” on page 7-2.
This example uses two intensity-based approaches for 3-D image registration:

• Register the images directly using imregister.


• Estimate the geometric transformation required to map the moving image to the fixed image using
imregtform, then apply the transformation using imwarp.

Medical Imaging Toolbox™ provides objects and functions that simplify this workflow to automatically
manage spatial referencing in the patient coordinate system. To get started, see “Medical Image
Registration” (Medical Imaging Toolbox).

Load Images

This example uses a CT image and a T1 weighted MRI image collected from the same patient. The 3-
D CT and MRI data sets are from The Retrospective Image Registration Evaluation (RIRE) Dataset,
provided by Dr. Michael Fitzpatrick. For more information, see the RIRE Project homepage.

In this example, the MRI scan is the fixed image and the CT image is the moving image. The images
are stored in the file formats used by the RIRE Project. Use the helperReadHeaderRIRE helper
function to obtain the metadata associated with each image. The helper function is attached to this
example as a supporting file.
fixedHeader = helperReadHeaderRIRE("rirePatient007MRT1.header");
movingHeader = helperReadHeaderRIRE("rirePatient007CT.header");

Use multibandread to read the binary files that contain the image data.
fixedVolume = multibandread("rirePatient007MRT1.bin", ...
[fixedHeader.Rows fixedHeader.Columns fixedHeader.Slices], ...
"int16=>single",0,"bsq","ieee-be");

movingVolume = multibandread("rirePatient007CT.bin", ...


[movingHeader.Rows movingHeader.Columns movingHeader.Slices], ...
"int16=>single",0,"bsq","ieee-be");

Display Unregistered Images

Judge the alignment of the unregistered volumes by displaying the middle transverse slice of each
volume using imshowpair. The MRI slice is magenta, and the CT slice is green. The scaling
differences indicate that the images have different pixel spacing.
centerFixed = size(fixedVolume,3)/2;
centerMoving = size(movingVolume,3)/2;

7-42
Register Multimodal 3-D Medical Images

figure
imshowpair(movingVolume(:,:,centerMoving),fixedVolume(:,:,centerFixed))
title("Unregistered Transverse Slice")

You can also display the volumes as 3-D objects. Create a viewer3d object, in which you can display
multiple volumes.
viewerUnregistered = viewer3d(BackgroundColor="black",BackgroundGradient="off");

Display the medicalVolume objects as 3-D isosurfaces by using the volshow function. You can
rotate the volumes by clicking and dragging in the viewer window. When you plot the 3-D isosurfaces
in intrinsic coordinates, they appear vertically distorted because the slice thickness is different than
the in-plane pixel spacing in world coordinates.
volshow(fixedVolume,Parent=viewerUnregistered,RenderingStyle="Isosurface", ...
Colormap=[1 0 1],Alphamap=1);

7-43
7 Image Registration

volshow(movingVolume,Parent=viewerUnregistered,RenderingStyle="Isosurface", ...
Colormap=[0 1 0],Alphamap=1);

Define Spatial Referencing

You can improve the display and registration results for images with different pixel spacing by using
spatial referencing. For this data set, the header files define the pixel spacing of the CT and MRI
images. Use this metadata, plus the size of each volume in voxels, to create imref3d spatial
referencing objects. The imref3d object properties define the position of an image volume in its
world coordinate system and the pixel spacing in each dimension. For example, the
PixelExtentInWorldX properties of the fixed and moving imref3d objects indicate pixel spacing
along the X-axes of the fixed and moving volumes of 1.25 mm and 0.6536 mm, respectively. Units are
in millimeters because the header information used to construct the spatial referencing is in
millimeters.

Rfixed = imref3d(size(fixedVolume),fixedHeader.PixelSize(2), ...


fixedHeader.PixelSize(1),fixedHeader.SliceThickness)

Rfixed =
imref3d with properties:

XWorldLimits: [0.6250 320.6250]


YWorldLimits: [0.6250 320.6250]
ZWorldLimits: [2 106]
ImageSize: [256 256 26]
PixelExtentInWorldX: 1.2500
PixelExtentInWorldY: 1.2500

7-44
Register Multimodal 3-D Medical Images

PixelExtentInWorldZ: 4
ImageExtentInWorldX: 320
ImageExtentInWorldY: 320
ImageExtentInWorldZ: 104
XIntrinsicLimits: [0.5000 256.5000]
YIntrinsicLimits: [0.5000 256.5000]
ZIntrinsicLimits: [0.5000 26.5000]

Rmoving = imref3d(size(movingVolume),movingHeader.PixelSize(2), ...


movingHeader.PixelSize(1),movingHeader.SliceThickness)

Rmoving =
imref3d with properties:

XWorldLimits: [0.3268 334.9674]


YWorldLimits: [0.3268 334.9674]
ZWorldLimits: [2 114]
ImageSize: [512 512 28]
PixelExtentInWorldX: 0.6536
PixelExtentInWorldY: 0.6536
PixelExtentInWorldZ: 4
ImageExtentInWorldX: 334.6406
ImageExtentInWorldY: 334.6406
ImageExtentInWorldZ: 112
XIntrinsicLimits: [0.5000 512.5000]
YIntrinsicLimits: [0.5000 512.5000]
ZIntrinsicLimits: [0.5000 28.5000]

Approach 1: Register Images Using imregister

The imregister function performs registration and returns the registered moving image volume.
Use this approach when you want the registered image data and do not need the geometric
transformation used to perform the registration.

The imregister function sets the value of fill pixels added to the registered volume to 0. To improve
the display of registration results, scale the CT intensities to the range [0, 1], so that the fill value
is equal to the minimum of the image data range.

rescaledMovingVolume = rescale(movingVolume);

The imregister function requires you to specify an optimizer and metric configuration for the
registration calculation. Define the configuration by using the imregconfig function. Specify the
modality as "multimodal" because the CT and MRI images are from different modalities.

[optimizer,metric] = imregconfig("multimodal");

Change the value of the InitialRadius property of the optimizer to achieve better convergence
during registration.

optimizer.InitialRadius = 0.004;

Align the images using imregister. Specify the spatial referencing information so the function
converges to better results more quickly. The misalignment between the two spatially referenced
volumes includes translation and rotation, so use a rigid transformation to register the images.

7-45
7 Image Registration

movingRegisteredVolume = imregister(rescaledMovingVolume,Rmoving,fixedVolume,Rfixed, ...


"rigid",optimizer,metric);

To assess the registration results, display the middle transverse slices of the fixed volume and
registered moving volume. The images appear more aligned. The imregister function also adjusts
the spatial referencing of the moving image to match the fixed image, so the images have the same
number of voxels in each dimension without scaling differences.

figure
imshowpair(movingRegisteredVolume(:,:,centerMoving),fixedVolume(:,:,centerFixed))
title("Registered Transverse Slice - imregister")

Display the registered volumes as 3-D isosurfaces using volshow. You can zoom and rotate the
display to assess the registration results.

viewerRegistered1 = viewer3d(BackgroundColor="black",BackgroundGradient="off");
volFixed1 = volshow(fixedVolume,Parent=viewerRegistered1,RenderingStyle="Isosurface", ...
Colormap=[1 0 1],Alphamap=1);
volRegistered1 = volshow(movingRegisteredVolume,Parent=viewerRegistered1,RenderingStyle="Isosurfa
Colormap=[0 1 0],Alphamap=1);

7-46
Register Multimodal 3-D Medical Images

Optionally, you can view the registered volumes in world coordinates with correct voxel spacing by
setting the Transformation property of the Volume objects created by volshow. Specify the
Transformation value to scale the volumes based on the voxel spacing from the fixed volume
header file. Use the spacing from the fixed volume file for both volumes because the registered
moving volume has been resampled in the voxel grid of the fixed volume.

sx = fixedHeader.PixelSize(1);
sy= fixedHeader.PixelSize(2);
sz = fixedHeader.SliceThickness;

A = [sx 0 0 0; 0 sy 0 0; 0 0 sz 0; 0 0 0 1];
tformVol = affinetform3d(A);

volFixed1.Transformation = tformVol;
volRegistered1.Transformation = tformVol;

7-47
7 Image Registration

Approach 2: Estimate and Apply 3-D Geometric Transformation

The imregister function registers images, but does not return information about the geometric
transformation applied to the moving image. When you are interested in the estimated geometric
transformation, you can use the imregtform function to get a geometric transformation object that
stores information about the transformation. The imregtform function has the same input
arguments and uses the same algorithm as imregister.
tform = imregtform(rescaledMovingVolume,Rmoving,fixedVolume,Rfixed, ...
"rigid",optimizer,metric)

tform =
rigidtform3d with properties:

Dimensionality: 3
Translation: [-15.8648 -17.5692 29.1830]
R: [3×3 double]
A: [4×4 double]

The A property of tform specifies the 3-D affine transformation matrix used to align the moving
image to the fixed image. Because the inputs to imregtform are spatially referenced, the geometric
transformation maps points in the world coordinate system from moving to fixed.
tform.A

ans = 4×4

7-48
Register Multimodal 3-D Medical Images

0.9704 0.0228 0.2404 -15.8648


-0.0143 0.9992 -0.0369 -17.5692
-0.2410 0.0324 0.9700 29.1830
0 0 0 1.0000

You can apply the geometric transformation from imregtform to the moving image volume by using
the imwarp function. Specify the moving volume, spatial referencing for the moving volume, and
transformation from imregtform. The OutputView name-value argument specifies the spatial
referencing for the transformed output image volume. To produce the same results as imregister,
specify the OutputView as the imref3d object for the fixed image. This creates a registered image
volume with the same spatial referencing as the fixed volume.

movingRegisteredVolume = imwarp(rescaledMovingVolume,Rmoving,tform, ...


"bicubic",OutputView=Rfixed);

To assess the registration results, display the middle transverse slices of the fixed volume and
transformed moving volume. The results are the same as the results from imregister.

figure
imshowpair(movingRegisteredVolume(:,:,centerFixed), ...
fixedVolume(:,:,centerFixed))
title("Registered Transverse Slice - imregtform")

Display the registered volumes as 3-D isosurfaces using volshow. View the volume in world
coordinates by setting the Transformation property as a name-value argument.

viewerRegistered2 = viewer3d(BackgroundColor="black",BackgroundGradient="off");
volshow(fixedVolume,Parent=viewerRegistered2,RenderingStyle="Isosurface", ...
Colormap=[1 0 1],Alphamap=1,Transformation=tformVol);
volshow(movingRegisteredVolume,Parent=viewerRegistered2,RenderingStyle="Isosurface", ...
Colormap=[0 1 0],Alphamap=1,Transformation=tformVol);

7-49
7 Image Registration

See Also
imregister | imregconfig | imwarp | imregtform | imref3d | imshowpair | volshow

Related Examples
• “Register Multimodal Medical Image Volumes with Spatial Referencing” (Medical Imaging
Toolbox)
• “Register Multimodal MRI Images” on page 7-32

More About
• “Choose Image Registration Technique” on page 7-2
• “Create an Optimizer and Metric for Intensity-Based Image Registration” on page 7-26

7-50
Registering an Image Using Normalized Cross-Correlation

Registering an Image Using Normalized Cross-Correlation

This example shows how to find a template image within a larger image. Sometimes one image is a
subset of another. Normalized cross-correlation can be used to determine how to register or align the
images by translating one of them.

Step 1: Read Image

onion = imread("onion.png");
peppers = imread("peppers.png");

imshow(onion)

imshow(peppers)

7-51
7 Image Registration

Step 2: Choose Subregions of Each Image

It is important to choose regions that are similar. The image sub_onion will be the template, and
must be smaller than the image sub_peppers. You can get these subregions using either the non-
interactive script below or the interactive script.

% non-interactively
rect_onion = [111 33 65 58];
rect_peppers = [163 47 143 151];
sub_onion = imcrop(onion,rect_onion);
sub_peppers = imcrop(peppers,rect_peppers);

% OR

% interactively
%[sub_onion,rect_onion] = imcrop(onion); % choose the pepper below the onion
%[sub_peppers,rect_peppers] = imcrop(peppers); % choose the whole onion

% display sub images


imshow(sub_onion)

7-52
Registering an Image Using Normalized Cross-Correlation

imshow(sub_peppers)

Step 3: Do Normalized Cross-Correlation and Find Coordinates of Peak

Calculate the normalized cross-correlation and display it as a surface plot. The peak of the cross-
correlation matrix occurs where the subimages are best correlated. normxcorr2 only works on
grayscale images, so we pass it the red plane of each subimage.

c = normxcorr2(sub_onion(:,:,1),sub_peppers(:,:,1));
figure
surf(c)
shading flat

7-53
7 Image Registration

Step 4: Find the Total Offset Between the Images

The total offset or translation between images depends on the location of the peak in the cross-
correlation matrix, and on the size and position of the subimages.
% offset found by correlation
[max_c,imax] = max(abs(c(:)));
[ypeak,xpeak] = ind2sub(size(c),imax(1));
corr_offset = [(xpeak-size(sub_onion,2))
(ypeak-size(sub_onion,1))];

% relative offset of position of subimages


rect_offset = [(rect_peppers(1)-rect_onion(1))
(rect_peppers(2)-rect_onion(2))];

% total offset
offset = corr_offset + rect_offset;
xoffset = offset(1);
yoffset = offset(2);

Step 5: See if Onion Image was Extracted from Peppers Image

Figure out where onion falls inside of peppers.


xbegin = round(xoffset + 1);
xend = round(xoffset + size(onion,2));
ybegin = round(yoffset + 1);
yend = round(yoffset + size(onion,1));

7-54
Registering an Image Using Normalized Cross-Correlation

% extract region from peppers and compare to onion


extracted_onion = peppers(ybegin:yend,xbegin:xend,:);
if isequal(onion,extracted_onion)
disp("onion.png was extracted from peppers.png")
end

onion.png was extracted from peppers.png

Step 6: Pad Onion Image to Size of Peppers Image

Pad the onion image to overlay on peppers, using the offset determined above.

recovered_onion = uint8(zeros(size(peppers)));
recovered_onion(ybegin:yend,xbegin:xend,:) = onion;
imshow(recovered_onion)

Step 7: Use Alpha Blending to Show Images Together

Display one plane of the peppers image with the recovered_onion image using alpha blending.

imshowpair(peppers(:,:,1),recovered_onion,"blend")

7-55
7 Image Registration

7-56
Control Point Registration

Control Point Registration


Image Processing Toolbox provides tools to support point mapping to determine the parameters of
the transformation required to bring an image into alignment with another image. In point mapping,
you pick pairs of control points in two images that identify the same features or landmarks in the
images. Then, infer a geometric transformation from the positions of these control points. Finally, you
apply the geometric transformation to the moving image, resulting in an image that is aligned with
the fixed image.

The figure provides an illustration of this process. See “Register Images with Projection Distortion
Using Control Points” on page 7-80 for an extended example.

7-57
7 Image Registration

You may need to perform several iterations of this process, experimenting with different types of
transformations, before you achieve a satisfactory result. Sometimes, you can perform successive
registrations, removing gross global distortions first, and then removing smaller local distortions in
subsequent passes.

See Also
cpselect | cpcorr | fitgeotform2d | imwarp

Related Examples
• “Register Images with Projection Distortion Using Control Points” on page 7-80

More About
• “Control Point Selection Procedure” on page 7-61
• “Choose Image Registration Technique” on page 7-2

7-58
Geometric Transformation Types for Control Point Registration

Geometric Transformation Types for Control Point Registration


The fitgeotform2d function can infer the parameters from control point pairs for the following
types of geometric transformations, listed in order of complexity.

Transformation Type Description Minimum Example


Number of
Control Point
Pairs
"similarity" Use this transformation when shapes in the 2
moving image are unchanged, but the image is
distorted by some combination of translation,
rotation, and isotropic scaling. Straight lines
remain straight, and parallel lines are still
parallel.
"reflectivesimilarit Same as "similarity" with the addition of 3
y" optional reflection.

"affine" Use this transformation when shapes in the 3


moving image exhibit shearing. Straight lines
remain straight, and parallel lines remain
parallel, but rectangles become parallelograms.
"projective" Use this transformation when the scene appears 4
tilted. Straight lines remain straight, but
parallel lines converge toward a vanishing
point.
"polynomial" Use this transformation when objects in the 6 (order 2)
image are curved. The higher the order of the
polynomial, the better the fit, but the result can 10 (order 3)
contain more curves than the fixed image.
15 (order 4)
"pwl" Use this transformation (piecewise linear) when 4
parts of the image appear distorted differently.
"lwm" Use this transformation (local weighted mean), 6 (12
when the distortion varies locally and piecewise recommended)
linear is not sufficient.

The first five transformations, "similarity", "reflectivesimilarity", "affine",


"projective", and "polynomial", are global transformations. In these transformations, a single
mathematical expression applies to an entire image. The last two transformations, "pwl" (piecewise
linear) and "lwm" (local weighted mean), are local transformations. In these transformations,
different mathematical expressions apply to different regions within an image. When exploring how
different transformations affect the images you are working with, try the global transformations first.
If these transformations are not satisfactory, try the local transformations: the piecewise linear
transformation first, and then the local weighted mean transformation.

Your choice of transformation type affects the number of control point pairs you must select. For
example, a similarity transformation without reflection requires at least two control point pairs. A
fourth order polynomial transformation requires 15 control point pairs. For more information about
these transformation types, and the special syntaxes they require, see cpselect.

7-59
7 Image Registration

See Also
fitgeotform2d

Related Examples
• “Register Images with Projection Distortion Using Control Points” on page 7-80

More About
• “Control Point Registration” on page 7-57
• “Control Point Selection Procedure” on page 7-61
• “Choose Image Registration Technique” on page 7-2

7-60
Control Point Selection Procedure

Control Point Selection Procedure

To specify control points in a pair of images interactively, use the Control Point Selection Tool,
cpselect. The tool displays the image you want to register, called the moving image, next to the
reference image, called the fixed image.

Specifying control points is a four-step process:

1 Start the tool on page 7-63, specifying the moving image and the fixed image.
2 Use navigation aids to explore the image on page 7-65, looking for visual elements that you can
identify in both images. cpselect provides many ways to navigate around the image. You can
pan and zoom to view areas of the image in more detail.
3 Specify matching control point pairs on page 7-68 in the moving image and the fixed image.
4 Save the control points on page 7-73 in the workspace.

The following figure shows the default appearance of the tool when you first start it.

7-61
7 Image Registration

See Also
cpselect

More About
• “Start the Control Point Selection Tool” on page 7-63
• “Control Point Registration” on page 7-57

7-62
Start the Control Point Selection Tool

Start the Control Point Selection Tool


To use the Control Point Selection Tool, enter the cpselect command at the MATLAB prompt. As
arguments, specify the image you want to register (the moving image) and the image you want to
compare it to (the fixed image).

The cpselect command has other optional arguments. You can import existing control points, so
that you can use the Control Point Selection Tool to modify, delete, or add to existing control points.
For example, you can restart a control point selection session by including a cpstruct structure as
the third argument. For more information about restarting sessions, see “Export Control Points to the
Workspace” on page 7-73.

For simplicity, this example uses the same image as the moving and the fixed image, and no prior
control points are imported. To walk through an example of an actual registration, see “Register
Images with Projection Distortion Using Control Points” on page 7-80.

moon_fixed = imread("moon.tif");
moon_moving = moon_fixed;
cpselect(moon_moving, moon_fixed);

When the Control Point Selection Tool starts, it contains three primary components:

• Detail windows—The two windows displayed at the top of the tool are called the Detail windows.
These windows show a close-up view of a portion of the images you are working with. The moving
image is on the left and the fixed image is on the right.
• Overview windows—The two windows displayed at the bottom of the tool are called the Overview
windows. These windows show the images in their entirety, at the largest scale that fits the
window. The moving image is on the left and the fixed image is on the right. You can control
whether the Overview window appears by using the View menu.
• Detail rectangles—Superimposed on the images displayed in the two Overview windows is a
rectangle, called the Detail rectangle. This rectangle controls the part of the image that is visible
in the Detail window. By default, at startup, the detail rectangle covers one quarter of the entire
image and is positioned over the center of the image. You can move the Detail rectangle to change
the portion of the image displayed in the Detail windows.

The following figure shows these components of the Control Point Selection Tool.

7-63
7 Image Registration

The next step is to use navigation aids to explore the image, looking for visual elements shared by
both images. For more information, see “Find Visual Elements Common to Both Images” on page 7-
65.

See Also
cpselect

More About
• “Find Visual Elements Common to Both Images” on page 7-65
• “Export Control Points to the Workspace” on page 7-73
• “Control Point Selection Procedure” on page 7-61

7-64
Find Visual Elements Common to Both Images

Find Visual Elements Common to Both Images


To find visual elements that are common to both images, you can change the section of the image
displayed in the Detail view. You can also zoom in on a part of the image to view it in more detail. The
following sections describe the different ways to change your view of the images in the Control Point
Selection Tool.

Use Scroll Bars to View Other Parts of an Image


To view parts of an image that are not visible in the Detail or Overview windows, use the scroll bars
provided for each window.

As you scroll the image in the Detail window, note how the Detail rectangle moves over the image in
the Overview window. The position of the Detail rectangle always shows the portion of the image in
the Detail window.

Use the Detail Rectangle to Change the View


To get a closer view of any part of the image, move the Detail rectangle in the Overview window over
that section of the image. The Control Point Selection Tool displays that section of the image in the
Detail window at a higher magnification than the Overview window.

To move the detail rectangle,

1
Move the pointer into the Detail rectangle. The cursor changes to the fleur shape, .
2 Press and hold the mouse button to drag the detail rectangle anywhere on the image.

As you move the Detail rectangle over the image in the Overview window, the view of the image
displayed in the Detail window changes.

Pan the Image Displayed in the Detail Window


To change the section of the image displayed in the Detail window, use the pan tool to move the
image in the window.

To use the pan tool,


1
Click the Pan button in the Control Point Selection Tool toolbar or select Pan from the
Tools menu.
2 Move the pointer over the image in the Detail window. The cursor changes to the hand shape,

.
3
Press and hold the mouse button. The cursor changes to a closed fist shape, . Use the mouse
to move the image in the Detail window.

As you move the image in the Detail window, the Detail rectangle in the Overview window moves.

Zoom In and Out on an Image


To enlarge an image to get a closer look or shrink an image to see the whole image in context, you
can zoom in or zoom out on the images displayed. You can also zoom in or out on an image by

7-65
7 Image Registration

changing the magnification. See “Specify the Magnification of the Images” on page 7-66 for more
information.

To zoom in or zoom out on the fixed or moving images,

1 Click the appropriate magnifying glass button on the Control Point Selection Tool toolbar or
select Zoom In or Zoom Out from the Tools menu.

2 Move the pointer over the image in the Detail window that you want to zoom in or out on. The

cursor changes to the appropriate magnifying glass shape, such as . Position the cursor over
a location in the image and click the mouse. With each click, the Control Point Selection Tool
changes the magnification of the image by a preset amount. (See “Specify the Magnification of
the Images” on page 7-66 for a list of some of these magnifications.) cpselect centers the new
view of the image on the spot where you clicked.

Another way to use the Zoom tool to zoom in on an image is to position the cursor over a location
in the image. While pressing and holding the mouse button, draw a rectangle defining the area
you want to zoom in on. The Control Point Selection Tool magnifies the image so that the chosen
section fills the Detail window. The tool resizes the detail rectangle in the Overview window as
well.

The size of the Detail rectangle in the Overview window changes as you zoom in or out on the
image in the Detail window.

To keep the relative magnifications of the fixed and moving images synchronized as you zoom in
or out, click the Lock ratio check box. See “Lock the Relative Magnification of the Moving and
Fixed Images” on page 7-67 for more information.

Specify the Magnification of the Images


To enlarge an image to get a closer look or to shrink an image to see the whole image in context, use
the magnification edit box. (You can also use the Zoom buttons to enlarge or shrink an image. See
“Zoom In and Out on an Image” on page 7-65 for more information.)

To change the magnification of an image:

1 Move the cursor into the magnification edit box of the window you want to change. The cursor
changes to the text entry cursor.
2 Type a new value in the magnification edit box and press Enter, or click the menu associated
with the edit box and choose from a list of preset magnifications. The Control Point Selection Tool
changes the magnification of the image and displays the new view in the appropriate window. To
keep the relative magnifications of the fixed and moving images synchronized as you change the
magnification, click the Lock ratio check box. See “Lock the Relative Magnification of the
Moving and Fixed Images” on page 7-67 for more information.

7-66
Find Visual Elements Common to Both Images

Lock the Relative Magnification of the Moving and Fixed Images


To keep the relative magnification of the moving and fixed images automatically synchronized in the
Detail windows, click the Lock Ratio check box.

When the Lock Ratio check box is selected, the Control Point Selection Tool changes the
magnification of both the moving and fixed images when you zoom in or out on either one of the
images on page 7-65 or specify a magnification value on page 7-66 for either of the images.

The next step is to specify matching control point pairs. For more information, see “Select Matching
Control Point Pairs” on page 7-68.

See Also

More About
• “Start the Control Point Selection Tool” on page 7-63
• “Select Matching Control Point Pairs” on page 7-68
• “Control Point Selection Procedure” on page 7-61

7-67
7 Image Registration

Select Matching Control Point Pairs


The Control Point Selection Tool enables you to pick control points in the image to be registered (the
moving image) and the reference image (the fixed image). When you start cpselect, point selection
is enabled, by default.

You specify control points by pointing and clicking in the moving and fixed images, in either the Detail
or the Overview windows. Each point you specify in the moving image must have a match in the fixed
image. The following sections describe the ways you can use the Control Point Selection Tool to
choose control point pairs.

Pick Control Point Pairs Manually


To specify a pair of control points in your images,

1
Click the Control Point Selection button in the Control Point Selection Tool toolbar or select
Add Points from the Tools menu. Control point selection mode is active by default. The cursor

changes to a cross-hairs shape


2 Position the cursor over a feature you have visually selected in any of the images displayed and

click the mouse button. cpselect places a control point symbol, , at the position you
specified, in both the Detail window and the corresponding Overview window. cpselect
numbers the points as you select them. The appearance of the control point symbol indicates its
current state. The circle around the point indicates that it is the currently selected point. The
number identifies control point pairs.

Note Depending on where in the image you select control points, the symbol for the point may
be visible in the Overview window, but not in the Detail window.
3 You can select another point in the same image or you can move to the corresponding image and
create a match for the point. To create the match for this control point, position the cursor over
the same feature in the corresponding Detail or Overview window and click the mouse button.
cpselect places a control point symbol at the position you specified, in both the Detail and
Overview windows. You can work in either direction: picking control points in either of the Detail
windows, moving or fixed, or in either of the Overview windows, moving or fixed.

To match an unmatched control point, select it, and then pick a point in the corresponding window.
You can move on page 7-71 or delete on page 7-71 control points after you create them.

The following figure illustrates control points in several states.

7-68
Select Matching Control Point Pairs

Use Control Point Prediction


Instead of picking matching control points yourself, you can let the Control Point Selection Tool
estimate the match for the control points you specify, automatically. The Control Point Selection Tool
determines the position of the matching control point based on the geometric relationship of the
previously selected control points, not on any feature of the underlying images.

To illustrate point prediction, this figure shows four control points selected in the moving image,
where the points form the four corners of a square. (The control point selections in the figure do not
attempt to identify any landmarks in the image.) The figure shows the picking of a fourth point, in the
left window, and the corresponding predicted point in the right window. Note how the Control Point
Selection Tool places the predicted point at the same location relative to the other control points,
forming the bottom right corner of the square.

7-69
7 Image Registration

Note By default, the Control Point Selection Tool does not include predicted points in the set of valid
control points returned in movingPoints or fixedPoints. To include predicted points, you must
accept them by selecting the points and fine-tuning their position with the cursor. When you move a
predicted point, the Control Point Selection Tool changes the symbol to indicate that it has changed
to a standard control point. For more information, see “Move Control Points” on page 7-71.

To use control point prediction,

7-70
Select Matching Control Point Pairs

1
Click the Control Point Prediction button .

Note The Control Point Selection Tool predicts control point locations based on the locations of
previous control points. You cannot use point prediction until you have a minimum of two pairs of
matched points. Until this minimum is met, the Control Point Prediction button is disabled.
2 Position the cursor anywhere in any of the images displayed. The cursor changes to the cross-

hairs shape, .

You can pick control points in either of the Detail windows, moving or fixed, or in either of the
Overview windows, moving or fixed. You also can work in either direction: moving-to-fixed image
or fixed-to-moving image.
3 Click either mouse button. The Control Point Selection Tool places a control point symbol at the
position you specified and places another control point symbol for a matching point in all the

other windows. The symbol for the predicted point contains the letter P, indicating that it is
a predicted control point.
4 To accept a predicted point, select it with the cursor and move it. The Control Point Selection
Tool removes the P from the point.

Move Control Points


To move a control point,

1
Click the Control Point Selection button .
2 Position the cursor over the control point you want to move. The cursor changes to the fleur

shape,
3 Press and hold the mouse button and drag the control point. The state of the control point
changes to selected when you move it.

If you move a predicted control point, the state of the control point changes to a regular
(nonpredicted) control point.

Delete Control Points


To delete a control point, and its matching point, if one exists,

1
Click the Control Point Selection button .
2 Click the control point you want to delete. Its state changes to selected. If the control point has a
match, both points become active.
3 Delete the point (or points) using one of these methods:

• Pressing the Backspace key


• Pressing the Delete key
• Choosing one of the delete options from the Edit menu

Using this menu, you can delete individual points or pairs of matched points, in the moving or
fixed images.

7-71
7 Image Registration

See Also

More About
• “Find Visual Elements Common to Both Images” on page 7-65
• “Export Control Points to the Workspace” on page 7-73
• “Control Point Selection Procedure” on page 7-61

7-72
Export Control Points to the Workspace

Export Control Points to the Workspace


After you specify control points, you must save them in the workspace to make them available for the
next step in image registration, processing by fitgeotform2d.

To save control points to the workspace, select File on the Control Point Selection Tool menu bar,
then choose the Export Points to Workspace option. The Control Point Selection Tool displays this
dialog box:

By default, the Control Point Selection Tool saves the coordinates of valid control points. The Control
Point Selection Tool does not include unmatched and predicted points in the movingPoints and
fixedPoints arrays. The arrays are n-by-2 arrays, where n is the number of valid control point pairs
you selected. The two columns represent the x- and y-coordinates of the control points, respectively,
in the intrinsic coordinate system of the image.

This example shows the movingPoints array containing four valid pairs of control points.

movingPoints =

215.6667 262.3333
225.7778 311.3333
156.5556 340.1111
270.8889 368.8889

To save the current state of the Control Point Selection Tool, including unpaired and predicted control
points, select the Structure with all points check box.

This option saves the positions of all control points and their current state in a cpstruct structure.

7-73
7 Image Registration

cpstruct =

inputPoints: [4x2 double]


basePoints: [4x2 double]
inputBasePairs: [4x2 double]
ids: [4x1 double]
inputIdPairs: [4x2 double]
baseIdPairs: [4x2 double]
isInputPredicted: [4x1 double]
isBasePredicted: [4x1 double]

You can use the cpstruct to restart a control point selection session at the point where you left off.
This option is useful if you are picking many points over a long time and want to preserve unmatched
and predicted points when you resume work.

To extract the arrays of valid control point coordinates from a cpstruct, use the cpstruct2pairs
function.

The Control Point Selection Tool also asks if you want to save your control points when you exit the
tool.

See Also
cpselect | cpstruct2pairs | fitgeotform2d

More About
• “Select Matching Control Point Pairs” on page 7-68
• “Control Point Selection Procedure” on page 7-61
• “Image Coordinate Systems” on page 2-63

7-74
Find Image Rotation and Scale

Find Image Rotation and Scale

This example shows how to align or register two images that differ by a rotation and a scale change.
You can calculate the rotation angle and scale factor and transform the distorted image to recover the
original image.

Step 1: Read Image

Read an image into the workspace.

original = imread("cameraman.tif");
imshow(original)
text(size(original,2),size(original,1)+15, ...
"Image courtesy of Massachusetts Institute of Technology", ...
FontSize=7,HorizontalAlignment="right")

Step 2: Resize and Rotate the Image

Create a distorted version of the image by resizing and rotating the image. Note that imrotate
rotates images in a counterclockwise direction when you specify a positive angle of rotation.

scaleFactor = 0.7;
distorted = imresize(original,scaleFactor);

theta = 30;
distorted = imrotate(distorted,theta);
imshow(distorted)

7-75
7 Image Registration

Step 3: Select Control Points

This example specifies three pairs of control points.

movingPoints = [128.6 75.4; 151.9 163.9; 192.1 118.6];


fixedPoints = [169.1 73.6; 135.6 199.9; 217.1 171.9];

If you want to pick the control points yourself, then you can use the Control Point Selection Tool.
Open this tool by using the cpselect function.

[movingPoints,fixedPoints] = cpselect(distorted,original,"Wait",true);

Step 4: Estimate Affine Transformation

Fit a geometric transformation to your control points using the fitgeotform2d function. This
example fits a similarity transformation because the distortion consists only of rotation and isotropic
scaling.

tform = fitgeotform2d(movingPoints,fixedPoints,"similarity");

Step 5: Recover Scale Factor and Rotation Angle

The geometric transformation, tform, represents how to transform the moving image to the fixed
image. If you want to determine the scale factor and rotation angle that you applied to the fixed
image to create the moving image, then use the inverse of the geometric transformation.

tformInv = invert(tform)

tformInv =
simtform2d with properties:

Dimensionality: 2
Scale: 0.7014
RotationAngle: -29.6202

7-76
Find Image Rotation and Scale

Translation: [0.0051 89.0695]


R: [2x2 double]
A: [3x3 double]

The values of the Scale property should match the value of scaleFactor that you set in Step 2:
Resize and Rotate the Image.

The value of the RotationAngle property should have the same magnitude as the angle theta that
you set in Step 2: Resize and Rotate the Image. However, the angle in RotationAngle has the
opposite sign as theta. The sign is opposite because the simtform2d object stores the rotation
angle as the amount of rotation from the positive x-axis to the positive y-axis in intrinsic coordinates.
For images, the positive x direction points to the right and the positive y axis points downward,
therefore a positive rotation angle is in the clockwise direction. A positive rotation angle in the
clockwise direction corresponds to a negative rotation angle in the counterclockwise direction, and
vice versa.

Step 6: Recover Original Image

Recover the original image by transforming distorted, the rotated-and-scaled image, using the
geometric transformation tform and what you know about the spatial referencing of original. Use
the OutputView name-value argument to specify the resolution and grid size of the resampled
output image.
Roriginal = imref2d(size(original));
recovered = imwarp(distorted,tform,OutputView=Roriginal);

Compare recovered to original by looking at them side-by-side in a montage.


montage({original,recovered})

The recovered (right) image quality does not match the original (left) image because of the distortion
and recovery process. In particular, the image shrinking causes information loss. The artifacts around

7-77
7 Image Registration

the edges are due to the limited accuracy of the transformation. If you were to pick more points in
Step 3: Select Control Points, the transformation would be more accurate.

See Also
imresize | imrotate | cpselect | fitgeotform2d | imwarp | imref2d

More About
• “Select Matching Control Point Pairs” on page 7-68
• “Control Point Selection Procedure” on page 7-61
• “Find Image Rotation and Scale Using Automated Feature Matching” (Computer Vision Toolbox)

7-78
Use Cross-Correlation to Improve Control Point Placement

Use Cross-Correlation to Improve Control Point Placement


You can fine-tune the control points you selected using cpselect. Using cross-correlation, you can
sometimes improve the points you selected by eye using the Control Point Selection Tool.

To use cross-correlation, pass sets of control points in the moving and fixed images, along with the
images themselves, to the cpcorr function.

moving_pts_adj = cpcorr(movingPoints,fixedPoints,moving,fixed);

The cpcorr function defines 11-by-11 pixel regions around each control point in the moving image
and around the matching control point in the fixed image. The function then calculates the
correlation between the values at each pixel in the region. Next, the cpcorr function finds the
position with the highest correlation value and uses it as the optimal position of the control point. The
function only moves control points up to four pixels based on the results of the cross-correlation.

Note Features in the two images must be at the same scale and have the same orientation. They
cannot be rotated relative to each other.

If cpcorr cannot correlate some of the control points, it returns their unmodified values in
movingPoints.

See Also
cpselect | cpcorr | cpstruct2pairs

More About
• “Select Matching Control Point Pairs” on page 7-68
• “Control Point Selection Procedure” on page 7-61

7-79
7 Image Registration

Register Images with Projection Distortion Using Control


Points

This example shows how to register two images by selecting control points common to both images
and inferring a geometric transformation that aligns the control points.

Read Images

Read the image westconcordorthophoto.png into the workspace. This image is an orthophoto
that has already been registered to the ground.

ortho = imread("westconcordorthophoto.png");
imshow(ortho)
text(size(ortho,2),size(ortho,1)+15, ...
"Image courtesy of Massachusetts Executive Office of Environmental Affairs", ...
FontSize=7,HorizontalAlignment="right");

Read the image westconcordaerial.png into the workspace. This image was taken from an
airplane and is distorted relative to the orthophoto. Because the unregistered image was taken from a
distance and the topography is relatively flat, it is likely that most of the distortion is projective.

unregistered = imread("westconcordaerial.png");
imshow(unregistered)
text(size(unregistered,2),size(unregistered,1)+15, ...
"Image courtesy of mPower3/Emerge", ...
FontSize=7,HorizontalAlignment="right");

7-80
Register Images with Projection Distortion Using Control Points

Select Control Point Pairs

To select control points interactively, open the Control Point Selection tool by using the cpselect
function. Control points are landmarks that you can find in both images, such as a road intersection
or a natural feature. Select at least four pairs of control points so that cpselect can fit a projective
transformation to the control points. After you have selected corresponding moving and fixed points,
close the tool to return to the workspace.

[mp,fp] = cpselect(unregistered,ortho,Wait=true);

7-81
7 Image Registration

Infer Geometric Transformation

Find the parameters of the projective transformation that best aligns the moving and fixed points by
using the fitgeotform2d function.
t = fitgeotform2d(mp,fp,"projective")

t =
projtform2d with properties:

Dimensionality: 2
A: [3×3 double]

Transform Unregistered Image

To apply the transformation to the unregistered aerial image, use the imwarp function. Specify that
the size and position of the transformed image match the size and position of the ortho image by
using the OutputView name-value argument.
Rfixed = imref2d(size(ortho));
registered = imwarp(unregistered,t,OutputView=Rfixed);

7-82
Register Images with Projection Distortion Using Control Points

See the result of the registration by overlaying the transformed image over the original orthophoto.

imshowpair(ortho,registered,"blend")

See Also
cpselect | cpcorr | cpstruct2pairs | fitgeotform2d

More About
• “Select Matching Control Point Pairs” on page 7-68
• “Control Point Selection Procedure” on page 7-61

7-83
8

Designing and Implementing Linear


Filters for Image Data

The Image Processing Toolbox software provides a number of functions for designing and
implementing two-dimensional linear filters for image data. This chapter describes these functions
and how to use them effectively.

• “What Is Image Filtering in the Spatial Domain?” on page 8-2


• “Filter Grayscale and Truecolor (RGB) Images Using imfilter Function” on page 8-5
• “imfilter Boundary Padding Options” on page 8-9
• “Change Filter Strength Radially Outward” on page 8-12
• “Noise Removal” on page 8-18
• “Apply Gaussian Smoothing Filters to Images” on page 8-24
• “Reduce Noise in Image Gradients” on page 8-30
• “What is Guided Image Filtering?” on page 8-39
• “Perform Flash/No-flash Denoising with Guided Filter” on page 8-40
• “Segment Thermographic Image After Edge-Preserving Filtering” on page 8-44
• “Integral Image” on page 8-48
• “Apply Multiple Filters to Integral Image” on page 8-50
• “Filter Images Using Predefined Filter” on page 8-55
• “Generate HDL Code for Image Sharpening” on page 8-58
• “Adjust Image Intensity Values to Specified Range” on page 8-65
• “Gamma Correction” on page 8-67
• “Contrast Enhancement Techniques” on page 8-69
• “Specify Contrast Adjustment Limits” on page 8-73
• “Adjust Image Contrast Using Histogram Equalization” on page 8-75
• “Adaptive Histogram Equalization” on page 8-80
• “Enhance Color Separation Using Decorrelation Stretching” on page 8-83
• “Enhance Multispectral Color Composite Images” on page 8-90
• “Low-Light Image Enhancement” on page 8-100
• “Design Linear Filters in the Frequency Domain” on page 8-107
8 Designing and Implementing Linear Filters for Image Data

What Is Image Filtering in the Spatial Domain?


Filtering is a technique for modifying or enhancing an image. For example, you can filter an image to
emphasize certain features or remove other features. Image processing operations implemented with
filtering include smoothing, sharpening, and edge enhancement.

Filtering is a neighborhood operation, in which the value of any given pixel in the output image is
determined by applying some algorithm to the values of the pixels in the neighborhood of the
corresponding input pixel. A pixel's neighborhood is some set of pixels, defined by their locations
relative to that pixel. (See “Neighborhood or Block Processing: An Overview” on page 18-2 for a
general discussion of neighborhood operations.) Linear filtering is filtering in which the value of an
output pixel is a linear combination of the values of the pixels in the input pixel's neighborhood.

Convolution
Linear filtering of an image is accomplished through an operation called convolution. Convolution is a
neighborhood operation in which each output pixel is the weighted sum of neighboring input pixels.
The matrix of weights is called the convolution kernel, also known as the filter. A convolution kernel
is a correlation kernel that has been rotated 180 degrees.

For example, suppose the image is

A = [17 24 1 8 15
23 5 7 14 16
4 6 13 20 22
10 12 19 21 3
11 18 25 2 9]

and the correlation kernel is

h = [8 1 6
3 5 7
4 9 2]

You would use the following steps to compute the output pixel at position (2, 4):

1 Rotate the correlation kernel 180 degrees about its center element to create a convolution
kernel.
2 Slide the center element of the convolution kernel so that it lies on top of the (2, 4) element of A.
3 Multiply each weight in the rotated convolution kernel by the pixel of A underneath.
4 Sum the individual products from step 3.

Hence the (2, 4) output pixel is

The calculation is shown in the following figure.

8-2
What Is Image Filtering in the Spatial Domain?

Computing the (2, 4) Output of Convolution

Correlation
The operation called correlation is closely related to convolution. In correlation, the value of an
output pixel is also computed as a weighted sum of neighboring pixels. The difference is that the
matrix of weights, in this case called the correlation kernel, is not rotated during the computation.
The Image Processing Toolbox filter design functions return correlation kernels.

The following figure shows how to compute the (2, 4) output pixel of the correlation of A, assuming h
is a correlation kernel instead of a convolution kernel, using these steps:

1 Slide the center element of the correlation kernel so that lies on top of the (2, 4) element of A.
2 Multiply each weight in the correlation kernel by the pixel of A underneath.
3 Sum the individual products.

The (2, 4) output pixel from the correlation is

Computing the (2, 4) Output of Correlation

See Also
imfilter | conv2 | convn

8-3
8 Designing and Implementing Linear Filters for Image Data

Related Examples
• “Design Linear Filters in the Frequency Domain” on page 8-107
• “Noise Removal” on page 8-18

8-4
Filter Grayscale and Truecolor (RGB) Images Using imfilter Function

Filter Grayscale and Truecolor (RGB) Images Using imfilter


Function

This example shows how to filter a 2-D grayscale image with a 5-by-5 filter containing equal weights
(often called an averaging filter) using imfilter. The example also shows how to filter an truecolor
(RGB) image with the same filter. A truecolor image is a 3-D array of size m-by-n-by-3, where the last
dimension represents the three color channels. Filtering a truecolor image with a 2-D filter is
equivalent to filtering each plane of the image individually with the same 2-D filter.

There are several MATLAB® functions that perform 2-D and multidimensional filtering that can be
compared to imfilter. The function filter2 performs two-dimensional correlation, conv2
performs two-dimensional convolution, and convn performs multidimensional convolution. However,
each of these filtering functions always converts the input to double, and the output is always
double. Also, these MATLAB® filtering functions always assume the input is zero padded, and they
do not support other padding options. In contrast, imfilter does not convert input images to
double. The imfilter function also offers a flexible set of boundary padding options.

Filter 2-D Grayscale Image with an Averaging Filter

Read a grayscale image into the workspace.

I = imread("coins.png");

Display the original image.

figure
imshow(I)
title("Original Image")

Create a normalized, 5-by-5, averaging filter.

8-5
8 Designing and Implementing Linear Filters for Image Data

h = ones(5,5)/25;

Apply the averaging filter to the grayscale image using imfilter.

I2 = imfilter(I,h);

Display the filtered image.

figure
imshow(I2)
title("Filtered Image")

Filter Multidimensional Truecolor (RGB) Image Using imfilter

Read a truecolor image into the workspace.

rgb = imread("peppers.png");
imshow(rgb);

8-6
Filter Grayscale and Truecolor (RGB) Images Using imfilter Function

Create a filter. This averaging filter contains equal weights, and causes the filtered image to look
more blurry than the original.

h = ones(5,5)/25;

Filter the image using imfilter and display it.

rgb2 = imfilter(rgb,h);
figure
imshow(rgb2)

8-7
8 Designing and Implementing Linear Filters for Image Data

See Also
imfilter

More About
• “What Is Image Filtering in the Spatial Domain?” on page 8-2

8-8
imfilter Boundary Padding Options

imfilter Boundary Padding Options


When computing an output pixel at the boundary of an image, a portion of the convolution or
correlation kernel is usually off the edge of the image, as illustrated in the following figure.

When the Values of the Kernel Fall Outside the Image

The imfilter function normally fills in these off-the-edge image pixels by assuming that they are 0.
This is called zero padding and is illustrated in the following figure.

Zero Padding of Outside Pixels

When you filter an image, zero padding can result in a dark band around the edge of the image, as
shown in this example.

I = imread("eight.tif");
h = ones(5,5) / 25;
I2 = imfilter(I,h);

8-9
8 Designing and Implementing Linear Filters for Image Data

imshow(I), title("Original Image");


figure, imshow(I2), title("Filtered Image with Black Border")

To eliminate the zero-padding artifacts around the edge of the image, imfilter offers an alternative
boundary padding method called border replication. In border replication, the value of any pixel
outside the image is determined by replicating the value from the nearest border pixel. This is
illustrated in the following figure.

Replicated Boundary Pixels

To filter using border replication, pass the additional optional argument "replicate" to imfilter.

I3 = imfilter(I,h,"replicate");
figure, imshow(I3);
title("Filtered Image with Border Replication")

8-10
imfilter Boundary Padding Options

The imfilter function supports other boundary padding options, such as "circular" and
"symmetric".

See Also
imfilter

Related Examples
• “Design Linear Filters in the Frequency Domain” on page 8-107

8-11
8 Designing and Implementing Linear Filters for Image Data

Change Filter Strength Radially Outward

This example shows how to create filters that blur and darken pixels in proportion to the distance
from the center of the image.

Read and display an image.

I = imread("peppers.png");
I = im2double(I);
imshow(I)

Blur Image Using Gaussian Weighting Function

Create a blurry copy of the image using a Gaussian filter with standard deviation of 2.

Iblurred = imgaussfilt(I,2);
imshow(Iblurred)

8-12
Change Filter Strength Radially Outward

Create a weight image as a Gaussian filter of the same size of the image. To increase the portion of
the image that appears sharp, increase the value of filterStrength.

filterStrength = ;
weights = fspecial("gaussian",[size(I,1) size(I,2)],filterStrength);
imshow(weights,[])

8-13
8 Designing and Implementing Linear Filters for Image Data

Normalize the weight image to the range [0, 1] by using the rescale function.

weights = rescale(weights);

Create a weighted blurred image that is a weighted sum of the original image and blurry image.
MATLAB automatically replicates the weight matrix for each of the R, G, and B color channels.

IweightedBlurred = I.*weights + Iblurred.*(1-weights);

Display the result. The image is sharp in the center and becomes more blurry radially outwards.

imshow(IweightedBlurred)

8-14
Change Filter Strength Radially Outward

Vignette Image Using 1/R^2 Weighting Function

Get the size of the image.

sizex = size(I,2);
sizey = size(I,1);

Specify the center of the vignette.

xcenter = size(I,2)/2;
ycenter = size(I,1)/2;

Define the x- and y-coordinates of the surface.

[X,Y] = meshgrid((1:sizex)-xcenter,(1:sizey)-ycenter);

Define the radius from the center at each (x,y) coordinate.

R2 = X.^2 + Y.^2;

Define the weighting function as the inverse of R, scaled to the range [0, 1].

R2 = rescale(R2);
weights = (1-R2);
imshow(weights)

8-15
8 Designing and Implementing Linear Filters for Image Data

Apply the weighting function to the image and display the result.

I2 = I.*weights;
imshow(I2)

8-16
Change Filter Strength Radially Outward

See Also
fspecial | imgaussfilt

8-17
8 Designing and Implementing Linear Filters for Image Data

Noise Removal
In this section...
“Remove Noise by Linear Filtering” on page 8-18
“Remove Noise Using an Averaging Filter and a Median Filter” on page 8-18
“Remove Noise By Adaptive Filtering” on page 8-21

Digital images are prone to various types of noise. Noise is the result of errors in the image
acquisition process that result in pixel values that do not reflect the true intensities of the real scene.
There are several ways that noise can be introduced into an image, depending on how the image is
created. For example:

• If the image is scanned from a photograph made on film, the film grain is a source of noise. Noise
can also be the result of damage to the film, or be introduced by the scanner itself.
• If the image is acquired directly in a digital format, the mechanism for gathering the data (such as
a CCD detector) can introduce noise.
• Electronic transmission of image data can introduce noise.

To simulate the effects of some of the problems listed above, the toolbox provides the imnoise
function, which you can use to add various types of noise to an image. The examples in this section
use this function.

Remove Noise by Linear Filtering


You can use linear filtering to remove certain types of noise. Certain filters, such as averaging or
Gaussian filters, are appropriate for this purpose. For example, an averaging filter is useful for
removing grain noise from a photograph. Because each pixel gets set to the average of the pixels in
its neighborhood, local variations caused by grain are reduced.

See “What Is Image Filtering in the Spatial Domain?” on page 8-2 for more information about linear
filtering using imfilter.

Remove Noise Using an Averaging Filter and a Median Filter

This example shows how to remove salt and pepper noise from an image using an averaging filter and
a median filter to allow comparison of the results. These two types of filtering both set the value of
the output pixel to the average of the pixel values in the neighborhood around the corresponding
input pixel. However, with median filtering, the value of an output pixel is determined by the median
of the neighborhood pixels, rather than the mean. The median is much less sensitive than the mean to
extreme values (called outliers). Median filtering is therefore better able to remove these outliers
without reducing the sharpness of the image.

Note: Median filtering is a specific case of order-statistic filtering, also known as rank filtering. For
information about order-statistic filtering, see the reference page for the ordfilt2 function.

Read image into the workspace and display it.


I = imread('eight.tif');
figure
imshow(I)

8-18
Noise Removal

For this example, add salt and pepper noise to the image. This type of noise consists of random pixels
being set to black or white (the extremes of the data range).

J = imnoise(I,'salt & pepper',0.02);


figure
imshow(J)

Filter the noisy image, J, with an averaging filter and display the results. The example uses a 3-by-3
neighborhood.

8-19
8 Designing and Implementing Linear Filters for Image Data

Kaverage = filter2(fspecial('average',3),J)/255;
figure
imshow(Kaverage)

Now use a median filter to filter the noisy image, J. The example also uses a 3-by-3 neighborhood.
Display the two filtered images side-by-side for comparison. Notice that medfilt2 does a better job
of removing noise, with less blurring of edges of the coins.

Kmedian = medfilt2(J);
imshowpair(Kaverage,Kmedian,'montage')

8-20
Noise Removal

Remove Noise By Adaptive Filtering

This example shows how to use the wiener2 function to apply a Wiener filter (a type of linear filter)
to an image adaptively. The Wiener filter tailors itself to the local image variance. Where the variance
is large, wiener2 performs little smoothing. Where the variance is small, wiener2 performs more
smoothing.

This approach often produces better results than linear filtering. The adaptive filter is more selective
than a comparable linear filter, preserving edges and other high-frequency parts of an image. In
addition, there are no design tasks; the wiener2 function handles all preliminary computations and
implements the filter for an input image. wiener2, however, does require more computation time
than linear filtering.

wiener2 works best when the noise is constant-power ("white") additive noise, such as Gaussian
noise. The example below applies wiener2 to an image of Saturn with added Gaussian noise.

Read the image into the workspace.

RGB = imread('saturn.png');

Convert the image from truecolor to grayscale.

I = im2gray(RGB);

Add Gaussian noise to the image

J = imnoise(I,'gaussian',0,0.025);

Display the noisy image. Because the image is quite large, display only a portion of the image.

imshow(J(600:1000,1:600));
title('Portion of the Image with Added Gaussian Noise');

8-21
8 Designing and Implementing Linear Filters for Image Data

Remove the noise using the wiener2 function.

K = wiener2(J,[5 5]);

Display the processed image. Because the image is quite large, display only a portion of the image.

figure
imshow(K(600:1000,1:600));
title('Portion of the Image with Noise Removed by Wiener Filter');

8-22
Noise Removal

See Also
imfilter | imguidedfilter | imgaussfilt | locallapfilt | nlfilter | imbilatfilt

More About
• “What Is Image Filtering in the Spatial Domain?” on page 8-2

8-23
8 Designing and Implementing Linear Filters for Image Data

Apply Gaussian Smoothing Filters to Images

This example shows how to apply different Gaussian smoothing filters to images using imgaussfilt.
Gaussian smoothing filters are commonly used to reduce noise.

Read an image into the workspace.

I = imread('cameraman.tif');

Filter the image with isotropic Gaussian smoothing kernels of increasing standard deviations.
Gaussian filters are generally isotropic, that is, they have the same standard deviation along both
dimensions. An image can be filtered by an isotropic Gaussian filter by specifying a scalar value for
sigma.

Iblur1 = imgaussfilt(I,2);
Iblur2 = imgaussfilt(I,4);
Iblur3 = imgaussfilt(I,8);

Display the original image and all the filtered images.

figure
imshow(I)
title('Original image')

figure
imshow(Iblur1)
title('Smoothed image, \sigma = 2')

8-24
Apply Gaussian Smoothing Filters to Images

figure
imshow(Iblur2)
title('Smoothed image, \sigma = 4')

figure

8-25
8 Designing and Implementing Linear Filters for Image Data

imshow(Iblur3)
title('Smoothed image, \sigma = 8')

Filter the image with anisotropic Gaussian smoothing kernels. imgaussfilt allows the Gaussian
kernel to have different standard deviations along row and column dimensions. These are called axis-
aligned anisotropic Gaussian filters. Specify a 2-element vector for sigma when using anisotropic
filters.

IblurX1 = imgaussfilt(I,[4 1]);


IblurX2 = imgaussfilt(I,[8 1]);
IblurY1 = imgaussfilt(I,[1 4]);
IblurY2 = imgaussfilt(I,[1 8]);

Display the filtered images.

figure
imshow(IblurX1)
title('Smoothed image, \sigma_x = 4, \sigma_y = 1')

8-26
Apply Gaussian Smoothing Filters to Images

figure
imshow(IblurX2)
title('Smoothed image, \sigma_x = 8, \sigma_y = 1')

figure

8-27
8 Designing and Implementing Linear Filters for Image Data

imshow(IblurY1)
title('Smoothed image, \sigma_x = 1, \sigma_y = 4')

figure
imshow(IblurY2)
title('Smoothed image, \sigma_x = 1, \sigma_y = 8')

8-28
Apply Gaussian Smoothing Filters to Images

Suppress the horizontal bands visible in the sky region of the original image. Anisotropic Gaussian
filters can suppress horizontal or vertical features in an image. Extract a section of the sky region of
the image and use a Gaussian filter with higher standard deviation along the X axis (direction of
increasing columns).

I_sky = imadjust(I(20:50,10:70));
IblurX1_sky = imadjust(IblurX1(20:50,10:70));

Display the original patch of sky with the filtered version.

figure
imshow(I_sky), title('Sky in original image')

figure
imshow(IblurX1_sky), title('Sky in filtered image')

8-29
8 Designing and Implementing Linear Filters for Image Data

Reduce Noise in Image Gradients

This example demonstrates how to reduce noise associated with computing image gradients. Image
gradients are used to highlight interesting features in images and are used in many feature detection
algorithms like edge/corner detection. Reducing noise in gradient computations is crucial to
detecting accurate features.

Read an image into the workspace and convert it to grayscale.

originalImage = imread("yellowlily.jpg");
originalImage = im2gray(originalImage);

imshow(originalImage)

8-30
Reduce Noise in Image Gradients

To simulate noise for this example, add some Gaussian noise to the image.

8-31
8 Designing and Implementing Linear Filters for Image Data

noisyImage = imnoise(originalImage,"gaussian");
imshow(noisyImage)

8-32
Reduce Noise in Image Gradients

8-33
8 Designing and Implementing Linear Filters for Image Data

Compute the magnitude of the gradient by using the imgradient and imgradientxy functions.
imgradient finds the gradient magnitude and direction, and imgradientxy finds directional image
gradients.

sobelGradient = imgradient(noisyImage);
imshow(sobelGradient,[])
title("Sobel Gradient Magnitude")

8-34
Reduce Noise in Image Gradients

8-35
8 Designing and Implementing Linear Filters for Image Data

Looking at the gradient magnitude image, it is clear that the image gradient is very noisy. The effect
of noise can be minimized by smoothing before gradient computation. imgradient already offers
this capability for small amounts of noise by using the Sobel gradient operator. The Sobel gradient
operators are 3-by-3 filters as shown below. They can be generated using the fspecial function.

hy = -fspecial("sobel")

hy = 3×3

-1 -2 -1
0 0 0
1 2 1

hx = hy'

hx = 3×3

-1 0 1
-2 0 2
-1 0 1

The hy filter computes a gradient along the vertical direction while smoothing in the horizontal
direction. hx smooths in the vertical direction and computes a gradient along the horizontal direction.
The "Prewitt" and "Roberts" method options also provide this capability.

Even with the use of Sobel, Roberts or Prewitt gradient operators, the image gradient may be too
noisy. To overcome this, smooth the image using a Gaussian smoothing filter before computing image
gradients. Use the imgaussfilt function to smooth the image. The standard deviation of the
Gaussian filter varies the extent of smoothing. Since smoothing is taken care of by Gaussian filtering,
the central or intermediate differencing gradient operators can be used.

sigma = 2;
smoothImage = imgaussfilt(noisyImage,sigma);
smoothGradient = imgradient(smoothImage,"CentralDifference");
imshow(smoothGradient,[])
title("Smoothed Gradient Magnitude")

8-36
Reduce Noise in Image Gradients

8-37
8 Designing and Implementing Linear Filters for Image Data

See Also
imnoise | imgradient | imgaussfilt | fspecial

More About
• “What Is Image Filtering in the Spatial Domain?” on page 8-2

8-38
What is Guided Image Filtering?

What is Guided Image Filtering?


The imguidedfilter function performs edge-preserving smoothing on an image, using the content
of a second image, called a guidance image, to influence the filtering. The guidance image can be the
image itself, a different version of the image, or a completely different image. Guided image filtering
is a neighborhood operation, like other filtering operations, but takes into account the statistics of a
region in the corresponding spatial neighborhood in the guidance image when calculating the value
of the output pixel.

If the guidance is the same as the image to be filtered, the structures are the same—an edge in
original image is the same in the guidance image. If the guidance image is different, structures in the
guidance image will impact the filtered image, in effect, imprinting these structures on the original
image. This effect is called structure transference.

See Also
imguidedfilter

Related Examples
• “Perform Flash/No-flash Denoising with Guided Filter” on page 8-40

8-39
8 Designing and Implementing Linear Filters for Image Data

Perform Flash/No-flash Denoising with Guided Filter

This example shows how to use a guided filter to smooth an image, reducing noise, while preserving
edges. The example uses two pictures of the same scene, one taken with a flash and the other without
a flash. The version without a flash preserves colors but is noisy due to the low-light conditions. This
example uses the version taken with a flash as the guidance image.

Read the image that you want to filter into the workspace. This example uses an image of some toys
taken without a flash. Because of the low light conditions, the image contains a lot of noise.

A = imread('toysnoflash.png');
figure;
imshow(A);
title('Input Image - Camera Flash Off')

Read the image that you want to use as the guidance image into the workspace. In this example, the
guidance image is a picture of the same scene taken with a flash.

G = imread('toysflash.png');
figure;

8-40
Perform Flash/No-flash Denoising with Guided Filter

imshow(G);
title('Guidance Image - Camera Flash On')

Perform the guided filtering operation. Using the imguidedfilter function, you can specify the size
of the neighborhood used for filtering. The default is a 5-by-5 square. This example uses a 3-by-3
neighborhood. You can also specify the amount of smoothing performed by the filter. The value can be
any positive number. One way to approach this is to use the default first and view the results. If you
want less smoothing and more edge preservation, use a lower value for this parameter. For more
smoothing, use a higher value. This example sets the value of the smoothing parameter.

nhoodSize = 3;
smoothValue = 0.001*diff(getrangefromclass(G)).^2;
B = imguidedfilter(A, G, 'NeighborhoodSize',nhoodSize, 'DegreeOfSmoothing',smoothValue);
figure, imshow(B), title('Filtered Image')

8-41
8 Designing and Implementing Linear Filters for Image Data

Examine a close up of an area of the original image and compare it to the filtered image to see the
effect of this edge-preserving smoothing filter.

figure;
h1 = subplot(1,2,1);
imshow(A), title('Region in Original Image'), axis on
h2 = subplot(1,2,2);
imshow(B), title('Region in Filtered Image'), axis on
linkaxes([h1 h2])
xlim([520 660])
ylim([150 250])

8-42
Perform Flash/No-flash Denoising with Guided Filter

See Also
imguidedfilter

More About
• “What is Guided Image Filtering?” on page 8-39

8-43
8 Designing and Implementing Linear Filters for Image Data

Segment Thermographic Image After Edge-Preserving Filtering

This example shows how to work with thermal images, demonstrating a simple segmentation.
Thermal images are obtained from thermographic cameras, which detect radiation in the infrared
range of the electromagnetic spectrum. Thermographic images capture infrared radiation emitted by
all objects above absolute zero.

Read a thermal image into the workspace and use whos to understand more about the image data.
I = imread("hotcoffee.tif");

whos I

Name Size Bytes Class Attributes

I 240x320 307200 single

Compute the dynamic range occupied by the data to see the range of temperatures occupied by the
image. The pixel values in this image correspond to actual temperatures on the Celsius scale.
range = [min(I(:)) max(I(:))]

range = 1x2 single row vector

22.4729 77.3727

Display the thermal image. Because the thermal image is a single-precision image with a dynamic
range outside 0 to 1, you must use the imshow auto-scaling capability to display the image.
figure
imshow(I,[])
colormap(gca,hot)
title("Original image")

8-44
Segment Thermographic Image After Edge-Preserving Filtering

Apply edge-preserving smoothing to the image to remove noise while still retaining image details.
This is a preprocessing step before segmentation. Use the imguidedfilter function to perform
smoothing under self-guidance. The DegreeOfSmoothing name-value argument controls the amount
of smoothing and is dependent on the range of the image. Adjust the DegreeOfSmoothing to
accommodate the range of the thermographic image. Display the filtered image.
smoothValue = 0.01*diff(range).^2;
J = imguidedfilter(I,"DegreeOfSmoothing",smoothValue);

figure
imshow(J,[])
colormap(gca,hot)
title("Guided filtered image")

Determine threshold values to use in segmentation. The image has 3 distinct regions - the person, the
hot object and the background - that appear well separated in intensity (temperature). Use
multithresh to compute a 2-level threshold for the image. This partitions the image into 3 regions
using Otsu's method.
thresh = multithresh(J,2)

thresh = 1x2 single row vector

27.0018 47.8220

Threshold the image using the values returned by multithresh. The threshold values are at 27 and
48 Celsius. The first threshold separates the background intensity from the person and the second
threshold separates the person from the hot object. Segment the image and fill holes.
L = imquantize(J,thresh);
L = imfill(L);

figure

8-45
8 Designing and Implementing Linear Filters for Image Data

imshow(label2rgb(L))
title("Label matrix from 3-level Otsu")

Draw a bounding box around the foreground regions in the image and put the mean temperature
value of the region in the box. The example assumes that the largest region is the background. Use
the regionprops function to get information about the regions in the segmented image.

props = regionprops(L,I,["Area","BoundingBox","MeanIntensity","Centroid"]);

% Find the index of the background region.


[~,idx] = max([props.Area]);

figure
imshow(I,[])
colormap(gca,hot)
title("Segmented regions with mean temperature")
for n = 1:numel(props)
% If the region is not background
if n ~= idx
% Draw bounding box around region
rectangle("Position",props(n).BoundingBox,"EdgeColor","c")

% Draw text displaying mean temperature in Celsius


T = num2str(props(n).MeanIntensity,3)+" \circ C";
text(props(n).Centroid(1),props(n).Centroid(2),T,...
"Color","c","FontSize",12)
end
end

8-46
Segment Thermographic Image After Edge-Preserving Filtering

See Also
imquantize | imguidedfilter | multithresh | imfill

More About
• “What is Guided Image Filtering?” on page 8-39

8-47
8 Designing and Implementing Linear Filters for Image Data

Integral Image
In an integral image, every pixel is the summation of the pixels above and to the left of it.

To illustrate, the following shows and image and its corresponding integral image. The integral image
is padded to the left and the top to allow for the calculation. The pixel value at (2, 1) in the original
image becomes the pixel value (3, 2) in the integral image after adding the pixel value above it (2+1)
and to the left (3+0). Similarly, the pixel at (2, 2) in the original image with the value 4 becomes the
pixel at (3, 3) in the integral image with the value 12 after adding the pixel value above it (4+5) and
adding the pixel to the left of it (9+3).

Using an integral image, you can rapidly calculate summations over image subregions. Integral
images facilitate summation of pixels and can be performed in constant time, regardless of the
neighborhood size. The following figure illustrates the summation of a subregion of an image, you can
use the corresponding region of its integral image. For example, in the input image below, the
summation of the shaded region becomes a simple calculation using four reference values of the
rectangular region in the corresponding integral image. The calculation becomes, 46 – 22 – 20 + 10 =
14. The calculation subtracts the regions above and to the left of the shaded region. The area of
overlap is added back to compensate for the double subtraction.

In this way, you can calculate summations in rectangular regions rapidly, irrespective of the filter size.
Use of integral images was popularized by the Viola-Jones algorithm. To see the full citation for this
algorithm and learn how to create an integral image, see integralImage.

8-48
Integral Image

See Also
integralImage | integralBoxFilter | integralBoxFilter3 | integralImage3

Related Examples
• “Apply Multiple Filters to Integral Image” on page 8-50

8-49
8 Designing and Implementing Linear Filters for Image Data

Apply Multiple Filters to Integral Image

This example shows how to apply multiple box filters of varying sizes to an image using integral
image filtering. Integral image is a useful image representation from which local image sums can be
computed rapidly. A box filter can be thought of as a local weighted sum at each pixel.

Read an image into the workspace and display it.


originalImage = imread('cameraman.tif');

figure
imshow(originalImage)
title('Original Image')

Define the sizes of the three box filters.


filterSizes = [7 7;11 11;15 15];

Pad the image to accommodate the size of the largest box filter. Pad each dimension by an amount
equal to half the size of the largest filter. Note the use of replicate-style padding to help reduce
boundary artifacts.
maxFilterSize = max(filterSizes);
padSize = (maxFilterSize - 1)/2;

paddedImage = padarray(originalImage,padSize,'replicate','both');

Compute the integral image representation of the padded image using the integralImage function
and display it. The integral image is monotonically non-decreasing from left to right and top to
bottom. Each pixel represents the sum of all pixel intensities to the top and left of the current pixel in
the image.

8-50
Apply Multiple Filters to Integral Image

intImage = integralImage(paddedImage);

figure
imshow(intImage,[])
title('Integral Image Representation')

Apply three box filters of varying sizes to the integral image. The integralBoxFilter function can
be used to apply a 2-D box filter to the integral image representation of an image.

filteredImage1 = integralBoxFilter(intImage, filterSizes(1,:));


filteredImage2 = integralBoxFilter(intImage, filterSizes(2,:));
filteredImage3 = integralBoxFilter(intImage, filterSizes(3,:));

The integralBoxFilter function returns only parts of the filtering that are computed without
padding. Filtering the same integral image with different sized box filters results in different sized
outputs. This is similar to the 'valid' option in the conv2 function.

whos filteredImage*

Name Size Bytes Class Attributes

filteredImage1 264x264 557568 double


filteredImage2 260x260 540800 double
filteredImage3 256x256 524288 double

Because the image was padded to accommodate the largest box filter prior to computing the integral
image, no image content is lost. filteredImage1 and filteredImage2 have additional padding
that can be cropped.

extraPadding1 = (maxFilterSize - filterSizes(1,:))/2;


filteredImage1 = filteredImage1(1+extraPadding1(1):end-extraPadding1(1),...
1+extraPadding1(2):end-extraPadding1(2) );

8-51
8 Designing and Implementing Linear Filters for Image Data

extraPadding2 = (maxFilterSize - filterSizes(2,:))/2;


filteredImage2 = filteredImage2(1+extraPadding2(1):end-extraPadding2(1),...
1+extraPadding2(2):end-extraPadding2(2) );

figure
imshow(filteredImage1,[])
title('Image filtered with [7 7] box filter')

figure
imshow(filteredImage2,[])
title('Image filtered with [11 11] box filter')

8-52
Apply Multiple Filters to Integral Image

figure
imshow(filteredImage3,[])
title('Image filtered with [15 15] box filter')

See Also
integralImage | integralBoxFilter | integralBoxFilter3 | integralImage3

8-53
8 Designing and Implementing Linear Filters for Image Data

More About
• “Integral Image” on page 8-48

8-54
Filter Images Using Predefined Filter

Filter Images Using Predefined Filter

This example shows how to create a predefined Laplacian of Gaussian (LoG) filter using the
fspecial function and apply the filter to an image using the imfilter function. A LoG filter
highlights regions with rapidly varying intensities and reduces the impact of variations caused by
noise. The fspecial function produces several additional types of predefined filters in the form of
correlation kernels.

Read and display an image.


I = imread('moon.tif');
imshow(I)

8-55
8 Designing and Implementing Linear Filters for Image Data

Create a 7-by-7 LoG filter with a standard deviation of 0.4 using fspecial.

h = fspecial('log',7,0.4)

h = 7×7

0.1263 0.1263 0.1263 0.1263 0.1263 0.1263 0.1263


0.1263 0.1263 0.1263 0.1267 0.1263 0.1263 0.1263
0.1263 0.1263 0.2333 1.1124 0.2333 0.1263 0.1263
0.1263 0.1267 1.1124 -10.4357 1.1124 0.1267 0.1263
0.1263 0.1263 0.2333 1.1124 0.2333 0.1263 0.1263
0.1263 0.1263 0.1263 0.1267 0.1263 0.1263 0.1263
0.1263 0.1263 0.1263 0.1263 0.1263 0.1263 0.1263

Apply the filter to the image using imfilter.

I2 = imfilter(I,h);

Display the filtered image.

imshow(I2)

8-56
Filter Images Using Predefined Filter

See Also
imfilter | fspecial

Related Examples
• “Design Linear Filters in the Frequency Domain” on page 8-107

8-57
8 Designing and Implementing Linear Filters for Image Data

Generate HDL Code for Image Sharpening

This example shows how to use Vision HDL Toolbox™ to implement an FPGA-based module for image
sharpening.

Vision HDL Toolbox provides image and video processing algorithms designed to generate readable,
synthesizable code in VHDL and Verilog (with HDL Coder™). The generated HDL code when run on
an FPGA (for example, Xilinx XC7Z045) can process 1920x1080 full-resolution images at 60 frames
per second.

This example shows how to use Vision HDL Toolbox to generate HDL code that sharpens a blurred
image. Since Vision HDL Toolbox algorithms are available as MATLAB® System objects™ and
Simulink® blocks, HDL code can be generated from MATLAB or Simulink. This example shows both
workflows.

The workflow for an FPGA-targeted design is:

1. Create a behavioral model to represent design goals;

2. Replicate the design using algorithms, interfaces, and data types appropriate for FPGAs and
supported for HDL code generation.

3. Simulate the two designs and compare the results to confirm that the HDL-optimized design meets
the goals.

4. Generate HDL code from the design created in Step 2.

For Steps 2 and 3 in MATLAB, you must have MATLAB, Vision HDL Toolbox, and Fixed-Point
Designer™. In Simulink, you need Simulink, Vision HDL Toolbox, and Fixed-Point Designer. In both
cases, you must have HDL Coder to generate HDL code.

Behavioral Model

The input image imgBlur is shown on the left in the diagram below. On the right, the image is
sharpened using the Image Processing Toolbox™ function imfilter.

Simulation time is printed as a bench mark for future comparison.

imgBlur = imread('riceblurred.png');
sharpCoeff = [0 0 0;0 1 0;0 0 0]-fspecial('laplacian',0.2);

f = @() imfilter(imgBlur,sharpCoeff,'symmetric');
fprintf('Elapsed time is %.6f seconds.\n',timeit(f));

imgSharp = imfilter(imgBlur,sharpCoeff,'symmetric');
figure
imshowpair(imgBlur,imgSharp,'montage')
title('Blurred Image and Sharpened Image')

Elapsed time is 0.000606 seconds.

8-58
Generate HDL Code for Image Sharpening

HDL-Optimized Design Considerations

Three key changes need to be made to enable HDL code generation.

• Use HDL-friendly algorithms: The functions in Image Processing Toolbox do not support HDL
Code generation. Vision HDL Toolbox provides image and video processing algorithms designed
for efficient HDL implementations. You can generate HDL code from these algorithms using
“Functions” (Vision HDL Toolbox) and “Blocks” (Vision HDL Toolbox). Both workflows are
provided in this example. To design an FPGA-based module, replace the functions from Image
Processing Toolbox with their HDL-friendly counterparts from Vision HDL Toolbox. This example
replaces imfilter in the behavioral model with the visionhdl.ImageFilter System object in
MATLAB, or the Image Filter block in Simulink.

• Use streaming pixel interface: The functions from Image Processing Toolbox model at a high
level of abstraction. They perform full-frame processing, operating on one image frame at a time.
FPGA and ASIC implementations, however, perform pixel-stream processing, operating on one
image pixel at a time. Vision HDL Toolbox blocks and System objects use a streaming pixel
interface. Use visionhdl.FrameToPixels System object in MATLAB or Frame To Pixels
block in Simulink to convert a full frame image or video to a pixel stream. The streaming pixel
interface includes control signals that indicate each pixel's position in the frame. Algorithms that
operate on a pixel neighborhood use internal memory to store a minimum number of lines. Vision
HDL Toolbox provides the streaming pixel interface and automatic memory implementation to
address common design issues when targeting FPGAs and ASICs. For more information on the
streaming pixel protocol used by System objects from the Vision HDL Toolbox, see “Streaming
Pixel Interface” (Vision HDL Toolbox).

• Use fixed-point data representation: Functions from Image Processing Toolbox perform video
processing algorithms in the floating-point or integer domain. The System objects and blocks from
Vision HDL Toolbox require fixed-point data to generate HDL code to target FPGAs and ASICs.
Converting a design to fixed-point can introduce quantization error. Therefore, the HDL-friendly
model might generate an output slightly different from that obtained from the behavioral model.

8-59
8 Designing and Implementing Linear Filters for Image Data

For most applications, small quantization errors within a tolerance are acceptable. You can tune
the fixed-point settings to suit your requirements.

In this example, we use a static image as the source. This model is also able to process continuous
video input.

Generate HDL Code from MATLAB

To generate HDL from MATLAB, your code needs to be divided into two files: test bench and design.
The design file is used for implementing the algorithm in the FPGA or ASIC. The test bench file
provides the input data to the design file and receives the design output.

Step 1: Create Design File

The function ImageSharpeningHDLDesign.m accepts a pixel stream and a control structure


consisting of five control signals, and returns a modified pixel stream and control structure.

In this example, the design contains a System object visionhdl.ImageFilter. It is the HDL-
friendly counterpart of the imfilter function. Configure it with the same coefficients and padding
method as imfilter.

function [pixOut,ctrlOut] = ImageSharpeningHDLDesign(pixIn,ctrlIn)


% ImageSharpeningHDLDesign Implement algorithms using pixel-stream
% System objects from the Vision HDL Toolbox

% Copyright 2015-2022 The MathWorks, Inc.

%#codegen
persistent sharpeningFilter;
if isempty(sharpeningFilter)
sharpCoeff = [0 0 0;0 1 0;0 0 0]-fspecial('laplacian',0.2);
sharpeningFilter = visionhdl.ImageFilter(...
'Coefficients',sharpCoeff,...
'PaddingMethod','Symmetric',...
'CoefficientsDataType','Custom',...
'CustomCoefficientsDataType',numerictype(1,16,12));
end

[pixOut,ctrlOut] = sharpeningFilter(pixIn,ctrlIn);

Step 2: Create Test Bench File

The test bench ImageSharpeningHDLTestBench.m reads in the blurred image. The frm2pix object
converts the full image frame to a stream of pixels and control structures. The test bench calls the
design function ImageSharpeningHDLDesign to process one pixel at a time. After the entire pixel-
stream is processed, pix2frm converts the output pixel stream to a full-frame image. The test bench
compares the output image to the reference output imgSharp.
...
[pixInVec,ctrlInVec] = frm2pix(imgBlur);
for p = 1:numPixPerFrm
[pixOutVec(p),ctrlOutVec(p)] = ImageSharpeningHDLDesign(pixInVec(p),ctrlInVec(p));
end
imgOut = pix2frm(pixOutVec,ctrlOutVec);

8-60
Generate HDL Code for Image Sharpening

% Compare the result


imgDiff = imabsdiff(imgSharp,imgOut);
fprintf('The maximum difference between corresponding pixels is %d.\n',max(imgDiff(:)));
fprintf('A total of %d pixels are different.\n',nnz(imgDiff));
...

Step 3: Simulate Design and Verify Result

Simulate the design with the test bench prior to HDL code generation to make sure there are no
runtime errors.

ImageSharpeningHDLTestBench

The maximum difference between corresponding pixels is 1.


A total of 41248 pixels are different.

Simulation took 664.096485 seconds to finish.

The test bench displays the comparison result and the time spent on simulation. Due to quantization
error and rounding error, out of a total of 256*256=65536 pixels, 38554 of imgOut are different from
imgSharp. However, the maximum difference in intensity is 1. On a 0 to 255 scale, this difference is
visually unnoticeable.

As we can see by comparing the simulation time in MATLAB with that of the behavioral model, the
pixel-streaming protocol introduces significant overhead. You can use MATLAB Coder™ to speed up
the pixel-streaming simulation in MATLAB. See “Accelerate Pixel-Streaming Designs Using MATLAB
Coder” (Vision HDL Toolbox).

Step 4: Generate HDL Code

Once you are satisfied with the results of the FPGA-targeted model, you can use HDL Coder to
generate HDL code from the design. You can run the generated HDL code in HDL simulators or load
it into an FPGA and run it in a physical system.

Make sure that the design and test bench files are located in the same writable directory. To generate
the HDL code, use the following command:

hdlcfg = coder.config('hdl');
hdlcfg.TestBenchName = 'ImageSharpeningHDLTestBench';
hdlcfg.TargetLanguage = 'Verilog';
hdlcfg.GenerateHDLTestBench = false;
codegen -config hdlcfg ImageSharpeningHDLDesign

For more detail on how to create and configure MATLAB to HDL projects, see the "Getting Started
with MATLAB to HDL Workflow" tutorial in the HDL Coder documentation.

Generate HDL Code from Simulink

Step 1: Create HDL-Optimized Model

The ImageSharpeningHDLModel model is shown below.

modelname = 'ImageSharpeningHDLModel';
open_system(modelname);
set_param(modelname,'Open','on');

8-61
8 Designing and Implementing Linear Filters for Image Data

The model reads in the blurred image. The Frame To Pixels block converts a full-frame image to a
pixel stream, and the Pixels To Frame block converts the pixel stream back to a full-frame image. The
Image Sharpening HDL System contains an Image Filter block, which is the HDL-friendly counterpart
in Vision HDL Toolbox of the imfilter function presented in the behavioral model.

set_param(modelname,'Open','off');
set_param([modelname '/Image Sharpening HDL System'],'Open','on');

Configure the Image Filter block with the same sharpening coefficients and padding method as in the
behavioral model, as shown on the masks below.

8-62
Generate HDL Code for Image Sharpening

Step 2: Simulate Design and Verify Result


tic
sim(modelname);
toc

8-63
8 Designing and Implementing Linear Filters for Image Data

Elapsed time is 16.774461 seconds.

Simulink takes advantage of C code generation to speed up the simulation. Therefore, it is much
faster than MATLAB simulation, although still slower than the behavioral model.

The simulation creates a new variable called imgOut in the workspace. Use the following commands
to compare imgOut with imSharp generated from the behavioral model.

imgDiff = imabsdiff(imgSharp,imgOut);
fprintf('The maximum difference between corresponding pixels is %d.\n',max(imgDiff(:)));
fprintf('A total of %d pixels are different.\n',nnz(imgDiff));

The maximum difference between corresponding pixels is 1.


A total of 41248 pixels are different.

Due to quantization error and rounding error, out of a total of 256*256=65536 pixels, 41248 of
imgOut are different from imgSharp. However, the maximum difference in intensity is 1. On a 0 to
255 scale, this difference is visually unnoticeable. (This reasoning is also presented in Step 3 in the
"Generate HDL Code from MATLAB" Section.)

Step 3: Generate HDL Code

Once you are satisfied with the results of the FPGA-targeted model, you can use HDL Coder to
generate HDL code from the design. You can run the generated HDL code in HDL simulators or load
it into an FPGA and run it in a physical system.

Generate HDL code from the Image Sharpening HDL System using the following command:

makehdl('ImageSharpeningHDLModel/Image Sharpening HDL System')

set_param([modelname '/Image Sharpening HDL System'],'Open','off');


close_system(modelname,0);
close all;

8-64
Adjust Image Intensity Values to Specified Range

Adjust Image Intensity Values to Specified Range

This example shows how to increase the contrast in a low-contrast grayscale image by remapping the
data values to fill the entire available intensity range [0, 255].

Read image into the workspace.

I = imread('pout.tif');

Adjust the contrast of the image using imadjust.

J = imadjust(I);

Display the original image and the adjusted image, side-by-side. Note the increased contrast in the
adjusted image.

imshowpair(I,J,'montage')

Plot the histogram of the adjust image. Note that the histogram of the adjusted image uses values
across the whole range.

figure
subplot(1,2,1)
imhist(I,64)
subplot(1,2,2)
imhist(J,64)

8-65
8 Designing and Implementing Linear Filters for Image Data

See Also
imadjust | imhist

8-66
Gamma Correction

Gamma Correction
When you map intensity values from one range to another, you can optionally perform a nonlinear
mapping using gamma correction. The gamma correction factor can be any value between 0 and
infinity.

• When gamma is less than 1, the mapping is weighted toward higher (brighter) output values.
• When gamma is greater than 1, the mapping is weighted toward lower (darker) output values.
• When gamma is exactly 1, the mapping is linear.

The figure illustrates this relationship. The three transformation curves show how values are mapped
when gamma is less than, equal to, and greater than 1. In each graph, the x-axis represents the
intensity values in the input image, and the y-axis represents the intensity values in the output image.

Plots Showing Three Different Gamma Correction Settings

Specify Gamma when Adjusting Contrast

This example shows how to specify gamma when adjusting contrast with the imadjust function. By
default, imadjust uses a gamma value of 1, which means that it uses a linear mapping between
intensity values in the original image and the output image. A gamma value less than 1 weights the
mapping toward higher (brighter) output values. A gamma value of more than 1 weights output
values toward lower (darker) output values.

Read an image into the workspace. This example reads an indexed image and then converts it into a
grayscale image.

[X,map] = imread("forest.tif");
I = ind2gray(X,map);

Adjust the contrast, specifying a gamma value of less than 1 (0.5). Notice that in the call to
imadjust, the example specifies the data ranges of the input and output images as empty matrices.
When you specify an empty matrix, imadjust uses the default range of [0,1]. In the example, both
ranges are left empty. This means that gamma correction is applied without any other adjustment of
the data.

J = imadjust(I,[],[],0.5);

Display the original image with the contrast-adjusted image.

imshowpair(I,J,"montage")

8-67
8 Designing and Implementing Linear Filters for Image Data

See Also
imadjust | rgb2lin | lin2rgb

More About
• “Contrast Enhancement Techniques” on page 8-69

8-68
Contrast Enhancement Techniques

Contrast Enhancement Techniques

This example shows how to enhance the contrast of grayscale and color images using intensity value
mapping, histogram equalization, and contrast-limited adaptive histogram equalization.

Three functions are particularly suitable for contrast enhancement:

• imadjust increases the contrast of the image by mapping the values of the input intensity image
to new values such that, by default, 1% of the data is saturated at low and high intensities of the
input data.
• histeq performs histogram equalization. It enhances the contrast of images by transforming the
values in an intensity image so that the histogram of the output image approximately matches a
specified histogram (uniform distribution by default).
• adapthisteq performs contrast-limited adaptive histogram equalization. Unlike histeq, it
operates on small data regions (tiles) rather than the entire image. Each tile's contrast is
enhanced so that the histogram of each output region approximately matches the specified
histogram (uniform distribution by default). The contrast enhancement can be limited in order to
avoid amplifying the noise which might be present in the image.

Enhance Grayscale Images

Read a grayscale image with poor contrast into the workspace. Enhance the image using the three
contrast adjustment techniques with default settings.

pout = imread("pout.tif");
pout_imadjust = imadjust(pout);
pout_histeq = histeq(pout);
pout_adapthisteq = adapthisteq(pout);

Display the original image and the three contrast adjusted images as a montage.

montage({pout,pout_imadjust,pout_histeq,pout_adapthisteq},"Size",[1 4])
title("Original Image and Enhanced Images using imadjust, histeq, and adapthisteq")

Read a second grayscale image into the workspace and enhance the image using the three contrast
adjustment techniques.

8-69
8 Designing and Implementing Linear Filters for Image Data

tire = imread("tire.tif");
tire_imadjust = imadjust(tire);
tire_histeq = histeq(tire);
tire_adapthisteq = adapthisteq(tire);

Display the original image and the three contrast adjusted images as a montage.

montage({tire,tire_imadjust,tire_histeq,tire_adapthisteq},"Size",[1 4])
title("Original Image and Enhanced Images using " + ...
"imadjust, histeq, and adapthisteq")

Notice that imadjust had little effect on the image of the tire, but it caused a drastic change in the
case of pout. Plotting the histograms of pout.tif and tire.tif reveals that most of the pixels in
the first image are concentrated in the center of the histogram, while in the case of tire.tif, the
values are already spread out between the minimum of 0 and maximum of 255 thus preventing
imadjust from being effective in adjusting the contrast of the image.

figure
subplot(1,2,1)
imhist(pout)
title("Histogram of pout.tif")
subplot(1,2,2)
imhist(tire)
title("Histogram of tire.tif");

8-70
Contrast Enhancement Techniques

Histogram equalization, on the other hand, substantially changes both images. Many of the previously
hidden features are exposed, especially the debris particles on the tire. Unfortunately, at the same
time, the enhancement over-saturates several areas of both images. Notice how the center of the tire,
part of the child's face, and the jacket became washed out.

Concentrating on the image of the tire, it would be preferable for the center of the wheel to stay at
about the same brightness while enhancing the contrast in other areas of the image. In order for that
to happen, a different transformation would have to be applied to different portions of the image. The
Contrast-Limited Adaptive Histogram Equalization technique, implemented in adapthisteq, can
accomplish this. The algorithm analyzes portions of the image and computes the appropriate
transformations. A limit on the level of contrast enhancement can also be set, thus preventing the
over-saturation caused by the basic histogram equalization method of histeq. This is the most
sophisticated technique in this example.

Enhance Color Images

Contrast enhancement of color images is typically done by converting the image to a color space that
has image luminosity as one of its components, such as the L*a*b* color space. Contrast adjustment
is performed on the luminosity layer L* only, and then the image is converted back to the RGB color
space. Manipulating luminosity affects the intensity of the pixels, while preserving the original colors.

Read an image with poor contrast into the workspace. Then, convert the image from the RGB color
space to the L*a*b* color space.

shadow = imread("lowlight_1.jpg");
shadow_lab = rgb2lab(shadow);

8-71
8 Designing and Implementing Linear Filters for Image Data

The values of luminosity span a range from 0 to 100. Scale the values to the range [0 1], which is the
expected range of images with data type double.

max_luminosity = 100;
L = shadow_lab(:,:,1)/max_luminosity;

Perform the three types of contrast adjustment on the luminosity channel, and keep the a* and b*
channels unchanged. Convert the images back to the RGB color space.

shadow_imadjust = shadow_lab;
shadow_imadjust(:,:,1) = imadjust(L)*max_luminosity;
shadow_imadjust = lab2rgb(shadow_imadjust);

shadow_histeq = shadow_lab;
shadow_histeq(:,:,1) = histeq(L)*max_luminosity;
shadow_histeq = lab2rgb(shadow_histeq);

shadow_adapthisteq = shadow_lab;
shadow_adapthisteq(:,:,1) = adapthisteq(L)*max_luminosity;
shadow_adapthisteq = lab2rgb(shadow_adapthisteq);

Display the original image and the three contrast adjusted images as a montage.

figure
montage({shadow,shadow_imadjust,shadow_histeq,shadow_adapthisteq},"Size",[1 4])
title("Original Image and Enhanced Images using " + ...
"imadjust, histeq, and adapthisteq")

See Also
imadjust | histeq | adapthisteq

More About
• “Adjust Image Contrast Using Histogram Equalization” on page 8-75
• “Adaptive Histogram Equalization” on page 8-80
• “Specify Contrast Adjustment Limits” on page 8-73

8-72
Specify Contrast Adjustment Limits

Specify Contrast Adjustment Limits


You can optionally specify the range of the input values and the output values using imadjust. You
specify these ranges in two vectors that you pass to imadjust as arguments. The first vector
specifies the low- and high-intensity values that you want to map. The second vector specifies the
scale over which you want to map them.

Note You must specify the intensities as values between 0 and 1 regardless of the class of I. If I is
uint8, the values you supply are multiplied by 255 to determine the actual values to use; if I is
uint16, the values are multiplied by 65535. To learn about an alternative way to set these limits
automatically, see “Set Image Intensity Adjustment Limits Automatically” on page 8-74.

Specify Contrast Adjustment Limits as Range

This example shows how to specify contrast adjustment limits as a range using the imadjust
function. This example decreases the contrast of an image by narrowing the range of the data.

Read an image into the workspace.

I = imread('cameraman.tif');

Adjust the contrast of the image, specifying the range of values used in the output image. In the
example below, the man's coat is too dark to reveal any detail. imadjust maps the range [0,51] in
the uint8 input image to [128,255] in the output image. This brightens the image considerably,
and also widens the dynamic range of the dark portions of the original image, making it much easier
to see the details in the coat. Note, however, that because all values above 51 in the original image
are mapped to 255 (white) in the adjusted image, the adjusted image appears washed out.

J = imadjust(I,[0 0.2],[0.5 1]);

Display the original image and the contrast-adjusted image.

imshowpair(I,J,'montage')

8-73
8 Designing and Implementing Linear Filters for Image Data

Set Image Intensity Adjustment Limits Automatically


For a more convenient way to specify limits, use the stretchlim function. (The imadjust function
uses stretchlim for its simplest syntax, imadjust(I).)

This function calculates the histogram of the image and determines the adjustment limits
automatically. The stretchlim function returns these values as fractions in a vector that you can
pass as the [low_in high_in] argument to imadjust; for example:

I = imread("rice.png");
J = imadjust(I,stretchlim(I),[0 1]);

By default, stretchlim uses the intensity values that represent the bottom 1% (0.01) and the top
1% (0.99) of the range as the adjustment limits. By trimming the extremes at both ends of the
intensity range, stretchlim makes more room in the adjusted dynamic range for the remaining
intensities. But you can specify other range limits as an argument to stretchlim.

See Also
imadjust | stretchlim

More About
• “Contrast Enhancement Techniques” on page 8-69

8-74
Adjust Image Contrast Using Histogram Equalization

Adjust Image Contrast Using Histogram Equalization

This example shows how to adjust the contrast of a grayscale image using histogram equalization.

Histogram equalization involves transforming the intensity values so that the histogram of the output
image approximately matches a specified histogram. By default, the histogram equalization function,
histeq, tries to match a flat histogram with 64 bins such that the output image has pixel values
evenly distributed throughout the range. You can also specify a different target histogram to match a
custom contrast.

Original Image Histogram

Read a grayscale image into the workspace.

I = imread("pout.tif");

Display the image and its histogram. The original image has low contrast, with most pixel values in
the middle of the intensity range.

figure
subplot(1,3,1)
imshow(I)
subplot(1,3,2:3)
imhist(I)

8-75
8 Designing and Implementing Linear Filters for Image Data

Adjust Contrast Using Default Equalization

Adjust the contrast using histogram equalization. Use the default behavior of the histogram
equalization function, histeq. The default target histogram is a flat histogram with 64 bins.

J = histeq(I);

Display the contrast-adjusted image and its new histogram.

figure
subplot(1,3,1)
imshow(J)
subplot(1,3,2:3)
imhist(J)

Adjust Contrast, Specifying Number of Bins

Adjust the contrast, specifying a different number of bins. With a small number of bins, there are
noticeably fewer gray levels in the contrast-adjusted image.

nbins = ;
K = histeq(I,nbins);

Display the contrast-adjusted image and its new histogram.

figure
subplot(1,3,1)

8-76
Adjust Image Contrast Using Histogram Equalization

imshow(K)
subplot(1,3,2:3)
imhist(K)

Adjust Contrast, Specifying Target Distribution

Adjust the contrast, specifying a nonflat target distribution. This example demonstrates a linearly
decreasing target histogram, which emphasizes small pixel values and causes shadows to appear
darker. Display the target histogram.

target = 256:-4:4;
figure
bar(4:4:256,target)

8-77
8 Designing and Implementing Linear Filters for Image Data

Adjust the histogram of the image to approximately match the target histogram.

L = histeq(I,target);

Display the contrast-adjusted image and its new histogram.

figure
subplot(1,3,1)
imshow(L)
subplot(1,3,2:3)
imhist(L)

8-78
Adjust Image Contrast Using Histogram Equalization

See Also
histeq | adapthisteq

More About
• “Contrast Enhancement Techniques” on page 8-69
• “Adaptive Histogram Equalization” on page 8-80

8-79
8 Designing and Implementing Linear Filters for Image Data

Adaptive Histogram Equalization


As an alternative to using histeq, you can perform contrast-limited adaptive histogram equalization
(CLAHE) using the adapthisteq function. While histeq works on the entire image, adapthisteq
operates on small regions in the image, called tiles. adapthisteq enhances the contrast of each tile,
so that the histogram of the output region approximately matches a specified histogram. After
performing the equalization, adapthisteq combines neighboring tiles using bilinear interpolation to
eliminate artificially induced boundaries.

To avoid amplifying any noise that might be present in the image, you can use adapthisteq optional
parameters to limit the contrast, especially in homogeneous areas.

Adjust Contrast using Adaptive Histogram Equalization

This example shows how to adjust the contrast in an image using CLAHE.

Read image into the workspace.

I = imread('pout.tif');

View the original image and its histogram.

figure
subplot(1,2,1)
imshow(I)
subplot(1,2,2)
imhist(I,64)

8-80
Adaptive Histogram Equalization

Adjust the contrast of the image using adaptive histogram equalization.

J = adapthisteq(I);

Display the contrast-adjusted image with its histogram.

figure
subplot(1,2,1)
imshow(J)
subplot(1,2,2)
imhist(J,64)

8-81
8 Designing and Implementing Linear Filters for Image Data

See Also
histeq | adapthisteq

More About
• “Contrast Enhancement Techniques” on page 8-69
• “Adjust Image Contrast Using Histogram Equalization” on page 8-75

8-82
Enhance Color Separation Using Decorrelation Stretching

Enhance Color Separation Using Decorrelation Stretching


Decorrelation stretching enhances the color separation of an image with significant band-to-band
correlation. The exaggerated colors improve visual interpretation and make feature discrimination
easier. You apply decorrelation stretching with the decorrstretch function. See “Linear Contrast
Stretching” on page 8-87 on how to add an optional linear contrast stretch to the decorrelation
stretch.

The number of color bands, NBANDS, in the image is usually three. But you can apply decorrelation
stretching regardless of the number of color bands.

The original color values of the image are mapped to a new set of color values with a wider range.
The color intensities of each pixel are transformed into the color eigenspace of the NBANDS-by-
NBANDS covariance or correlation matrix, stretched to equalize the band variances, then
transformed back to the original color bands.

To define the band-wise statistics, you can use the entire original image or, with the subset option,
any selected subset of it.

Simple Decorrelation Stretching

This example shows how to perform decorrelation stretching to three color bands of an image. A
color band scatterplot of the images shows how the bands are decorrelated and equalized.

Perform Decorrelation Stretch

Read an image from the library of images available in the imdata folder. This example uses a
LANDSAT image of the Little Colorado River. The image has seven bands, but just read in the three
visible colors.

A = multibandread('littlecoriver.lan', [512, 512, 7], ...


'uint8=>uint8', 128, 'bil', 'ieee-le', ...
{'Band','Direct',[3 2 1]});

Perform the decorrelation stretch.

B = decorrstretch(A);

Display the original image and the processed image. Compare the two images. The original has a
strong violet (red-bluish) tint, while the transformed image has a somewhat expanded color range.

imshow(A)
title('Little Colorado River Image')

8-83
8 Designing and Implementing Linear Filters for Image Data

imshow(B)
title('Little Colorado River Image After Decorrelation Stretch')

8-84
Enhance Color Separation Using Decorrelation Stretching

Create a Color Band Scatterplot

First separate the three color channels of the original image.


[rA,gA,bA] = imsplit(A);

Separate the three color channels of the image after decorrelation stretching.
[rB,gB,bB] = imsplit(B);

Display the color scatterplot of the original image. Then display the color scatterplot of the image
after decorrelation stretching.
figure
plot3(rA(:),gA(:),bA(:),'.')
grid on

8-85
8 Designing and Implementing Linear Filters for Image Data

xlabel('Red (Band 3)')


ylabel('Green (Band 2)')
zlabel('Blue (Band 1)')
title('Color Scatterplot Before Decorrelation Stretch')

figure
plot3(rB(:),gB(:),bB(:),'.')
grid on
xlabel('Red (Band 3)')
ylabel('Green (Band 2)')
zlabel('Blue (Band 1)')
title('Color Scatterplot After Decorrelation Stretch')

8-86
Enhance Color Separation Using Decorrelation Stretching

Linear Contrast Stretching


Adding linear contrast stretch enhances the resulting image by further expanding the color range.
The following example uses the Tol option to saturate equal fractions of the image at high and low
intensities. Without the Tol option, decorrstretch applies no linear contrast stretch.

See the stretchlim function reference page for more about calculating saturation limits.

Note You can apply a linear contrast stretch as a separate operation after performing a decorrelation
stretch, using stretchlim and imadjust. This alternative, however, often gives inferior results for
uint8 and uint16 images, because the pixel values must be clamped to [0 255] (or [0 65535]). The
Tol option in decorrstretch circumvents this limitation.

Decorrelation Stretch with Linear Contrast Stretch

Read the three visible color channels of the LANDSAT image of the Little Colorado River.

A = multibandread('littlecoriver.lan', [512, 512, 7], ...


'uint8=>uint8', 128, 'bil', 'ieee-le', ...
{'Band','Direct',[3 2 1]});

8-87
8 Designing and Implementing Linear Filters for Image Data

Apply decorrelation stretching, specifying the linear contrast stretch. Setting the value 'Tol' to 0.01
maps the transformed color range within each band to a normalized interval between 0.01 and 0.99,
saturating 2%.

C = decorrstretch(A,'Tol',0.01);
imshow(C)
title(['Little Colorado River After Decorrelation Stretch and ',...
'Linear Contrast Stretch'])

See Also
decorrstretch | imadjust | stretchlim | imhist | multibandread | plot3

8-88
Enhance Color Separation Using Decorrelation Stretching

Related Examples
• “Enhance Multispectral Color Composite Images” on page 8-90

8-89
8 Designing and Implementing Linear Filters for Image Data

Enhance Multispectral Color Composite Images

This example shows some basic image composition and enhancement techniques for use with
multispectral data. It is often necessary to enhance multispectral radiance or reflectance data to
create an image that is suitable for visual interpretation. The example uses Landsat thematic mapper
imagery covering part of Paris, France. Seven spectral bands are stored in one file in the Erdas LAN
format. Concepts covered include:

• Reading multispectral data from Erdas LAN files


• Constructing color composites from different band combinations
• Enhancing imagery with a contrast stretch
• Enhancing imagery with a decorrelation stretch
• Using scatterplots

Create Truecolor Composite from Multispectral Image

The LAN file, paris.lan, contains a 7-band 512-by-512 Landsat image. A 128-byte header is
followed by the pixel values, which are band interleaved by line (BIL) in order of increasing band
number. They are stored as unsigned 8-bit integers, in little-endian byte order.

Read bands 3, 2, and 1 from the LAN file using the MATLAB® function multibandread. These
bands cover the visible part of the spectrum. When they are mapped to the red, green, and blue
planes, respectively, of an RGB image, the result is a standard truecolor composite. The final input
argument to multibandread specifies which bands to read, and in which order, so that you can
create an RGB composite in a single step.

truecolor = multibandread('paris.lan',[512, 512, 7],'uint8=>uint8', ...


128,'bil','ieee-le',{'Band','Direct',[3 2 1]});

The truecolor composite has very little contrast and the colors are unbalanced.

imshow(truecolor)
title('Truecolor Composite (Un-Enhanced)')
text(size(truecolor,2),size(truecolor,1)+15,...
'Image courtesy of Space Imaging, LLC',...
'FontSize',7,'HorizontalAlignment','right')

8-90
Enhance Multispectral Color Composite Images

Explore Histogram and Scatterplot of Un-Enhanced Truecolor Composite

By viewing a histogram of the red band, for example, you can see that the data is concentrated within
a small part of the available dynamic range. This is one reason why the truecolor composite appears
dull.

imhist(truecolor(:,:,1))
title('Histogram of the Red Band (Band 3)')

8-91
8 Designing and Implementing Linear Filters for Image Data

Another reason for the dull appearance of the composite is that the visible bands are highly
correlated with each other. Two- and three-band scatterplots are an excellent way to gauge the
degree of correlation among spectral bands. You can make them easily just by using plot. The linear
trend of the red-green-blue scatterplot indicates that the visible bands are highly correlated. This
helps explain the monochromatic look of the un-enhanced truecolor composite.

r = truecolor(:,:,1);
g = truecolor(:,:,2);
b = truecolor(:,:,3);

plot3(r(:),g(:),b(:),'.')
grid('on')
xlabel('Red (Band 3)')
ylabel('Green (Band 2)')

8-92
Enhance Multispectral Color Composite Images

zlabel('Blue (Band 1)')


title('Scatterplot of the Visible Bands')

Enhance Truecolor Composite with Contrast Stretch

When you use imadjust to apply a linear contrast stretch to the truecolor composite image, the
surface features are easier to recognize.

stretched_truecolor = imadjust(truecolor,stretchlim(truecolor));
imshow(stretched_truecolor)
title('Truecolor Composite after Contrast Stretch')

8-93
8 Designing and Implementing Linear Filters for Image Data

Check Histogram Following the Contrast Stretch

A histogram of the red band after applying a contrast stretch shows that the data has been spread
over much more of the available dynamic range. Create a histogram of all red pixel values in the
image by using the imhist function.

imhist(stretched_truecolor(:,:,1))
title('Histogram of Red Band (Band 3) after Contrast Stretch')

8-94
Enhance Multispectral Color Composite Images

Enhance Truecolor Composite with Decorrelation Stretch

Another way to enhance the truecolor composite is to use a decorrelation stretch, which enhances
color separation across highly correlated channels. Use decorrstretch to perform the
decorrelation stretch. Specify the optional name-value pair 'Tol',0.1 to perform a linear contrast
stretch after the decorrelation stretch. Again, surface features have become much more clearly
visible, but in a different way. The spectral differences across the scene have been exaggerated. A
noticeable example is the area of green on the left edge, which appears black in the contrast-
stretched composite. This green area is the Bois de Boulogne, a large park on the western edge of
Paris.

decorrstretched_truecolor = decorrstretch(truecolor,'Tol',0.01);
imshow(decorrstretched_truecolor)
title('Truecolor Composite after Decorrelation Stretch')

8-95
8 Designing and Implementing Linear Filters for Image Data

Check Correlation Following Decorrelation Stretch

As expected, a scatterplot following the decorrelation stretch shows a strong decrease in correlation.

r = decorrstretched_truecolor(:,:,1);
g = decorrstretched_truecolor(:,:,2);
b = decorrstretched_truecolor(:,:,3);

plot3(r(:),g(:),b(:),'.')
grid('on')
xlabel('Red (Band 3)')
ylabel('Green (Band 2)')
zlabel('Blue (Band 1)')
title('Scatterplot of the Visible Bands after Decorrelation Stretch')

8-96
Enhance Multispectral Color Composite Images

Create and Enhance CIR Composite

Just as with the visible bands, information from Landsat bands covering non-visible portions of the
spectrum can be viewed by constructing and enhancing RGB composite images. The near infrared
(NIR) band (Band 4) is important because of the high reflectance of chlorophyll in this part of the
spectrum. It is even more useful when combined with visible red and green (Bands 3 and 2,
respectively) to form a color infrared (CIR) composite image. Color infrared (CIR) composites are
commonly used to identify vegetation or assess its state of growth and/or health.

Construct a CIR composite by reading from the original LAN file and composing an RGB image that
maps bands 4, 3, and 2 to red, green, and blue, respectively.

CIR = multibandread('paris.lan',[512, 512, 7],'uint8=>uint8', ...


128,'bil','ieee-le',{'Band','Direct',[4 3 2]});

8-97
8 Designing and Implementing Linear Filters for Image Data

Even though the near infrared (NIR) band (Band 4) is less correlated with the visible bands than the
visible bands are with each other, a decorrelation stretch makes many features easier to see. A
property of color infrared composites is that they look red in areas with a high vegetation
(chlorophyll) density. Notice that the Bois de Boulogne park is red in the CIR composite, which is
consistent with its green appearance in the decorrelation-stretched truecolor composite.

stretched_CIR = decorrstretch(CIR,'Tol',0.01);
imshow(stretched_CIR)
title('CIR after Decorrelation Stretch')

See Also
decorrstretch | imadjust | stretchlim | imhist | multibandread | plot3

8-98
Enhance Multispectral Color Composite Images

Related Examples
• “Enhance Color Separation Using Decorrelation Stretching” on page 8-83

8-99
8 Designing and Implementing Linear Filters for Image Data

Low-Light Image Enhancement

This example shows how to brighten dark regions of an image while preventing oversaturation of
bright regions.

Images can be highly degraded due to poor lighting conditions. These images can have low dynamic
ranges with high noise levels that affect the overall performance of computer vision algorithms. To
make computer vision algorithms robust in low-light conditions, use low-light image enhancement to
improve the visibility of an image.

Read and display an RGB image captured in low light.

A = imread("lowlight_1.jpg");
imshow(A)
title("Original Image")

8-100
Low-Light Image Enhancement

Localized Brightening

Brighten the low-light image in proportion to the darkness of the local region, then display the
brightened image. Dark regions brighten significantly. Bright regions also have a small increase in
brightness, causing oversaturation. The image looks somewhat unnatural and is perhaps brightened
too much.

8-101
8 Designing and Implementing Linear Filters for Image Data

B = imlocalbrighten(A);
imshow(B)

Display a histogram of the pixel values for the original image and the brightened image. For the
original image, the histogram is skewed towards darker pixel values. For the brightened image, the
pixel values are more evenly distributed throughout the full range of pixel values.

8-102
Low-Light Image Enhancement

figure
subplot(1,2,1)
imhist(A)
title("Original Image")
subplot(1,2,2)
imhist(B)
title("Brightened Image")

Brighten the original low-light image again and specify a smaller brightening amount.

amt = 0.5;
B2 = imlocalbrighten(A,amt);

Display the brightened image. The image looks more natural. The dark regions of the image are
enhanced, but the bright regions by the windows are still oversaturated.

figure
imshow(B2)
title("Image with Less Brightening")

8-103
8 Designing and Implementing Linear Filters for Image Data

To reduce oversaturation of bright regions, apply alpha blending when brightening the image. The
dark regions are brighter, and the bright pixels retain their original pixel values.

B3 = imlocalbrighten(A,amt,AlphaBlend=true);
imshow(B3)
title("Image with Alpha Blending")

8-104
Low-Light Image Enhancement

For comparison, display the three enhanced images in a montage.

figure
montage({B,B2,B3},Size=[1 3],BorderSize=5,BackgroundColor="w")

8-105
8 Designing and Implementing Linear Filters for Image Data

References
[1] Dong, X., G. Wang, Y. Pang, W. Li, J. Wen, W. Meng, and Y. Lu. "Fast efficient algorithm for
enhancement of low lighting video." Proceedings of IEEE® International Conference on
Multimedia and Expo (ICME). 2011, pp. 1–6.

See Also
imlocalbrighten

8-106
Design Linear Filters in the Frequency Domain

Design Linear Filters in the Frequency Domain


In this section...
“Two-Dimensional Finite Impulse Response (FIR) Filters” on page 8-107
“Create 2-D Filter Using Frequency Transformation of 1-D Filter” on page 8-107
“Create Filter Using Frequency Sampling Method” on page 8-109
“Windowing Method” on page 8-111
“Creating the Desired Frequency Response Matrix” on page 8-112
“Computing the Frequency Response of a Filter” on page 8-113

This topic describes functions that perform filtering in the frequency domain. For information about
designing filters in the spatial domain, see “What Is Image Filtering in the Spatial Domain?” on page
8-2.

Two-Dimensional Finite Impulse Response (FIR) Filters


The Image Processing Toolbox software supports one class of linear filter: the two-dimensional finite
impulse response (FIR) filter. FIR filters have a finite extent to a single point, or impulse. All the
Image Processing Toolbox filter design functions return FIR filters.

FIR filters have several characteristics that make them ideal for image processing in the MATLAB
environment:

• FIR filters are easy to represent as matrices of coefficients.


• Two-dimensional FIR filters are natural extensions of one-dimensional FIR filters.
• There are several well-known, reliable methods for FIR filter design.
• FIR filters are easy to implement.
• FIR filters can be designed to have linear phase, which helps prevent distortion.

Another class of filter, the infinite impulse response (IIR) filter, is not as suitable for image processing
applications. It lacks the inherent stability and ease of design and implementation of the FIR filter.
Therefore, this toolbox does not provide IIR filter support.

Note Most of the design methods described in this section work by creating a two-dimensional filter
from a one-dimensional filter or window created using Signal Processing Toolbox™ functions.
Although this toolbox is not required, you might find it difficult to design filters if you do not have the
Signal Processing Toolbox software.

Create 2-D Filter Using Frequency Transformation of 1-D Filter


This example shows how to transform a one-dimensional FIR filter into a two-dimensional FIR filter
using the ftrans2 function. This function can be useful because it is easier to design a one-
dimensional filter with particular characteristics than a corresponding two-dimensional filter. The
frequency transformation method preserves most of the characteristics of the one-dimensional filter,
particularly the transition bandwidth and ripple characteristics. The shape of the one-dimensional
frequency response is clearly evident in the two-dimensional response.

8-107
8 Designing and Implementing Linear Filters for Image Data

This function uses a transformation matrix, a set of elements that defines the frequency
transformation. This function's default transformation matrix produces filters with nearly circular
symmetry. By defining your own transformation matrix, you can obtain different symmetries. (See Jae
S. Lim, Two-Dimensional Signal and Image Processing, 1990, for details.)

Create 1-D FIR filter using the firpm function from the Signal Processing Toolbox.
b = firpm(10,[0 0.4 0.6 1],[1 1 0 0])

b =

Columns 1 through 9

0.0537 -0.0000 -0.0916 -0.0001 0.3131 0.4999 0.3131 -0.0001 -0.0916

Columns 10 through 11

-0.0000 0.0537

Transform the 1-D filter to a 2-D filter.


h = ftrans2(b);

h =

Columns 1 through 9

0.0001 0.0005 0.0024 0.0063 0.0110 0.0132 0.0110 0.0063 0.0024


0.0005 0.0031 0.0068 0.0042 -0.0074 -0.0147 -0.0074 0.0042 0.0068
0.0024 0.0068 -0.0001 -0.0191 -0.0251 -0.0213 -0.0251 -0.0191 -0.0001
0.0063 0.0042 -0.0191 -0.0172 0.0128 0.0259 0.0128 -0.0172 -0.0191
0.0110 -0.0074 -0.0251 0.0128 0.0924 0.1457 0.0924 0.0128 -0.0251
0.0132 -0.0147 -0.0213 0.0259 0.1457 0.2021 0.1457 0.0259 -0.0213
0.0110 -0.0074 -0.0251 0.0128 0.0924 0.1457 0.0924 0.0128 -0.0251
0.0063 0.0042 -0.0191 -0.0172 0.0128 0.0259 0.0128 -0.0172 -0.0191
0.0024 0.0068 -0.0001 -0.0191 -0.0251 -0.0213 -0.0251 -0.0191 -0.0001
0.0005 0.0031 0.0068 0.0042 -0.0074 -0.0147 -0.0074 0.0042 0.0068
0.0001 0.0005 0.0024 0.0063 0.0110 0.0132 0.0110 0.0063 0.0024

Columns 10 through 11

0.0005 0.0001
0.0031 0.0005
0.0068 0.0024
0.0042 0.0063
-0.0074 0.0110
-0.0147 0.0132
-0.0074 0.0110
0.0042 0.0063
0.0068 0.0024
0.0031 0.0005
0.0005 0.0001

View the frequency response of the filters.


[H,w] = freqz(b,1,64,"whole");
colormap(jet(64))
plot(w/pi-1,fftshift(abs(H)))
figure, freqz2(h,[32 32])

8-108
Design Linear Filters in the Frequency Domain

One-Dimensional Frequency Response

Corresponding Two-Dimensional Frequency Response

Create Filter Using Frequency Sampling Method

This example shows how to create a 2-D filter based on a desired frequency response using the
frequency sampling method.

Given a matrix of points that define the shape of the frequency response, the frequency sampling
method creates a filter whose frequency response passes through those points. Frequency sampling
places no constraints on the behavior of the frequency response between the given points. Usually,

8-109
8 Designing and Implementing Linear Filters for Image Data

the response ripples in these areas. (Ripples are oscillations around a constant value. The frequency
response of a practical filter often has ripples where the frequency response of an ideal filter is flat.)

Define and display the target 2-D frequency response of an 11-by-11 filter.

Hd = zeros(11,11);
Hd(4:8,4:8) = 1;
[f1,f2] = freqspace(11,"meshgrid");
mesh(f1,f2,Hd)
colormap(jet(64))
title("Desired Frequency Response")

Create the filter based on the target frequency response using the fsamp2 function. fsamp2 returns
the filter h with a frequency response that passes through the points in the input matrix Hd

h = fsamp2(Hd);

Plot the frequency response of the filter.

freqz2(h,[32 32])
title("Actual Frequency Response")

8-110
Design Linear Filters in the Frequency Domain

Notice the ripples in the actual frequency response compared to the desired frequency response.
These ripples are a fundamental problem with the frequency sampling design method. They occur
wherever there are sharp transitions in the desired response.

You can reduce the spatial extent of the ripples by using a larger filter. However, a larger filter does
not reduce the height of the ripples, and requires more computation time for filtering. To achieve a
smoother approximation to the desired frequency response, consider using the frequency
transformation method or the windowing method.

Windowing Method
The windowing method involves multiplying the ideal impulse response with a window function to
generate a corresponding filter, which tapers the ideal impulse response. Like the frequency sampling
method, the windowing method produces a filter whose frequency response approximates a desired
frequency response. The windowing method, however, tends to produce better results than the
frequency sampling method.

The toolbox provides two functions for window-based filter design, fwind1 and fwind2. fwind1
designs a two-dimensional filter by using a two-dimensional window that it creates from one or two
one-dimensional windows that you specify. fwind2 designs a two-dimensional filter by using a
specified two-dimensional window directly.

fwind1 supports two different methods for making the two-dimensional windows it uses:

8-111
8 Designing and Implementing Linear Filters for Image Data

• Transforming a single one-dimensional window to create a two-dimensional window that is nearly


circularly symmetric, by using a process similar to rotation
• Creating a rectangular, separable window from two one-dimensional windows, by computing their
outer product

The example below uses fwind1 to create an 11-by-11 filter from the desired frequency response Hd.
The example uses the Signal Processing Toolbox hamming function to create a one-dimensional
window, which fwind1 then extends to a two-dimensional window.

Hd = zeros(11,11); Hd(4:8,4:8) = 1;
[f1,f2] = freqspace(11,'meshgrid');
mesh(f1,f2,Hd), axis([-1 1 -1 1 0 1.2]), colormap(jet(64))
h = fwind1(Hd,hamming(11));
figure, freqz2(h,[32 32]), axis([-1 1 -1 1 0 1.2])

Desired Two-Dimensional Frequency Response (left) and Actual Two-Dimensional Frequency


Response (right)

Creating the Desired Frequency Response Matrix


The filter design functions fsamp2, fwind1, and fwind2 all create filters based on a desired
frequency response magnitude matrix. Frequency response is a mathematical function describing the
gain of a filter in response to different input frequencies.

You can create an appropriate desired frequency response matrix using the freqspace function.
freqspace returns correct, evenly spaced frequency values for any size response. If you create a
desired frequency response matrix using frequency points other than those returned by freqspace,
you might get unexpected results, such as nonlinear phase.

For example, to create a circular ideal low-pass frequency response with cutoff at 0.5, use

[f1,f2] = freqspace(25,"meshgrid");
Hd = zeros(25,25); d = sqrt(f1.^2 + f2.^2) < 0.5;
Hd(d) = 1;
mesh(f1,f2,Hd)

8-112
Design Linear Filters in the Frequency Domain

Ideal Circular Low-pass Frequency Response

Note that for this frequency response, the filters produced by fsamp2, fwind1, and fwind2 are real.
This result is desirable for most image processing applications. To achieve this in general, the desired
frequency response should be symmetric about the frequency origin (f1 = 0, f2 = 0).

Computing the Frequency Response of a Filter


The freqz2 function computes the frequency response for a two-dimensional filter. With no output
arguments, freqz2 creates a mesh plot of the frequency response. For example, consider this FIR
filter,

h =[0.1667 0.6667 0.1667


0.6667 -3.3333 0.6667
0.1667 0.6667 0.1667];

This command computes and displays the 64-by-64 point frequency response of h.

freqz2(h)

Frequency Response of a Two-Dimensional Filter

To obtain the frequency response matrix H and the frequency point vectors f1 and f2, use output
arguments

[H,f1,f2] = freqz2(h);

freqz2 normalizes the frequencies f1 and f2 so that the value 1.0 corresponds to half the sampling
frequency, or π radians.

8-113
8 Designing and Implementing Linear Filters for Image Data

For a simple m-by-n response, as shown above, freqz2 uses the two-dimensional fast Fourier
transform function fft2. You can also specify vectors of arbitrary frequency points, but in this case
freqz2 uses a slower algorithm.

See “Fourier Transform” on page 10-2 for more information about the fast Fourier transform and
its application to linear filtering and filter design.

See Also
freqspace | freqz2 | fsamp2 | ftrans2 | fwind1 | fwind2

8-114
9

Image Deblurring

This chapter describes how to deblur an image using the toolbox deblurring functions.

• “Image Deblurring” on page 9-2


• “Deblur Images Using a Wiener Filter” on page 9-5
• “Deblur Images Using Regularized Filter” on page 9-12
• “Adapt the Lucy-Richardson Deconvolution for Various Image Distortions” on page 9-22
• “Deblurring Images Using the Lucy-Richardson Algorithm” on page 9-25
• “Adapt Blind Deconvolution for Various Image Distortions” on page 9-37
• “Deblurring Images Using the Blind Deconvolution Algorithm” on page 9-45
• “Create Your Own Deblurring Functions” on page 9-53
• “Avoid Ringing in Deblurred Images” on page 9-54
9 Image Deblurring

Image Deblurring
The blurring, or degradation, of an image can be caused by many factors:

• Movement during the image capture process, by the camera or, when long exposure times are
used, by the subject
• Out-of-focus optics, use of a wide-angle lens, atmospheric turbulence, or a short exposure time,
which reduces the number of photons captured
• Scattered light distortion in confocal microscopy

A blurred or degraded image can be approximately described by this equation g = Hf + n.

g The blurred image


H The distortion operator, also called the point spread function (PSF). In the spatial domain,
the PSF describes the degree to which an optical system blurs (spreads) a point of light. The
PSF is the inverse Fourier transform of the optical transfer function (OTF). In the frequency
domain, the OTF describes the response of a linear, position-invariant system to an impulse.
The OTF is the Fourier transform of the point spread function (PSF). The distortion operator,
when convolved with the image, creates the distortion. Distortion caused by a point spread
function is just one type of distortion.
f The original true image

Note The image f does not really exist. This image represents what you would have if you
had perfect image acquisition conditions.
n Additive noise, introduced during image acquisition, that corrupts the image

Based on this model, the fundamental task of deblurring is to deconvolve the blurred image with the
PSF that exactly describes the distortion. Deconvolution is the process of reversing the effect of
convolution.

Note The quality of the deblurred image is mainly determined by knowledge of the PSF.

To illustrate, this example takes a clear image and deliberately blurs it by convolving it with a PSF.
The example uses the fspecial function to create a PSF that simulates a motion blur, specifying the
length of the blur in pixels, (LEN=31), and the angle of the blur in degrees (THETA=11). Once the PSF
is created, the example uses the imfilter function to convolve the PSF with the original image, I,
to create the blurred image, Blurred. To see how deblurring is the reverse of this process, using the
same images, see “Deblur Images Using a Wiener Filter” on page 9-5.

I = imread("peppers.png");
I = I(60+[1:256],222+[1:256],:); % crop the image
figure; imshow(I); title("Original Image");

9-2
Image Deblurring

LEN = 31;
THETA = 11;
PSF = fspecial("motion",LEN,THETA); % create PSF
Blurred = imfilter(I,PSF,"circular","conv");
figure; imshow(Blurred); title("Blurred Image");

Deblurring Functions
The toolbox includes four deblurring functions, listed here in order of complexity. All the functions
accept a PSF and the blurred image as their primary arguments.

deconvwnr Implements a least squares solution. You should provide some information about
the noise to reduce possible noise amplification during deblurring. See “Deblur
Images Using a Wiener Filter” on page 9-5 for more information.
deconvreg Implements a constrained least squares solution, where you can place
constraints on the output image (the smoothness requirement is the default).
You should provide some information about the noise to reduce possible noise
amplification during deblurring. See “Deblur Images Using Regularized Filter”
on page 9-12 for more information.
deconvlucy Implements an accelerated, damped Lucy-Richardson algorithm. This function
performs multiple iterations, using optimization techniques and Poisson
statistics. You do not need to provide information about the additive noise in the
corrupted image. See “Adapt the Lucy-Richardson Deconvolution for Various
Image Distortions” on page 9-22 for more information.

9-3
9 Image Deblurring

deconvblind Implements the blind deconvolution algorithm, which performs deblurring


without knowledge of the PSF. You pass as an argument your initial guess at the
PSF. The deconvblind function returns a restored PSF in addition to the
restored image. The implementation uses the same damping and iterative model
as the deconvlucy function. See “Adapt Blind Deconvolution for Various Image
Distortions” on page 9-37 for more information.

When using the deblurring functions, note the following:

• Deblurring is an iterative process. You might need to repeat the deblurring process multiple times,
varying the parameters you specify to the deblurring functions with each iteration, until you
achieve an image that, based on the limits of your information, is the best approximation of the
original scene. Along the way, you must make numerous judgments about whether newly
uncovered features in the image are features of the original scene or simply artifacts of the
deblurring process.
• To avoid "ringing" in a deblurred image, you can use the edgetaper function to preprocess your
image before passing it to the deblurring functions. See “Avoid Ringing in Deblurred Images” on
page 9-54 for more information.
• For information about creating your own deblurring functions, see “Create Your Own Deblurring
Functions” on page 9-53.

9-4
Deblur Images Using a Wiener Filter

Deblur Images Using a Wiener Filter

This example shows how to use Wiener deconvolution to deblur images. Wiener deconvolution can be
used effectively when the frequency characteristics of the image and additive noise are known, to at
least some degree.

Read Pristine Image

Read and display a pristine image that does not have blur or noise.

Ioriginal = imread('cameraman.tif');
imshow(Ioriginal)
title('Original Image')

Simulate and Restore Motion Blur Without Noise

Simulate a blurred image that might result from camera motion. First, create a point-spread function,
PSF, by using the fspecial function and specifying linear motion across 21 pixels at an angle of 11
degrees. Then, convolve the point-spread function with the image by using imfilter.

The original image has data type uint8. If you pass a uint8 image to imfilter, then the function
will quantize the output in order to return another uint8 image. To reduce quantization errors,
convert the image to double before calling imfilter.

PSF = fspecial('motion',21,11);
Idouble = im2double(Ioriginal);
blurred = imfilter(Idouble,PSF,'conv','circular');
imshow(blurred)
title('Blurred Image')

9-5
9 Image Deblurring

Restore the blurred image by using the deconvwnr function. The blurred image does not have noise
so you can omit the noise-to-signal (NSR) input argument.

wnr1 = deconvwnr(blurred,PSF);
imshow(wnr1)
title('Restored Blurred Image')

9-6
Deblur Images Using a Wiener Filter

Simulate and Restore Motion Blur and Gaussian Noise

Add zero-mean Gaussian noise to the blurred image by using the imnoise function.

noise_mean = 0;
noise_var = 0.0001;
blurred_noisy = imnoise(blurred,'gaussian',noise_mean,noise_var);
imshow(blurred_noisy)
title('Blurred and Noisy Image')

Try to restore the blurred noisy image by using deconvwnr without providing a noise estimate. By
default, the Wiener restoration filter assumes the NSR is equal to 0. In this case, the Wiener
restoration filter is equivalent to an ideal inverse filter, which can be extremely sensitive to noise in
the input image.

In this example, the noise in this restoration is amplified to such a degree that the image content is
lost.

wnr2 = deconvwnr(blurred_noisy,PSF);
imshow(wnr2)
title('Restoration of Blurred Noisy Image (NSR = 0)')

9-7
9 Image Deblurring

Try to restore the blurred noisy image by using deconvwnr with a more realistic value of the
estimated noise.

signal_var = var(Idouble(:));
NSR = noise_var / signal_var;
wnr3 = deconvwnr(blurred_noisy,PSF,NSR);
imshow(wnr3)
title('Restoration of Blurred Noisy Image (Estimated NSR)')

9-8
Deblur Images Using a Wiener Filter

Simulate and Restore Motion Blur and 8-Bit Quantization Noise

Even a visually imperceptible amount of noise can affect the result. One source of noise is
quantization errors from working with images in uint8 representation. Earlier, to avoid quantization
errors, this example simulated a blurred image from a pristine image in data type double. Now, to
explore the impact of quantization errors on the restoration, simulate a blurred image from the
pristine image in the original uint8 data type.

blurred_quantized = imfilter(Ioriginal,PSF,'conv','circular');
imshow(blurred_quantized)
title('Blurred Quantized Image')

Try to restore the blurred quantized image by using deconvwnr without providing a noise estimate.
Even though no additional noise was added, this restoration is degraded compared to the restoration
of the blurred image in data type double.

wnr4 = deconvwnr(blurred_quantized,PSF);
imshow(wnr4)
title('Restoration of Blurred Quantized Image (NSR = 0)');

9-9
9 Image Deblurring

Try to restore the blurred quantized image by using deconvwnr with a more realistic value of the
estimated noise.

uniform_quantization_var = (1/256)^2 / 12;


signal_var = var(Idouble(:));
NSR = uniform_quantization_var / signal_var;
wnr5 = deconvwnr(blurred_quantized,PSF,NSR);
imshow(wnr5)
title('Restoration of Blurred Quantized Image (Estimated NSR)');

9-10
Deblur Images Using a Wiener Filter

See Also
deconvwnr | fspecial | imfilter | imnoise

More About
• “Image Deblurring” on page 9-2

9-11
9 Image Deblurring

Deblur Images Using Regularized Filter

This example shows how to use regularized deconvolution to deblur images. Regularized
deconvolution can be used effectively when limited information is known about the additive noise and
constraints (such as smoothness) are applied on the recovered image. The blurred noisy image is
restored by a constrained least square restoration algorithm that uses a regularized filter.

Simulate Gaussian Blur and Gaussian Noise

Read and display a pristine image that does not have blur or noise.
I = im2double(imread("tissue.png"));
imshow(I)
title("Original Image")
text(size(I,2),size(I,1)+15, ...
"Image courtesy of Alan Partin, Johns Hopkins University", ...
FontSize=7,HorizontalAlignment="right")

Simulate a blurred image that might result from an out-of-focus lens. First, create a point-spread
function, PSF, by using the fspecial function and specifying a Gaussian filter of size 11-by-11 and
standard deviation 5. Then, convolve the point-spread function with the image by using imfilter.
PSF = fspecial("gaussian",11,5);
blurred = imfilter(I,PSF,"conv");

Add zero-mean Gaussian noise to the blurred image by using the imnoise function.

9-12
Deblur Images Using Regularized Filter

noise_mean = 0;
noise_var = 0.02;
blurred_noisy = imnoise(blurred,"gaussian",noise_mean,noise_var);

Display the blurred noisy image.

imshow(blurred_noisy)
title("Blurred and Noisy Image")

Restore Image Using Estimated Noise Power

Restore the blurred image by using the deconvreg function, supplying the noise power (NP) as the
third input parameter. To illustrate how sensitive the algorithm is to the value of noise power, this
example performs three restorations.

For the first restoration, use the true NP. Note that the example outputs two parameters here. The
first return value, reg1, is the restored image. The second return value, lagra, is a scalar Lagrange
multiplier on which the regularized deconvolution has converged. This value is used later in the
example.

NP = noise_var*numel(I);
[reg1,lagra] = deconvreg(blurred_noisy,PSF,NP);
imshow(reg1)
title("Restored with True NP")

9-13
9 Image Deblurring

For the second restoration, use a slightly overestimated noise power. The restoration has poor
resolution.

NP_scale1 = ;
reg2 = deconvreg(blurred_noisy,PSF,NP*NP_scale1);
imshow(reg2)
title("Restored with Larger NP")

9-14
Deblur Images Using Regularized Filter

For the third restoration, use a slightly underestimated noise power. The smaller the noise power
multiplier, the stronger the noise amplification and ringing from the image borders.

NP_scale2 = ;
reg3 = deconvreg(blurred_noisy,PSF,NP*NP_scale2);
imshow(reg3)
title("Restored with Smaller NP")

9-15
9 Image Deblurring

Reduce Noise Amplification and Ringing

You can reduce the noise amplification and ringing along the boundary of the image by calling the
edgetaper function prior to deconvolution. The image restoration becomes less sensitive to the
noise power parameter.

tapered = edgetaper(blurred_noisy,PSF);
reg4 = deconvreg(tapered,PSF,NP*NP_scale2);
imshow(reg4)
title("Restored with Smaller NP and Edge Tapering")

9-16
Deblur Images Using Regularized Filter

Use Lagrange Multiplier

Restore the blurred and noisy image, assuming that the optimal solution is already found and the
corresponding Lagrange multiplier is known. In this case, any value passed for noise power, NP, is
ignored.

To illustrate how sensitive the algorithm is to the Lagrange multiplier, this example performs three
restorations. The first restoration uses the lagra output from the reg1 restoration performed earlier.

reg5 = deconvreg(tapered,PSF,[],lagra);
imshow(reg5)
title("Restored with LAGRA")

9-17
9 Image Deblurring

The second restoration uses a larger value than the lagra output from the reg1 restoration. A larger
value increases the significance of the constraint. By default, this leads to oversmoothing of the
image.

lagra_scale1 = ;
reg6 = deconvreg(tapered,PSF,[],lagra*lagra_scale1);
imshow(reg6)
title("Restored with Large LAGRA")

9-18
Deblur Images Using Regularized Filter

The third restoration uses a smaller value than the lagra output from the reg1 restoration. A small
value weakens the constraint (the smoothness requirement set for the image). It amplifies the noise.
For the extreme case when the Lagrange multiplier equals 0, the reconstruction is a pure inverse
filtering.

lagra_scale2 = ;
reg7 = deconvreg(tapered,PSF,[],lagra*lagra_scale2);
imshow(reg7)
title("Restored with Small LAGRA")

9-19
9 Image Deblurring

Use Different Smoothness Constraint

Restore the blurred noisy image using a different constraint for the regularization operator. Instead
of using the default Laplacian constraint on image smoothness, constrain the image smoothness using
a 3-by-3 Laplacian of Gaussian (LoG) operator.

h = fspecial("log",3);
reg8 = deconvreg(blurred_noisy,PSF,[],lagra,h);
imshow(reg8)
title("Constrained by LoG Operator")

9-20
Deblur Images Using Regularized Filter

See Also
deconvreg | imnoise | fspecial | imfilter

More About
• “Image Deblurring” on page 9-2

9-21
9 Image Deblurring

Adapt the Lucy-Richardson Deconvolution for Various Image


Distortions

In this section...
“Reduce the Effect of Noise Amplification” on page 9-22
“Account for Nonuniform Image Quality” on page 9-22
“Handle Camera Read-Out Noise” on page 9-23
“Handling Undersampled Images” on page 9-23
“Refine the Result” on page 9-23

Use the deconvlucy function to deblur an image using the accelerated, damped, Lucy-Richardson
algorithm. The algorithm maximizes the likelihood that the resulting image, when convolved with the
PSF, is an instance of the blurred image, assuming Poisson noise statistics. This function can be
effective when you know the PSF but know little about the additive noise in the image.

The deconvlucy function implements several adaptations to the original Lucy-Richardson maximum
likelihood algorithm that address complex image restoration tasks.

Reduce the Effect of Noise Amplification


Noise amplification is a common problem of maximum likelihood methods that attempt to fit data as
closely as possible. After many iterations, the restored image can have a speckled appearance,
especially for a smooth object observed at low signal-to-noise ratios. These speckles do not represent
any real structure in the image, but are artifacts of fitting the noise in the image too closely.

To control noise amplification, the deconvlucy function uses a damping parameter, DAMPAR. This
parameter specifies the threshold level for the deviation of the resulting image from the original
image, below which damping occurs. For pixels that deviate in the vicinity of their original values,
iterations are suppressed.

Damping is also used to reduce ringing, the appearance of high-frequency structures in a restored
image. Ringing is not necessarily the result of noise amplification. See “Avoid Ringing in Deblurred
Images” on page 9-54 for more information.

Account for Nonuniform Image Quality


Another complication of real-life image restoration is that the data might include bad pixels, or that
the quality of the receiving pixels might vary with time and position. By specifying the WEIGHT
argument with the deconvlucy function, you can specify that certain pixels in the image be ignored.
To ignore a pixel, assign a weight of zero to the element in the WEIGHT array that corresponds to the
pixel in the image.

The algorithm converges on predicted values for the bad pixels based on the information from
neighborhood pixels. The variation in the detector response from pixel to pixel (the so-called flat-field
correction) can also be accommodated by the WEIGHT array. Instead of assigning a weight of 1.0 to
the good pixels, you can specify fractional values and weight the pixels according to the amount of
the flat-field correction.

9-22
Adapt the Lucy-Richardson Deconvolution for Various Image Distortions

Handle Camera Read-Out Noise


Noise in charge coupled device (CCD) detectors has two primary components:

• Photon counting noise with a Poisson distribution


• Read-out noise with a Gaussian distribution

The Lucy-Richardson iterations intrinsically account for the first type of noise. You must account for
the second type of noise; otherwise, it can cause pixels with low levels of incident photons to have
negative values.

The deconvlucy function uses the READOUT input argument to handle camera read-out noise. The
value of this parameter is typically the sum of the read-out noise variance and the background noise,
such as the number of counts from the background radiation. The value of the READOUT argument
specifies an offset that ensures that all values are positive.

Handling Undersampled Images


The restoration of undersampled data can be improved significantly if it is done on a finer grid. The
deconvlucy function uses the SUBSMPL parameter to specify the subsampling rate, if the PSF is
known to have a higher resolution.

If the undersampled data is the result of camera pixel binning during image acquisition, the PSF
observed at each pixel rate can serve as a finer grid PSF. Otherwise, the PSF can be obtained via
observations taken at subpixel offsets or via optical modeling techniques. This method is especially
effective for images of stars (high signal-to-noise ratio), because the stars are effectively forced to be
in the center of a pixel. If a star is centered between pixels, it is restored as a combination of the
neighboring pixels. A finer grid redirects the consequent spreading of the star flux back to the center
of the star's image.

Refine the Result


The deconvlucy function, by default, performs multiple iterations of the deblurring process. You can
stop the processing after a certain number of iterations to check the result, and then restart the
iterations from the point where processing stopped. To do this, pass in the input image as a cell array,
for example, {BlurredNoisy}. The deconvlucy function returns the output image as a cell array
that you can then pass as an input argument to deconvlucy to restart the deconvolution.

The output cell array contains these four elements:

Element Description
output{1} Original input image
output{2} Image produced by the last iteration
output{3} Image produced by the next to last iteration
output{4} Internal information used by deconvlucy to know where to restart the
process

See Also
deconvlucy | deconvblind

9-23
9 Image Deblurring

Related Examples
• “Deblurring Images Using the Lucy-Richardson Algorithm” on page 9-25

More About
• “Image Deblurring” on page 9-2

9-24
Deblurring Images Using the Lucy-Richardson Algorithm

Deblurring Images Using the Lucy-Richardson Algorithm

This example shows how to use the Lucy-Richardson algorithm to deblur images. It can be used
effectively when the point-spread function PSF (blurring operator) is known, but little or no
information is available for the noise. The blurred and noisy image is restored by the iterative,
accelerated, damped Lucy-Richardson algorithm. You can use characteristics of the optical system as
input parameters to improve the quality of the image restoration.

Step 1: Read Image

The example reads in an RGB image and crops it to be 256-by-256-by-3. The deconvlucy function
can handle arrays of any dimension.
I = imread("board.tif");
I = I(50+(1:256),2+(1:256),:);
figure;
imshow(I);
title("Original Image");
text(size(I,2),size(I,1)+15, ...
"Image courtesy of courtesy of Alexander V. Panasyuk, Ph.D.", ...
"FontSize",7,"HorizontalAlignment","right");
text(size(I,2),size(I,1)+25, ...
"Harvard-Smithsonian Center for Astrophysics", ...
"FontSize",7,"HorizontalAlignment","right");

Step 2: Simulate a Blur and Noise

Simulate a real-life image that could be blurred due to camera motion or lack of focus. The image
could also be noisy due to random disturbances. The example simulates the blur by convolving a
Gaussian filter with the true image (using imfilter). The Gaussian filter then represents a point-
spread function, PSF.

9-25
9 Image Deblurring

PSF = fspecial("gaussian",5,5);
Blurred = imfilter(I,PSF,"symmetric","conv");
figure;
imshow(Blurred);
title("Blurred");

The example simulates the noise by adding a Gaussian noise of variance V to the blurred image (using
imnoise). The noise variance V is used later to define a damping parameter of the algorithm.

V = .002;
BlurredNoisy = imnoise(Blurred,"gaussian",0,V);
figure;
imshow(BlurredNoisy);
title("Blurred & Noisy");

9-26
Deblurring Images Using the Lucy-Richardson Algorithm

Step 3: Restore the Blurred and Noisy Image

Restore the blurred and noisy image providing the PSF and using only 5 iterations (default is 10). The
output is an array of the same type as the input image.

luc1 = deconvlucy(BlurredNoisy,PSF,5);
figure;
imshow(luc1);
title("Restored Image, NUMIT = 5");

9-27
9 Image Deblurring

Step 4: Iterate to Explore the Restoration

The resulting image changes with each iteration. To investigate the evolution of the image
restoration, you can do the deconvolution in steps: do a set of iterations, see the result, and then
resume the iterations from where they were stopped. To do so, the input image has to be passed as a
part of a cell array. For example, start the first set of iterations by passing in {BlurredNoisy}
instead of BlurredNoisy as the input image parameter.

luc1_cell = deconvlucy({BlurredNoisy},PSF,5);

In that case the output, luc1_cell, becomes a cell array. The cell output consists of four numeric
arrays, where the first is the BlurredNoisy image, the second is the restored image of class
double, the third array is the result of the one-before-last iteration, and the fourth array is an
internal parameter of the iterated set. The second numeric array of the output cell-array, image
luc1_cell{2}, is identical to the output array of the Step 3 image, luc1, with a possible exception
of their class (the cell output always gives the restored image of class double).

To resume the iterations, take the output from the previous function call, the cell-array luc1_cell,
and pass it into the deconvlucy function. Use the default number of iterations (NUMIT = 10). The
restored image is the result of a total of 15 iterations.

luc2_cell = deconvlucy(luc1_cell,PSF);
luc2 = im2uint8(luc2_cell{2});
figure;
imshow(luc2);
title("Restored Image, NUMIT = 15");

9-28
Deblurring Images Using the Lucy-Richardson Algorithm

Step 5: Control Noise Amplification by Damping

The latest image, luc2, is the result of 15 iterations. Although it is sharper than the earlier result
from 5 iterations, the image develops a "speckled" appearance. The speckles do not correspond to
any real structures (compare it to the true image), but instead are the result of fitting the noise in the
data too closely.

To control the noise amplification, use the damping option by specifying the DAMPAR parameter.
DAMPAR has to be of the same class as the input image. The algorithm dampens changes in the model
in regions where the differences are small compared with the noise. The DAMPAR used here equals 3
standard deviations of the noise. Notice that the image is smoother.

DAMPAR = im2uint8(3*sqrt(V));
luc3 = deconvlucy(BlurredNoisy,PSF,15,DAMPAR);
figure;
imshow(luc3);
title("Restored Image with Damping, NUMIT = 15");

9-29
9 Image Deblurring

The next part of this example explores the weight and subsample input parameters of the
deconvlucy function, using a simulated star image (for simplicity & speed).

Step 6: Create Sample Image

The example creates a black/white image of four stars.

I = zeros(32);
I(5,5) = 1;
I(10,3) = 1;
I(27,26) = 1;
I(29,25) = 1;
figure;
imshow(1-I,[],"InitialMagnification","fit");
ax = gca;
ax.Visible = "on";
ax.XTickLabel = [];
ax.YTickLabel = [];
ax.XTick = [7 24];
ax.XGrid = "on";
ax.YTick = [5 28];
ax.YGrid = "on";
title("Data");

9-30
Deblurring Images Using the Lucy-Richardson Algorithm

Step 7: Simulate a Blur

The example simulates a blur of the image of the stars by creating a Gaussian filter, PSF, and
convolving it with the true image.

PSF = fspecial("gaussian",15,3);
Blurred = imfilter(I,PSF,"conv","sym");

Now simulate a camera that can only observe part of the stars' images (only the blur is seen). Create
a weighting function array, WT, that consists of ones in the central part of the Blurred image ("good"
pixels, located within the dashed lines) and zeros at the edges ("bad" pixels - those that do not receive
the signal).

WT = zeros(32);
WT(6:27,8:23) = 1;
CutImage = Blurred.*WT;

To reduce the ringing associated with borders, apply the edgetaper function with the given PSF.

CutEdged = edgetaper(CutImage,PSF);
figure;
imshow(1-CutEdged,[],"InitialMagnification","fit");
ax = gca;
ax.Visible = "on";
ax.XTickLabel = [];
ax.YTickLabel = [];
ax.XTick = [7 24];

9-31
9 Image Deblurring

ax.XGrid = "on";
ax.YTick = [5 28];
ax.YGrid = "on";
title("Observed");

Step 8: Provide the WEIGHT Array

The algorithm weights each pixel value according to the weight array while restoring the image. In
our example, only the values of the central pixels are used (where WT = 1), while the "bad" pixel
values are excluded from the optimization. However, the algorithm can place the signal power into
the location of these "bad" pixels, beyond the edge of the camera's view. Notice the accuracy of the
resolved star positions.

luc4 = deconvlucy(CutEdged,PSF,300,0,WT);
figure;
imshow(1-luc4,[],"InitialMagnification","fit");
ax = gca;
ax.Visible = "on";
ax.XTickLabel = [];
ax.YTickLabel = [];
ax.XTick = [7 24];
ax.XGrid = "on";
ax.YTick = [5 28];
ax.YGrid = "on";
title("Restored");

9-32
Deblurring Images Using the Lucy-Richardson Algorithm

Step 9: Provide a finer-sampled PSF

deconvlucy can restore undersampled image given a finer sampled PSF (finer by subsample
times). To simulate the poorly resolved image and PSF, the example bins the Blurred image and the
original PSF, two pixels in one, in each dimension.

Binned = squeeze(sum(reshape(Blurred,[2 16 2 16])));


BinnedImage = squeeze(sum(Binned,2));
Binned = squeeze(sum(reshape(PSF(1:14,1:14),[2 7 2 7])));
BinnedPSF = squeeze(sum(Binned,2));
figure;
imshow(1-BinnedImage,[],"InitialMagnification","fit");
ax = gca;
ax.Visible = "on";
ax.XTick = [];
ax.YTick = [];
title("Binned Observed");

9-33
9 Image Deblurring

Restore the undersampled image, BinnedImage, using the undersampled PSF, BinnedPSF. Notice
that the luc5 image distinguishes only 3 stars.

luc5 = deconvlucy(BinnedImage,BinnedPSF,100);
figure;
imshow(1-luc5,[],"InitialMagnification","fit");
ax = gca;
ax.Visible = "on";
ax.XTick = [];
ax.YTick = [];
title("Poor PSF");

9-34
Deblurring Images Using the Lucy-Richardson Algorithm

The next example restores the undersampled image (BinnedImage), this time using the finer PSF
(defined on a subsample-times finer grid). The reconstructed image (luc6) resolves the position of
the stars more accurately. Note how it distributes power between the two stars in the lower right
corner of the image. This hints at the existence of two bright objects, instead of one, as in the
previous restoration.

luc6 = deconvlucy(BinnedImage,PSF,100,[],[],[],2);
figure;
imshow(1-luc6,[],"InitialMagnification","fit");
ax = gca;
ax.Visible = "on";
ax.XTick = [];
ax.YTick = [];
title("Fine PSF");

9-35
9 Image Deblurring

See Also
deconvwnr | deconvreg | deconvlucy | deconvblind

More About
• “Image Deblurring” on page 9-2
• “Adapt the Lucy-Richardson Deconvolution for Various Image Distortions” on page 9-22

9-36
Adapt Blind Deconvolution for Various Image Distortions

Adapt Blind Deconvolution for Various Image Distortions


Use the deconvblind function to deblur an image using the blind deconvolution algorithm. The
algorithm maximizes the likelihood that the resulting image, when convolved with the resulting PSF,
is an instance of the blurred image, assuming Poisson noise statistics. The blind deconvolution
algorithm can be used effectively when no information about the distortion (blurring and noise) is
known. The deconvblind function restores the image and the PSF simultaneously, using an iterative
process similar to the accelerated, damped Lucy-Richardson algorithm.

The deconvblind function, just like the deconvlucy function, implements several adaptations to
the original Lucy-Richardson maximum likelihood algorithm that address complex image restoration
tasks. Using these adaptations, you can

• Reduce the effect of noise on the restoration


• Account for nonuniform image quality (such as bad pixels)
• Handle camera read-out noise

For more information about these adaptations, see “Adapt the Lucy-Richardson Deconvolution for
Various Image Distortions” on page 9-22. The deconvblind function also supports PSF constraints
that you can provide through a user-specified function.

Deblur images using blind deconvolution

This example shows how to deblur an image using blind deconvolution. The example illustrates the
iterative nature of this operation, making two passes at deblurring the image using optional
parameters.

Read an image into the workspace and display it.

I = imread('cameraman.tif');
figure
imshow(I)
title('Original Image')

9-37
9 Image Deblurring

Create a point spread function (PSF). A PSF describes the degree to which an optical system blurs
(spreads) a point of light.

PSF = fspecial('motion',13,45);
figure
imshow(PSF,[],'InitialMagnification','fit')
title('Original PSF')

9-38
Adapt Blind Deconvolution for Various Image Distortions

Create a simulated blur in the image, using the PSF, and display the blurred image.

Blurred = imfilter(I,PSF,'circ','conv');
figure
imshow(Blurred)
title('Blurred Image')

9-39
9 Image Deblurring

Deblur the image using the deconvblind function. You must make an initial guess at the PSF. To
determine the size of the PSF, examine the blurred image and measure the width of a blur (in pixels)
around an obviously sharp object. Because the size of the PSF is more important than the values it
contains, you can typically specify an array of 1's as the initial PSF.

In this initial restoration, deconvblind was able to deblur the image to a great extent. Note,
however, the ringing around the sharp intensity contrast areas in the restored image. (The example
eliminated edge-related ringing by using the 'circular' option with imfilter when creating the
simulated blurred image.) To achieve a more satisfactory result, rerun the operation, experimenting
with PSFs of different sizes. The restored PSF returned by each deconvolution can also provide
valuable hints at the optimal PSF size.

INITPSF = ones(size(PSF));
[J P] = deconvblind(Blurred,INITPSF,30);
figure
imshow(J)
title('Restored Image')

9-40
Adapt Blind Deconvolution for Various Image Distortions

figure
imshow(P,[],'InitialMagnification','fit')
title('Restored PSF')

9-41
9 Image Deblurring

One way to improve the result is to create a weight array to exclude areas of high contrast from the
deblurring operation. This can reduce contrast-related ringing in the result.

To create a weight array, create an array the same size as the image, and assign the value 0 to the
pixels in the array that correspond to pixels in the original image that you want to exclude from
processing. The example uses a combination of edge detection and morphological processing to
detect high-contrast areas in the image. Because the blur in the image is linear, the example dilates
the image twice. To exclude the image boundary pixels (a high-contrast area) from processing, the
example uses padarray to assign the value 0 to all border pixels.
WEIGHT = edge(I,'sobel',.28);
se1 = strel('disk',1);
se2 = strel('line',13,45);
WEIGHT = ~imdilate(WEIGHT,[se1 se2]);
WEIGHT = padarray(WEIGHT(2:end-1,2:end-1),[1 1]);
figure
imshow(WEIGHT)
title('Weight Array')

Refine the guess at the PSF. The reconstructed PSF returned by the first pass at deconvolution, P ,
shows a clear linearity. For this second pass, the example uses a new PSF which is the same as the
returned PSF but with the small amplitude pixels set to 0.
P1 = P;
P1(find(P1 < 0.01))= 0;

Run the deconvolution again, this time specifying the weight array and the modified PSF. Note how
the restored image has much less ringing around the sharp intensity areas than the result of the first
pass.
[J2 P2] = deconvblind(Blurred,P1,50,[],double(WEIGHT));
figure, imshow(J2)
title('Newly Deblurred Image');

9-42
Adapt Blind Deconvolution for Various Image Distortions

figure, imshow(P2,[],'InitialMagnification','fit')
title('Newly Reconstructed PSF')

9-43
9 Image Deblurring

Refining the Result


The deconvblind function, by default, performs multiple iterations of the deblurring process. You
can stop the processing after a certain number of iterations to check the result, and then restart the
iterations from the point where processing stopped. To use this feature, you must pass in both the
blurred image and the PSF as cell arrays, for example, {Blurred} and {INITPSF}.

The deconvblind function returns the output image and the restored PSF as cell arrays. The output
image cell array contains these four elements:

Element Description
output{1} Original input image
output{2} Image produced by the last iteration
output{3} Image produced by the next to last iteration
output{4} Internal information used by deconvblind to know where to restart the
process

The PSF output cell array contains similar elements.

See Also
deconvlucy | deconvblind

Related Examples
• “Deblurring Images Using the Blind Deconvolution Algorithm” on page 9-45

More About
• “Image Deblurring” on page 9-2

9-44
Deblurring Images Using the Blind Deconvolution Algorithm

Deblurring Images Using the Blind Deconvolution Algorithm

This example shows how to use blind deconvolution to deblur images. The blind deconvolution
algorithm can be used effectively when no information about the distortion (blurring and noise) is
known. The algorithm restores the image and the point-spread function (PSF) simultaneously. The
accelerated, damped Richardson-Lucy algorithm is used in each iteration. Additional optical system
(e.g. camera) characteristics can be used as input parameters to improve the quality of the image
restoration. PSF constraints can be specified by a user-specified function.

Step 1: Read Image

Read a grayscale image into the workspace. The deconvblind function can handle arrays of any
dimension.

I = imread("cameraman.tif");
figure;imshow(I);title("Original Image");
text(size(I,2),size(I,1)+15, ...
"Image courtesy of Massachusetts Institute of Technology", ...
"FontSize",7,"HorizontalAlignment","right");

Step 2: Simulate a Blur

Simulate a real-life image that could be blurred (e.g., due to camera motion or lack of focus). The
example simulates the blur by convolving a Gaussian filter with the true image (using imfilter).
The Gaussian filter then represents a point-spread function, PSF.

PSF = fspecial("gaussian",7,10);
Blurred = imfilter(I,PSF,"symmetric","conv");
imshow(Blurred)
title("Blurred Image")

9-45
9 Image Deblurring

Step 3: Restore the Blurred Image Using PSFs of Various Sizes

To illustrate the importance of knowing the size of the true PSF, this example performs three
restorations. Each time the PSF reconstruction starts from a uniform array (an array of ones).

The first restoration, described by J1 and P1, uses an undersized array, UNDERPSF, for an initial
guess of the PSF. The size of the UNDERPSF array is 4 pixels shorter in each dimension than the true
PSF.

UNDERPSF = ones(size(PSF)-4);
[J1,P1] = deconvblind(Blurred,UNDERPSF);
imshow(J1)
title("Deblurring with Undersized PSF")

9-46
Deblurring Images Using the Blind Deconvolution Algorithm

The second restoration, described by J2 and P2, uses an array of ones, OVERPSF, for an initial PSF
that is 4 pixels longer in each dimension than the true PSF.

OVERPSF = padarray(UNDERPSF,[4 4],"replicate","both");


[J2,P2] = deconvblind(Blurred,OVERPSF);
imshow(J2)
title("Deblurring with Oversized PSF")

9-47
9 Image Deblurring

The third restoration, described by J3 and P3, uses an array of ones, INITPSF, for an initial PSF that
is exactly of the same size as the true PSF.

INITPSF = padarray(UNDERPSF,[2 2],"replicate","both");


[J3,P3] = deconvblind(Blurred,INITPSF);
imshow(J3)
title("Deblurring with INITPSF")

Step 4: Analyzing the Restored PSF

All three restorations also produce a PSF. The following pictures show how the analysis of the
reconstructed PSF might help in guessing the right size for the initial PSF. In the true PSF, a Gaussian
filter, the maximum values are at the center (white) and diminish at the borders (black).

figure;
subplot(2,2,1)
imshow(PSF,[],"InitialMagnification","fit")
title("True PSF")
subplot(222)
imshow(P1,[],"InitialMagnification","fit")
title("Reconstructed Undersized PSF")
subplot(2,2,3)
imshow(P2,[],"InitialMagnification","fit")
title("Reconstructed Oversized PSF")
subplot(2,2,4)
imshow(P3,[],"InitialMagnification","fit")
title("Reconstructed true PSF")

9-48
Deblurring Images Using the Blind Deconvolution Algorithm

The PSF reconstructed in the first restoration, P1, obviously does not fit into the constrained size. It
has a strong signal variation at the borders. The corresponding image, J1, does not show any
improved clarity vs. the blurred image, Blurred.

The PSF reconstructed in the second restoration, P2, is very smooth at the edges. This implies that
the restoration can handle a PSF of a smaller size. The corresponding image, J2, shows some
deblurring but it is strongly corrupted by the ringing.

Finally, the PSF reconstructed in the third restoration, P3, is intermediate between P1 and P2. The
array, P3, resembles the true PSF very well. The corresponding image, J3, shows significant
improvement; however it is still corrupted by the ringing.

Step 5: Improving the Restoration

The ringing in the restored image, J3, occurs along the areas of sharp intensity contrast and along
the image borders. This example shows how to reduce the ringing effect by specifying a weighting
function. The algorithm weights each pixel according to the WEIGHT array while restoring the image
and the PSF. In our example, we start by finding the "sharp" pixels using the edge function. By trial
and error, we determine that a desirable threshold level is 0.08.

WEIGHT = edge(Blurred,"sobel",.08);

To widen the area, we use imdilate and pass in a structuring element, se.

se = strel("disk",2);
WEIGHT = 1-double(imdilate(WEIGHT,se));

9-49
9 Image Deblurring

The pixels close to the borders are also assigned the value 0.

WEIGHT([1:3 end-(0:2)],:) = 0;
WEIGHT(:,[1:3 end-(0:2)]) = 0;
figure
imshow(WEIGHT)
title("Weight Array")

The image is restored by calling deconvblind with the WEIGHT array and an increased number of
iterations (30). Almost all the ringing is suppressed.

[J,P] = deconvblind(Blurred,INITPSF,30,[],WEIGHT);
imshow(J)
title("Deblurred Image")

9-50
Deblurring Images Using the Blind Deconvolution Algorithm

Step 6: Using Additional Constraints on the PSF Restoration

The example shows how you can specify additional constraints on the PSF. The function, FUN, below
returns a modified PSF array which deconvblind uses for the next iteration.

In this example, FUN modifies the PSF by cropping it by P1 and P2 number of pixels in each
dimension, and then padding the array back to its original size with zeros. This operation does not
change the values in the center of the PSF, but effectively reduces the PSF size by 2*P1 and 2*P2
pixels.

P1 = 2;
P2 = 2;
FUN = @(PSF) padarray(PSF(P1+1:end-P1,P2+1:end-P2),[P1 P2]);

The anonymous function, FUN, is passed into deconvblind last. See the section “Parameterizing
Functions”, in the MATLAB Mathematics documentation, for information about providing additional
parameters to the function FUN.

In this example, the size of the initial PSF, OVERPSF, is 4 pixels larger than the true PSF. Setting P1 =
2 and P2 = 2 as parameters in FUN effectively makes the valuable space in OVERPSF the same size
as the true PSF. Therefore, the outcome, JF and PF, is similar to the result of deconvolution with the
right sized PSF and no FUN call, J and P, from step 4.

[JF,PF] = deconvblind(Blurred,OVERPSF,30,[],WEIGHT,FUN);
imshow(JF)
title("Deblurred Image")

9-51
9 Image Deblurring

If we had used the oversized initial PSF, OVERPSF, without the constraining function, FUN, the
resulting image would be similar to the unsatisfactory result, J2, achieved in Step 3.

Note that any unspecified parameters before FUN can be omitted, such as DAMPAR and READOUT in
this example, without requiring a place holder, ([]).

See Also
deconvlucy | deconvblind

More About
• “Image Deblurring” on page 9-2
• “Adapt Blind Deconvolution for Various Image Distortions” on page 9-37

9-52
Create Your Own Deblurring Functions

Create Your Own Deblurring Functions


All the toolbox deblurring functions perform deconvolution in the frequency domain, where the
process becomes a simple matrix multiplication. To work in the frequency domain, the deblurring
functions must convert the PSF you provide into an optical transfer function (OTF), using the
psf2otf function. The toolbox also provides a function to convert an OTF into a PSF, otf2psf. The
toolbox makes these functions available in case you want to create your own deblurring functions.

To aid this conversion between PSFs and OTFs, use the padding function padarray.

See Also
deconvblind

More About
• “Image Deblurring” on page 9-2

9-53
9 Image Deblurring

Avoid Ringing in Deblurred Images


The discrete Fourier transform (DFT), used by the deblurring functions, assumes that the frequency
pattern of an image is periodic. This assumption creates a high-frequency drop-off at the edges of
images. In the figure, the shaded area represents the actual extent of the image; the unshaded area
represents the assumed periodicity.

This high-frequency drop-off can create an effect called boundary related ringing in deblurred
images. In this figure, note the horizontal and vertical patterns in the image.

To avoid ringing, use the edgetaper function to preprocess your images before passing them to the
deblurring functions. The edgetaper function removes the high-frequency drop-off at the edge of an
image by blurring the entire image and then replacing the center pixels of the blurred image with the
original image. In this way, the edges of the image taper off to a lower frequency.

See Also
deconvwnr | deconvreg | deconvlucy | deconvblind

More About
• “Image Deblurring” on page 9-2
• “Fourier Transform” on page 10-2

9-54
10

Transforms

The usual mathematical representation of an image is a function of two spatial variables: f(x,y). The
value of the function at a particular location (x,y) represents the intensity of the image at that point.
This is called the spatial domain. The term transform refers to an alternative mathematical
representation of an image. For example, the Fourier transform is a representation of an image as a
sum of complex exponentials of varying magnitudes, frequencies, and phases. This is called the
frequency domain. Transforms are useful for a wide range of purposes, including convolution,
enhancement, feature detection, and compression.

This chapter defines several important transforms and shows examples of their application to image
processing.

• “Fourier Transform” on page 10-2


• “Discrete Cosine Transform” on page 10-12
• “Hough Transform” on page 10-16
• “Radon Transform” on page 10-21
• “Detect Lines Using Radon Transform” on page 10-27
• “The Inverse Radon Transformation” on page 10-32
• “Fan-Beam Projection” on page 10-37
• “Reconstructing an Image from Projection Data” on page 10-44
10 Transforms

Fourier Transform
In this section...
“Definition of Fourier Transform” on page 10-2
“Discrete Fourier Transform” on page 10-5
“Applications of the Fourier Transform” on page 10-8

Definition of Fourier Transform


The Fourier transform is a representation of an image as a sum of complex exponentials of varying
magnitudes, frequencies, and phases. The Fourier transform plays a critical role in a broad range of
image processing applications, including enhancement, analysis, restoration, and compression.

If f(m,n) is a function of two discrete spatial variables m and n, then the two-dimensional Fourier
transform of f(m,n) is defined by the relationship
∞ ∞
− jω1m − jω2n
F(ω1, ω2) = ∑ ∑ f (m, n)e e .
m = −∞n = −∞

The variables ω1 and ω2 are frequency variables; their units are radians per sample. F(ω1,ω2) is often
called the frequency-domain representation of f(m,n). F(ω1,ω2) is a complex-valued function that is
periodic both in ω1 and ω2, with period 2π. Because of the periodicity, usually only the range
−π ≤ ω1, ω2 ≤ π is displayed. Note that F(0,0) is the sum of all the values of f(m,n). For this reason,
F(0,0) is often called the constant component or DC component of the Fourier transform. (DC stands
for direct current; it is an electrical engineering term that refers to a constant-voltage power source,
as opposed to a power source whose voltage varies sinusoidally.)

The inverse of a transform is an operation that when performed on a transformed image produces the
original image. The inverse two-dimensional Fourier transform is given by

∫ ∫
1 π π jω1m jω2n
f (m, n) = ω1 = − π ω2 = − π
F(ω1, ω2)e e dω1dω2 .
4π2

Roughly speaking, this equation means that f(m,n) can be represented as a sum of an infinite number
of complex exponentials (sinusoids) with different frequencies. The magnitude and phase of the
contribution at the frequencies (ω1,ω2) are given by F(ω1,ω2).

Visualizing the Fourier Transform

To illustrate, consider a function f(m,n) that equals 1 within a rectangular region and 0 everywhere
else. To simplify the diagram, f(m,n) is shown as a continuous function, even though the variables m
and n are discrete.

10-2
Fourier Transform

Rectangular Function

The following figure shows, as a mesh plot, the magnitude of the Fourier transform,

F(ω1, ω2) ,

of the rectangular function shown in the preceding figure. The mesh plot of the magnitude is a
common way to visualize the Fourier transform.

Magnitude Image of a Rectangular Function

The peak at the center of the plot is F(0,0), which is the sum of all the values in f(m,n). The plot also
shows that F(ω1,ω2) has more energy at high horizontal frequencies than at high vertical frequencies.
This reflects the fact that horizontal cross sections of f(m,n) are narrow pulses, while vertical cross
sections are broad pulses. Narrow pulses have more high-frequency content than broad pulses.

Another common way to visualize the Fourier transform is to display

log F(ω1, ω2)

as an image, as shown.

10-3
10 Transforms

Log of the Fourier Transform of a Rectangular Function

Using the logarithm helps to bring out details of the Fourier transform in regions where F(ω1,ω2) is
very close to 0.

Examples of the Fourier transform for other simple shapes are shown below.

10-4
Fourier Transform

Fourier Transforms of Some Simple Shapes

Discrete Fourier Transform


Working with the Fourier transform on a computer usually involves a form of the transform known as
the discrete Fourier transform (DFT). A discrete transform is a transform whose input and output
values are discrete samples, making it convenient for computer manipulation. There are two principal
reasons for using this form of the transform:

• The input and output of the DFT are both discrete, which makes it convenient for computer
manipulations.
• There is a fast algorithm for computing the DFT known as the fast Fourier transform (FFT).

The DFT is usually defined for a discrete function f(m,n) that is nonzero only over the finite region
0 ≤ m ≤ M − 1 and 0 ≤ n ≤ N − 1. The two-dimensional M-by-N DFT and inverse M-by-N DFT
relationships are given by

10-5
10 Transforms

M−1N−1 p = 0, 1, ..., M − 1
F(p, q) = ∑ ∑ f (m, n)e− j2πpm/Me− j2πqn/N
q = 0, 1, ..., N − 1
m=0n=0

and

M−1N−1 m = 0, 1, ..., M − 1
1
f (m, n) =
MN ∑ ∑ F(p, q)e j2πpm/Me j2πqn/N
n = 0, 1, ..., N − 1
p=0 q=0

The values F(p,q) are the DFT coefficients of f(m,n). The zero-frequency coefficient, F(0,0), is often
called the "DC component." DC is an electrical engineering term that stands for direct current. (Note
that matrix indices in MATLAB always start at 1 rather than 0; therefore, the matrix elements f(1,1)
and F(1,1) correspond to the mathematical quantities f(0,0) and F(0,0), respectively.)

The MATLAB functions fft, fft2, and fftn implement the fast Fourier transform algorithm for
computing the one-dimensional DFT, two-dimensional DFT, and N-dimensional DFT, respectively. The
functions ifft, ifft2, and ifftn compute the inverse DFT.

Relationship to the Fourier Transform

The DFT coefficients F(p,q) are samples of the Fourier transform F(ω1,ω2).

ω1 = 2πp/M p = 0, 1, ..., M − 1
F p, q = F ω1, ω2
ω2 = 2πq/N q = 0, 1, ..., N − 1

Visualizing the Discrete Fourier Transform

1 Construct a matrix f that is similar to the function f(m,n) in the example in “Definition of Fourier
Transform” on page 10-2. Remember that f(m,n) is equal to 1 within the rectangular region and 0
elsewhere. Use a binary image to represent f(m,n).

f = zeros(30,30);
f(5:24,13:17) = 1;
imshow(f,"InitialMagnification","fit")

2 Compute and visualize the 30-by-30 DFT of f with these commands.

F = fft2(f);
F2 = log(abs(F));
imshow(F2,[-1 5],"InitialMagnification","fit");
colormap(jet); colorbar

10-6
Fourier Transform

Discrete Fourier Transform Computed Without Padding

This plot differs from the Fourier transform displayed in “Visualizing the Fourier Transform” on
page 10-2. First, the sampling of the Fourier transform is much coarser. Second, the zero-
frequency coefficient is displayed in the upper left corner instead of the traditional location in the
center.
3 To obtain a finer sampling of the Fourier transform, add zero padding to f when computing its
DFT. The zero padding and DFT computation can be performed in a single step with this
command.

F = fft2(f,256,256);

This command zero-pads f to be 256-by-256 before computing the DFT.

imshow(log(abs(F)),[-1 5]); colormap(jet); colorbar

Discrete Fourier Transform Computed with Padding


4 The zero-frequency coefficient, however, is still displayed in the upper left corner rather than the
center. You can fix this problem by using the function fftshift, which swaps the quadrants of F
so that the zero-frequency coefficient is in the center.

F = fft2(f,256,256);F2 = fftshift(F);
imshow(log(abs(F2)),[-1 5]); colormap(jet); colorbar

10-7
10 Transforms

The resulting plot is identical to the one shown in “Visualizing the Fourier Transform” on page
10-2.

Applications of the Fourier Transform


This section presents a few of the many image processing-related applications of the Fourier
transform.

Frequency Response of Linear Filters

The Fourier transform of the impulse response of a linear filter gives the frequency response of the
filter. The function freqz2 computes and displays a filter's frequency response. The frequency
response of the Gaussian convolution kernel shows that this filter passes low frequencies and
attenuates high frequencies.

h = fspecial("gaussian");
freqz2(h)

Frequency Response of a Gaussian Filter

See “Design Linear Filters in the Frequency Domain” on page 8-107 for more information about
linear filtering, filter design, and frequency responses.

Perform Fast Convolution Using the Fourier Transform

This example shows how to perform fast convolution of two matrices using the Fourier transform. A
key property of the Fourier transform is that the multiplication of two Fourier transforms corresponds
to the convolution of the associated spatial functions. This property, together with the fast Fourier
transform, forms the basis for a fast convolution algorithm.

Note: The FFT-based convolution method is most often used for large inputs. For small inputs it is
generally faster to use the imfilter function.

10-8
Fourier Transform

Create two simple matrices, A and B. A is an M-by-N matrix and B is a P-by-Q matrix.

A = magic(3);
B = ones(3);

Zero-pad A and B so that they are at least (M+P-1)-by-(N+Q-1). (Often A and B are zero-padded to a
size that is a power of 2 because fft2 is fastest for these sizes.) The example pads the matrices to be
8-by-8.

A(8,8) = 0;
B(8,8) = 0;

Compute the two-dimensional DFT of A and B using the fft2 function. Multiply the two DFTs
together and compute the inverse two-dimensional DFT of the result using the ifft2 function.

C = ifft2(fft2(A).*fft2(B));

Extract the nonzero portion of the result and remove the imaginary part caused by roundoff error.

C = C(1:5,1:5);
C = real(C)

C = 5×5

8.0000 9.0000 15.0000 7.0000 6.0000


11.0000 17.0000 30.0000 19.0000 13.0000
15.0000 30.0000 45.0000 30.0000 15.0000
7.0000 21.0000 30.0000 23.0000 9.0000
4.0000 13.0000 15.0000 11.0000 2.0000

Perform FFT-Based Correlation to Locate Image Features

This example shows how to use the Fourier transform to perform correlation, which is closely related
to convolution. Correlation can be used to locate features within an image. In this context, correlation
is often called template matching.

Read a sample image into the workspace.

bw = imread('text.png');

Create a template for matching by extracting the letter "a" from the image. Note that you can also
create the template by using the interactive syntax of the imcrop function.

a = bw(32:45,88:98);

Compute the correlation of the template image with the original image by rotating the template
image by 180 degrees and then using the FFT-based convolution technique. (Convolution is
equivalent to correlation if you rotate the convolution kernel by 180 degrees.) To match the template
to the image, use the fft2 and ifft2 functions. In the resulting image, bright peaks correspond to
occurrences of the letter.

C = real(ifft2(fft2(bw) .* fft2(rot90(a,2),256,256)));
figure
imshow(C,[]) % Scale image to appropriate display range.

10-9
10 Transforms

To view the locations of the template in the image, find the maximum pixel value and then define a
threshold value that is less than this maximum. The thresholded image shows the locations of these
peaks as white spots in the thresholded correlation image. (To make the locations easier to see in this
figure, the example dilates the thresholded image to enlarge the size of the points.)

max(C(:))

ans = 68.0000

thresh = 60; % Use a threshold that's a little less than max.


D = C > thresh;
se = strel('disk',5);
E = imdilate(D,se);
figure
imshow(E) % Display pixels with values over the threshold.

10-10
Fourier Transform

See Also
fft2 | ifft2 | freqz2

Related Examples
• “Design Linear Filters in the Frequency Domain” on page 8-107

10-11
10 Transforms

Discrete Cosine Transform

In this section...
“DCT Definition” on page 10-12
“The DCT Transform Matrix” on page 10-13
“Image Compression with the Discrete Cosine Transform” on page 10-13

DCT Definition
The discrete cosine transform (DCT) represents an image as a sum of sinusoids of varying
magnitudes and frequencies. The dct2 function computes the two-dimensional discrete cosine
transform (DCT) of an image. The DCT has the property that, for a typical image, most of the visually
significant information about the image is concentrated in just a few coefficients of the DCT. For this
reason, the DCT is often used in image compression applications. For example, the DCT is at the
heart of the international standard lossy image compression algorithm known as JPEG. (The name
comes from the working group that developed the standard: the Joint Photographic Experts Group.)

The two-dimensional DCT of an M-by-N matrix A is defined as follows.

M−1N−1
π 2m + 1 p π 2n + 1 q 0 ≤ p ≤ M − 1
Bpq = αpαq ∑ ∑ Amncos
2M
cos
2N
,
0≤q≤ N−1
m=0n=0

1/ M, p = 0 1/ N, q = 0
αp = αq =
2/M, 1 ≤ p ≤ M − 1 2/N, 1 ≤ q ≤ N − 1

The values Bpq are called the DCT coefficients of A. (Note that matrix indices in MATLAB always start
at 1 rather than 0; therefore, the MATLAB matrix elements A(1,1) and B(1,1) correspond to the
mathematical quantities A00 and B00, respectively.)

The DCT is an invertible transform, and its inverse is given by

M−1N−1
π 2m + 1 p π 2n + 1 q 0 ≤ m ≤ M − 1
Amn = ∑ ∑ αpαqBpqcos
2M
cos
2N
,
0≤n≤ N−1
p=0 q=0

1/ M, p = 0 1/ N, q = 0
αp = αq =
2/M, 1 ≤ p ≤ M − 1 2/N, 1 ≤ q ≤ N − 1

The inverse DCT equation can be interpreted as meaning that any M-by-N matrix A can be written as
a sum of MN functions of the form

π(2m + 1)p π(2n + 1)q 0≤p≤ M−1


αpαqcos cos ,
2M 2N 0≤q≤ N−1

These functions are called the basis functions of the DCT. The DCT coefficients Bpq, then, can be
regarded as the weights applied to each basis function. For 8-by-8 matrices, the 64 basis functions
are illustrated by this image.

10-12
Discrete Cosine Transform

The 64 Basis Functions of an 8-by-8 Matrix

Horizontal frequencies increase from left to right, and vertical frequencies increase from top to
bottom. The constant-valued basis function at the upper left is often called the DC basis function, and
the corresponding DCT coefficient B00 is often called the DC coefficient.

The DCT Transform Matrix


There are two ways to compute the DCT using Image Processing Toolbox software. The first method
is to use the dct2 function. dct2 uses an FFT-based algorithm for speedy computation with large
inputs. The second method is to use the DCT transform matrix, which is returned by the function
dctmtx and might be more efficient for small square inputs, such as 8-by-8 or 16-by-16. The M-by-M
transform matrix T is given by

1
M p = 0, 0≤q≤ M−1
Tpq =
2 π 2q + 1 p 1 ≤ p ≤ M − 1, 0 ≤ q ≤ M − 1
cos
M 2M

For an M-by-M matrix A, T*A is an M-by-M matrix whose columns contain the one-dimensional DCT of
the columns of A. The two-dimensional DCT of A can be computed as B=T*A*T'. Since T is a real
orthonormal matrix, its inverse is the same as its transpose. Therefore, the inverse two-dimensional
DCT of B is given by T'*B*T.

Image Compression with the Discrete Cosine Transform

This example shows how to compress an image using the Discrete Cosine Transform (DCT). The
example computes the two-dimensional DCT of 8-by-8 blocks in an input image, discards (sets to zero)
all but 10 of the 64 DCT coefficients in each block, and then reconstructs the image using the two-
dimensional inverse DCT of each block. The example uses the transform matrix computation method.

DCT is used in the JPEG image compression algorithm. The input image is divided into 8-by-8 or 16-
by-16 blocks, and the two-dimensional DCT is computed for each block. The DCT coefficients are then
quantized, coded, and transmitted. The JPEG receiver (or JPEG file reader) decodes the quantized
DCT coefficients, computes the inverse two-dimensional DCT of each block, and then puts the blocks

10-13
10 Transforms

back together into a single image. For typical images, many of the DCT coefficients have values close
to zero. These coefficients can be discarded without seriously affecting the quality of the
reconstructed image.

Read an image into the workspace and convert it to class double.

I = imread('cameraman.tif');
I = im2double(I);

Compute the two-dimensional DCT of 8-by-8 blocks in the image. The function dctmtx returns the N-
by-N DCT transform matrix.

T = dctmtx(8);
dct = @(block_struct) T * block_struct.data * T';
B = blockproc(I,[8 8],dct);

Discard all but 10 of the 64 DCT coefficients in each block.

mask = [1 1 1 1 0 0 0 0
1 1 1 0 0 0 0 0
1 1 0 0 0 0 0 0
1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0];
B2 = blockproc(B,[8 8],@(block_struct) mask .* block_struct.data);

Reconstruct the image using the two-dimensional inverse DCT of each block.

invdct = @(block_struct) T' * block_struct.data * T;


I2 = blockproc(B2,[8 8],invdct);

Display the original image and the reconstructed image, side-by-side. Although there is some loss of
quality in the reconstructed image, it is clearly recognizable, even though almost 85% of the DCT
coefficients were discarded.

imshow(I)

10-14
Discrete Cosine Transform

figure
imshow(I2)

10-15
10 Transforms

Hough Transform
The Image Processing Toolbox supports functions that enable you to use the Hough transform to
detect lines in an image.

The hough function implements the Standard Hough Transform (SHT). The Hough transform is
designed to detect lines, using the parametric representation of a line:

rho = x*cos(theta) + y*sin(theta)

The variable rho is the distance from the origin to the line along a vector perpendicular to the line.
theta is the angle between the x-axis and this vector. The hough function generates a parameter
space matrix whose rows and columns correspond to these rho and theta values, respectively.

After you compute the Hough transform, you can use the houghpeaks function to find peak values in
the parameter space. These peaks represent potential lines in the input image.

After you identify the peaks in the Hough transform, you can use the houghlines function to find the
endpoints of the line segments corresponding to peaks in the Hough transform. This function
automatically fills in small gaps in the line segments.

Detect Lines in Images Using Hough

This example shows how to detect lines in an image using the Hough transform.

Read an image into the workspace and, to make this example more illustrative, rotate the image.
Display the image.

I = imread('circuit.tif');
rotI = imrotate(I,33,'crop');
imshow(rotI)

10-16
Hough Transform

Find the edges in the image using the edge function.

BW = edge(rotI,'canny');
imshow(BW);

10-17
10 Transforms

Compute the Hough transform of the binary image returned by edge.

[H,theta,rho] = hough(BW);

Display the transform, H, returned by the hough function.

figure
imshow(imadjust(rescale(H)),[],...
'XData',theta,...
'YData',rho,...
'InitialMagnification','fit');
xlabel('\theta (degrees)')
ylabel('\rho')
axis on
axis normal
hold on
colormap(gca,hot)

Find the peaks in the Hough transform matrix, H, using the houghpeaks function.

P = houghpeaks(H,5,'threshold',ceil(0.3*max(H(:))));

Superimpose a plot on the image of the transform that identifies the peaks.

x = theta(P(:,2));
y = rho(P(:,1));
plot(x,y,'s','color','black');

10-18
Hough Transform

Find lines in the image using the houghlines function.

lines = houghlines(BW,theta,rho,P,'FillGap',5,'MinLength',7);

Create a plot that displays the original image with the lines superimposed on it.

figure, imshow(rotI), hold on


max_len = 0;
for k = 1:length(lines)
xy = [lines(k).point1; lines(k).point2];
plot(xy(:,1),xy(:,2),'LineWidth',2,'Color','green');

% Plot beginnings and ends of lines


plot(xy(1,1),xy(1,2),'x','LineWidth',2,'Color','yellow');
plot(xy(2,1),xy(2,2),'x','LineWidth',2,'Color','red');

% Determine the endpoints of the longest line segment


len = norm(lines(k).point1 - lines(k).point2);
if ( len > max_len)
max_len = len;
xy_long = xy;
end
end
% highlight the longest line segment
plot(xy_long(:,1),xy_long(:,2),'LineWidth',2,'Color','red');

10-19
10 Transforms

10-20
Radon Transform

Radon Transform

Note For information about creating projection data from line integrals along paths that radiate from
a single source, called fan-beam projections, see “Fan-Beam Projection” on page 10-37. To convert
parallel-beam projection data to fan-beam projection data, use the para2fan function.

The radon function computes projections of an image matrix along specified directions.

A projection of a two-dimensional function f(x,y) is a set of line integrals. The radon function
computes the line integrals from multiple sources along parallel paths, or beams, in a certain
direction. The beams are spaced 1 pixel unit apart. To represent an image, the radon function takes
multiple, parallel-beam projections of the image from different angles by rotating the source around
the center of the image. The following figure shows a single projection at a specified rotation angle.

Parallel-Beam Projection at Rotation Angle Theta

For example, the line integral of f(x,y) in the vertical direction is the projection of f(x,y) onto the x-
axis; the line integral in the horizontal direction is the projection of f(x,y) onto the y-axis. The
following figure shows horizontal and vertical projections for a simple two-dimensional function.

10-21
10 Transforms

Horizontal and Vertical Projections of a Simple Function

Projections can be computed along any angle theta (θ). In general, the Radon transform of f(x,y) is
the line integral of f parallel to the y´-axis



Rθ x′ = f x′cosθ − y′sinθ, x′sinθ + y′cosθ dy′
−∞

where

x′ cosθ sinθ x
=
y′ −sinθ cosθ y

The following figure illustrates the geometry of the Radon transform.

10-22
Radon Transform

Geometry of the Radon Transform

Plot the Radon Transform of an Image

This example shows how to compute the Radon transform of an image for a specific set of rotations
angles using the radon function.

Create a small sample image that consists of a single square object, and display the image.

I = zeros(100,100);
I(25:75,25:75) = 1;
imshow(I)

10-23
10 Transforms

Calculate the Radon transform of the image for the angles 0° and 30°. The function returns, R, in
which the columns contain the Radon transform for each angle in theta. The function also returns
the vector, xp, which contains the corresponding coordinates along the x-axis. The center pixel of I is
defined to be floor((size(I)+1)/2), which is the pixel on the x-axis corresponding to x' = 0.

theta = [0 30];
[R,xp] = radon(I,theta);

Plot the transform at 0°.

figure
plot(xp,R(:,1))
title("Radon Transform of Square Function at 0 Degrees")

Plot the transform at 30°.

plot(xp,R(:,2));
title("Radon Transform of Square Function at 30 Degrees")

10-24
Radon Transform

The Radon transform is often calculated for a large number of angles and displayed as an image.
Calculate the Radon transform for the square image at angles from 0° to 180°, in 1° increments.

theta = 0:180;
[R,xp] = radon(I,theta);

Display the 2-D Radon transform as a sinogram.

figure
imagesc(theta,xp,R)
title("R_{\theta} (X\prime)")
xlabel("\theta (degrees)")
ylabel("X\prime")
set(gca,"XTick",0:20:180)
colormap(hot)
colorbar

10-25
10 Transforms

10-26
Detect Lines Using Radon Transform

Detect Lines Using Radon Transform

This example shows how to use the Radon transform to detect lines in an image. The Radon
transform is closely related to a common computer vision operation known as the Hough transform.
You can use the radon function to implement a form of the Hough transform used to detect straight
lines.

Compute the Radon Transform of an Image

Read an image into the workspace. Convert it into a grayscale image.

I = fitsread("solarspectra.fts");
I = rescale(I);

Display the original image.

figure
imshow(I)
title("Original Image")

Compute a binary edge image using the edge function. Display the binary image returned by the
edge function.

BW = edge(I);
figure
imshow(BW)
title("Edges of Original Image")

10-27
10 Transforms

Calculate the Radon transform of the image, using the radon function. The locations of peaks in the
transform correspond to the locations of straight lines in the original image.

theta = 0:179;
[R,xp] = radon(BW,theta);

Display the result of the Radon transform.

figure
imagesc(theta,xp,R)
colormap(hot)
xlabel("\theta (degrees)")
ylabel("x^{\prime} (pixels from center)")
title("R_{\theta} (x^{\prime})")
colorbar

10-28
Detect Lines Using Radon Transform

Interpret the Peaks of the Radon Transform

Calculate the θ and x' offset values for the five largest peaks. The xp_peak_offset values represent
the offset of the peak from the center of the image, in pixels.

R_sort = sort(unique(R),"descend");

[row_peak,col_peak] = find(ismember(R,R_sort(1:5)));
xp_peak_offset = xp(row_peak);
theta_peak = theta(col_peak);

Add an x marker at the center of the original image. The row indices of an image map to the y-
direction, and the columns map to the x-direction, so calculate centerX as half the number of
columns and centerY as half the number of rows in the image I.

centerX = ceil(size(I,2)/2);
centerY = ceil(size(I,1)/2);

figure
imshow(I)
hold on
scatter(centerX,centerY,50,"bx",LineWidth=2)

10-29
10 Transforms

There are three strong peaks with θ = 1 degree and offsets of –80, –84, and –87 pixels from center.
Plot the radial line that passes through the center at an angle of θ = 1 degree as a red dashed line.
Represent the Radon as solid red lines that are perpendicular to the dashed line, and shifted –80, –84,
and –87 pixels from the center, which positions them to the left.

[x1,y1] = pol2cart(deg2rad(1),5000);
plot([centerX-x1 centerX+x1],[centerY+y1 centerY-y1],"r--",LineWidth=2)

[x91,y91] = pol2cart(deg2rad(91),100);
for i=1:3
plot([centerX-x91+xp_peak_offset(i) centerX+x91+xp_peak_offset(i)], ...
[centerY+y91 centerY-y91], ...
"r",LineWidth=2)
end

There are also two strong peaks at θ = 91 degrees, with offsets of –8 and –44 pixels from center. Plot
the radial line that passes through the center at an angle of θ = 91 degrees as a green dashed line.
Represent the Radon peaks as solid green lines that are perpendicular to the dashed line, and shifted
–8 and –44 pixels from center, which positions them down.

10-30
Detect Lines Using Radon Transform

plot([centerX-x91 centerX+x91],[centerY+y91 centerY-y91],"g--",LineWidth=2)

for i=4:5
plot([centerX-x1 centerX+x1], ...
[centerY+y1-xp_peak_offset(i) centerY-y1-xp_peak_offset(i)], ...
"g",LineWidth=2)
end

See Also
radon | edge | pol2cart

More About
• “Radon Transform” on page 10-21

10-31
10 Transforms

The Inverse Radon Transformation


In this section...
“Inverse Radon Transform Definition” on page 10-32
“Reconstruct an Image from Parallel Projection Data” on page 10-34

Inverse Radon Transform Definition


The iradon function inverts the Radon transform and can therefore be used to reconstruct images.

As described in “Radon Transform” on page 10-21, given an image I and a set of angles theta, the
radon function can be used to calculate the Radon transform.

R = radon(I,theta);

The function iradon can then be called to reconstruct the image I from projection data.

IR = iradon(R,theta);

In the example above, projections are calculated from the original image I.

Note, however, that in most application areas, there is no original image from which projections are
formed. For example, the inverse Radon transform is commonly used in tomography applications. In
X-ray absorption tomography, projections are formed by measuring the attenuation of radiation that
passes through a physical specimen at different angles. The original image can be thought of as a
cross section through the specimen, in which intensity values represent the density of the specimen.
Projections are collected using special purpose hardware, and then an internal image of the specimen
is reconstructed by iradon. This allows for noninvasive imaging of the inside of a living body or
another opaque object.

iradon reconstructs an image from parallel-beam projections. In parallel-beam geometry, each


projection is formed by combining a set of line integrals through an image at a specific angle.

The following figure illustrates how parallel-beam geometry is applied in X-ray absorption
tomography. Note that there is an equal number of n emitters and n sensors. Each sensor measures
the radiation emitted from its corresponding emitter, and the attenuation in the radiation gives a
measure of the integrated density, or mass, of the object. This corresponds to the line integral that is
calculated in the Radon transform.

The parallel-beam geometry used in the figure is the same as the geometry that was described in
“Radon Transform” on page 10-21. f(x,y) denotes the brightness of the image and Rθ(x′) is the
projection at angle theta.

10-32
The Inverse Radon Transformation

Parallel-Beam Projections Through an Object

Another geometry that is commonly used is fan-beam geometry, in which there is one source and n
sensors. For more information, see “Fan-Beam Projection” on page 10-37. To convert parallel-beam
projection data into fan-beam projection data, use the para2fan function.

Improving the Results

iradon uses the filtered back projection algorithm to compute the inverse Radon transform. This
algorithm forms an approximation of the image I based on the projections in the columns of R. A
more accurate result can be obtained by using more projections in the reconstruction. As the number
of projections (the length of theta) increases, the reconstructed image IR more accurately
approximates the original image I. The vector theta must contain monotonically increasing angular
values with a constant incremental angle Dtheta. When the scalar Dtheta is known, it can be
passed to iradon instead of the array of theta values. Here is an example.
IR = iradon(R,Dtheta);

The filtered back projection algorithm filters the projections in R and then reconstructs the image
using the filtered projections. In some cases, noise can be present in the projections. To remove high
frequency noise, apply a window to the filter to attenuate the noise. Many such windowed filters are
available in iradon. The example call to iradon below applies a Hamming window to the filter. See
the iradon reference page for more information. To get unfiltered back projection data, specify
"none" for the filter parameter.
IR = iradon(R,theta,"Hamming");

10-33
10 Transforms

iradon also enables you to specify a normalized frequency, D, above which the filter has zero
response. D must be a scalar in the range [0,1]. With this option, the frequency axis is rescaled so that
the whole filter is compressed to fit into the frequency range [0,D]. This can be useful in cases
where the projections contain little high-frequency information but there is high-frequency noise. In
this case, the noise can be completely suppressed without compromising the reconstruction. The
following call to iradon sets a normalized frequency value of 0.85.
IR = iradon(R,theta,0.85);

Reconstruct an Image from Parallel Projection Data

This example shows how to reconstruct an image from parallel projection data, and how more
projection angles can improve the quality of the reconstructed image.

Create a Shepp-Logan head phantom image using the phantom function. The phantom image
illustrates many of the qualities that are found in real-world tomographic imaging of human heads.
The bright elliptical shell along the exterior is analogous to a skull, and the many ellipses inside are
analogous to brain features.
P = phantom(256);
imshow(P)

Compute the Radon transform of the phantom brain for three different sets of theta values. R1 has 18
projections, R2 has 36 projections, and R3 has 90 projections.
theta1 = 0:10:170; [R1,xp] = radon(P,theta1);
theta2 = 0:5:175; [R2,xp] = radon(P,theta2);
theta3 = 0:2:178; [R3,xp] = radon(P,theta3);

Plot the Radon transforms of the Shepp-Logan head phantom with 90 projections. Note how some of
the features of the input image appear in this image of the transform. The first column in the Radon

10-34
The Inverse Radon Transformation

transform corresponds to a projection at 0º that is integrating in the vertical direction. The


centermost column corresponds to a projection at 90º, which is integrating in the horizontal
direction. The projection at 90º has a wider profile than the projection at 0º due to the larger vertical
semi-axis of the outermost ellipse of the phantom.

figure
imagesc(theta3,xp,R3)
colormap(hot); colorbar
xlabel('\theta'); ylabel('x\prime')
title("Radon Transform of Head Phantom Using 90 Projections")

Reconstruct three head phantom images from the three sets of projection data.

I1 = iradon(R1,10);
I2 = iradon(R2,5);
I3 = iradon(R3,2);

Display the results in a montage. Notice that image I1 on the left, which was reconstructed from only
18 projections, is the least accurate reconstruction. Image I2 in the center, which was reconstructed
from 36 projections, is better, but it is still not clear enough to discern clearly the small ellipses in the
lower portion of the image. Image I3 on the right, which was reconstructed using 90 projections,
most closely resembles the original image. Notice that when the number of projections is relatively
small (as in I1 and I2), the reconstruction can include some artifacts from the back projection.

montage({I1,I2,I3},Size=[1 3])

10-35
10 Transforms

10-36
Fan-Beam Projection

Fan-Beam Projection

In this section...
“Image Reconstruction from Fan-Beam Projection Data” on page 10-39
“Reconstruct Image using Inverse Fanbeam Projection” on page 10-40

Note For information about creating projection data from line integrals along parallel paths, see
“Radon Transform” on page 10-21. To convert fan-beam projection data to parallel-beam projection
data, use the fan2para function.

The fanbeam function computes projections of an image matrix along specified directions. A
projection of a two-dimensional function f(x,y) is a set of line integrals. The fanbeam function
computes the line integrals along paths that radiate from a single source, forming a fan shape. To
represent an image, the fanbeam function takes multiple projections of the image from different
angles by rotating the source around the center of the image. The following figure shows a single fan-
beam projection at a specified rotation angle.

Fan-Beam Projection at Rotation Angle Theta

When you compute fan-beam projection data using the fanbeam function, you specify as arguments
an image and the distance between the vertex of the fan-beam projections and the center of rotation
(the center pixel in the image). The fanbeam function determines the number of beams, based on the
size of the image and the values of specified name-value arguments.

The FanSensorGeometry name-value argument specifies how sensors are aligned: "arc" or
"line".

Fan Sensor Geometry Description


"arc" fanbeam positions the sensors along an arc, spacing the sensors at 1
degree intervals. Use the FanSensorSpacing parameter to control the
distance between sensors by specifying the angle between each beam.
This is the default fan sensor geometry.

10-37
10 Transforms

Fan Sensor Geometry Description


"line" fanbeam positions sensors along a straight line, rather than an arc. Use
the FanSensorSpacing parameter to specify the distance between the
sensors, in pixels, along the x´ axis.

The FanRotationIncrement parameter specifies the rotation angle increment. By default, fanbeam
takes projections at different angles by rotating the source around the center pixel at 1 degree
intervals.

The following figures illustrate both these geometries. The first figure illustrates geometry used by
the fanbeam function when FanSensorGeometry is set to "arc" (the default). Note how you specify
the distance between sensors by specifying the angular spacing of the beams.

Fan-Beam Projection with Arc Geometry

The following figure illustrates the geometry used by the fanbeam function when
FanSensorGeometry is set to "line". In this figure, note how you specify the position of the
sensors by specifying the distance between them in pixels along the x´ axis.

10-38
Fan-Beam Projection

Fan-Beam Projection with Line Geometry

Image Reconstruction from Fan-Beam Projection Data


To reconstruct an image from fan-beam projection data, use the ifanbeam function. With this
function, you specify as arguments the projection data and the distance between the vertex of the
fan-beam projections and the center of rotation when the projection data was created. For example,
this code recreates the image I from the projection data P and distance D.

I = ifanbeam(P,D);

By default, the ifanbeam function assumes that the fan-beam projection data was created using the
arc fan sensor geometry, with beams spaced at 1 degree angles and projections taken at 1 degree
increments over a full 360 degree range. As with the fanbeam function, you can use ifanbeam
name-value arguments to specify other values for these characteristics of the projection data. Use the
same values for these name-value arguments that were used when the projection data was created.
For more information about these parameters, see ifanbeam.

10-39
10 Transforms

The ifanbeam function converts the fan-beam projection data to parallel-beam projection data with
the fan2para function, and then calls the iradon function to perform the image reconstruction. For
this reason, the ifanbeam function supports certain iradon parameters, which it passes to the
iradon function. See “The Inverse Radon Transformation” on page 10-32 for more information about
the iradon function.

Reconstruct Image using Inverse Fanbeam Projection

This example shows how to use fanbeam and ifanbeam to form projections from a sample image and
then reconstruct the image from the projections.

Create a test image of the Shepp-Logan head phantom using the phantom function. The phantom
image illustrates many of the qualities that are found in real-world tomographic imaging of human
heads.

P = phantom(256);
imshow(P)

Compute fan-beam projection data of the test image, using the FanSensorSpacing name-value
argument to vary the sensor spacing. The example uses the fanbeam arc geometry, so you specify the
spacing between sensors by specifying the angular spacing of the beams. The first call spaces the
beams at 2 degrees; the second at 1 degree; and the third at 0.25 degrees. In each call, the distance
between the center of rotation and vertex of the projections is constant at 250 pixels. In addition,
fanbeam rotates the projection around the center pixel at 1 degree increments.

D = 250;

dsensor1 = 2;
F1 = fanbeam(P,D,"FanSensorSpacing",dsensor1);

10-40
Fan-Beam Projection

dsensor2 = 1;
F2 = fanbeam(P,D,"FanSensorSpacing",dsensor2);

dsensor3 = 0.25;
[F3,sensor_pos3,rot_angles3] = fanbeam(P,D,"FanSensorSpacing",dsensor3);

Plot the projection data F3. Because fanbeam calculates projection data at rotation angles from 0 to
360 degrees, the same patterns occur at an offset of 180 degrees. The same features are being
sampled from both sides.
figure
imagesc(rot_angles3,sensor_pos3,F3)
colormap(hot); colorbar
xlabel("Fan Rotation Angle (degrees)")
ylabel("Fan Sensor Position (degrees)")

Reconstruct the image from the fan-beam projection data using ifanbeam. In each reconstruction,
match the fan sensor spacing with the spacing used when the projection data was created previously.
The example uses the OutputSize name-value argument to constrain the output size of each
reconstruction to be the same as the size of the original image P. In the output, note how the quality
of the reconstruction gets better as the number of beams in the projection increases. The first image,
Ifan1, was created using 2 degree spacing of the beams; the second image, Ifan2, was created
using 1 degree spacing of the beams; the third image, Ifan3, was created using 0.25 spacing of the
beams.
output_size = max(size(P));

10-41
10 Transforms

Ifan1 = ifanbeam(F1,D, ...


"FanSensorSpacing",dsensor1,"OutputSize",output_size);
figure, imshow(Ifan1)
title("Ifan1")

Ifan2 = ifanbeam(F2,D, ...


"FanSensorSpacing",dsensor2,"OutputSize",output_size);
figure, imshow(Ifan2)
title("Ifan2")

10-42
Fan-Beam Projection

Ifan3 = ifanbeam(F3,D, ...


"FanSensorSpacing",dsensor3,"OutputSize",output_size);
figure, imshow(Ifan3)
title("Ifan3")

10-43
10 Transforms

Reconstructing an Image from Projection Data

This example shows how to form parallel-beam and fan-beam projections from a head phantom
image, and how to reconstruct the image using radon and fan-beam transforms.

The radon and iradon functions use a parallel-beam geometry for the projections, whereas the
fanbeam and ifanbeam use a fan-beam geometry. To compare parallel-beam and fan-beam
geometries, the examples below create synthetic projections for each geometry and then use those
synthetic projections to reconstruct the original image.

A real-world application that requires image reconstruction is X-ray absorption tomography where
projections are formed by measuring the attenuation of radiation that passes through a physical
specimen at different angles. The original image can be thought of as a cross section through the
specimen in which intensity values represent the density of the specimen. Projections are collected by
special medical imaging devices and then an internal image of the specimen is reconstructed using
iradon or ifanbeam.

The function iradon reconstructs an image from parallel-beam projections. In parallel-beam


geometry, each projection is formed by combining a set of line integrals through an image at a
specific angle. The function ifanbeam reconstructs an image from fan-beam projections, which have
one emitter and multiple sensors.

Create Head Phantom

The test image is the Shepp-Logan head phantom which can be generated using the function
phantom. The phantom image illustrates many qualities that are found in real-world tomographic
imaging of human heads. The bright elliptical shell along the exterior is analogous to a skull and the
many ellipses inside are analogous to brain features or tumors.

P = phantom(256);
imshow(P)

10-44
Reconstructing an Image from Projection Data

Parallel Beam - Calculate Synthetic Projections

Calculate synthetic projections using parallel-beam geometry and vary the number of projection
angles. For each of these calls to radon, the output is a matrix in which each column is the Radon
transform for one of the angles in the corresponding theta.
theta1 = 0:10:170;
[R1,~] = radon(P,theta1);
num_angles_R1 = size(R1,2)

num_angles_R1 = 18

theta2 = 0:5:175;
[R2,~] = radon(P,theta2);
num_angles_R2 = size(R2,2)

num_angles_R2 = 36

theta3 = 0:2:178;
[R3,xp] = radon(P,theta3);
num_angles_R3 = size(R3,2)

num_angles_R3 = 90

Note that for each angle, the projection is computed at N points along the xp-axis, where N is a
constant that depends on the diagonal distance of the image such that every pixel will be projected
for all possible projection angles.
N_R1 = size(R1,1)

N_R1 = 367

N_R2 = size(R2,1)

N_R2 = 367

10-45
10 Transforms

N_R3 = size(R3,1)

N_R3 = 367

So, if you use a smaller head phantom, the projection needs to be computed at fewer points along the
xp-axis.

P_128 = phantom(128);
[R_128,xp_128] = radon(P_128,theta1);
N_128 = size(R_128,1)

N_128 = 185

Display the projection data R3. Some of the features of the original phantom image are visible in the
image of R3. The first column of R3 corresponds to a projection at 0 degrees, which is integrating in
the vertical direction. The centermost column corresponds to a projection at 90 degrees, which is
integrating in the horizontal directions. The projection at 90 degrees has a wider profile than the
projection at 0 degrees due to the large vertical semi-axis of the outermost ellipse of the phantom.

imagesc(theta3,xp,R3)
colormap(hot)
colorbar
xlabel('Parallel Rotation Angle - \theta (degrees)');
ylabel('Parallel Sensor Position - x\prime (pixels)');

Parallel Beam - Reconstruct Head Phantom from Projection Data

Match the parallel rotation-increment, dtheta, in each reconstruction with that used above to create
the corresponding synthetic projections. In a real-world case, you would know the geometry of your
transmitters and sensors, but not the source image, P.

The following three reconstructions (I1, I2, and I3) show the effect of varying the number of angles
at which projections are made. For I1 and I2 some features that were visible in the original phantom

10-46
Reconstructing an Image from Projection Data

are not clear. Specifically, look at the three ellipses at the bottom of each image. The result in I3
closely resembles the original image, P.

Notice the significant artifacts present in I1 and I2. To avoid these artifacts, use a larger number of
angles.

% Constrain the output size of each reconstruction to be


% the same as the size of the original image, |P|.
output_size = max(size(P));

dtheta1 = theta1(2) - theta1(1);


I1 = iradon(R1,dtheta1,output_size);

dtheta2 = theta2(2) - theta2(1);


I2 = iradon(R2,dtheta2,output_size);

dtheta3 = theta3(2) - theta3(1);


I3 = iradon(R3,dtheta3,output_size);

figure
montage({I1,I2,I3},'Size',[1 3])
title(['Reconstruction from Parallel Beam Projection ' ...
'with 18, 24, and 90 Projection Angles'])

Fan Beam - Calculate Synthetic Projections

Calculate synthetic projections using fan-beam geometry and vary the 'FanSensorSpacing'.

D = 250;
dsensor1 = 2;
F1 = fanbeam(P,D,'FanSensorSpacing',dsensor1);

dsensor2 = 1;
F2 = fanbeam(P,D,'FanSensorSpacing',dsensor2);

dsensor3 = 0.25;
[F3,sensor_pos3,fan_rot_angles3] = fanbeam(P,D, ...
'FanSensorSpacing',dsensor3);

10-47
10 Transforms

Display the projection data F3. Notice that the fan rotation angles range from 0 to 360 degrees and
the same patterns occur at an offset of 180 degrees because the same features are being sampled
from both sides. You can correlate features in this image of fan-beam projections with the same
features in the image of parallel-beam projections, above.

imagesc(fan_rot_angles3,sensor_pos3,F3)
colormap(hot)
colorbar
xlabel('Fan Rotation Angle (degrees)')
ylabel('Fan Sensor Position (degrees)')

Fan Beam - Reconstruct Head Phantom from Projection Data

Match the fan-sensor-spacing in each reconstruction with that used to create each of the synthetic
projections. In a real-world case, you would know the geometry of your transmitters and sensors, but
not the source image, P.

Changing the value of the 'FanSensorSpacing' effectively changes the number of sensors used at
each rotation angle. For each of these fan-beam reconstructions, the same rotation angles are used.
This is in contrast to the parallel-beam reconstructions which each used different rotation angles.

Note that 'FanSensorSpacing' is only one parameter of several that you can control when using
fanbeam and ifanbeam. You can also convert back and forth between parallel- and fan-beam
projection data using the functions fan2para and para2fan.

Ifan1 = ifanbeam(F1,D,'FanSensorSpacing',dsensor1, ...


'OutputSize',output_size);
Ifan2 = ifanbeam(F2,D,'FanSensorSpacing',dsensor2, ...
'OutputSize',output_size);
Ifan3 = ifanbeam(F3,D,'FanSensorSpacing',dsensor3, ...
'OutputSize',output_size);

montage({Ifan1,Ifan2,Ifan3},'Size',[1 3])
title(['Reconstruction from Fan Beam Projection with '...
'18, 24, and 90 Projection Angles'])

10-48
Reconstructing an Image from Projection Data

10-49
11

Morphological Operations

This chapter describes the Image Processing Toolbox morphological functions. You can use these
functions to perform common image processing tasks, such as contrast enhancement, noise removal,
thinning, skeletonization, filling, and segmentation.

• “Types of Morphological Operations” on page 11-2


• “Structuring Elements” on page 11-9
• “Border Padding for Morphology” on page 11-13
• “Morphological Reconstruction” on page 11-14
• “Find Image Peaks and Valleys” on page 11-21
• “Pixel Connectivity” on page 11-27
• “Lookup Table Operations” on page 11-30
• “Dilate an Image to Enlarge a Shape” on page 11-32
• “Remove Thin Lines Using Erosion” on page 11-36
• “Use Morphological Opening to Extract Large Image Features” on page 11-38
• “Flood-Fill Operations” on page 11-42
• “Detect Cell Using Edge Detection and Morphology” on page 11-45
• “Granulometry of Snowflakes” on page 11-50
• “Distance Transform of a Binary Image” on page 11-55
• “Label and Measure Connected Components in a Binary Image” on page 11-57
11 Morphological Operations

Types of Morphological Operations


In this section...
“Morphological Dilation and Erosion” on page 11-2
“Operations Based on Dilation and Erosion” on page 11-4

Morphology is a broad set of image processing operations that process images based on shapes.
Morphological operations apply a structuring element to an input image, creating an output image of
the same size. In a morphological operation, the value of each pixel in the output image is based on a
comparison of the corresponding pixel in the input image with its neighbors.

Morphological Dilation and Erosion


The most basic morphological operations are dilation and erosion. Dilation adds pixels to the
boundaries of objects in an image, while erosion removes pixels on object boundaries. The number of
pixels added or removed from the objects in an image depends on the size and shape of the
structuring element used to process the image. In the morphological dilation and erosion operations,
the state of any given pixel in the output image is determined by applying a rule to the corresponding
pixel and its neighbors in the input image. The rule used to process the pixels defines the operation
as a dilation or an erosion. This table lists the rules for both dilation and erosion.

11-2
Types of Morphological Operations

Rules for Dilation and Erosion

Operation Rule Example (Original and Processed Image)


Dilation The value of the output pixel is the
maximum value of all pixels in the
neighborhood. In a binary image, a
pixel is set to 1 if any of the
neighboring pixels have the value 1.

Morphological dilation makes objects


more visible and fills in small holes in
objects. Lines appear thicker, and
filled shapes appear larger.

Erosion The value of the output pixel is the


minimum value of all pixels in the
neighborhood. In a binary image, a
pixel is set to 0 if any of the
neighboring pixels have the value 0.

Morphological erosion removes


floating pixels and thin lines so that
only substantive objects remain.
Remaining lines appear thinner and
shapes appear smaller.

The following figure illustrates the dilation of a binary image. The structuring element defines the
neighborhood of the pixel of interest, which is circled. The dilation function applies the appropriate
rule to the pixels in the neighborhood and assigns a value to the corresponding pixel in the output
image. In the figure, the morphological dilation function sets the value of the output pixel to 1
because one of the elements in the neighborhood defined by the structuring element is on. For more
information, see “Structuring Elements” on page 11-9.

11-3
11 Morphological Operations

Morphological Dilation of a Binary Image

The following figure illustrates this processing for a grayscale image. The dilation function applies
the rule to the neighborhood of the circled pixel of interest. The value of the corresponding pixel in
the output image is assigned as the highest value among all neighborhood pixels. In the figure, the
value of the output pixel is 16 because it is the highest value in the neighborhood defined by the
structuring element.

Morphological Dilation of a Grayscale Image

Operations Based on Dilation and Erosion


Dilation and erosion are often used in combination to implement image processing operations. For
example, the definition of a morphological opening of an image is an erosion followed by a dilation,
using the same structuring element for both operations. You can combine dilation and erosion to
remove small objects from an image and smooth the border of large objects.

This table lists functions in the toolbox that perform common morphological operations that are
based on dilation and erosion.

11-4
Types of Morphological Operations

Function Morphological Definition Example (Original and Processed Image)


imopen Perform morphological opening. The
opening operation erodes an image
and then dilates the eroded image,
using the same structuring element
for both operations.

Morphological opening is useful for


removing small objects and thin lines
from an image while preserving the
shape and size of larger objects in
the image. For an example, see “Use
Morphological Opening to Extract
Large Image Features” on page 11-
38.

imclose Perform morphological closing. The


closing operation dilates an image
and then erodes the dilated image,
using the same structuring element
for both operations.

Morphological closing is useful for


filling small holes in an image while
preserving the shape and size of
large holes and objects in the image.

11-5
11 Morphological Operations

Function Morphological Definition Example (Original and Processed Image)


bwskel Skeletonize objects in a binary
image. The process of skeletonization
erodes all objects to centerlines
without changing the essential
structure of the objects, such as the
existence of holes and branches.

bwperim Find perimeter of objects in a binary


image. A pixel is part of the
perimeter if it is nonzero and it is
connected to at least one zero-valued
pixel. Therefore, edges of interior
holes are considered part of the
object perimeter.

11-6
Types of Morphological Operations

Function Morphological Definition Example (Original and Processed Image)


bwhitmiss Perform binary hit-miss transform.
The hit-miss transform preserves
pixels in a binary image whose
neighborhoods match the shape of
one structuring element and do not
match the shape of a second disjoint
structuring element.

The hit-miss transforms can be used


to detect patterns in an image.

This example uses one structuring element with a


neighborhood above and to the right of center,
and a second structuring element with a
neighborhood below and to the left of center. The
transform preserves pixels with neighbors only to
the top and right.
imtophat Perform a morphological top-hat
transform. The top-hat transform
opens an image, then subtracts the
opened image from the original
image.

The top-hat transform can be used to


enhance contrast in a grayscale
image with nonuniform illumination.
The transform can also isolate small
bright objects in an image.

11-7
11 Morphological Operations

Function Morphological Definition Example (Original and Processed Image)


imbothat Perform a morphological bottom-hat
transform. The bottom-hat transform
closes an image, then subtracts the
original image from the closed
image.

The bottom-hat transform isolates


pixels that are darker than other
pixels in their neighborhood.
Therefore, the transform can be used
to find intensity troughs in a
grayscale image.

See Also
imclose | imopen | imdilate | imerode

More About
• “Structuring Elements” on page 11-9
• “Pixel Connectivity” on page 11-27
• “Border Padding for Morphology” on page 11-13

External Websites
• Binary Morphology in Image Processing (MathWorks Teaching Resources)

11-8
Structuring Elements

Structuring Elements
An essential part of the morphological dilation and erosion operations is the structuring element used
to probe the input image. A structuring element is a matrix that identifies the pixel in the image being
processed and defines the neighborhood used in the processing of each pixel. You typically choose a
structuring element the same size and shape as the objects you want to process in the input image.
For example, to find lines in an image, create a linear structuring element.

There are two types of structuring elements: flat and nonflat. A flat structuring element is a binary
valued neighborhood, either 2-D or multidimensional, in which the true pixels are included in the
morphological computation, and the false pixels are not. The center pixel of the structuring element,
called the origin, identifies the pixel in the image being processed. Use the strel function to create
a flat structuring element. You can use flat structuring elements with both binary and grayscale
images. The following figure illustrates a flat structuring element.

A nonflat structuring element is a matrix of type double that identifies the pixel in the image being
processed and defines the neighborhood used in the processing of that pixel. A nonflat structuring
element contains finite values used as additive offsets in the morphological computation. The center
pixel of the matrix, called the origin, identifies the pixel in the image that is being processed. Pixels in

11-9
11 Morphological Operations

the neighborhood with the value -Inf are not used in the computation. Use the offsetstrel
function to create a nonflat structuring element. You can use nonflat structuring elements only with
grayscale images.

Determine the Origin of a Structuring Element


The morphological functions use this code to get the coordinates of the origin of structuring elements
of any size and dimension:

origin = floor((size(nhood)+1)/2)

where nhood is the neighborhood defining the structuring element. To see the neighborhood of a flat
structuring element, view the Neighborhood property of the strel object. To see the neighborhood
of a nonflat structuring element, view the Offset property of the offsetstrel object.

For example, the following illustrates the origin of a flat, diamond-shaped structuring element.

11-10
Structuring Elements

Structuring Element Decomposition


To enhance performance, the strel and offsetstrel functions might break structuring elements
into smaller pieces, a technique known as structuring element decomposition.

For example, dilation by an 11-by-11 square structuring element can be accomplished by dilating first
with a 1-by-11 structuring element, and then with an 11-by-1 structuring element. This results in a
theoretical speed improvement of a factor of 5.5, although in practice the actual speed improvement
is somewhat less.

Structuring element decompositions used for the "disk" and "ball" shapes are approximations; all
other decompositions are exact. Decomposition is not used with an arbitrary structuring element
unless it is a flat structuring element whose neighborhood matrix is all 1's.

To see the sequence of structuring elements used in a decomposition, use the decompose method.
Both strel objects and offsetstrel objects support decompose methods. The decompose method
returns an array of the structuring elements that form the decomposition. For example, here are the
structuring elements created in the decomposition of a diamond-shaped structuring element.

SE = strel("diamond",4)

SE =

strel is a diamond shaped structuring element with properties:

Neighborhood: [9x9 logical]


Dimensionality: 2

Call the decompose method. The method returns an array of structuring elements.

decompose(SE)

ans =

3x1 strel array with properties:

Neighborhood
Dimensionality

See Also
strel | offsetstrel

11-11
11 Morphological Operations

More About
• “Types of Morphological Operations” on page 11-2
• “Border Padding for Morphology” on page 11-13

External Websites
• Binary Morphology in Image Processing (MathWorks Teaching Resources)

11-12
Border Padding for Morphology

Border Padding for Morphology


Morphological functions position the origin of the structuring element, its center element, over the
pixel of interest in the input image. For pixels at the edge of an image, parts of the neighborhood
defined by the structuring element can extend past the border of the image.

To process border pixels, the morphological functions assign a value to these undefined pixels, as if
the functions had padded the image with additional rows and columns. The value of these padding
pixels varies for dilation and erosion operations. The table describes the padding rules for dilation
and erosion for both binary and grayscale images.

Rules for Padding Images

Operation Rule
Dilation Pixels beyond the image border are assigned the minimum value afforded by
the data type.

For binary images, these pixels are assumed to be set to 0. For grayscale
images, the minimum value for uint8 images is 0.
Erosion Pixels beyond the image border are assigned the maximum value afforded by
the data type.

For binary images, these pixels are assumed to be set to 1. For grayscale
images, the maximum value for uint8 images is 255.

Note By using the minimum value for dilation operations and the maximum value for erosion
operations, the toolbox avoids border effects, where regions near the borders of the output image do
not appear to be homogeneous with the rest of the image. For example, if erosion padded with a
minimum value, eroding an image would result in a black border around the edge of the output
image.

See Also
strel | offsetstrel | imdilate | imerode | imclose | imopen

More About
• “Types of Morphological Operations” on page 11-2
• “Structuring Elements” on page 11-9

11-13
11 Morphological Operations

Morphological Reconstruction
You can use morphological reconstruction to extract or enhance marked objects from an image. The
image you want to enhance is the mask image. A second image, the marker image, is used to mark
the regions to extract or emphasize. The peaks of the marker image act as seed pixels that spread out
to fill in the mask image. Conceptually, you can think of this process as repeated dilations of the
marker image.

Note To learn how morphological reconstruction is implemented in the toolbox, see


imreconstruct.

This figure illustrates the concept in 1-D. Each successive dilation is constrained to lie underneath the
mask. The final dilation is the reconstructed image.

Repeated Dilations of Marker Image, Constrained by Mask

If you change the marker image, the reconstructed image is different. In this figure, the right-most
peak is suppressed in the marker image, and is therefore absent from the reconstructed image.

Morphological Reconstruction with Modified Marker Image

Morphological reconstruction is based on morphological dilation, but has these distinguishing


characteristics:

• Processing uses two images, a marker and a mask, rather than one image and a structuring
element.
• Because morphological reconstruction uses pixel connectivity, rather than a structuring element
with a specific shape and size, the process maintains the shape and size of objects from the mask
image. For more information on pixel connectivity, see pixel connectivity on page 11-27.

11-14
Morphological Reconstruction

• Processing continues until the image values stop changing.

You can control the result of a morphological reconstruction operation by modifying the marker
image and the pixel connectivity.

Marker Image and Mask Image


Consider this 2-D grayscale mask image. It contains two primary regions, represented by the blocks
of pixels containing intensity values of 14 and 18. Most of the background pixels have intensity
values of 10, while some have values of 11.

To morphologically reconstruct this image, perform these steps:

1 Create a marker image. Much like a structuring element in dilation and erosion, the
characteristics of the marker image determine the processing performed in morphological
reconstruction. The peaks in the marker image should identify the locations of objects in the
mask image that you want to emphasize.

One way to create a marker image is to subtract a constant value from each pixel of the mask
image. In this example, the marker image is created by subtracting 2 from each pixel of the mask
image.

11-15
11 Morphological Operations

2 Use the imreconstruct function to morphologically reconstruct the image.

This animation shows the incremental changes in image values. At each step, the marker image
is dilated, but the value of each pixel cannot exceed the corresponding value in the mask image.

3 In the final reconstructed image, smaller intensity fluctuations have been removed. Only peaks
whose values are greater than surrounding pixels by at least 2 remain. This difference threshold
corresponds to the value subtracted from the mask image to create the marker image. For more

11-16
Morphological Reconstruction

information about using morphological reconstruction to modify peaks in grayscale images, see
“Suppress Minima and Maxima” on page 11-23.

If you modify the marker image, the reconstructed image changes. You can remove one of the peaks
from the maker image by changing the values of the pixels in the top-left peak from 12 to 8, to match
the background. The new reconstructed image has only one peak, at the location of the
corresponding peak in the marker image.

This animation shows the incremental changes in image values with the modified marker image.

11-17
11 Morphological Operations

Influence of Pixel Connectivity


During morphological reconstruction, peaks in the marker image incrementally spread to neighboring
pixels. The specified pixel connectivity determines the boundaries of the neighborhood.

Consider this binary image of two diagonally adjacent squares. Based on an 8-connected
neighborhood (the default value for the imreconstruct function), the squares are a single
foreground object. Based on a 4-connected neighborhood, the squares are two separate foreground
objects.

11-18
Morphological Reconstruction

This example performs morphological reconstruction using a marker image with one single-pixel peak
in the lower-right square. This is equivalent to the operation performed by the bwselect function.

If you specify an 8-connected neighborhood, at each step the marker image peak pixels dilate to 8-
connected pixels, limited to the values of the mask image. The image stops changing after three
steps. The final reconstructed image includes both squares from the mask image.

If you perform morphological reconstruction with a 4-connected neighborhood, at each step the
marker image peak pixels dilate to 4-connected pixels, limited to the values of the mask image. The
image stops changing after two steps. The final reconstructed image includes only the lower-right
square from the mask image.

This animation shows the incremental changes in the binary image with a pixel connectivity of eight
versus a connectivity of four.

11-19
11 Morphological Operations

Applications of Morphological Reconstruction


You can use the imreconstruct function to perform general morphological reconstruction to
extract, emphasize, or suppress regions in a mask image. Image Processing Toolbox also contains
functions that use specialized morphological reconstruction algorithms to accomplish specific image
processing tasks.

• The imfill function suppresses dark regions to fill holes and perform flood-fill operations. For
more information, see “Flood-Fill Operations” on page 11-42.
• The imclearborder function suppresses light areas connected to the image border, which is
useful for removing objects that are partially cut off at the edges of an image. Analogously, the
imkeepborder function retains light areas connected to the image border and suppresses the
rest.
• The imextendedmax, imextendedmin, imhmax, imhmin, and imimposemin functions
emphasize or suppress certain regional minima and maxima. For more information, see “Find
Image Peaks and Valleys” on page 11-21.

See Also
imreconstruct

More About
• “Pixel Connectivity” on page 11-27
• “Flood-Fill Operations” on page 11-42
• “Find Image Peaks and Valleys” on page 11-21

11-20
Find Image Peaks and Valleys

Find Image Peaks and Valleys


You can think of a grayscale image as a three-dimensional object, with the x- and y-axes representing
pixel positions and the z-axis representing the intensity of each pixel. In this interpretation, the
intensity values represent elevations, as in a topographical map. The areas of high intensity and low
intensity in an image, respectively peaks and valleys in topographical terms, can be important
morphological features because they often mark relevant image objects.

For example, in an image of several spherical objects, points of high intensity can represent the tops
of the objects. Using morphological processing, you can use these maxima to identify objects in an
image.

Global and Regional Minima and Maxima


This table defines the terminology used to describe peaks and valleys in images.

Term Definition
Global Maximum Highest regional maximum in the image.
Global Minimum Lowest regional minimum in the image.
Regional Maxima Each of the regional maxima in the image is a connected set of
pixels in which all pixels have the same intensity value, t,
surrounded by pixels that all have an intensity value less than t.
Regional Minima Each of the regional minima in the image is a connected set of
pixels in which all pixels have the same intensity value, t,
surrounded by pixels that all have an intensity value greater than
t.

An image can have multiple regional minima or maxima, but typically has only one global minimum
and maximum. However, if multiple regional minima or maxima share the same extreme value, then
an image can have multiple global minima or maxima.

This figure illustrates the concept of global and regional minima and maxima in 1-D.

Find Areas of High or Low Intensity


The toolbox includes functions that you can use to find areas of high or low intensity in an image:

11-21
11 Morphological Operations

• The imregionalmax and imregionalmin functions identify all regional minima or maxima.
• The imextendedmax and imextendedmin functions identify regional minima or maxima that are
greater than or less than a specified threshold.

The functions accept a grayscale image as input and return a binary image as output. In the output
binary image, the regional minima or maxima are set to 1, while all other pixels are set to 0.

For example, this image A contains two primary regional maxima, the blocks of pixels containing the
values 14 and 18. It also contains several smaller maxima, with values of 11.

The binary image returned by imregionalmax pinpoints all these regional maxima.

B = imregionalmax(A)

If you want to identify only the areas of the image where the change in intensity is greater than or
less than a certain threshold, use the imextendedmax and imextendedmin functions. For example,
to find only those regional maxima in the sample image A that are at least two units higher than their
neighbors, enter this command.

B = imextendedmax(A,2)

11-22
Find Image Peaks and Valleys

Suppress Minima and Maxima


In an image, every small fluctuation in intensity represents a regional minimum or maximum. If you
are interested in only significant minima or maxima, and not in these smaller minima and maxima
caused by background texture, you can remove only the less significant minima and maxima but
retain the significant minima and maxima by using the imhmax and imhmin functions. Use these
functions to specify a threshold level, h, that suppresses all maxima with height less than h or minima
with height greater than h.

Note The imregionalmin, imregionalmax, imextendedmin, and imextendedmax functions


return a binary image that indicates the locations of the regional minima and maxima in an image.
The imhmax and imhmin functions return an altered image of the same size and data type as the
input image.

For example, consider again image A.

To eliminate all regional maxima except the two significant maxima, use imhmax with a threshold
value of 2. Note that imhmax affects only the maxima without changing any of the other pixel values.
The two significant maxima remain, but their heights are reduced.

B = imhmax(A,2)

11-23
11 Morphological Operations

Notice that, in the original image, the second row has one significant regional maximum and two
smaller regional maxima. The imhmax function reduces the value of each maximum by 2, retaining
only the maximum with an adjusted value greater than that of the surrounding pixels. This figure
illustrates the process in 1-D.

Impose a Minimum
You can emphasize specific minima in an image by using the imimposemin function. The
imimposemin function uses morphological reconstruction to eliminate all minima from the image
except the minima you specify.

Consider an image that contains two primary regional minima and several other regional minima.

To obtain an image that emphasizes the two deepest minima and removes all others, create a marker
image that pinpoints the two minima of interest. You can create the marker image by explicitly
setting certain pixels to specific values, or by using other morphological functions to extract the
features you want to emphasize in the mask image.

This example uses imextendedmin to get a binary image that shows the locations of the two deepest
minima.

11-24
Find Image Peaks and Valleys

marker = imextendedmin(mask,1)

Use imimposemin to create new minima in the mask image at the points specified by the marker
image. Note how imimposemin sets the values of pixels specified by the marker image to the lowest
value supported by the data type (0 for uint8 values). imimposemin also changes the values of all
the other pixels in the image to eliminate the other minima.

I = imimposemin(mask,marker)

I =
11 11 11 11 11 11 11 11 11 11
11 8 8 8 11 11 11 11 11 11
11 8 0 8 11 11 11 11 11 11
11 8 8 8 11 11 11 11 11 11
11 11 11 11 11 11 11 11 11 11
11 11 11 11 11 0 0 0 11 11
11 11 11 11 11 0 0 0 11 11
11 11 11 11 11 0 0 0 11 11
11 11 11 11 11 11 11 11 11 11
11 11 11 11 11 11 11 11 11 11

This figure illustrates in 1-D how imimposemin changes the profile of the third row of the image.
After minima imposition, the final image profile has one minimum at the location of the marker image
profile minimum.

11-25
11 Morphological Operations

See Also
imregionalmin | imregionalmax | imextendedmax | imextendedmin | imhmax | imhmin |
imimposemin

More About
• “Morphological Reconstruction” on page 11-14

11-26
Pixel Connectivity

Pixel Connectivity
Pixel connectivity defines which other pixels each pixel is connected to. A connected group of
foreground pixels in a binary image is called an object or a connected component.

This table lists all the standard two- and three-dimensional connectivities supported by the toolbox.

Value Meaning
Two-Dimensional Connectivities
4 Pixels are connected if their edges
touch. Two adjoining pixels are part of
the same object if they are both on and
are connected along the horizontal or
vertical direction.

Current pixel is shown in gray.


8 Pixels are connected if their edges or
corners touch. Two adjoining pixels are
part of the same object if they are both
on and are connected along the
horizontal, vertical, or diagonal
direction.
Current pixel is shown in gray.
Three-Dimensional Connectivities
6 Pixels are connected if their faces
touch. Two adjoining pixels are part of
the same object if they are both on and
are connected in:

• One of these directions: in, out, left,


right, up, and down
Current pixel is center of cube.
18 Pixels are connected if their faces or
edges touch. Two adjoining pixels are
part of the same object if they are both
on and are connected in:

• One of these directions: in, out, left,


right, up, and down
• A combination of two directions, Current pixel is center of cube.
such as right-down or in-up

11-27
11 Morphological Operations

Value Meaning
26 Pixels are connected if their faces,
edges, or corners touch. Two adjoining
pixels are part of the same object if
they are both on and are connected in:

• One of these directions: in, out, left,


right, up, and down
• A combination of two directions, Current pixel is center of cube.
such as right-down or in-up
• A combination of three directions,
such as in-right-up or in-left-down

Choosing a Connectivity
The type of neighborhood you choose affects the number of objects found in an image and the
boundaries of those objects. For this reason, the results of many morphology operations can differ
depending upon the type of connectivity you specify.

For example, if you specify a 4-connected neighborhood, this binary image contains two objects. If
you specify an 8-connected neighborhood, the image contains only one object.

0 0 0 0 0 0
0 1 1 0 0 0
0 1 1 0 0 0
0 0 0 1 1 0
0 0 0 1 1 0

Specifying Custom Connectivities


You can also define custom neighborhoods by specifying a 3-by-3-by-...-by-3 array of 0s and 1s. The 1-
valued elements define the connectivity of the neighborhood relative to the center element.

For example, this array defines a North/South connectivity, which you can use to break up an image
into independent columns.

CONN = [ 0 1 0; 0 1 0; 0 1 0 ]

CONN =
0 1 0
0 1 0
0 1 0

Note Connectivity arrays must be symmetric about their center element. Also, you can use a 2-D
connectivity array with a 3-D image; the connectivity affects each plane in the 3-D image.

See Also
conndef | iptcheckconn | bwconncomp | imfill | bwareaopen | boundarymask

11-28
Pixel Connectivity

More About
• “Morphological Reconstruction” on page 11-14

11-29
11 Morphological Operations

Lookup Table Operations


In this section...
“Creating a Lookup Table” on page 11-30
“Using a Lookup Table” on page 11-30

Creating a Lookup Table


Certain binary image operations can be implemented most easily through lookup tables. A lookup
table is a column vector in which each element represents the value to return for one possible
combination of pixels in a neighborhood. To create lookup tables for various operations, use the
makelut function. makelut creates lookup tables for 2-by-2 and 3-by-3 neighborhoods. The following
figure illustrates these types of neighborhoods. Each neighborhood pixel is indicated by an x, and the
center pixel is the one with a circle.

For a 2-by-2 neighborhood, there are 16 possible permutations of the pixels in the neighborhood.
Therefore, the lookup table for this operation is a 16-element vector. For a 3-by-3 neighborhood, there
are 512 permutations, so the lookup table is a 512-element vector.

Note makelut and applylut support only 2-by-2 and 3-by-3 neighborhoods. Lookup tables larger
than 3-by-3 neighborhoods are not practical. For example, a lookup table for a 4-by-4 neighborhood
would have 65,536 entries.

Using a Lookup Table


Once you create a lookup table, you can use it to perform the desired operation by using the
applylut function.

The example below illustrates using lookup table operations to modify an image containing text. The
example creates an anonymous function that returns 1 if three or more pixels in the 3-by-3
neighborhood are 1; otherwise, it returns 0. The example then calls makelut, passing in this function
as the first argument, and using the second argument to specify a 3-by-3 lookup table.

f = @(x) sum(x(:)) >= 3;


lut = makelut(f,3);

lut is returned as a 512-element vector of 1's and 0's. Each value is the output from the function for
one of the 512 possible permutations.

You then perform the operation using applylut.

BW1 = imread("text.png");
BW2 = applylut(BW1,lut);

11-30
Lookup Table Operations

figure
montage({BW1,BW2})

Image Before and After Applying Lookup Table Operation

For information about how applylut maps pixel combinations in the image to entries in the lookup
table, see the reference page for applylut.

11-31
11 Morphological Operations

Dilate an Image to Enlarge a Shape

This example shows how to dilate an image using the imdilate function. The morphological dilation
operation expands or thickens foreground objects in an image.

Create a simple sample binary image containing one foreground object: the square region of 1's in
the middle of the image.
BW = zeros(9,10);
BW(4:6,4:7) = 1

BW = 9×10

0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 1 1 1 1 0 0 0
0 0 0 1 1 1 1 0 0 0
0 0 0 1 1 1 1 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0

imshow(imresize(BW,40,'nearest'))

11-32
Dilate an Image to Enlarge a Shape

Create a structuring element to use with imdilate. To dilate a geometric object, you typically create
a structuring element that is the same shape as the object.

SE = strel('square',3)

SE =
strel is a square shaped structuring element with properties:

Neighborhood: [3x3 logical]


Dimensionality: 2

Dilate the image, passing the input image and the structuring element to imdilate. Note how
dilation adds a rank of 1's to all sides of the foreground object.

BW2 = imdilate(BW,SE)

BW2 = 9×10

0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 1 1 1 1 1 1 0 0
0 0 1 1 1 1 1 1 0 0
0 0 1 1 1 1 1 1 0 0
0 0 1 1 1 1 1 1 0 0
0 0 1 1 1 1 1 1 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0

imshow(imresize(BW2,40,'nearest'))

11-33
11 Morphological Operations

For comparison, create a structuring element that is a different shape. Dilate the original image using
the new structuring element.

SE2 = strel('diamond',1);
BW3 = imdilate(BW,SE2);
imshow(imresize(BW3,40,'nearest'))

11-34
Dilate an Image to Enlarge a Shape

11-35
11 Morphological Operations

Remove Thin Lines Using Erosion

This example shows how to remove thin lines in a binary image by using morphological erosion with a
neighborhood larger than the width of the lines.

Read and display a binary image. The white lines that represent wires are approximately four or five
pixels wide. In some places, the wires are touching and the overall width is closer to ten or eleven
pixels.
BW1 = imread('circbw.tif');
imshow(BW1)

Define a neighborhood larger than the width of the lines. This example uses a disk-shaped structuring
element with a radius of 7 pixels so that the overall neighborhood size is 13-by-13 pixels.
SE = strel("disk",7)

SE =
strel is a disk shaped structuring element with properties:

Neighborhood: [13x13 logical]


Dimensionality: 2

Erode the image, specifying the input image and the structuring element as arguments to the
imerode function.
BW2 = imerode(BW1,SE);

Display the eroded image.

11-36
Remove Thin Lines Using Erosion

imshow(BW2)

11-37
11 Morphological Operations

Use Morphological Opening to Extract Large Image Features

You can use morphological opening to remove small objects from an image while preserving the
shape and size of larger objects in the image.

In this example, you use morphological opening on an image of a circuit board to remove all the
circuit lines from the image. The output image contains only the rectangular shapes of the
microchips.

Open an Image In One Step

You can use the imopen function to perform erosion and dilation in one step.

Read the image into the workspace, and display it.

BW1 = imread('circbw.tif');
figure
imshow(BW1)

Create a structuring element. The structuring element should be large enough to remove the lines
when you erode the image, but not large enough to remove the rectangles. It should consist of all 1s,
so it removes everything but large contiguous patches of foreground pixels.

SE = strel('rectangle',[40 30]);

Open the image.

BW2 = imopen(BW1, SE);


imshow(BW2);

11-38
Use Morphological Opening to Extract Large Image Features

Open an Image By Performing Erosion Then Dilation

You can also perform erosion and dilation sequentially.

Erode the image with the structuring element. This removes all the lines, but also shrinks the
rectangles.

BW3 = imerode(BW1,SE);
imshow(BW3)

11-39
11 Morphological Operations

To restore the rectangles to their original sizes, dilate the eroded image using the same structuring
element, SE.
BW4 = imdilate(BW3,SE);
imshow(BW4)

11-40
Use Morphological Opening to Extract Large Image Features

By performing the operations sequentially, you have the flexibility to change the structuring element.
Create a different structuring element, and dilate the eroded image using the new structuring
element.

SE = strel('diamond',15);
BW5 = imdilate(BW3,SE);
imshow(BW5)

See Also
strel | imopen | imerode | imdilate | imclose

More About
• “Types of Morphological Operations” on page 11-2

11-41
11 Morphological Operations

Flood-Fill Operations
The imfill function performs a flood-fill operation on binary and grayscale images. This operation
can be useful in removing irrelevant artifacts from images.

• For binary images, imfill changes connected background pixels (0s) to foreground pixels (1s),
stopping when it reaches object boundaries.
• For grayscale images, imfill brings the intensity values of dark areas that are surrounded by
lighter areas up to the same intensity level as surrounding pixels. In effect, imfill removes
regional minima that are not connected to the image border. For more information, see “Find
Image Peaks and Valleys” on page 11-21.

Specifying Connectivity
For both binary and grayscale images, the boundary of the fill operation is determined by the pixel
connectivity on page 11-27 that you specify.

Note imfill differs from the other object-based operations in that it operates on background pixels.
When you specify connectivity with imfill, you are specifying the connectivity of the background,
not the foreground.

The implications of connectivity can be illustrated with this matrix.

BW = logical([0 0 0 0 0 0 0 0;
0 1 1 1 1 1 0 0;
0 1 0 0 0 1 0 0;
0 1 0 0 0 1 0 0;
0 1 0 0 0 1 0 0;
0 1 1 1 1 0 0 0;
0 0 0 0 0 0 0 0;
0 0 0 0 0 0 0 0]);

If the background is 4-connected, this binary image contains two separate background elements (the
part inside the loop and the part outside). If the background is 8-connected, the pixels connect
diagonally, and there is only one background element.

Specifying the Starting Point


For binary images, you can specify the starting point of the fill operation by passing in the location
subscript or by using imfill in interactive mode, selecting starting pixels with a mouse.

For example, if you call imfill, specifying the pixel BW(4,3) as the starting point, imfill only fills
the inside of the loop because, by default, the background is 4-connected.

imfill(BW,[4 3])

ans =
0 0 0 0 0 0 0 0
0 1 1 1 1 1 0 0
0 1 1 1 1 1 0 0
0 1 1 1 1 1 0 0
0 1 1 1 1 1 0 0

11-42
Flood-Fill Operations

0 1 1 1 1 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0

If you specify the same starting point, but use an 8-connected background connectivity, imfill fills
the entire image.

imfill(BW,[4 3],8)

ans =
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1

Filling Holes
A common use of the flood-fill operation is to fill holes in images. For example, suppose you have an
image, binary or grayscale, in which the foreground objects represent spheres. In the image, these
objects should appear as disks, but instead are doughnut shaped because of reflections in the original
photograph. Before doing any further processing of the image, you might want to first fill in the
"doughnut holes" using imfill.

Because the use of flood-fill to fill holes is so common, imfill includes a special syntax to support it
for both binary and grayscale images. In this syntax, you just specify the argument 'holes'; you do
not have to specify starting locations in each hole.

To illustrate, this example fills holes in a grayscale image of a spinal column.

[X,map] = imread('spine.tif');
I = ind2gray(X,map);
Ifill = imfill(I,'holes');
figure
montage({I,Ifill})

11-43
11 Morphological Operations

See Also
imfill

Related Examples
• “Detect and Measure Circular Objects in an Image” on page 13-36
• “Detect Cell Using Edge Detection and Morphology” on page 11-45

More About
• “Morphological Reconstruction” on page 11-14

11-44
Detect Cell Using Edge Detection and Morphology

Detect Cell Using Edge Detection and Morphology

This example shows how to detect a cell using edge detection and basic morphology. An object can be
easily detected in an image if the object has sufficient contrast from the background.

Step 1: Read Image

Read in the cell.tif image, which is an image of a prostate cancer cell. Two cells are present in
this image, but only one cell can be seen in its entirety. The goal is to detect, or segment, the cell that
is completely visible.
I = imread('cell.tif');
imshow(I)
title('Original Image');
text(size(I,2),size(I,1)+15, ...
'Image courtesy of Alan Partin', ...
'FontSize',7,'HorizontalAlignment','right');
text(size(I,2),size(I,1)+25, ....
'Johns Hopkins University', ...
'FontSize',7,'HorizontalAlignment','right');

Step 2: Detect Entire Cell

The object to be segmented differs greatly in contrast from the background image. Changes in
contrast can be detected by operators that calculate the gradient of an image. To create a binary
mask containing the segmented cell, calculate the gradient image and apply a threshold.

Use edge and the Sobel operator to calculate the threshold value. Tune the threshold value and use
edge again to obtain a binary mask that contains the segmented cell.
[~,threshold] = edge(I,'sobel');
fudgeFactor = 0.5;
BWs = edge(I,'sobel',threshold * fudgeFactor);

Display the resulting binary gradient mask.


imshow(BWs)
title('Binary Gradient Mask')

11-45
11 Morphological Operations

Step 3: Dilate the Image

The binary gradient mask shows lines of high contrast in the image. These lines do not quite
delineate the outline of the object of interest. Compared to the original image, there are gaps in the
lines surrounding the object in the gradient mask. These linear gaps will disappear if the Sobel image
is dilated using linear structuring elements. Create two perpendicular linear structuring elements by
using strel function.
se90 = strel('line',3,90);
se0 = strel('line',3,0);

Dilate the binary gradient mask using the vertical structuring element followed by the horizontal
structuring element. The imdilate function dilates the image.
BWsdil = imdilate(BWs,[se90 se0]);
imshow(BWsdil)
title('Dilated Gradient Mask')

Step 4: Fill Interior Gaps

The dilated gradient mask shows the outline of the cell quite nicely, but there are still holes in the
interior of the cell. To fill these holes, use the imfill function.

11-46
Detect Cell Using Edge Detection and Morphology

BWdfill = imfill(BWsdil,'holes');
imshow(BWdfill)
title('Binary Image with Filled Holes')

Step 5: Remove Connected Objects on Border

The cell of interest has been successfully segmented, but it is not the only object that has been found.
Any objects that are connected to the border of the image can be removed using the imclearborder
function. To remove diagonal connections, set the connectivity in the imclearborder function to 4.
BWnobord = imclearborder(BWdfill,4);
imshow(BWnobord)
title('Cleared Border Image')

Step 6: Smooth the Object

Finally, in order to make the segmented object look natural, smooth the object by eroding the image
twice with a diamond structuring element. Create the diamond structuring element using the strel
function.
seD = strel('diamond',1);
BWfinal = imerode(BWnobord,seD);

11-47
11 Morphological Operations

BWfinal = imerode(BWfinal,seD);
imshow(BWfinal)
title('Segmented Image');

Step 7: Visualize the Segmentation

You can use the labeloverlay function to display the mask over the original image.

imshow(labeloverlay(I,BWfinal))
title('Mask Over Original Image')

An alternate method to display the segmented object is to draw an outline around the segmented cell.
Draw an outline by using the bwperim function.

BWoutline = bwperim(BWfinal);
Segout = I;
Segout(BWoutline) = 255;
imshow(Segout)
title('Outlined Original Image')

11-48
Detect Cell Using Edge Detection and Morphology

See Also
imfill | imclearborder | edge | imdilate | imerode | bwperim | strel

More About
• “Types of Morphological Operations” on page 11-2

11-49
11 Morphological Operations

Granulometry of Snowflakes

This example shows how to calculate the size distribution of snowflakes in an image by using
granulometry. Granulometry determines the size distribution of objects in an image without explicitly
segmenting (detecting) each object first.

Read Image

Read in the snowflakes.png image, which is a photograph of snowflakes.


I = imread("snowflakes.png");
imshow(I)

Enhance Contrast

Your first step is to maximize the intensity contrast in the image. You can do this using the
adapthisteq function, which performs contrast-limited adaptive histogram equalization. Rescale the
image intensity using the imadjust function so that it fills the data type's entire dynamic range.
claheI = adapthisteq(I,"NumTiles",[10 10]);
claheI = imadjust(claheI);
imshow(claheI)

Determine Volume Under Image Surface in Enhanced Image

Granulometry is analogous to sifting stones through screens of increasing size and collecting what
remains after each pass. Granulometry applies a series of morphological openings of increasing size,

11-50
Granulometry of Snowflakes

and the sum of image pixel values is computed after each opening. For grayscale images, this sum
corresponds to the volume under the surface whose height is determined by the pixel intensity
values. The volume is reduced after each opening as larger objects are eliminated by increasingly
large structuring elements.

Optionally, you can plot the intensity surface of the snowflake image to conceptualize the surface
volume. Adjust the vertical aspect ratio to make the whole surface visible. Surface peaks correspond
to bright areas in the original 2-D image.

figure
surf(claheI, EdgeColor="none")
colormap("gray")
title("Snowflakes Pixel Intensity Surface")
daspect([1 1 15]);

Choose a counter limit so that the intensity surface volume goes to zero as you increase the size of
your structuring element. For display purposes, leave the first entry in the surface area array empty.

radius_range = 0:22;
intensity_volume = zeros(size(radius_range));
for counter = radius_range
remain = imopen(claheI, strel("disk", counter));
intensity_volume(counter + 1) = sum(remain(:));
end
figure
plot(intensity_volume, "m - *")
grid on

11-51
11 Morphological Operations

title("Sum of Pixel Values in Opened Image Versus Radius")


xlabel("Radius of Opening Structuring Element (pixels)")
ylabel("Sum of Pixel Values in Opened Image (intensity)")

Estimate Derivative of Distribution

A significant drop in intensity surface volume between two consecutive openings indicates that the
image contains objects of comparable size to the smaller opening. This is equivalent to the derivative
of the intensity surface volume with respect to radius, which contains the size distribution of the
snowflakes in the image. Estimate the derivative by using the diff function.

intensity_volume_prime = diff(intensity_volume);
figure
plot(intensity_volume_prime, "m - *")
grid on
title("Granulometry (Size Distribution) of Snowflakes")
ax = gca;
ax.XTick = [0 2 4 6 8 10 12 14 16 18 20 22];
xlabel("Radius of Snowflakes (pixels)")
ylabel("Derivative of Sum of Pixel Values (intensity/pixel)")

11-52
Granulometry of Snowflakes

Emphasize Snowflakes Having a Particular Radius

Notice the minima and the radii where they occur in the graph. The minima tell you that snowflakes
in the image have those radii. The more negative the minimum point, the higher the snowflakes'
cumulative intensity at that radius. For example, the most negative minimum point occurs at the 5
pixel radius mark. You can emphasize the snowflakes having a 5 pixel radius with the following steps.

open5 = imopen(claheI,strel("disk",5));
open6 = imopen(claheI,strel("disk",6));
rad5 = open5 - open6;
imshow(rad5,[])

11-53
11 Morphological Operations

See Also
adapthisteq | imadjust | imopen | strel

11-54
Distance Transform of a Binary Image

Distance Transform of a Binary Image


The distance transform provides a metric or measure of the separation of points in the image. The
bwdist function calculates the distance between each pixel that is set to off (0) and the nearest
nonzero pixel for binary images.

The bwdist function supports several distance metrics.

Distance Metrics

Distance Metric Description Illustration


Euclidean The Euclidean distance is the
straight-line distance between two
pixels.

City Block The city block distance metric


measures the path between the
pixels based on a 4-connected
neighborhood. Pixels whose edges
touch are 1 unit apart; pixels
diagonally touching are 2 units
apart.
Chessboard The chessboard distance metric
measures the path between the
pixels based on an 8-connected
neighborhood. Pixels whose edges or
corners touch are 1 unit apart.
Quasi-Euclidean The quasi-Euclidean metric
measures the total Euclidean
distance along a set of horizontal,
vertical, and diagonal line segments.

This example creates a binary image containing two intersecting circular objects.

center1 = -10;
center2 = -center1;
dist = sqrt(2*(2*center1)^2);
radius = dist/2 * 1.4;
lims = [floor(center1-1.2*radius) ceil(center2+1.2*radius)];
[x,y] = meshgrid(lims(1):lims(2));
bw1 = sqrt((x-center1).^2 + (y-center1).^2) <= radius;
bw2 = sqrt((x-center2).^2 + (y-center2).^2) <= radius;
bw = bw1 | bw2;
figure
imshow(bw)

11-55
11 Morphological Operations

To compute the distance transform of the complement of the binary image, use the bwdist function.
In the image of the distance transform, note how the centers of the two circular areas are white.

D = bwdist(~bw);
figure
imshow(D,[])

See Also
bwdist

Related Examples
• “Marker-Controlled Watershed Segmentation” on page 12-27

11-56
Label and Measure Connected Components in a Binary Image

Label and Measure Connected Components in a Binary Image


In this section...
“Detect Connected Components” on page 11-57
“Label Connected Components” on page 11-58
“Select Objects in a Binary Image” on page 11-59
“Measure Properties of Connected Components” on page 11-59

Detect Connected Components


A connected component, or an object, in a binary image is a set of adjacent pixels. Determining which
pixels are adjacent depends on how pixel connectivity is defined. For a two-dimensional image, there
are two standard connectivities:

• 4-connectivity — Pixels are connected if their edges touch. Two adjoining pixels are part of the
same object if they are both on and are connected along the horizontal or vertical direction.
• 8-connectivity — Pixels are connected if their edges or corners touch. Two adjoining pixels are
part of the same object if they are both on and are connected along the horizontal, vertical, or
diagonal direction.

You can also define nonstandard connectivity, or connectivities for higher dimensional images. For
more information, see “Pixel Connectivity” on page 11-27.

This figure shows two identical matrices that represent a binary image. On each matrix is an overlay
highlighting the connected components using 4-connectivity and 8-connectivity, respectively. There
are three connected components using 4-connectivity, but only two connected components using 8-
connectivity.

You can calculate connected components by using the bwconncomp function. In this sample code, BW
is the binary matrix shown in the above figure. For this example, specify a connectivity of 4 so that
two adjoining pixels are part of the same object if they are both on and are connected along the
horizontal or vertical direction. The PixelIdxList field identifies the list of pixels belonging to each
connected component.

BW = zeros(8,8);
BW(2:4,2:3) = 1;
BW(5:7,4:5) = 1;
BW(2,6:8) = 1;
BW(3,7:8) = 1;
BW

11-57
11 Morphological Operations

BW =

0 0 0 0 0 0 0 0
0 1 1 0 0 1 1 1
0 1 1 0 0 0 1 1
0 1 1 0 0 0 0 0
0 0 0 1 1 0 0 0
0 0 0 1 1 0 0 0
0 0 0 1 1 0 0 0
0 0 0 0 0 0 0 0

cc4 = bwconncomp(BW,4)

cc4 =

Connectivity: 4
ImageSize: [8 8]
NumObjects: 3
PixelIdxList: {[6x1 double] [6x1 double] [5x1 double]}

For comparison, calculate the connected components of the same binary image using the default
connectivity, 8.
cc8 = bwconncomp(BW)

cc8 =

Connectivity: 8
ImageSize: [8 8]
NumObjects: 2
PixelIdxList: {[12x1 double] [5x1 double]}

Label Connected Components


Labeling connected component is the process of identifying the connected components in an image
and assigning each one a unique label. The resulting matrix is called a label matrix.

This figure shows the two label matrices that label the connected components using 4-connectivity
and 8-connectivity, respectively.

Create a label matrix by using the labelmatrix function. This sample code continues with the
connected component structure, cc4, defined in the preceding section.
L4 = labelmatrix(cc4)

L4 =

11-58
Label and Measure Connected Components in a Binary Image

8×8 uint8 matrix

0 0 0 0 0 0 0 0
0 1 1 0 0 3 3 3
0 1 1 0 0 0 3 3
0 1 1 0 0 0 0 0
0 0 0 2 2 0 0 0
0 0 0 2 2 0 0 0
0 0 0 2 2 0 0 0
0 0 0 0 0 0 0 0

To visualize connected components, display the label matrix as a pseudo-color image by using the
label2rgb function. The label identifying each object in the label matrix maps to a different color in
the associated colormap. You can specify the colormap, background color, and how objects in the
label matrix map to colors in the colormap.

RGB_label = label2rgb(L4,@copper,"c","shuffle");
imshow(RGB_label)

Select Objects in a Binary Image


You can use the bwselect function to select individual objects in a binary image. Specify pixels in
the input image programmatically or interactively with a mouse. bwselect returns a binary image
that includes only those objects from the input image that contain one of the specified pixels.

For example, use this command to select objects in the image displayed in the current axes.

BW = bwselect;

The cursor changes to cross-hairs when it is over the image. Click the objects you want to select;
bwselect displays a small star over each pixel you click. When you are done, press Return.
bwselect returns a binary image consisting of the objects you selected, and removes the stars.

Measure Properties of Connected Components


The regionprops function can return measurements for several properties of connected
components. Other functions measure a single property. For example, the bwarea function returns
the area of a binary image.

This example uses bwarea to determine the percentage area increase in circbw.tif that results
from a dilation operation. The area is a measure of the size of the foreground of the image and is
roughly equal to the number of on pixels in the image. However, bwarea does not simply count the
number of pixels set to on. Rather, bwarea weights different pixel patterns unequally when
computing the area. This weighting compensates for the distortion that is inherent in representing a

11-59
11 Morphological Operations

continuous image with discrete pixels. For example, a diagonal line of 50 pixels is longer than a
horizontal line of 50 pixels. As a result of the weighting bwarea uses, the horizontal line has area of
50, but the diagonal line has area of 62.5.

BW = imread('circbw.tif');
SE = ones(5);
BW2 = imdilate(BW,SE);
increase = (bwarea(BW2) - bwarea(BW))/bwarea(BW)

increase =

0.3456

See Also
bwconncomp | labelmatrix | label2rgb | bwselect | regionprops

Related Examples
• “Calculate Properties of Image Regions Using Image Region Analyzer” on page 15-33
• “Correct Nonuniform Illumination and Analyze Foreground Objects” on page 1-9

More About
• “Pixel Connectivity” on page 11-27

11-60
12

Image Segmentation

This topic describes a range of techniques and apps that are used to segment images.

• “Color-Based Segmentation Using the L*a*b* Color Space” on page 12-2


• “Color-Based Segmentation Using K-Means Clustering” on page 12-7
• “Plot Land Classification with Color Features and Superpixels” on page 12-13
• “Compute 3-D Superpixels of Input Volumetric Intensity Image” on page 12-16
• “Segment Lungs from 3-D Chest Scan” on page 12-19
• “Marker-Controlled Watershed Segmentation” on page 12-27
• “Segment Image and Create Mask Using Color Thresholder” on page 12-43
• “Acquire Live Images in Color Thresholder” on page 12-55
• “Getting Started with Image Segmenter” on page 12-59
• “Segment Image Using Thresholding in Image Segmenter” on page 12-62
• “Segment Image by Drawing Regions Using Image Segmenter” on page 12-68
• “Segment Image Using Active Contours in Image Segmenter” on page 12-74
• “Refine Segmentation Using Morphology in Image Segmenter” on page 12-80
• “Segment Image Using Graph Cut in Image Segmenter” on page 12-85
• “Segment Image Using Local Graph Cut (Grabcut) in Image Segmenter” on page 12-94
• “Segment Image Using Find Circles in Image Segmenter” on page 12-102
• “Segment Image Using Auto Cluster in Image Segmenter” on page 12-109
• “Use Texture Filtering in Image Segmenter” on page 12-115
• “Create Binary Mask Using Volume Segmenter” on page 12-118
• “Create Semantic Segmentation Using Volume Segmenter” on page 12-130
• “Work with Blocked Images Using Volume Segmenter” on page 12-143
• “Install Sample Data Using Add-On Explorer” on page 12-158
• “Texture Segmentation Using Gabor Filters” on page 12-160
• “Texture Segmentation Using Texture Filters” on page 12-165
12 Image Segmentation

Color-Based Segmentation Using the L*a*b* Color Space

This example shows how to identify different colors in fabric by analyzing the L*a*b* colorspace.

Step 1: Acquire Image

Read in the fabric.png image, which is an image of colorful fabric.

fabric = imread("fabric.png");
imshow(fabric)
title("Fabric")

Step 2: Calculate Sample Colors in L*a*b* Color Space for Each Region

You can see six major colors in the image: the background color, red, green, purple, yellow, and
magenta. Notice how easily you can visually distinguish these colors from one another. The L*a*b*
colorspace (also known as CIELAB or CIE L*a*b*) enables you to quantify these visual differences.

The L*a*b* color space is derived from the CIE XYZ tristimulus values. The L*a*b* space consists of a
luminosity 'L*' or brightness layer, chromaticity layer 'a*' indicating where color falls along the red-
green axis, and chromaticity layer 'b*' indicating where the color falls along the blue-yellow axis.

12-2
Color-Based Segmentation Using the L*a*b* Color Space

Your approach is to choose a small sample region for each color and to calculate each sample region's
average color in 'a*b*' space. You will use these color markers to classify each pixel.

To simplify this example, load the region coordinates that are stored in a MAT-file.
load regioncoordinates;

nColors = 6;
sample_regions = false([size(fabric,1) size(fabric,2) nColors]);

for count = 1:nColors


sample_regions(:,:,count) = roipoly(fabric,region_coordinates(:,1,count), ...
region_coordinates(:,2,count));
end

imshow(sample_regions(:,:,2))
title("Sample Region for Red")

Convert your fabric RGB image into an L*a*b* image using the rgb2lab function.
lab_fabric = rgb2lab(fabric);

Calculate the mean 'a*' and 'b*' value for each area that you extracted with roipoly. These values
serve as your color markers in 'a*b*' space.

12-3
12 Image Segmentation

a = lab_fabric(:,:,2);
b = lab_fabric(:,:,3);
color_markers = zeros([nColors, 2]);

for count = 1:nColors


color_markers(count,1) = mean2(a(sample_regions(:,:,count)));
color_markers(count,2) = mean2(b(sample_regions(:,:,count)));
end

For example, the average color of the red sample region in a*b* space is

disp([color_markers(2,1), color_markers(2,2)]);

69.8278 20.1056

Step 3: Classify Each Pixel Using the Nearest Neighbor Rule

Each color marker now has an a* and a b* value. You can classify each pixel in the lab_fabric
image by calculating the Euclidean distance between that pixel and each color marker. The smallest
distance will tell you that the pixel most closely matches that color marker. For example, if the
distance between a pixel and the red color marker is the smallest, then the pixel would be labeled as
a red pixel.

Create an array that contains your color labels: 0 = background, 1 = red, 2 = green, 3 = purple, 4 =
magenta, and 5 = yellow.

color_labels = 0:nColors-1;

Initialize matrices to be used in the nearest neighbor classification.

a = double(a);
b = double(b);
distance = zeros([size(a), nColors]);

Perform classification

for count = 1:nColors


distance(:,:,count) = ( (a - color_markers(count,1)).^2 + ...
(b - color_markers(count,2)).^2 ).^0.5;
end

[~,label] = min(distance,[],3);
label = color_labels(label);
clear distance;

Step 4: Display Results of Nearest Neighbor Classification

The label matrix contains a color label for each pixel in the fabric image. Use the label matrix to
separate objects in the original fabric image by color.

rgb_label = repmat(label,[1 1 3]);


segmented_images = zeros([size(fabric), nColors],"uint8");

for count = 1:nColors


color = fabric;
color(rgb_label ~= color_labels(count)) = 0;
segmented_images(:,:,:,count) = color;
end

12-4
Color-Based Segmentation Using the L*a*b* Color Space

Display the five segmented colors as a montage. Also display the background pixels in the image that
are not classified as a color.

montage({segmented_images(:,:,:,2),segmented_images(:,:,:,3) ...
segmented_images(:,:,:,4),segmented_images(:,:,:,5) ...
segmented_images(:,:,:,6),segmented_images(:,:,:,1)});
title("Montage of Red, Green, Purple, Magenta, and Yellow Objects, and Background")

Step 5: Display a* and b* Values of Labeled Colors

You can see how well the nearest neighbor classification separated the different color populations by
plotting the a* and b* values of pixels that were classified into separate colors. For display purposes,
label each point with its color label. Purple is not a named color value, so specify the color purple
using a string with a hexadecimal color code.

purple = "#774998";
plot_labels = ["k", "r", "g", purple, "m", "y"];

figure
for count = 1:nColors
plot_label = plot_labels(count);
plot(a(label==count-1),b(label==count-1),".", ...
MarkerEdgeColor=plot_label,MarkerFaceColor=plot_label);
hold on
end

title("Scatterplot of Segmented Pixels in a*b* Space");


xlabel("a* Values");
ylabel("b* Values");

12-5
12 Image Segmentation

12-6
Color-Based Segmentation Using K-Means Clustering

Color-Based Segmentation Using K-Means Clustering

This example shows how to segment colors in an automated fashion using k-means clustering.

Clustering is a way to separate groups of objects. K-means clustering treats each object as having a
location in space. It finds partitions such that objects within each cluster are as close to each other as
possible, and as far from objects in other clusters as possible. You can use the imsegkmeans function
to separate image pixels by value into clusters within a color space. This example performs k-means
clustering of an image in the RGB and L*a*b* color spaces to show how using different color spaces
can improve segmentation results.

Step 1: Read Image

Read in hestain.png, which is an image of tissue stained with hemotoxylin and eosin (H&E). This
staining method helps pathologists distinguish between tissue types that are stained blue-purple and
pink.
he = imread("hestain.png");
imshow(he)
title("H&E Image")
text(size(he,2),size(he,1)+15, ...
"Image courtesy of Alan Partin, Johns Hopkins University", ...
FontSize=7,HorizontalAlignment="right")

Step 2: Classify Colors in RBG Color Space Using K-Means Clustering

Segment the image into three regions using k-means clustering in the RGB color space. For each
pixel in the input image, the imsegkmeans function returns a label corresponding to a cluster.

Display the label image as an overlay on the original image. The label image incorrectly groups white,
light blue-purple, and light pink regions together. Because the RGB color space combines brightness
and color information within each channel (red, green, blue), lighter versions of two different colors
are closer together and more challenging to segment than darker versions of the same two colors.

12-7
12 Image Segmentation

numColors = 3;
L = imsegkmeans(he,numColors);
B = labeloverlay(he,L);
imshow(B)
title("Labeled Image RGB")

Step 3: Convert Image from RGB Color Space to L*a*b* Color Space

The L*a*b* color space separates image luminosity and color. This makes it easier to segment regions
by color, independent of lightness. The color space is also more consistent with human visual
perception of the distinct white, blue-purple, and pink regions in the image.

The L*a*b* color space is derived from the CIE XYZ tristimulus values. The L*a*b* space consists of
the luminosity layer L*, the chromaticity layer a* that indicates where a color falls along the red-
green axis, and the chromaticity layer b* that indicates where a color falls along the blue-yellow axis.
All of the color information is in the a* and b* layers.

Convert the image to the L*a*b* color space by using the rgb2lab function.
lab_he = rgb2lab(he);

Step 4: Classify Colors in a*b* Space Using K-Means Clustering

To segment the image using only color information, limit the image to the a* and b* values in lab_he.
Convert the image to data type single for use with the imsegkmeans function. Use the
imsegkmeans function to separate the image pixels into three clusters. Set the value of the
NumAttempts name-value argument to repeat clustering three times with different initial cluster
centroid positions to avoid fitting to a local minimum.
ab = lab_he(:,:,2:3);
ab = im2single(ab);
pixel_labels = imsegkmeans(ab,numColors,NumAttempts=3);

Display the label image as an overlay on the original image. The new label image more clearly
separates the white, blue-purple, and pink stained tissue regions.

12-8
Color-Based Segmentation Using K-Means Clustering

B2 = labeloverlay(he,pixel_labels);
imshow(B2)
title("Labeled Image a*b*")

Step 5: Create Images that Segment H&E Image by Color

Using pixel_labels, you can separate objects in the original image hestain.png by color,
resulting in three masked images.
mask1 = pixel_labels == 1;
cluster1 = he.*uint8(mask1);
imshow(cluster1)
title("Objects in Cluster 1");

12-9
12 Image Segmentation

mask2 = pixel_labels == 2;
cluster2 = he.*uint8(mask2);
imshow(cluster2)
title("Objects in Cluster 2");

mask3 = pixel_labels == 3;
cluster3 = he.*uint8(mask3);
imshow(cluster3)
title("Objects in Cluster 3");

12-10
Color-Based Segmentation Using K-Means Clustering

Step 6: Segment Nuclei

Cluster 3 contains only the blue objects. Notice that there are dark and light blue objects. You can
separate dark blue from light blue using the L* layer in the L*a*b* color space. The cell nuclei are
dark blue.

The L* layer contains the brightness value of each pixel. Extract the brightness values of the pixels in
this cluster and threshold them with a global threshold by using the imbinarize function. The mask
idx_light_blue gives the indices of light blue pixels.

L = lab_he(:,:,1);
L_blue = L.*double(mask3);
L_blue = rescale(L_blue);
idx_light_blue = imbinarize(nonzeros(L_blue));

Copy the mask of blue objects, mask3, and then remove the light blue pixels from the mask. Apply the
new mask to the original image and display the result. Only dark blue cell nuclei are visible.

blue_idx = find(mask3);
mask_dark_blue = mask3;
mask_dark_blue(blue_idx(idx_light_blue)) = 0;

blue_nuclei = he.*uint8(mask_dark_blue);
imshow(blue_nuclei)
title("Blue Nuclei")

See Also
imsegkmeans

Related Examples
• “Understanding Color Spaces and Color Space Conversion” on page 16-15

12-11
12 Image Segmentation

• “Color-Based Segmentation Using the L*a*b* Color Space” on page 12-2

12-12
Plot Land Classification with Color Features and Superpixels

Plot Land Classification with Color Features and Superpixels

This example shows how to perform land type classification based on color features using K-means
clustering and superpixels. Superpixels can be a very useful technique when performing
segmentation and classification, especially when working with large images. Superpixels enable you
to break an image into a set of structurally meaningful regions, where the boundaries of each region
take into account edge information in the original image. Once you break an image into superpixel
regions, classification algorithms can be used to classify each region, rather than having to solve the
classification problem over the full original image grid. The use of superpixels can provide large
performance advantages in solving image classification problems while also providing a high quality
segmentation result.

Read an image into the workspace. For better performance, this example reduces the size of the
image by half. Visually, there are four types of land that are distinguishable in the blue marble image
based only on color features: forested regions, dry/desert regions, ice covered regions, and water.

url = "https://eoimages.gsfc.nasa.gov/images/imagerecords/74000/74192/" ...


+ "world.200411.3x5400x2700.jpg";
A = imread(url);
A = imresize(A,0.5);
imshow(A)

Convert the image to the L*a*b* color space.

Alab = rgb2lab(A);

Compute the superpixel oversegmentation of the original image and display it.

12-13
12 Image Segmentation

[L,N] = superpixels(Alab,20000,isInputLab=true);
BW = boundarymask(L);
imshow(imoverlay(A,BW,"cyan"))

Create a cell array of the set of pixels in each region.

pixelIdxList = label2idx(L);

Determine the median color of each superpixel region in the L*a*b* color space.

[m,n] = size(L);
meanColor = zeros(m,n,3,"single");
for i = 1:N
meanColor(pixelIdxList{i}) = mean(Alab(pixelIdxList{i}));
meanColor(pixelIdxList{i}+m*n) = mean(Alab(pixelIdxList{i}+m*n));
meanColor(pixelIdxList{i}+2*m*n) = mean(Alab(pixelIdxList{i}+2*m*n));
end

Cluster the color feature of each superpixel by using the imsegkmeans function.

numColors = 4;
[Lout,cmap] = imsegkmeans(meanColor,numColors,numAttempts=2);
cmap = lab2rgb(cmap);
imshow(label2rgb(Lout))

12-14
Plot Land Classification with Color Features and Superpixels

Use cluster centers as the colormap for a thematic map. The mean colors found during K-means
clustering can be used directly as a colormap to give a more natural visual interpretation of the land
classification assignments of forest, ice, dry land, and water.

imshow(double(Lout),cmap)

12-15
12 Image Segmentation

Compute 3-D Superpixels of Input Volumetric Intensity Image

Load 3-D MRI data, remove any singleton dimensions, and convert the data into a grayscale intensity
image.

load mri;
D = squeeze(D);
A = ind2gray(D,map);

Calculate the 3-D superpixels. Form an output image where each pixel is set to the mean color of its
corresponding superpixel region.

[L,N] = superpixels3(A,34);

Show all xy-planes progressively with superpixel boundaries.

imSize = size(A);

Create a stack of RGB images to display the boundaries in color.

imPlusBoundaries = zeros(imSize(1),imSize(2),3,imSize(3),'uint8');
for plane = 1:imSize(3)
BW = boundarymask(L(:, :, plane));
% Create an RGB representation of this plane with boundary shown
% in cyan.
imPlusBoundaries(:, :, :, plane) = imoverlay(A(:, :, plane), BW, 'cyan');
end

implay(imPlusBoundaries,5)

12-16
Compute 3-D Superpixels of Input Volumetric Intensity Image

Set the color of each pixel in output image to the mean intensity of the superpixel region. Show the
mean image next to the original. If you run this code, you can use implay to view each slice of the
MRI data.

pixelIdxList = label2idx(L);
meanA = zeros(size(A),'like',D);
for superpixel = 1:N
memberPixelIdx = pixelIdxList{superpixel};
meanA(memberPixelIdx) = mean(A(memberPixelIdx));
end
implay([A meanA],5);

12-17
12 Image Segmentation

12-18
Segment Lungs from 3-D Chest Scan

Segment Lungs from 3-D Chest Scan

This example shows how to perform a 3-D segmentation using active contours (snakes) and view the
results using the Volume Viewer app.

Prepare the Data

Load the human chest CT scan data into the workspace. To run this example, you must download the
sample data from MathWorks™ using the Add-On Explorer. See “Install Sample Data Using Add-On
Explorer” on page 12-158.

load chestVolume
whos

Name Size Bytes Class Attributes

V 512x512x318 166723584 int16

Convert the CT scan data from int16 to single to normalize the values to the range [0, 1].

V = im2single(V);

View the chest scans using the Volume Viewer app.

volumeViewer(V)

Volume Viewer has preset alphamaps that are intended to provide the best view of certain types of
data. To get the best view of the chest scans, select the ct-bone preset.

12-19
12 Image Segmentation

Segment the Lungs

Segment the lungs in the CT scan data using the active contour technique. Active contours is a region
growing algorithm which requires initial seed points. The example uses the Image Segmenter app to
create this seed mask by segmenting two orthogonal 2-D slices, one in the XY plane and the other in
the XZ plane. The example then inserts these two segmentations into a 3-D mask. The example passes
this mask to the activecontour function to create a 3-D segmentation of the lungs in the chest
cavity. (This example uses the active contour method but you could use other segmentation
techniques to accomplish the same goal, such as flood-fill.)

Extract the center slice in both the XY and XZ dimensions.


XY = V(:,:,160);
XZ = squeeze(V(256,:,:));

View the 2-D slices using the imshow function.


figure
imshow(XY,[],"Border","tight")
imshow(XZ,[],"Border","tight")

You can perform the segmentation in the Image Segmenter app. Open the app using the
imageSegmenter command, specifying a 2-D slice as the input argument.
imageSegmenter(XY)

12-20
Segment Lungs from 3-D Chest Scan

To start the segmentation process, click Threshold to open the lung slice in the Threshold tab. On
the Threshold tab, select the Manual Threshold option and move the Threshold slider to specify a
threshold value that achieves a good segmentation of the lungs. Click Create Mask to accept the
thresholding and return the Segmentation tab.

The app executes the following code to threshold the image.

BW = XY > 0.5098;

After this initial lung segmentation, clean up the mask using options on the Refine Mask menu.

12-21
12 Image Segmentation

In the app, you can click each option to invert the mask image so that the lungs are in the foreground
(Invert Mask), remove other segmented elements besides the lungs (Clear Borders), and fill holes
inside the lung segmentation (Fill Holes). Finally, use the Morphology option to smooth the edges of
the lung segmentation. On the Morphology tab, select the Erode Mask operation. After performing
these steps, select Show Binary and save the mask image to the workspace.

The app executes the following code to refine the mask.

BW = imcomplement(BW);
BW = imclearborder(BW);
BW = imfill(BW, "holes");
radius = 3;
decomposition = 0;
se = strel("disk",radius,decomposition);
BW = imerode(BW, se);
maskedImageXY = XY;
maskedImageXY(~BW) = 0;
imshow(maskedImageXY)

12-22
Segment Lungs from 3-D Chest Scan

Perform the same operation on the XZ slice. Using Load Image, select the XZ variable. Use
thresholding to perform the initial segmentation of the lungs. For the XZ slice, the Global Threshold
option creates an adequate segmentation (the call to imbinarize in the following code). As with the
XY slice, use options on the Refine Mask menu to create a polished segmentation of the lungs. In the
erosion operation on the Morphology tab, specify a radius of 13 to remove small extraneous objects.

To segment the XZ slice and polish the result, the app executes the following code.
BW = imbinarize(XZ);
BW = imcomplement(BW);
BW = imclearborder(BW);
BW = imfill(BW,"holes");
radius = 13;
decomposition = 0;
se = strel("disk",radius,decomposition);
BW = imerode(BW, se);

12-23
12 Image Segmentation

maskedImageXZ = XZ;
maskedImageXZ(~BW) = 0;
imshow(maskedImageXZ)

Create Seed Mask and Segment Lungs Using activecontour

Create the 3-D seed mask that you can use with the activecontour function to segment the lungs.

Create a logical 3-D volume the same size as the input volume and insert mask_XY and mask_XZ at
the appropriate spatial locations.

12-24
Segment Lungs from 3-D Chest Scan

mask = false(size(V));
mask(:,:,160) = maskedImageXY;
mask(256,:,:) = mask(256,:,:)|reshape(maskedImageXZ,[1,512,318]);

Using this 3-D seed mask, segment the lungs in the 3-D volume using the active contour method. This
operation can take a few minutes. To get a quality segmentation, use histeq to spread voxel values
over the available range.

V = histeq(V);

BW = activecontour(V,mask,100,"Chan-Vese");

segmentedImage = V.*single(BW);

View the segmented lungs in the Volume Viewer app.

volumeViewer(segmentedImage)

By manipulating the alphamap settings in the Rendering Editor, you can get a good view of just the
lungs.

Compute the Volume of the Segmented Lungs

Use the regionprops3 function with the "volume" option to calculate the volume of the lungs.

volLungsPixels = regionprops3(logical(BW),"volume");

12-25
12 Image Segmentation

Specify the spacing of the voxels in the x, y, and z dimensions, which was gathered from the original
file metadata. The metadata is not included with the image data that you download from the Add-On
Explorer.

spacingx = 0.76;
spacingy = 0.76;
spacingz = 1.26*1e-6;
unitvol = spacingx*spacingy*spacingz;

volLungs1 = volLungsPixels.Volume(1)*unitvol;
volLungs2 = volLungsPixels.Volume(2)*unitvol;
volLungsLiters = volLungs1 + volLungs2

volLungsLiters = 5.7726

See Also
regionprops3 | activecontour | histeq

12-26
Marker-Controlled Watershed Segmentation

Marker-Controlled Watershed Segmentation

This example shows how to use watershed segmentation to separate touching objects in an image.
The watershed transform finds "catchment basins" and "watershed ridge lines" in an image by
treating it as a surface where light pixels are high and dark pixels are low.

Segmentation using the watershed transform works better if you can identify, or "mark," foreground
objects and background locations. Marker-controlled watershed segmentation follows this basic
procedure:

1 Compute a segmentation function. This is an image whose dark regions are the objects you are
trying to segment.
2 Compute foreground markers. These are connected blobs of pixels within each of the objects.
3 Compute background markers. These are pixels that are not part of any object.
4 Modify the segmentation function so that it only has minima at the foreground and background
marker locations.
5 Compute the watershed transform of the modified segmentation function.

Step 1: Read in the Color Image and Convert it to Grayscale

rgb = imread("pears.png");
I = im2gray(rgb);
imshow(I)

text(732,501,"Image courtesy of Corel(R)",...


"FontSize",7,"HorizontalAlignment","right")

12-27
12 Image Segmentation

Step 2: Use the Gradient Magnitude as the Segmentation Function

Compute the gradient magnitude. The gradient is high at the borders of the objects and low (mostly)
inside the objects.

gmag = imgradient(I);
imshow(gmag,[])
title("Gradient Magnitude")

12-28
Marker-Controlled Watershed Segmentation

Can you segment the image by using the watershed transform directly on the gradient magnitude?

L = watershed(gmag);
Lrgb = label2rgb(L);
imshow(Lrgb)
title("Watershed Transform of Gradient Magnitude")

12-29
12 Image Segmentation

No. Without additional preprocessing such as the marker computations below, using the watershed
transform directly often results in "oversegmentation."

Step 3: Mark the Foreground Objects

A variety of procedures could be applied here to find the foreground markers, which must be
connected blobs of pixels inside each of the foreground objects. In this example, you use
morphological techniques called "opening-by-reconstruction" and "closing-by-reconstruction" to
"clean" up the image. These operations will create flat maxima inside each object that can be located
using imregionalmax.

Opening is an erosion followed by a dilation, while opening-by-reconstruction is an erosion followed


by a morphological reconstruction. Let's compare the two. First, compute the opening using imopen.

se = strel("disk",20);
Io = imopen(I,se);
imshow(Io)
title("Opening")

12-30
Marker-Controlled Watershed Segmentation

Next compute the opening-by-reconstruction using imerode and imreconstruct.

Ie = imerode(I,se);
Iobr = imreconstruct(Ie,I);
imshow(Iobr)
title("Opening-by-Reconstruction")

12-31
12 Image Segmentation

Following the opening with a closing can remove the dark spots and stem marks. Compare a regular
morphological closing with a closing-by-reconstruction. First try imclose:

Ioc = imclose(Io,se);
imshow(Ioc)
title("Opening-Closing")

12-32
Marker-Controlled Watershed Segmentation

Now use imdilate followed by imreconstruct. Notice you must complement the image inputs and
output of imreconstruct.

Iobrd = imdilate(Iobr,se);
Iobrcbr = imreconstruct(imcomplement(Iobrd),imcomplement(Iobr));
Iobrcbr = imcomplement(Iobrcbr);
imshow(Iobrcbr)
title("Opening-Closing by Reconstruction")

12-33
12 Image Segmentation

As you can see by comparing Iobrcbr with Ioc, reconstruction-based opening and closing are more
effective than standard opening and closing at removing small blemishes without affecting the overall
shapes of the objects. Calculate the regional maxima of Iobrcbr to obtain good foreground markers.

fgm = imregionalmax(Iobrcbr);
imshow(fgm)
title("Regional Maxima of Opening-Closing by Reconstruction")

12-34
Marker-Controlled Watershed Segmentation

To help interpret the result, superimpose the foreground marker image on the original image.

I2 = labeloverlay(I,fgm);
imshow(I2)
title("Regional Maxima Superimposed on Original Image")

12-35
12 Image Segmentation

Notice that some of the mostly-occluded and shadowed objects are not marked, which means that
these objects will not be segmented properly in the end result. Also, the foreground markers in some
objects go right up to the objects' edge. That means you should clean the edges of the marker blobs
and then shrink them a bit. You can do this by a closing followed by an erosion.

se2 = strel(ones(5,5));
fgm2 = imclose(fgm,se2);
fgm3 = imerode(fgm2,se2);

This procedure tends to leave some stray isolated pixels that must be removed. You can do this using
bwareaopen, which removes all blobs that have fewer than a certain number of pixels.

fgm4 = bwareaopen(fgm3,20);
I3 = labeloverlay(I,fgm4);
imshow(I3)
title("Modified Regional Maxima Superimposed on Original Image")

12-36
Marker-Controlled Watershed Segmentation

Step 4: Compute Background Markers

Now you need to mark the background. In the cleaned-up image, Iobrcbr, the dark pixels belong to
the background, so you could start with a thresholding operation.

bw = imbinarize(Iobrcbr);
imshow(bw)
title("Thresholded Opening-Closing by Reconstruction")

12-37
12 Image Segmentation

The background pixels are in black, but ideally we don't want the background markers to be too close
to the edges of the objects we are trying to segment. We'll "thin" the background by computing the
"skeleton by influence zones", or SKIZ, of the foreground of bw. This can be done by computing the
watershed transform of the distance transform of bw, and then looking for the watershed ridge lines
(DL == 0) of the result.

D = bwdist(bw);
DL = watershed(D);
bgm = DL == 0;
imshow(bgm)
title("Watershed Ridge Lines")

12-38
Marker-Controlled Watershed Segmentation

Step 5: Compute the Watershed Transform of the Segmentation Function

The function imimposemin can be used to modify an image so that it has regional minima only in
certain desired locations. Here you can use imimposemin to modify the gradient magnitude image so
that its only regional minima occur at foreground and background marker pixels.

gmag2 = imimposemin(gmag, bgm | fgm4);

Finally, compute the watershed-based segmentation.

L = watershed(gmag2);

Step 6: Visualize the Result

One visualization technique is to superimpose the foreground markers, background markers, and
segmented object boundaries on the original image. You can use dilation as needed to make certain
aspects, such as the object boundaries, more visible. Object boundaries are located where L == 0.
The binary foreground and background markers are scaled to different integer values so that they are
assigned different labels.

labels = imdilate(L==0,ones(3,3)) + 2*bgm + 3*fgm4;


I4 = labeloverlay(I,labels);
imshow(I4)
title("Markers and Object Boundaries Superimposed on Original Image")

12-39
12 Image Segmentation

This visualization illustrates how the locations of the foreground and background markers affect the
result. In a couple of locations, partially occluded darker objects were merged with their brighter
neighbor objects because the occluded objects did not have foreground markers.

Another useful visualization technique is to display the label matrix as a color image. Label matrices,
such as those produced by watershed and bwlabel, can be converted to truecolor images for
visualization purposes by using label2rgb.

Lrgb = label2rgb(L,"jet","w","shuffle");
imshow(Lrgb)
title("Colored Watershed Label Matrix")

12-40
Marker-Controlled Watershed Segmentation

You can use transparency to superimpose this pseudo-color label matrix on top of the original
intensity image.

figure
imshow(I)
hold on
himage = imshow(Lrgb);
himage.AlphaData = 0.3;
title("Colored Labels Superimposed Transparently on Original Image")

12-41
12 Image Segmentation

See Also
watershed | imopen | imreconstruct | imclose | imdilate | imregionalmax | imerode |
bwareaopen | bwdist | label2rgb | imcomplement | labeloverlay | imgradient

12-42
Segment Image and Create Mask Using Color Thresholder

Segment Image and Create Mask Using Color Thresholder

This example shows how to segment an image and create a binary mask image using the Color
Thresholder app. The example segments the foreground (the peppers) from the background (the
purple cloth) based on color values.

Image segmentation by color thresholding can be an iterative process. For example, you can try
segmenting the image in different color spaces because one color space might isolate a particular
color better than another. In any of the supported color spaces, you can initially perform an automatic
segmentation by selecting a region in the foreground or background. Then, you can refine the
segmentation by using color component controls provided by the app.

This example also shows how to create a binary mask image, save the results of your work, and
export MATLAB® code that enables you to reproduce the segmentation.

Open Image in Color Thresholder

Read a color image into the workspace.

im = imread("peppers.png");

Open Color Thresholder from the MATLAB toolstrip. On the Apps tab, in the Image Processing and

Computer Vision section, click Color Thresholder .

12-43
12 Image Segmentation

Load the image into the Color Thresholder app. Click Load Image, and then select Load Image
from Workspace. In the Import From Workspace dialog box, select the image from the workspace,
and then click OK.

You can also open the app from the command line by using the colorThresholder function,
specifying the variable name of the image:

12-44
Segment Image and Create Mask Using Color Thresholder

colorThresholder(im)

Select Color Space

Color Thresholder displays the image in the Choose a Color Space tab, with point clouds
representing the image in these color spaces: RGB, HSV, YCbCr, and L*a*b*. For color-based
segmentation, select the color space that provides the best color separation. Using the mouse, rotate
the point cloud representations to see how they isolate individual colors. For this example, start by
selecting the YCbCr color space.

Segment Image

When you choose a color space, Color Thresholder opens a new tab, displaying the image along with
a set of controls for each color component and the point cloud representation. The color controls vary
depending on the color space. For the YCbCr color space, the app displays three histograms
representing the three color components: the Y component represents brightness, the Cb component
represents the blue-yellow spectrum, and the Cr component represents the red-green spectrum.

12-45
12 Image Segmentation

To access the pan and zoom controls, move the cursor over the image.

Automatic Thresholding

First, segment the image using automatic thresholding. Because the background (purple cloth) is
close to a uniform color, segment it rather than the foreground objects (the peppers). You can invert
the mask later using the Invert Mask option.

Define a region using the freehand ROI tool. Click the lasso icon in the axes toolbar at the top-
right corner of the image and draw an ROI on the background. You can draw multiple regions. If you

12-46
Segment Image and Create Mask Using Color Thresholder

want to delete a region you drew and start over, right-click anywhere in the region and select Delete
Freehand.

After drawing the region, Color Thresholder automatically thresholds the image based on the colors
you selected in the region you drew. The Y, Cb, and Cr color controls change to reflect the
segmentation. This automatic thresholding does not create a clean segmentation of the background
and foreground, especially at the lower border between the foreground and background. For this
example, the background color is lighter near the bottom of the image.

Refine Automatic Thresholding Using Color Controls

To fine tune the automatic thresholding, use the color controls. For each Y, Cb, and Cr color control,
you can set the range of values by dragging the lower and upper bounds in that histogram. Using
these color controls, you can significantly improve the segmentation of the foreground.

12-47
12 Image Segmentation

Threshold Image Color Values Using Point Cloud

Another approach to selecting a range of colors is to draw an ROI on the point cloud.

On the app toolstrip, click Reset Thresholds to revert back to the original image. In the bottom-right
pane of the app, click and drag the point cloud to rotate until you isolate the view of the color you are

interested in thresholding. Hover over the point cloud and click the ROI button in the top left
corner of the point cloud. Color Thresholder converts the 3-D point cloud into a 2-D representation
and activates the polygon ROI tool. Draw an ROI around the color you want to segment (purple). This
method can create a better segmentation than the initial automatic thresholding approach.

12-48
Segment Image and Create Mask Using Color Thresholder

Segment Image in Another Color Space

To segment the image in another color space, click New Color Space in the app toolstrip. In the
Choose a Color Space tab, choose the HSV color space.

Color Thresholder creates a new tab that displays the image and the color component controls for the
HSV color space. In this color space, H represents hue, S represents saturation, and V represents
value. The HSV color space uses a dual-direction knob for the H component and two histogram
sliders for the S and V components. The tab also contains the point cloud representation of the colors
in the image.

12-49
12 Image Segmentation

As in the previous iteration, you can use all of the same techniques: automatic thresholding and
interactive use of the color component controls, including the point cloud. When you use the color
controls, you can see the segmentation in progress. In the pane with the H control, change the range
of the hue by clicking and dragging one arrow at a time. Experiment with the controls until you have
a clean separation of the background from the foreground. You can clean up small imperfections after
you create the mask image using toolbox functions, such as morphological operators.

12-50
Segment Image and Create Mask Using Color Thresholder

Create Mask Image

Because the example segmented the background (the purple cloth) rather than the foreground
objects (the peppers), swap the foreground and background by clicking Invert Mask.

View the binary mask image by clicking Show Binary on the app toolstrip.

12-51
12 Image Segmentation

Export Results

Save the mask in the workspace. On the toolstrip, click Export and select Export Images.

In the Export To Workspace dialog box, specify variable names for the binary mask image. You can
also save the original input RGB image and the segmented version of the original image.

12-52
Segment Image and Create Mask Using Color Thresholder

To save the MATLAB code required to recreate the segmentation, click Export and select Export
Function. Color Thresholder opens the MATLAB Editor with the code that creates the segmentation.
To save the code, click Save on the MATLAB Editor toolstrip. You can run this code, passing it an RGB
image, to create the same mask image programmatically.

function [BW,maskedRGBImage] = createMask(RGB)


%createMask Threshold RGB image using auto-generated code from colorThresholder app.
% [BW,MASKEDRGBIMAGE] = createMask(RGB) thresholds image RGB using
% auto-generated code from the colorThresholder app. The colorspace and
% range for each channel of the colorspace were set within the app. The
% segmentation mask is returned in BW, and a composite of the mask and
% original RGB images is returned in maskedRGBImage.

% Auto-generated by colorThresholder app on 01-Jan-2023


%------------------------------------------------------

% Convert RGB image to chosen color space


I = rgb2hsv(RGB);

% Define thresholds for channel 1 based on histogram settings


channel1Min = 0.734;
channel1Max = 0.921;

% Define thresholds for channel 2 based on histogram settings


channel2Min = 0.334;
channel2Max = 1.000;

% Define thresholds for channel 3 based on histogram settings


channel3Min = 0.000;
channel3Max = 0.868;

% Create mask based on chosen histogram thresholds


sliderBW = (I(:,:,1) >= channel1Min ) & (I(:,:,1) <= channel1Max) & ...
(I(:,:,2) >= channel2Min ) & (I(:,:,2) <= channel2Max) & ...
(I(:,:,3) >= channel3Min ) & (I(:,:,3) <= channel3Max);
BW = sliderBW;

% Invert mask
BW = ~BW;

12-53
12 Image Segmentation

% Initialize output masked image based on input image.


maskedRGBImage = RGB;

% Set background pixels where BW is false to zero.


maskedRGBImage(repmat(~BW,[1 1 3])) = 0;

end

See Also
Color Thresholder

More About
• “Acquire Live Images in Color Thresholder” on page 12-55

12-54
Acquire Live Images in Color Thresholder

Acquire Live Images in Color Thresholder


You can color threshold an image acquired from a webcam using the Color Thresholder app. The
Image Capture tab enables you to bring a live image from USB webcams into the app.

To begin color thresholding, add images that you acquire live from a webcam using the MATLAB
Webcam support. Install the MATLAB Support Package for USB Webcams to use this feature. See
“Install the MATLAB Support Package for USB Webcams” (Image Acquisition Toolbox) for information
on installing the support package.

1 Open the app using the colorThresholder function.


2 To add a live image from your webcam, select Load Image > Acquire Image From Camera.

3 The Import from Camera dialog box opens and displays camera properties and a live preview
of the webcam. If you have only one webcam connected to your system, then it is selected by
default. If you have multiple cameras connected and want to use a different one, select the
camera in the Camera list.

12-55
12 Image Segmentation

4 Use the sliders or drop-downs to change property settings. The list of camera properties varies,
depending on your device. The live preview updates dynamically when you change a setting.

After you adjust the camera settings, click Capture to capture a static image from the webcam.
If you want to use this static image for color thresholding, then click Accept. Otherwise, you can
click Retake to capture a new static image.
5 The Choose a Color Space tab opens and displays the four color space options: RGB, HSV,
YCbCr, and L*a*b*. Choose a color space by clicking the button of your choice.

12-56
Acquire Live Images in Color Thresholder

6 The Color Thresholder app creates a new tab displaying the image and the color component
controls for the selected color space. You can now perform color thresholding on the image. See
“Segment Image and Create Mask Using Color Thresholder” on page 12-43 for information about
processing the image.

7 If you want to save the image that you captured, click Export and select Export Images.

12-57
12 Image Segmentation

In the Export To Workspace dialog box, you can save the binary mask, segmented image, and the
captured image. The Input RGB Image option saves the RGB image captured from the webcam.

See Also
Color Thresholder

More About
• “Segment Image and Create Mask Using Color Thresholder” on page 12-43

12-58
Getting Started with Image Segmenter

Getting Started with Image Segmenter


The Image Segmenter app provides access to many different ways to segment an image. Performing
segmentation using Image Segmenter can be an iterative process where you might try several of the
segmentation options. Some segmentation techniques might work better with certain types of images
than others. After segmenting an image, you can save the binary mask. You can also retrieve the code
that Image Segmenter used to create the mask.

Open Image Segmenter App and Load Data


Open the app and load an image to be segmented. Image Segmenter can open any file that can be
read by imread.

You can open the Image Segmenter from the command line. Specify an image in the workspace or the
name of a file.

I = imread("coins.png");
imageSegmenter(I)

Alternatively, open the app from the Apps tab, under Image Processing and Computer Vision.
Then, from the Load menu, choose the name of a workspace variable or the name of the file
containing the image.

After you load an image, you can optionally load an existing binary mask. For example, you might
have previously created a mask of an RGB image in the Color Thresholder app and you want to refine
the segmentation. To load an existing mask, click Load Mask. The segmentation mask image must be
a logical image of the same size as the image you are segmenting.

Create and Add Regions to Segmented Mask


To create an initial mask, use any of the tools in the Create Mask and Add to Mask menus. If you
want to start a new segmentation after creating a mask, click New Segmentation. You can perform
multiple segmentations using the app. Each segmentation appears, with a thumbnail, in the Data
Browser.

To add segmented regions to an existing mask, use tools in the Add to Mask menu. The app displays
the steps you take while creating the segmentation in the History panel of the Data Browser.

Tool Description
Threshold An automatic technique where you specify an intensity value that you want
to isolate. This technique can be useful if the objects you want to segment
in the image have similar pixel intensity values and these values are easily
distinguished from other areas of the image, such as the background. For
more information, see “Segment Image Using Thresholding in Image
Segmenter” on page 12-62.
Graph Cut A semi-automatic technique that can segment foreground and background.
This technique does not require careful placement of seed points and you
can refine the segmentation interactively. For more information, see
“Segment Image Using Graph Cut in Image Segmenter” on page 12-85.

12-59
12 Image Segmentation

Tool Description
Auto Cluster An automatic technique where the app groups image features into a binary
segmentation. This option is only available if you have Statistics and
Machine Learning Toolbox™. For more information, see “Segment Image
Using Auto Cluster in Image Segmenter” on page 12-109.
Find Circles An automatic technique where you specify the minimum and maximum
diameter of the circular objects you want to detect. For more information,
see “Segment Image Using Find Circles in Image Segmenter” on page 12-
102
Local Graph Cut A semi-automatic technique, similar to the Graph Cut method, that can
(grabcut) segment foreground and background. With Local Graph Cut (grabcut), you
first define an ROI that encompasses the object in the image that you want
to segment. The Image Segmenter automatically segments the object in
the ROI. You can refine the segmentation by drawing lines on the image to
identify the foreground and the background within the ROI. Everything
outside the ROI is considered background. For more information, see
“Segment Image Using Local Graph Cut (Grabcut) in Image Segmenter” on
page 12-94.
Flood Fill An automatic technique where you specify starting points and the method
segments areas with similar intensity values.
Draw ROI A manual technique where you draw shapes that outline the region the
objects you want to segment. Using the mouse, you can draw rectangles,
ellipses, polygons, or freehand shapes. For more information, see
“Segment Image by Drawing Regions Using Image Segmenter” on page 12-
68.

When using the Auto Cluster, Graph Cut, and Flood Fill segmentation tools, you can also include
texture as an additional consideration in your segmentation. Texture filtering can help distinguish
foreground from background. To turn the texture option on and off, click Include Texture Features.
When enabled, Image Segmenter uses Gabor filters to analyze the texture of the image as a
preprocessing step in the segmentation. For more information, see “Use Texture Filtering in Image
Segmenter” on page 12-115. For more information about Gabor filters, see “Texture Segmentation
Using Gabor Filters” on page 12-160.

Refine Segmented Mask


Image Segmenter provides access to several tools that you can use to refine the mask you created.

Tool Description
Morphology Many morphological techniques, such as dilation and erosion. For an
example, view “Refine Segmentation Using Morphology in Image
Segmenter” on page 12-80.
Active contours (also An iterative method that grows or shrinks regions in an image. You identify
known as snakes) the regions with seed points. For an example, view “Segment Image Using
Active Contours in Image Segmenter” on page 12-74.
Clear borders A fast way to remove small regions on the edge of the image.

12-60
Getting Started with Image Segmenter

Tool Description
Fill holes A fast way to fill small holes in foreground regions. For an example, view
“Refine Segmentation Using Morphology in Image Segmenter” on page 12-
80.
Invert mask Sometimes the segmentation is easier to evaluate if you invert the
foreground and background. For an example, view “Segment Image Using
Auto Cluster in Image Segmenter” on page 12-109

Export Segmentation Results


When you find an acceptable segmentation, you can export to the workspace the final segmentation
mask image and the segmented version of the original image. To export the mask and segmentation
to the workspace, click Export and select Export Images.

You can also generate the code used to perform the segmentation (requires Statistics and Machine
Learning Toolbox.) Use the code to apply the same segmentation algorithm to similar images. To get
the code, click Export and select Generate Function. The app opens the MATLAB editor containing
a function with the autogenerated code. To save the code, click Save in the MATLAB editor.

See Also
Image Segmenter

12-61
12 Image Segmentation

Segment Image Using Thresholding in Image Segmenter

This example shows how to segment an image in the Image Segmenter app by using thresholding.
The Image Segmenter app supports three different types of thresholding: Global, Manual, and
Adaptive.

The Image Segmenter app supports many different segmentation methods and using the app can be
an iterative process. You might try several different methods until you achieve the results you want.

Load Image into the Image Segmenter

Open the Image Segmenter app and load an image to be segmented. The app can open any file that
can be read by imread.

For this example, first read an image into the workspace. This example uses an MRI image of a knee.
The goal is to create a mask image that segments the bone from the soft tissue in the image.

I = dicomread('knee1');
knee = mat2gray(I);

From the MATLAB® toolstrip, open the Image Segmenter app. On the Apps tab, in the Image

Processing and Computer Vision section, click Image Segmenter .

On the app toolstrip, click Load, and then select Load Image from Workspace. In the Import from
Workspace dialog box, select the image you read into the workspace. The Image Segmenter app
displays the image you selected.

12-62
Segment Image Using Thresholding in Image Segmenter

You can also open the image in the Image Segmenter app using the imageSegmenter command, as
follows:

imageSegmenter(knee);

After you load an image, you can optionally load an existing binary mask. For example, you might
have previously created a mask of an RGB image in the Color Thresholder app and you want to

12-63
12 Image Segmentation

refine the segmentation. To load an existing mask, click Load Mask. The segmentation mask image
must be a logical image of the same size as the image you are segmenting.

Use Thresholding to Segment Image

Click Threshold in the Create Mask section of the Image Segmenter app toolstrip. The app displays
the thresholded image in the Threshold tab. By default, the app uses global thresholding.

12-64
Segment Image Using Thresholding in Image Segmenter

You can also choose Manual or Adaptive thresholding. Each thresholding option supports controls
that you can use to fine-tune the thresholding. For example, with Manual thresholding, you can
choose the threshold value using the slider. With Adaptive thresholding, you can choose the
sensitivity using the slider. Try each option to see which thresholding method performs the best
segmentation.

The following figure shows the results of using Manual thresholding.

12-65
12 Image Segmentation

The knee image does not have well-defined pixel intensity differences between foreground and
background and thresholding does not seem like the best choice to segment this image.

12-66
Segment Image Using Thresholding in Image Segmenter

To save the segmentation, click Create Mask. If you want to try another segmentation method in the
Image Segmenter app, click Cancel to return to the main segmentation app window.

See Also
Image Segmenter

Related Examples
• “Getting Started with Image Segmenter” on page 12-59

12-67
12 Image Segmentation

Segment Image by Drawing Regions Using Image Segmenter

This example shows how to segment an image in the Image Segmenter app by drawing regions of
interest. Image Segmenter offers many different ROI shapes including polygons, rectangles,
ellipses, and circles. In addition, you can draw freehand shapes or assisted freehand shapes that help
you by following the underlying shape of the objects in the image.

Image Segmenter offers many different segmentation methods and using the app can be an
iterative process. You might try several different methods until you achieve the results you want.

Load an Image in Image Segmenter

Open the Image Segmenter app and load an image to be segmented. The app can open any file that
can be read by imread.

For this example, read an image into the workspace. This example uses an MRI image of a knee. The
goal is to create a mask image that segments the bone from the soft tissue in the image.

I = dicomread("knee1");
knee = mat2gray(I);

Open the Image Segmenter app from the MATLAB® toolstrip. On the Apps tab, in the Image

Processing and Computer Vision section, click Image Segmenter .

On the app toolstrip, click Load, and then select Load image from Workspace. In the Import from
Workspace dialog box, select the image you read into the workspace. Image Segmenter displays the
image you selected.

12-68
Segment Image by Drawing Regions Using Image Segmenter

You can also open the image in the Image Segmenter app using the imageSegmenter command, as
follows:

imageSegmenter(knee);

After you load an image into the app, you can optionally load an existing binary mask. For example,
you might have previously created a mask of an RGB image in the Color Thresholder app and you
want to refine the segmentation. To load an existing mask, click Load Mask. The segmentation mask
image must be a logical image of the same size as the image you are segmenting.

12-69
12 Image Segmentation

Use ROI Tools to Draw Regions for Segmentation

Expand the Add to Mask group and click Draw ROIs. The app opens the ROI tab.

Select the type of ROI you want to draw. For this example, choose Assisted Freehand. As you move
the cursor over the image, it changes to the crosshairs shape. Press the mouse button, and begin
drawing a freehand shape over the area of the image that you want to segment. With the Assisted
Freehand ROI option, which is preselected, you can draw a freehand shape that automatically follows
edges in the underlying image to help you draw a more accurate ROI. As you draw, click the mouse to
create waypoints. Waypoints can help you make fine adjustments to the shape after you finish
drawing. To add additional waypoints after you finish drawing, double-click on the ROI edge.

12-70
Segment Image by Drawing Regions Using Image Segmenter

Continue drawing shapes until all the areas you want to segment are identified. To save the regions
your have drawn, click Apply (their color changes to yellow). To return to the Segmentation tab, click
Close ROI.

12-71
12 Image Segmentation

To view the mask image, click Show Binary on the Segmentation tab. To refine the mask image, use
the tools in the Refine Mask section of Image Segmenter app toolstrip, such as Clear Borders or
Fill Holes. When you are done, click Export to save the mask image to the workspace.

12-72
Segment Image by Drawing Regions Using Image Segmenter

See Also
Image Segmenter

Related Examples
• “Getting Started with Image Segmenter” on page 12-59

12-73
12 Image Segmentation

Segment Image Using Active Contours in Image Segmenter

This example shows how to segment an image in the Image Segmenter app by using active contours
(also called snakes). Active contours is an automatic, iterative method where you mark locations in
the image by drawing regions (called a seed mask). Active contours grows (or shrinks) these seed
shapes to fill the borders of the region in the image. The accuracy of this initial seed mask can impact
the final result. You can also use the Include Texture Features option with active contours.

The Image Segmenter app offers many different segmentation methods and using the app can be an
iterative process. You might try several different methods until you achieve the results you want.

Load an Image in Image Segmenter

Open the Image Segmenter app and load an image to be segmented. The app can open any file that
can be read by imread.

For this example, read an image into the workspace. This example uses an MRI image of a knee. The
goal is to create a mask image that segments the bone from the soft tissue in the image.

I = dicomread("knee1");
knee = mat2gray(I);

From the MATLAB® toolstrip, open the Image Segmenter app. On the Apps tab, in the Image

Processing and Computer Vision section, click Image Segmenter .

On the app toolstrip, click Load, and then select Load Image from Workspace. In the Import from
Workspace dialog box, select the image you read into the workspace. The Image Segmenter app
displays the image you selected.

12-74
Segment Image Using Active Contours in Image Segmenter

You can also open the image in the Image Segmenter app using the imageSegmenter command, as
follows:

imageSegmenter(knee);

After you load an image, you can optionally load an existing binary mask. For example, you might
have previously created a mask of an RGB image in the Color Thresholder app and you want to refine
the segmentation. To load an existing mask, click Load Mask. The segmentation mask image must be
a logical image of the same size as the image you are segmenting.

12-75
12 Image Segmentation

Use Active Contours to Segment Image

To segment an image using active contours, you must first create a rough estimation of the
segmentation. For example, you can use the ROI tools to create a rough segmentation of the image
(see “Segment Image by Drawing Regions Using Image Segmenter” on page 12-68). You could also
load an existing binary mask image.

For this example, use the ROI tools to create seed shapes in the areas you want to segment. When
you are done drawing the regions, click Apply and then click Close ROI to return to the
Segmentation tab.

On the Segmentation tab, in the Refine Mask section of the toolstrip and click the Active Contours.
The Image Segmenter app opens the Active Contours tab.

12-76
Segment Image Using Active Contours in Image Segmenter

To use active contours, click Evolve. The app starts performing iterations to grow the seed masks to
fill the objects to their borders. Initially, use the default active contours method (Region-based) and
the default number of iterations (100). The Image Segmenter displays the progress of the processing
in the lower right corner. Looking at the results, you can see that this approach worked well for two
of the three objects but the segmentation bled into the background for one of the objects. The object
boundary is not as well-defined in this area.

One way to get a better segmentation is to repeat active contours, reducing the number of iterations.
Change the number of iterations in the iterations box, specifying 35, and click Evolve again. This
time, the segmentation does not bleed into the background.

12-77
12 Image Segmentation

To save the segmentation, click Apply. To return to the Segmentation tab, click Close Active
Contours.

To view the mask image, click Show Binary on the Segmentation tab. You can use other tools in the
Image Segmenter app to refine the mask image, such as Clear Borders or Fill Holes. To save the
mask image to the workspace, click Export.

12-78
Segment Image Using Active Contours in Image Segmenter

See Also
Image Segmenter | activecontour

Related Examples
• “Getting Started with Image Segmenter” on page 12-59

12-79
12 Image Segmentation

Refine Segmentation Using Morphology in Image Segmenter

This example shows how to use the capabilities of the Image Segmenter app to polish the
appearance of the mask image you created with the app. The Image Segmenter app includes several
morphological operations that you can use to fix small imperfections in the mask image.

This example creates a mask image using hand-drawn ROIs and active contours (see “Segment Image
Using Active Contours in Image Segmenter” on page 12-74).

Load an Image in the Image Segmenter

Open the Image Segmenter app and load an image to be segmented. The Image Segmenter can
open any file that can be read by imread.

For this example, first read an image into the workspace. This example uses an MRI image of a knee.
The goal is to create a mask image that segments the bone from the soft tissue in the image.

I = dicomread('knee1');
knee = mat2gray(I);

From the MATLAB® toolstrip, open the Image Segmenter app. On the Apps tab, in the Image

Processing and Computer Vision section, click Image Segmenter .

On the app toolstrip, click Load, and then select Load image from Workspace. In the Import from
Workspace dialog box, select the image you read into the workspace. The Image Segmenter app
displays the image you selected.

12-80
Refine Segmentation Using Morphology in Image Segmenter

You can also open the image in the Image Segmenter app using the imageSegmenter command, as
follows:

imageSegmenter(knee);

After you load an image, you can optionally load an existing binary mask. For example, you might
have previously created a mask by drawing ROIs. To load an existing mask, click Load Mask. The
segmentation mask image must be a logical image of the same size as the image you are segmenting.

Create the Mask Image

Create a rough segmentation of the image using ROI drawing tools. Use active contours to finish the
segmentation. For more details on this process, see “Segment Image Using Active Contours in Image
Segmenter” on page 12-74.

After finishing the segmentation, click Show Binary on the Segmentation tab to view the mask
image. Upon close examination, you can see several small holes in the mask image.

12-81
12 Image Segmentation

The Image Segmenter includes morphological tools to refine the binary mask. Expand the Refine
Mask section of the app toolstrip and click Fill Holes.

12-82
Refine Segmentation Using Morphology in Image Segmenter

This removes the holes in the binary mask.

12-83
12 Image Segmentation

To save the binary mask, click Export and select Export Images.

See Also
Image Segmenter

Related Examples
• “Getting Started with Image Segmenter” on page 12-59

12-84
Segment Image Using Graph Cut in Image Segmenter

Segment Image Using Graph Cut in Image Segmenter

This example shows how to use the Graph Cut option in the Image Segmenter app to segment an
image. Graph cut is a semiautomatic segmentation technique that you can use to segment an image
into foreground and background elements. Graph cut segmentation does not require good
initialization. You draw lines on the image, called scribbles, to identify what you want in the
foreground and what you want in the background. The Image Segmenter segments the image
automatically based on your scribbles and displays the segmented image. You can refine the
segmentation by drawing more scribbles on the image until you are satisfied with the result.

The Graph Cut technique applies graph theory to image processing to achieve fast segmentation. The
technique creates a graph of the image where each pixel is a node connected by weighted edges. The
higher the probability that pixels are related the higher the weight. The algorithm cuts along weak
edges, achieving the segmentation of objects in the image. The Image Segmenter uses a particular
variety of the Graph Cut algorithm called lazysnapping. For information about another segmentation
technique that is related to graph cut, see “Segment Image Using Local Graph Cut (Grabcut) in
Image Segmenter” on page 12-94.

Load Image into the Image Segmenter App

Read an image into the workspace. For this example, read the sample image baby.png into the
workspace.

b = imread('baby.jpg');

From the MATLAB® toolstrip, open the Image Segmenter app. On the Apps tab, in the Image

Processing and Computer Vision section, click Image Segmenter .

12-85
12 Image Segmentation

On the app toolstrip, click Load Image, and then select Load Image from Workspace. In the
Import from Workspace dialog box, select the image you read into the workspace. The Image
Segmenter app displays the image you selected.

12-86
Segment Image Using Graph Cut in Image Segmenter

You can also open the app using the imageSegmenter command, specifying the image:

imageSegmenter(b);

Use Graph Cut to Segment Image

On the Image Segmenter app toolstrip, select Graph Cut.

12-87
12 Image Segmentation

The Image Segmenter opens a new tab for Graph Cut segmentation. As a first step in Graph Cut
segmentation, mark the elements of the image that you want to be in the foreground. When the
Image Segmenter opens the Graph Cut tab, it preselects the Mark Foreground option. To mark an
object as foreground, draw a line (also called a scribble) over the object. When you draw a line, try to
include all the different values in the object you want to segment. You can draw as many separate
lines as you like. If you are not satisfied with the lines you draw, you can always edit them. Click
Erase and move the cursor over any part of the line you want to remove. If want to start over, click
Clear Markings.

12-88
Segment Image Using Graph Cut in Image Segmenter

Next, click Mark Background and draw scribbles to mark the elements of the image you want to be
the background. When you finish drawing the lines, the Image Segmenter immediately performs the
segmentation (shown in blue).

12-89
12 Image Segmentation

To refine the segmentation, continue drawing foreground and background lines. For example, there
are several areas near the bottom of the image that need to be removed from the foreground. To fix
these problems, draw additional background lines on these parts of the image.

12-90
Segment Image Using Graph Cut in Image Segmenter

To get a better look at the segmentation, click Show Binary.

12-91
12 Image Segmentation

When you are satisfied with the segmentation, click Create Mask in the toolstrip on the Graph Cut
tab. The app closes the Graph Cut tab and returns you to the Segmentation tab.

Save the Mask image to the Workspace

When you return to the main Segmentation tab, you can use tools to refine the mask image, such as
Morphology and Active Contours. To save the mask image, click Export. You can also use the Export
option to obtain the code the Image Segmenter app used to create the segmentation.

12-92
Segment Image Using Graph Cut in Image Segmenter

See Also
Image Segmenter | lazysnapping

Related Examples
• “Getting Started with Image Segmenter” on page 12-59

12-93
12 Image Segmentation

Segment Image Using Local Graph Cut (Grabcut) in Image


Segmenter

This example shows how to segment an image using Local Graph Cut (also known as grabcut) in the
Image Segmenter app. Like Graph Cut, Local Graph Cut is a semiautomatic segmentation technique
that you can use to segment an image into foreground and background elements. With Local Graph
Cut, you first draw a region-of-interest around the object you want to segment. The Image
Segmenter app segments the image automatically based on the contents of the ROI.

Then, as with Graph Cut, you refine the automatic segmentation by drawing lines, called scribbles, on
the image inside the ROI. The lines you draw identify what you want in the foreground and what you
want in the background. The Local Graph Cut option only segments elements within the boundaries
of the ROI.

The Local Graph Cut technique, similar to the Graph Cut technique, applies graph theory to image
processing to achieve fast segmentation. The algorithm creates a graph of the image where each
pixel is a node connected by weighted edges. The higher the probability that pixels are related the
higher the weight. The algorithm cuts along weak edges, achieving the segmentation of objects in the
image. For information about the Graph Cut technique, see “Segment Image Using Graph Cut in
Image Segmenter” on page 12-85.

Load Image into the Image Segmenter App

Read an image into the workspace.

car = imread('car2.jpg');

From the MATLAB® Toolstrip, open the Image Segmenter app. On the Apps tab, in the Image

Processing and Computer Vision section, click Image Segmenter .

12-94
Segment Image Using Local Graph Cut (Grabcut) in Image Segmenter

On the app toolstrip, click Load Image, and then select Load Image from Workspace. In the
Import from Workspace dialog box, select the image you read into the workspace. The Image
Segmenter app displays the image you selected.

12-95
12 Image Segmentation

You can also open the app using the imageSegmenter command, specifying the name of the image
variable.

imageSegmenter(b);

Use Local Graph Cut (Grabcut) to Segment Image

On the Image Segmenter app toolstrip, select Local Graph Cut.

12-96
Segment Image Using Local Graph Cut (Grabcut) in Image Segmenter

The Image Segmenter app opens a new tab for Local Graph Cut segmentation. As a first step in
Local Graph Cut segmentation, draw an ROI around the object in the image that you want to
segment. When the Image Segmenter app opens the Local Graph Cut tab, it preselects the Draw
ROI button. Position the cursor over the image and draw an ROI that encompasses the entire object
you want to segment. To get a good initial segmentation, make sure the ROI you draw completely
surrounds the object, leaving a small amount of space between the object and the ROI boundary.
Make sure the object you want to segment is completely inside the ROI.

You can choose to draw a rectangular or polygonal ROI. Use the ROI Style menu to choose. To draw
a rectangle, position the cursor over the image and then click and drag. To draw a polygon, click and
drag the mouse, creating a vertex at each click. Double-click to finish the polygon. If you are not
satisfied with the shape you drew, you can always edit it. Right-click the ROI and choose Delete.

When you finish the ROI, the Image Segmenter app automatically segments the object in the ROI.
The blue shading indicates the segmented area.

To refine the automatic segmentation, draw lines (scribbles) to mark any parts of the foreground that
weren't included in the automatic segmentation. After you draw the ROI, the Image Segmenter
selects the Mark Foreground button automatically.

12-97
12 Image Segmentation

To remove areas from the segmentation that are not part of the foreground, mark those areas as
background. Select the Mark Background option and draw lines inside the ROI to identify parts of
the segmentation that should be in the background.

12-98
Segment Image Using Local Graph Cut (Grabcut) in Image Segmenter

When you are satisfied with the segmentation, click Apply. The Image Segmenter app changes the
color of the segmented part of the image to yellow.

12-99
12 Image Segmentation

View Binary Image and Save Mask

To view the mask image, click Show Binary. You can also view the binary mask image in the main
Segmentation tab. To return to the main Image Segmenter app, click Close Local Graph Cut.

12-100
Segment Image Using Local Graph Cut (Grabcut) in Image Segmenter

When you are done segmenting the image, you can save the binary mask, using the Export option.
You can also obtain the code used for the segmentation.

See Also
Image Segmenter | grabcut

Related Examples
• “Getting Started with Image Segmenter” on page 12-59

12-101
12 Image Segmentation

Segment Image Using Find Circles in Image Segmenter

This example shows how to use the Find Circles option in the Image Segmenter app to segment an
image. The Find Circles option is an automatic segmentation technique that you can use to segment
an image into foreground and background elements. The Find Circles option does not require
initialization.

Load Image into the Image Segmenter App

Read an image into the workspace.

coins = imread('coins.png');

From the MATLAB® Toolstrip, open the Image Segmenter app. On the Apps tab, in the Image

Processing and Computer Vision section, click Image Segmenter .

12-102
Segment Image Using Find Circles in Image Segmenter

On the app toolstrip, click Load Image, and then select Load Image from Workspace. In the
Import from Workspace dialog box, select the image you read into the workspace. The Image
Segmenter app displays the image you selected.

You can also open the app using the imageSegmenter command, specifying the image:

imageSegmenter(coins);

Use Find Circles to Segment the Image

On the Image Segmenter app toolstrip, expand the Create Mask section and select Find Circles.

12-103
12 Image Segmentation

The Image Segmenter app opens a new tab for the Find Circles segmentation option.

In the Find Circles tab, first click Ruler and measure the diameters of some representative circles in
the image to determine the range of sizes. To find circles, you must specify the lower and upper
bounds on the diameters. Set the values in the Min. Diameter and Max. Diameter fields to values
that you think include all the objects, 50 and 150.

12-104
Segment Image Using Find Circles in Image Segmenter

On the Find Circles tab, click Find Circles. The Image Segmenter app fills the circles it finds.
However, find circles does not find two of the circles. Examining the image more closely, you discover
that the diameter of these coins are slightly smaller than the specified minimum diameter.

12-105
12 Image Segmentation

Change the minimum value to accommodate the sizes of the objects that were not segmented and run
the find circles segmentation operation again. This time, Find Circles segments all the objects in the
image.

12-106
Segment Image Using Find Circles in Image Segmenter

Save the Mask Image to the Workspace

When you are satisfied with the segmentation, click Create Mask on the Find Circles tab toolstrip
and create the mask image. The Image Segmenter app closes the Find Circles tab and returns to
the Segmentation tab. The color of the segmented circles changes to yellow. To view the mask image,
click Show Binary.

When you are done segmenting the image, save the mask image by using the Export option. You can
also obtain the code used for the segmentation.

12-107
12 Image Segmentation

See Also
Image Segmenter | imfindcircles

Related Examples
• “Getting Started with Image Segmenter” on page 12-59

12-108
Segment Image Using Auto Cluster in Image Segmenter

Segment Image Using Auto Cluster in Image Segmenter

This example shows how to use the Auto Cluster option in the Image Segmenter app to segment
an image. The Auto Cluster option is an automatic segmentation technique that you can use to
segment an image into foreground and background elements. The Auto Cluster option does not
require initialization.

Load Image into the Image Segmenter App

Read an image into the workspace.

coins = imread('coins.png');

From the MATLAB® Toolstrip, open the Image Segmenter app. On the Apps tab, in the Image

Processing and Computer Vision section, click Image Segmenter .

12-109
12 Image Segmentation

On the app toolstrip, click Load Image, and then select Load Image from Workspace. In the
Import from Workspace dialog box, select the image you read into the workspace. Image Segmenter
displays the image you selected.

12-110
Segment Image Using Auto Cluster in Image Segmenter

You can also open the app using the imageSegmenter command, specifying the image:

imageSegmenter(coins);

Use Auto Cluster to Segment Image

On the Image Segmenter toolstrip, expand the Create Mask section and choose Auto Cluster.

12-111
12 Image Segmentation

Image Segmenter automatically segments the image, displaying the result. The Auto Cluster
option has correctly segmented all the circles. However, some of the circles have holes.

Clean up the holes in the segmented image using the Fill Holes option in the Refine Mask toolstrip
group.

12-112
Segment Image Using Auto Cluster in Image Segmenter

Save the Mask Image to the Workspace

When you are satisfied with the segmentation, click Show Binary to view the mask image. To save
the binary mask, use the Export option. You can also obtain the code used for the segmentation.

12-113
12 Image Segmentation

See Also
Image Segmenter | imsegkmeans

Related Examples
• “Getting Started with Image Segmenter” on page 12-59

12-114
Use Texture Filtering in Image Segmenter

Use Texture Filtering in Image Segmenter

This example shows how to use the Include Texture Features option in the Image Segmenter app to
segment an image.

When using the Auto Cluster, Graph Cut, and Flood Fill segmentation tools, you can also include
texture as an additional consideration in your segmentation by selecting the Include Texture
Features option. Texture filtering can help distinguish the foreground in an image from the
background. When you select Include Texture Features, Image Segmenter uses Gabor filters to
analyze the texture of the image as a preprocessing step for segmentation. For more information
about Gabor filters, see “Texture Segmentation Using Gabor Filters” on page 12-160.

Load Image into Image Segmenter App

Read an image into the workspace.


img = imread("kobi.png");

Open the Image Segmenter app, and load the image into the app. On the Apps tab of the MATLAB®

Toolstrip, in the Apps section, select Image Segmenter . Then, on the app toolstrip, select Load
Image, and then select Load Image From Workspace. In the Import From Workspace dialog box,
select the image img you read into the workspace. Image Segmenter displays the selected image.

You can also open the app and immediately load the image into it by entering this command in the
Command Window:

imageSegmenter(img);

Use Auto Cluster to Segment Image

On the Image Segmenter toolstrip, expand the Segmentation Tools section, select Auto Cluster.
Image Segmenter automatically segments the image, displaying the result. The Auto Cluster option

12-115
12 Image Segmentation

correctly segments the body of the dog as the foreground, but incorrectly segments the eyes and part
of the nose of the dog as parts of the background.

Include Texture Features in Segmentation

On the Image Segmenter toolstrip, select New Segmentation. Select Include Texture Features
and, once Image Segmenter finishes applying the Gabor filters, select Auto Cluster again. By
including the texture features, Image Segmenter correctly segments the entire dog as the
foreground.

12-116
Use Texture Filtering in Image Segmenter

See Also
Image Segmenter

Related Examples
• “Getting Started with Image Segmenter” on page 12-59

12-117
12 Image Segmentation

Create Binary Mask Using Volume Segmenter

This example shows how to segment a volume in the Volume Segmenter app. The Volume
Segmenter app offers many ways to explore a volume and segment objects in the volume. For
example, you can view the volume slice-by-slice or as a 3-D representation. To segment an object, you
can draw a region of interest (ROI) using ROI drawing tools or a paint brush tool. This example
creates a binary mask that isolates a small region of the brain.

Load Volumetric Data into the Workspace

Load a volume into the workspace. This example uses a stack of MRI brain images, stored in the MAT-
file vol_001.mat. The MRI data is a modified subset of the BraTS data set [1 on page 12-129].

load(fullfile(toolboxdir('images'),'imdata',...
'BrainMRILabeled','images','vol_001.mat'));

This command loads a 240-by-240-by-155 volume named vol into the workspace.

whos vol

Name Size Bytes Class Attributes

vol 240x240x155 17856000 uint16

Open the Volume Segmenter

Open the Volume Segmenter app. Click the Apps tab on the MATLAB® toolstrip. In the Image
Processing and Computer Vision section, click Volume Segmenter.

12-118
Create Binary Mask Using Volume Segmenter

Load the Volume into the Volume Segmenter

To load the volume in the Volume Segmenter app, click Open Volume in the app toolstrip. For this
example, select Open from Workspace. In the Import Volume dialog box, select the volume you
loaded into the workspace, vol, and click OK. (You can also specify a volume when you open the app
by using the volumeSegmenter command: volumeSegmenter(vol).)

The Volume Segmenter app displays a 3-D representation of the volume in the 3-D Display pane
and displays individual slices of the data set in the Slice pane.

12-119
12 Image Segmentation

By default, the Slice pane displays the first slice of your data. The app displays the number of the
slice displayed at the top of the image, for example, 1/155. In this dataset, the first few slices do not
contain images of the brain.

The app also automatically creates a label for the segmentation in the Labels pane, using the default
name Label1. You can define multiple labels in the Labels pane. However, to create a binary mask,
you must use only one label.

To change the name of the label, double-click the label name. To change the color associated with the
label, double-click the color square displayed in the Labels pane. You can optionally load an existing
set of labels into the app using the Open Labels button.

Explore the Volume

To determine what you want to segment, explore the volume using the 3-D Display pane and the
Slice pane.

In the 3-D Display pane, you can rotate the volume to examine the data from every angle, using the
mouse. You can also customize the display of the volume in the 3-D Display tab in the app toolstrip.
For example, if you have metadata that describes the relative size of the voxels, you can specify it in
the Spatial Referencing part of the 3-D Display tab in the app toolstrip. To improve your view of
the data, you can change the background color used in the 3-D display, modify the threshold and
opacity of the display, and include orientation axes with the display, as shown in the figure below.
With the brain MRI data, you can see the tumor in the temporal lobe that you want to segment.

12-120
Create Binary Mask Using Volume Segmenter

You can also view each slice of the volume in the Slice pane. Use the slider at the bottom of the pane
to move from slice to slice. You can see the tumor on slice 35 through slice 88. By default, the Slice
pane displays the volume oriented along the X-Y axis, but you can change this using buttons in the
Orientation section of the toolstrip on the Segmenter tab. The Slice pane is also where you use
drawing tools to define the mask.

12-121
12 Image Segmentation

Use Drawing Tools to Define the Mask

Once you have identified the object you want to segment, you can use the tools on the Draw tab in
the app toolstrip to define the region. Select the drawing tool you want to use from the ROI tools:
Freehand, Assisted Freehand, and Polygon, and a Paint Brush tool.

In the Slice pane, navigate to the slice where the object first appears, slice 35, and draw an outline
around the object. For this example, use the Polygon drawing tool. Click to create a vertex, then
move the cursor and click again to create a second vertex with a straight line connecting them.
Continue this process to create a connected line. To add additional vertices after you finish drawing,
double-click on the ROI edge.

12-122
Create Binary Mask Using Volume Segmenter

Using Interpolation to Speed Object ROI Creation

You can move through the volume, slice-by-slice, and draw an ROI on each slice where the object
appears. However, the Volume Segmenter app provides several automated interpolation tools that
can help with segmenting an object across slices.

To use interpolation, you must first manually define the region on two slices. You have already defined
the region on the first slice where the object appears, slice 35. Use the same process to define the
region on the last slice where it appears, slice 88. The app places two bars on top of the slider, using
the color associated with the label, to indicate the slices with ROIs.

12-123
12 Image Segmentation

With the object defined on two slices, click Auto Interpolate. The app automatically defines the ROI
on all the intervening slides. The app uses blue bars to indicate all the slices that have ROIs, which
now appear as a solid bar from slice 35 to slice 88.

Alternatively, after defining an ROI on two slices, you can click Manually Interpolate. With this
option, the app opens the Manually Interpolate dialog box. You select the two regions from which you
want to interpolate, Region One and Region Two. To select the first region, use the slider at the
bottom of the dialog box to navigate to the first slice with an ROI, slice 35, and then click inside the
ROI displayed. To select the second region, click Region Two, navigate to slide 88, and click inside the
ROI displayed. After selecting both regions, click Run to interpolate the ROI on all intervening slices.

12-124
Create Binary Mask Using Volume Segmenter

Refine the Interpolated ROIs

After using interpolation, check the individual slices to see if the interpolation created satisfactory
ROIs. Note that the ROI on slice 71 does not fill the entire object that you want to segment. You can
manually adjust the ROI using the Paint Brush tool. Alternatively, you can use one of the tools on
Automate tab. For example, you can use Active Contours to grow the ROIs on the slices where it
does not fill the full size of the tumor. You can also use the Add Algorithm to specify your own
algorithm to operate on the ROIs.

12-125
12 Image Segmentation

Perform Custom Processing

You can add your own algorithm to operate on the ROIs. On the Automate tab, click Add Algorithm.
Choose whether you want your processing to operate on each 2-D slice (Slice-based) or on the entire
3-D volume (Volume-based).

12-126
Create Binary Mask Using Volume Segmenter

For this example, under Slice-Based, select the New option and click Function Template to create a
new function that operates on each 2-D slice. The app opens the template in the MATLAB editor.
Replace the sample code in the template with code that you want to use. Your function must accept
two arguments: each slice as a separate image and a mask. Your function must also return a mask
image.

When you are done editing the template, save the file. The Volume Segmenter app automatically
creates a button in the Automate tab toolstrip for your function. To test your function on one slice,
click Run. By default, the app applies the function to only the current slice.

After testing your function on a single slice, you can run it on all of the slices or a subset of the slices.
You can run it from the current slice to the end (the highest numbered slice) or from the current slice
back to the beginning (slice 1). You can also specify a range of slices by specifying the starting slice
and the ending slice.

12-127
12 Image Segmentation

When you choose one of the directional options, the app updates the slice numbers in the display. You
can use this display to view the progress of processing.

Create the Binary Mask Volume

To create the binary mask volume, click Save Labels on the Segmenter tab. You can save the mask
to a MAT-file or to a workspace variable. For this example, click Save As Workspace Variable. In the
Save to Workspace dialog box, specify whether you want to save the segmentation as a logical or
categorical mask. Choose logical (the default when there is only one label), give the variable a name,
my_mask_volume, and click OK. The app creates a 3-D volume of class logical with the same
dimensions as the original volume.

To view the mask, use the volshow function: volshow(my_mask_volume);.

12-128
Create Binary Mask Using Volume Segmenter

References

[1] Medical Segmentation Decathlon. "Brain Tumours." Tasks. Accessed May 10, 2018. http://
medicaldecathlon.com/.

The BraTS data set is provided by Medical Segmentation Decathlon under the CC-BY-SA 4.0 license.
All warranties and representations are disclaimed. See the license for details. MathWorks® has
modified the subset of data used in this example. This example uses the MRI data of one scan from
the original data set, saved to a MAT file.

See Also
Volume Segmenter

Related Examples
• “Create Semantic Segmentation Using Volume Segmenter” on page 12-130

12-129
12 Image Segmentation

Create Semantic Segmentation Using Volume Segmenter

This example shows how to create a semantic segmentation of a volume using the Volume
Segmenter app. The Volume Segmenter app offers many ways to explore a volume and segment
objects in the volume. For example, you can view the volume slice-by-slice or as a 3-D representation.
To segment an object, you can draw a region of interest (ROI) using ROI drawing tools or a paint
brush tool. This example segments a stack of MRI images to label the brain and tumor regions. The
example also labels the background.

Load Volumetric Data into the Workspace

Load a volume into the workspace. This example uses a stack of MRI brain images, stored in the MAT-
file vol_001.mat. The MRI data is a modified subset of the BraTS data set [1 on page 12-142].

load(fullfile(toolboxdir("images"),"imdata", ...
"BrainMRILabeled","images","vol_001.mat"));
whos vol

Name Size Bytes Class Attributes

vol 240x240x155 17856000 uint16

Open the Volume Segmenter

Open the Volume Segmenter app. Click the Apps tab on the MATLAB® toolstrip. In the Image
Processing and Computer Vision section, click Volume Segmenter.

12-130
Create Semantic Segmentation Using Volume Segmenter

Load the Volume into the Volume Segmenter

To load the volume in the Volume Segmenter app, click Open Volume in the app toolstrip. For this
example, select Open from Workspace. In the Import Volume dialog box, select the volume you
loaded into the workspace, vol, and click OK. Alternatively, you can specify a volume when you open
the app by using the volumeSegmenter command:

volumeSegmenter(vol)

The Volume Segmenter app displays a 3-D representation of the volume in the 3-D Display pane
and displays individual slices of the data set in the Slice pane.

By default, the Slice pane displays the first slice of your data. The app displays the number of the
slice displayed at the top of the image, for example, 1/155. In this data set, the first few slices do not
contain images of the brain.

The app also automatically creates a label for the segmentation in the Labels pane, using the default
name Label1. You can define multiple labels in the Labels pane. However, to create a binary mask,
you must use only one label.

To change the name of the label, double-click the label name. To change the color associated with the
label, double-click the color square displayed in the Labels pane. You can optionally load an existing
set of labels into the app using the Open Labels button.

Explore the Volume

To determine what you want to segment, explore the volume using the 3-D Display pane and the
Slice pane.

12-131
12 Image Segmentation

In the 3-D Display pane, you can rotate the volume to examine the data from every angle, using the
mouse. You can also customize the display of the volume in the 3-D Display tab in the app toolstrip.
For example, if you have metadata that describes the relative size of the voxels, you can specify it in
the Spatial Referencing part of the 3-D Display tab in the app toolstrip. To improve your view of
the data, you can change the background color used in the 3-D display, modify the threshold and
opacity of the display, and include orientation axes with the display, as shown in the figure below.
With the brain MRI data, you can see the tumor in the temporal lobe that you want to segment.

You can also view each slice of the volume in the Slice pane. Use the slider at the bottom of the pane
to move from slice to slice. You can see the tumor on slice 35 through slice 88. By default, the Slice
pane displays the volume oriented along the X-Y axis, but you can change this using buttons in the
Orientation section of the toolstrip on the Segmenter tab. The Slice pane is also where you use
drawing tools to define the mask.

12-132
Create Semantic Segmentation Using Volume Segmenter

Use Drawing Tools to Label Regions in Volume

Once you have identified the object you want to segment, you can use the tools on the Draw tab in
the app toolstrip to define the region. Select the drawing tool you want to use from the ROI tools:
Freehand, Assisted Freehand, and Polygon, and a Paint Brush tool.

Start by labeling the brain. When one object is nested in another object, as the tumor appears over
the brain on slices, label the larger region first. The first step is to create a label in the Labels pane.
The app provides one label by default, named Label1. To change the name of the label to be more
descriptive for your application, double-click on the label and type in the new name. To change the
default color associated with the label, double-click on the colored square in the label identifier and
select a color from the Color dialog box.

12-133
12 Image Segmentation

In the Slice pane, navigate to the slice where the object first appears and use a drawing tool to label
the object. In the following figure, this example uses the Paint Brush tool to label the brain, but you
can use any of the drawing tools.

Using Interpolation to Speed Object ROI Creation

You can move through the volume, slice-by-slice, and draw an ROI on each slice where the object
appears. However, the Volume Segmenter app provides several automated interpolation tools that
can help with segmenting an object across slices.

To use interpolation, you must first manually define the region on two slices. You have already defined
the region on the first slice where the object appears, slice 35. Use the same process to define the
region on the last slice where it appears, slice 88. The app places two bars on the slider, using the
color associated with the label, to indicate the slices with ROIs.

12-134
Create Semantic Segmentation Using Volume Segmenter

With the ROI defined on two slices, click Auto Interpolate. The app automatically defines the ROI on
all the intervening slides. The app uses blue bars to indicate all the slices that have ROIs, which now
appear like a solid bar from slice 35 to slice 88.

Alternatively, after defining an ROI on two slices, you can click Manually Interpolate. With this
option, the app opens the Manually Interpolate dialog box. You select the two regions from which you
want to interpolate, Region One and Region Two. To select the first region, use the slider at the
bottom of the dialog box to navigate to the first slice with an ROI, slice 35, and then click inside the
ROI displayed. To select the second region, click Region Two, navigate to slide 88, and click inside the
ROI displayed. After selecting both regions, click Run to interpolate the ROI on all intervening slices.

12-135
12 Image Segmentation

Refine the Interpolated Labels

After using interpolation, check the individual slices to see if the interpolation created satisfactory
ROIs. Note that the ROI on slice 71 does not fill the entire object that you want to segment. You can
manually adjust the ROI using the Paint Brush tool. Alternatively, you can use one of the tools in the
Automate tab. For example, you can use Active Contours to grow the ROIs on the slices where it
does not fill the full size of the tumor.

12-136
Create Semantic Segmentation Using Volume Segmenter

Perform Custom Processing

You can also add your own algorithm to operate on the ROIs. On the Automate tab, click Add
Algorithm. Choose whether you want your processing function to operate on each 2-D slice (Slice-
based) or on the entire 3-D volume (Volume-based).

12-137
12 Image Segmentation

For this example, under Slice-Based, select the New option and click Function Template to create a
new function that operates on each 2-D slice. The app opens the template in the MATLAB editor.
Replace the sample code in the template with code that you want to use. Your function must accept
two arguments: each slice as a separate image and a mask. Your function must also return a mask
image.

When you are done editing the template, save the file. The Volume Segmenter app automatically
creates a button in the Automate tab toolstrip for your function. To test your function on one slice,
click Run. By default, the app applies the function to only the current slice.

After testing your function on a single slice, you can run it on all of the slices or a subset of the slices.
You can run it from the current slice to the end (the highest numbered slice) or from the current slice
back to the beginning (slice 1). You can also specify a range of slices by specifying the starting slice
and the ending slice.

12-138
Create Semantic Segmentation Using Volume Segmenter

When you choose one of the directional options, the app updates the slice numbers in the display. You
can use this display to view the progress of processing.

Create Additional Labels

After labeling the brain on each slice, label the tumor wherever it appears on a slice, repeating the
process described previously.

First, define a new label in the Labels pane. Click the Plus sign in the Labels pane to create a new
label.

In the Slice pane, navigate to the slice where the object first appears and start labeling the object on
each slice using a drawing tool. In the following figure, this example uses the Paint Brush tool to label
the tumor. As previously, you can draw the object on each slice where it appears or use the
interpolation tools to draw on multiple slices automatically. After interpolation, you can use drawing
tools, such as the Eraser, to modify the automated segmentation on each slice.

12-139
12 Image Segmentation

Make the Background a Separately Labeled Region

When you define multiple labels, the app sets the data type of the label data as categorical. The
default value for unlabeled categorical voxels is <undefined>. To label the background voxels so
that they have a recognizable categorization, follow a similar process to that previously described:

1 Define a new label in the Labels pane, give the label a descriptive name, and select the color you
want for the background.
2 Label the background on each slice. Navigate to a slice, select Fill Region on the Draw tab, and
click anywhere in the background. Repeat this process on each slice.

12-140
Create Semantic Segmentation Using Volume Segmenter

When you add a background, it can obscure the other labels in the visualization of the volume in the
3-D Display pane. To view the other labeled regions in the 3-D Display pane, disable the visibility of
the background label. Click Show Labels in the 3-D Display tab, click Customize, and deselect the
visibility of the background label.

Save the Segmentation

When you complete labeling the brain and the tumor in the volume, save the segmentation. Click
Save Labels on the Segmenter tab and choose from several options. You can save the labeled MRI
data as a MAT file or as a variable in the workspace. For this example, choose a workspace variable
and name the variable brain_labels.

After you save the segmentation, you can optionally turn on Autosave, which periodically saves the
segmentation automatically.

12-141
12 Image Segmentation

View the Labeled Volume

To view the mask, use the volshow function. These commands demonstrate how to display the labels
within the volume by adjusting the volume overlay properties:

viewer = viewer3d(BackgroundColor="white",BackgroundGradient="off",CameraZoom=1.5);
volDisp = volshow(vol,OverlayData=brain_labels,Parent=viewer, ...
RenderingStyle="GradientOpacity",GradientOpacityValue=0.8, ...
Alphamap=linspace(0,0.2,256),OverlayAlphamap=0.8);

References

[1] Medical Segmentation Decathlon. "Brain Tumours." Tasks. Accessed May 10, 2018. http://
medicaldecathlon.com/.

The BraTS data set is provided by Medical Segmentation Decathlon under the CC-BY-SA 4.0 license.
All warranties and representations are disclaimed. See the license for details. MathWorks® has
modified the subset of data used in this example. This example uses the MRI data of one scan from
the original data set, saved to a MAT file.

See Also
Volume Segmenter

Related Examples
• “Create Binary Mask Using Volume Segmenter” on page 12-118
• “Work with Blocked Images Using Volume Segmenter” on page 12-143

12-142
Work with Blocked Images Using Volume Segmenter

Work with Blocked Images Using Volume Segmenter

This example shows how to work with a single-resolution blocked image in the Volume Segmenter
app.

Use blocked images when the original volume is too large to fit into memory. By using blocked
images, you can segment volumes without running out of memory.

To use the Volume Segmenter app with a blocked image, you must create a blocked image from the
original volume and open the blocked image in the app. Once in the app, working with the blocked
image is very similar to working with any volume.

• Explore the blocked image just as you would any volume, by viewing each slice individually or
manipulating the 3-D representation of the volume. However, with a blocked image, you view the
volume one block at a time. The app includes navigation aids you can use to view each block in the
blocked image.
• Segment the blocked image just as you would any volume, drawing labels on areas of the volume.
However, with a blocked image, you draw labels on the volume a block at a time. To label the
blocked image, use the drawing tools in the app to create ROIs. You can also use interpolation to
automatically label intermediate slices in a block. As you view each block, you segment the part of
the object you find in that block. You can also use automated methods to segment a blocked
image. When using automation, you can process all blocks at the same time..

When working with blocked images in the Volume Segmenter app, create all the labels you want to
use and then save the segmentation. This is more efficient than adding or removing labels
individually. Also, as you finish processing a block, before you begin processing the next block, you
must save the processed block in a file. When you are done, the blockedImage object combines the
individually processed block files into one volume

Create Blocked Image

If you want to segment a volume that does not fit into memory, create a blockedImage object to
represent the volume. This example uses a stack of MRI brain images as a volume, stored in the MAT-
file vol_001.mat. The MRI data is a modified subset of the BraTS data set [1 on page 12-157]. In
this MRI data, you can see the tumor that you want to segment in the temporal lobe.

load(fullfile(toolboxdir('images'),'imdata','BrainMRILabeled','images','vol_001.mat'));

Reading the file loads a 240-by-240-by-155 volume named vol into the workspace.

whos vol

Name Size Bytes Class Attributes

vol 240x240x155 17856000 uint16

Create a blocked image from the volume, specifying the size of the blocks. (If you have a volume that
does not fit in memory, you can specify the file name to blockedImage.)

bim = blockedImage(vol,'BlockSize',[120 120 120])

bim =
blockedImage with properties:

12-143
12 Image Segmentation

Read only properties


Source: [240x240x155 uint16]
Adapter: [1x1 images.blocked.InMemory]
Size: [240 240 155]
SizeInBlocks: [2 2 2]
ClassUnderlying: "uint16"

Settable properties
BlockSize: [120 120 120]

Given the specified block size, the blocked image creates two blocks in each dimension.

Open Volume Segmenter

Open the Volume Segmenter app. Select the Apps tab on the MATLAB® toolstrip. In the Image
Processing and Computer Vision section, select Volume Segmenter.

12-144
Work with Blocked Images Using Volume Segmenter

Load the Blocked Image into the Volume Segmenter

To load the blocked image into the Volume Segmenter app, select Open Volume on the app
toolstrip. For this example, select Open Blocked Image from Workspace. In the Import Volume
dialog box, select the blocked image you created in the workspace, bim, and click OK. Alternatively,
you can specify a blocked image when you open the app by using the volumeSegmenter command:
volumeSegmenter(bim).

12-145
12 Image Segmentation

The app loads the volume and displays its content. When working with a blocked image, the app
displays the contents of one block at a time. The Overview tab indicates which block you are
currently viewing in the context of the entire volume.

12-146
Work with Blocked Images Using Volume Segmenter

Explore the Blocked Image

Using the Volume Segmenter app, explore the volume to determine what you want to segment. With
a blocked image, the app includes several navigational aids that help explore each block.

Current Block -- View a 3-D representation of the block contents in the Current Block tab. To add
orientation axes and a wireframe to the display, go to the 3-D Display tab in the app toolstrip. To view
the block from different angles, use the mouse to rotate the display.

12-147
12 Image Segmentation

Overview -- Shows the location of the current block in relation to the other blocks in the blocked
image. To add orientation axes and a wireframe to the display, use the options on the 3-D Display tab
of the app toolstrip. To view the block from different angles, use the mouse to rotate the display. As
you explore blocks, the display updates to show which block you currently have selected, as well as
which you have visited and which you have marked as done. The current block is shown in red.
Visited blocks or processed blocks are yellow. Blocks that you mark as done are green.

You can also customize the display of the volume in the 3-D Display tab in the app toolstrip. For
example, if you have metadata that describes the relative size of the voxels, you can specify it in the
Spatial Referencing part of the 3-D Display tab. To improve your view of the data, you can change
the background color used in the 3-D display, modify the threshold and opacity of the display, and
include orientation axes with the display.

12-148
Work with Blocked Images Using Volume Segmenter

Blocked Image tab -- For blocked images, the app adds a Blocked Image tab to the app toolstrip.
This tab contains navigation aids that help you move among the blocks in the blocked image. For
example, to move to the next unprocessed block, click Next Block. You can also move to a particular
block by specifying block coordinates along the X-, Y-, and Z-axes. To indicate that you are done
processing a block, click Mark Block Complete. When you mark a block complete, the app
calculates the percentage of doneness for the entire volume.

Slice pane -- View each slice of the volume in the Slice pane. Use the slider at the bottom of the tab
to move from slice to slice. By default, the Slice pane displays the volume oriented along the X-Y axis,
but you can change this using buttons in the Orientation section of the toolstrip on the Segmenter
tab. The Slice pane is also where you use drawing tools to define the ROIs. With blocked images, the
view of the slice shows only the current block. The object you want to segment may span several
blocks. The app displays the number of the current slice, out of the total number of slices, at the top
of the pane. For example, 50/120..

12-149
12 Image Segmentation

Use Drawing Tools to Label Regions in Blocked Images

Once you have identified the object that you want to segment, use the tools on the Draw tab in the
app toolstrip to label the object in each block where it appears. You can use any of drawing tools with
blocked images: the Paint Brush tool, the Fill Region tool, the Eraser tool, and the Freehand,
Assisted Freehand, and Polygon region-of-interest (ROI) shapes.

As with any volume, to start labeling the brain, first create all the labels you want to use in the
segmentation. In the Labels pane, the app provides one label by default, named Label1. To change
the name of the label to be more descriptive for your application, double-click the label and type in
the new name. To change the default color associated with the label, double-click the colored square
associated with the label and select a color from the Color dialog box. When one object is nested in
another object, as the tumor appears over the brain on slices, label the larger region first. Click the
plus button to create additional labels.

12-150
Work with Blocked Images Using Volume Segmenter

In the Slice pane, navigate to a slice where the object appears in the block and use a drawing tool to
label the object. This figure shows the Paint Brush tool, but you can use any of the drawing tools.

Use Interpolation to Speed Object ROI Creation

You could move through a block, slice-by-slice, and draw an ROI on each slice where the object
appears. However, the Volume Segmenter app provides several automated tools that can help with
segmenting an object across slices. These automated options process only the slices within a block.

To use interpolation to speed up labeling, you must first manually label the region on two slices. For
example, create a label on one slice and use the same process to define the label on another slice.
The app places two bars on the slider, using the color associated with the label, to indicate the slices
with defined ROIs.

12-151
12 Image Segmentation

With the object defined on two slices, click Auto Interpolate. The app automatically defines the ROI
on all the intervening slides. The app uses a solid blue bar to indicate that all the slices have ROIs.

Alternatively, after defining an ROI on two slices, click Manually Interpolate. With this option, the
app opens the Manually Interpolate dialog box. You select the two regions from which you want to
interpolate, Region One and Region Two. By default, the dialog box opens on a slice on which you
have defined a region. To select the first region, click Region One. Navigate to the other slice on
which you have defined a region, using the slider or by clicking the blue indicator above the slider. To
select the second region, click Region Two. After selecting both regions, click Run to interpolate the
ROI on all intervening slices.

12-152
Work with Blocked Images Using Volume Segmenter

Use Automation to Refine Labels and Perform Custom Processing

You can use an algorithm to refine label definitions and perform other processing of blocked images
automatically. The app includes several slice-based and volume-based algorithms on the Automate
tab. First, select the algorithm. For example, select the volume-based algorithm Otsu's Threshold in
the Algorithm section of the Automate tab toolstrip. Once you select the algorithm, select
Algorithm Parameters to specify values for any algorithm-specific parameters that might be
associated with the algorithm. Because Otsu's threshold algorithm does not support any parameters,
this option is not enabled. For slice-based algorithms, you can specify which slices you want to
process: the current slice, a set of slices from the current slice back to the beginning or from the
current slice to the end. After selecting the algorithm, specifying algorithm-specific parameters, if
available, and choosing the slides to operate on, click Run.

12-153
12 Image Segmentation

Process All Blocks and Review Results

When working with blocked images, you have several other options for automated processing. For
blocked images, by default, automation algorithms operate on the slices in the current block.
However, to perform automated processing on all the blocks in the blocked image at one time, click
Automate On All Blocks. If you have already marked some blocks completed, make sure Skip
Completed is not enabled. To enable parallel processing of blocks, click Use Parallel.

To review the results of the processing and accept or reject each block, click Review Results to
select this option before clicking Run. The app displays the Review and accept automation results
dialog box. Select the check box for each block you accept and click Accept Selected to finish.

12-154
Work with Blocked Images Using Volume Segmenter

Add Custom Automation Algorithms

You can also add your own algorithm to operate on the ROIs. On the Automate tab, click Add
Algorithm. Choose whether you want your processing to operate on each 2-D slice (Slice-based) or
on the entire 3-D volume (Volume-based).

12-155
12 Image Segmentation

For this example, under Slice-Based, select the New option and click Function Template to create a
new function that operates on each 2-D slice. The app opens the template in the MATLAB editor.
Replace the sample code in the template with code that you want to use. Your function must accept
two arguments: each slice as a separate image and a mask. Your function must also return a mask
image.

When you are done editing the template, save the file. The Volume Segmenter app automatically
creates a button in the Automate tab toolstrip for your function. To test your function on one slice,
click Run. By default, the app applies the function to only the current slice.

Save the Segmentation

When you complete labeling the brain and the tumor in the volume, save the segmentation. Save the
labels to a new empty folder by selecting Save Labels > Save As from the Segmenter tab. Image
Segmenter saves each block of labels as a separate H5 image file.

12-156
Work with Blocked Images Using Volume Segmenter

If you continue to modify the labels in the app, then you can overwrite the old label files by selecting
Save Labels > Save. After you save the segmentation, you can optionally turn on Autosave, which
periodically saves the segmentation automatically.

View the Labeled Volume

To view the mask, use the bigimageshow function.

References

[1] Medical Segmentation Decathlon. "Brain Tumours." Tasks. Accessed May 10, 2018. http://
medicaldecathlon.com/.

The BraTS data set is provided by Medical Segmentation Decathlon under the CC-BY-SA 4.0 license.
All warranties and representations are disclaimed. See the license for details. MathWorks® has
modified the subset of data used in this example. This example uses the MRI data of one scan from
the original data set, saved to a MAT file.

See Also
Volume Segmenter | blockedImage | bigimageshow

Related Examples
• “Create Binary Mask Using Volume Segmenter” on page 12-118
• “Create Semantic Segmentation Using Volume Segmenter” on page 12-130

12-157
12 Image Segmentation

Install Sample Data Using Add-On Explorer


Image Processing Toolbox provides a sample 3-D MRI chest scan data set as an optional feature. This
volumetric data set is used by the example “Segment Lungs from 3-D Chest Scan” on page 12-19.

To obtain this data set, download it using the Add-On Explorer.

1 Select Get Add-ons from the Add-ons drop-down menu on the MATLAB desktop. The Add-on
Explorer opens.

2 In the Add-On Explorer, search for the data package Image Processing Toolbox Image Data.
The data package is a MathWorks Optional Feature.

12-158
Install Sample Data Using Add-On Explorer

3 Click the data package in the search results. On the data package page, click Install. Follow the
instructions presented by the installer.

See Also

Related Examples
• “Segment Lungs from 3-D Chest Scan” on page 12-19

12-159
12 Image Segmentation

Texture Segmentation Using Gabor Filters

This example shows how to use texture segmentation to identify regions based on their texture. The
goal is to segment the dog from the bathroom floor. The segmentation is visually obvious because of
the difference in texture between the regular, periodic pattern of the bathroom floor, and the regular,
smooth texture of the dog's fur.

From experimentation, it is known that Gabor filters are a reasonable model of simple cells in the
Mammalian vision system. Because of this, Gabor filters are thought to be a good model of how
humans distinguish texture, and are therefore a useful model to use when designing algorithms to
recognize texture. This example uses the basic approach described in (A. K. Jain and F. Farrokhnia,
"Unsupervised Texture Segmentation Using Gabor Filters",1991) to perform texture segmentation.

Read and display input image

Read and display the input image. This example shrinks the image to make the example run more
quickly.

A = imread("kobi.png");
A = imresize(A,0.25);
Agray = im2gray(A);
imshow(A)

Design Array of Gabor Filters

Design an array of Gabor Filters which are tuned to different frequencies and orientations. The set of
frequencies and orientations is designed to localize different, roughly orthogonal, subsets of

12-160
Texture Segmentation Using Gabor Filters

frequency and orientation information in the input image. Regularly sample orientations between
[0,150] degrees in steps of 30 degrees. Sample wavelength in increasing powers of two starting from
4/sqrt(2) up to the hypotenuse length of the input image. These combinations of frequency and
orientation are taken from [Jain,1991] cited in the introduction.
imageSize = size(A);
numRows = imageSize(1);
numCols = imageSize(2);

wavelengthMin = 4/sqrt(2);
wavelengthMax = hypot(numRows,numCols);
n = floor(log2(wavelengthMax/wavelengthMin));
wavelength = 2.^(0:(n-2)) * wavelengthMin;

deltaTheta = 45;
orientation = 0:deltaTheta:(180-deltaTheta);

g = gabor(wavelength,orientation);

Extract Gabor magnitude features from source image. When working with Gabor filters, it is common
to work with the magnitude response of each filter. Gabor magnitude response is also sometimes
referred to as "Gabor Energy". Each MxN Gabor magnitude output image in gabormag(:,:,ind) is
the output of the corresponding Gabor filter g(ind).
gabormag = imgaborfilt(Agray,g);

Post-process the Gabor Magnitude Images into Gabor Features.

To use Gabor magnitude responses as features for use in classification, some post-processing is
required. This post processing includes Gaussian smoothing, adding additional spatial information to
the feature set, reshaping our feature set to the form expected by the pca and kmeans functions, and
normalizing the feature information to a common variance and mean.

Each Gabor magnitude image contains some local variations, even within well segmented regions of
constant texture. These local variations will throw off the segmentation. We can compensate for these
variations using simple Gaussian low-pass filtering to smooth the Gabor magnitude information. We
choose a sigma that is matched to the Gabor filter that extracted each feature. We introduce a
smoothing term K that controls how much smoothing is applied to the Gabor magnitude responses.
for i = 1:length(g)
sigma = 0.5*g(i).Wavelength;
K = 3;
gabormag(:,:,i) = imgaussfilt(gabormag(:,:,i),K*sigma);
end

When constructing Gabor feature sets for classification, it is useful to add a map of spatial location
information in both X and Y. This additional information allows the classifier to prefer groupings
which are close together spatially.
X = 1:numCols;
Y = 1:numRows;
[X,Y] = meshgrid(X,Y);
featureSet = cat(3,gabormag,X);
featureSet = cat(3,featureSet,Y);

Reshape data into a matrix X of the form expected by the kmeans function. Each pixel in the image
grid is a separate datapoint, and each plane in the variable featureSet is a separate feature. In this

12-161
12 Image Segmentation

example, there is a separate feature for each filter in the Gabor filter bank, plus two additional
features from the spatial information that was added in the previous step. In total, there are 24 Gabor
features and 2 spatial features for each pixel in the input image.
numPoints = numRows*numCols;
X = reshape(featureSet,numRows*numCols,[]);

Normalize the features to be zero mean, unit variance.


X = bsxfun(@minus, X, mean(X));
X = bsxfun(@rdivide,X,std(X));

Visualize the feature set. To get a sense of what the Gabor magnitude features look like, Principal
Component Analysis can be used to move from a 26-D representation of each pixel in the input image
into a 1-D intensity value for each pixel.
coeff = pca(X);
feature2DImage = reshape(X*coeff(:,1),numRows,numCols);
imshow(feature2DImage,[])

It is apparent in this visualization that there is sufficient variance in the Gabor feature information to
obtain a good segmentation for this image. The dog is very dark compared to the floor because of the
texture differences between the dog and the floor.

Classify Gabor Texture Features using kmeans

Repeat k-means clustering five times to avoid local minima when searching for means that minimize
objective function. The only prior information assumed in this example is how many distinct regions
of texture are present in the image being segmented. There are two distinct regions in this case. This
part of the example requires the Statistics and Machine Learning Toolbox™.

12-162
Texture Segmentation Using Gabor Filters

L = kmeans(X,2,"Replicates",5);

Visualize segmentation using label2rgb.

L = reshape(L,[numRows numCols]);
imshow(label2rgb(L))

Visualize the segmentation. Examine the foreground and background images that result from the
mask BW that is associated with the label matrix L.

Aseg1 = zeros(size(A),"like",A);
Aseg2 = zeros(size(A),"like",A);
BW = L == 2;
BW = repmat(BW,[1 1 3]);
Aseg1(BW) = A(BW);
Aseg2(~BW) = A(~BW);
montage({Aseg1,Aseg2});

12-163
12 Image Segmentation

References
[1] Jain, Anil K., and Farshid Farrokhnia. "Unsupervised Texture Segmentation Using Gabor Filters."
Pattern Recognition 24, no. 12 (January 1991): 1167–86. https://doi.org/
10.1016/0031-3203(91)90143-S.

See Also
gabor | imgaborfilt

12-164
Texture Segmentation Using Texture Filters

Texture Segmentation Using Texture Filters

This example shows how to identify and segment regions based on their texture.

Read Image

Read and display a grayscale image of textured patterns on a bag.

I = imread('bag.png');
imshow(I)
title('Original Image')

Create Texture Image

Use entropyfilt to create a texture image. The function entropyfilt returns an array where
each output pixel contains the entropy value of the 9-by-9 neighborhood around the corresponding
pixel in the input image I. Entropy is a statistical measure of randomness.

You can also use stdfilt and rangefilt to achieve similar segmentation results. For comparison
to the texture image of local entropy, create texture images S and R showing the local standard
deviation and local range, respectively.

E = entropyfilt(I);
S = stdfilt(I,ones(9));
R = rangefilt(I,ones(9));

Use rescale to rescale the texture images E and S so that pixel values are in the range [0, 1] as
expected of images of data type double.

Eim = rescale(E);
Sim = rescale(S);

12-165
12 Image Segmentation

Display the three texture images in a montage.

montage({Eim,Sim,R},'Size',[1 3],'BackgroundColor','w',"BorderSize",20)
title('Texture Images Showing Local Entropy, Local Standard Deviation, and Local Range')

Create Mask for Bottom Texture

This example continues by processing the entropy texture image Eim. You can repeat a similar
process for the other two types of texture images with other morphological functions to achieve
similar segmentation results.

Threshold the rescaled image Eim to segment the textures. A threshold value of 0.8 is selected
because it is roughly the intensity value of pixels along the boundary between the textures.

BW1 = imbinarize(Eim,0.8);
imshow(BW1)
title('Thresholded Texture Image')

12-166
Texture Segmentation Using Texture Filters

The segmented objects in the binary image BW1 are white. If you compare BW1 to I, you notice the
top texture is overly segmented (multiple white objects) and the bottom texture is segmented almost
in its entirety. Remove the objects in the top texture by using bwareaopen.

BWao = bwareaopen(BW1,2000);
imshow(BWao)
title('Area-Opened Texture Image')

12-167
12 Image Segmentation

Use imclose to smooth the edges and to close any open holes in the object in BWao. Specify the
same 9-by-9 neighborhood that was used by entropyfilt.

nhood = ones(9);
closeBWao = imclose(BWao,nhood);
imshow(closeBWao)
title('Closed Texture Image')

Use imfill to fill holes in the object in closeBWao. The mask for the bottom texture is not perfect
because the mask does not extend to the bottom of the image. However, you can use the mask to
segment the textures.

mask = imfill(closeBWao,'holes');
imshow(mask);
title('Mask of Bottom Texture')

12-168
Texture Segmentation Using Texture Filters

Use Mask to Segment Textures

Separate the textures into two different images.

textureTop = I;
textureTop(mask) = 0;
textureBottom = I;
textureBottom(~mask) = 0;
montage({textureTop,textureBottom},'Size',[1 2],'BackgroundColor','w',"BorderSize",20)
title('Segmented Top Texture (Left) and Segmented Bottom Texture (Right)')

12-169
12 Image Segmentation

Display Segmentation Results

Create a label matrix that has the label 1 where the mask is false and the label 2 where the mask is
true. Overlay label matrix on the original image.

L = mask+1;
imshow(labeloverlay(I,L))
title('Labeled Segmentation Regions')

12-170
Texture Segmentation Using Texture Filters

Outline the boundary between the two textures in cyan.

boundary = bwperim(mask);
imshow(labeloverlay(I,boundary,"Colormap",[0 1 1]))
title('Boundary Between Textures')

See Also
entropyfilt | bwareaopen | imclose | imbinarize | imfill | bwperim | rangefilt

12-171
13

Analyze Images

This topic describes functions that support a range of standard image processing operations for
analyzing images and objects within images.

• “Pixel Values” on page 13-2


• “Intensity Profile of Images” on page 13-4
• “Create Contour Plot of Grayscale Image” on page 13-7
• “Measuring Regions in Grayscale Images” on page 13-11
• “Find the Length of a Pendulum in Motion” on page 13-17
• “Create Image Histogram” on page 13-22
• “Image Mean, Standard Deviation, and Correlation Coefficient” on page 13-24
• “Edge Detection” on page 13-25
• “Boundary Tracing in Images” on page 13-28
• “Quadtree Decomposition” on page 13-33
• “Detect and Measure Circular Objects in an Image” on page 13-36
• “Identifying Round Objects” on page 13-48
• “Measuring Angle of Intersection” on page 13-56
• “Measuring the Radius of a Roll of Tape” on page 13-62
• “Calculate Statistical Measures of Texture” on page 13-65
• “Texture Analysis Using the Gray-Level Co-Occurrence Matrix (GLCM)” on page 13-67
• “Create a Gray-Level Co-Occurrence Matrix” on page 13-68
• “Specify Offset Used in GLCM Calculation” on page 13-70
• “Derive Statistics from GLCM and Plot Correlation” on page 13-71
13 Analyze Images

Pixel Values
To determine the values of one or more pixels in an image and return the values in a variable, use the
impixel function. You can specify the pixels by passing their coordinates as input arguments or you
can select the pixels interactively using a mouse. impixel returns the value of specified pixels.

Note You can also get pixel value information interactively using the Image Viewer app. For details,
see “Get Pixel Information in Image Viewer App” on page 4-33.

Determine Values of Individual Pixels in Images


This example shows how to use impixel interactively to get pixel values.

Display an image.

imshow("canoe.tif")

Call impixel. When called with no input arguments, impixel associates itself with the image in the
current axes.

pixel_values = impixel

Select the points you want to examine in the image by clicking the mouse. impixel places a star at
each point you select.

When you are finished selecting points, press Return. impixel returns the pixel values in an n-by-3
array, where n is the number of points you selected. impixel removes the stars used to indicate
selected points.

pixel_values =

0.1294 0.1294 0.1294

13-2
Pixel Values

0.5176 0 0
0.7765 0.6118 0.4196

See Also
impixel

Related Examples
• “Get Pixel Information in Image Viewer App” on page 4-33

13-3
13 Analyze Images

Intensity Profile of Images


The intensity profile of an image is the set of intensity values taken from regularly spaced points
along a line segment or multi-line path in an image. To create an intensity profile, use the improfile
function. This function calculates and plots the intensity values along a line segment or a multi line
path in an image. You define the line segment (or segments) by specifying their coordinates as input
arguments or interactively using a mouse. For points that do not fall on the center of a pixel, the
intensity values are interpolated. By default, improfile uses nearest-neighbor interpolation, but you
can specify a different method. (For more information about specifying the interpolation method, see
“Resize an Image” on page 6-2.) improfile works best with grayscale and truecolor images.

Create an Intensity Profile of an Image


This example shows how to create an intensity profile for an image interactively using improfile.

Read an image and display it.

I = fitsread("solarspectra.fts");
imshow(I,[]);

Create the intensity profile. Call improfile with no arguments. The cursor changes to cross-hairs
when you move it over the displayed image. Using the mouse, specify line segments by clicking the
endpoints. improfile draws a line between the endpoints. When you finish specifying the path,
press Return. In the following figure, the line is shown in red.

improfile

After you finish drawing the line over the image, improfile displays a plot of the data along the line.
Notice how the peaks and valleys in the plot correspond to the light and dark bands in the image.

13-4
Intensity Profile of Images

Create Intensity Profile of an RGB Image


This example shows how to plot the intensity values in an RGB image. For a single line segment,
improfile plots the intensity values in a two-dimensional view. For a multi-line path, improfile
plots the intensity values in a three-dimensional view.

Display an RGB image using imshow.

imshow("peppers.png")

Call improfile without any arguments and trace a line segment in the image interactively. In the
figure, the black line indicates a line segment drawn from top to bottom. Double-click to end the line
segment.

improfile

13-5
13 Analyze Images

The improfile function displays a plot of the intensity values along the line segment. The plot
includes separate lines for the red, green, and blue intensities. In the plot, notice how low the blue
values are at the beginning of the plot where the line traverses the orange pepper.

Intensity Values Along a Line Segment in an RGB Image

See Also
improfile | impixel | imcontour

13-6
Create Contour Plot of Grayscale Image

Create Contour Plot of Grayscale Image

This example shows how to create a contour plot of an image. A contour is a path in an image along
which the image intensity values are equal to a constant. You can create a contour plot of the data in
a grayscale image using the imcontour function. This function is similar to the contour function in
MATLAB®, but it automatically sets up the axes so their orientation and aspect ratio match the
image.

Read and display a grayscale image.


I = imread("moon.tif");
imshow(I)

13-7
13 Analyze Images

Create a contour plot of the image with ten contour levels using imcontour.

[C,h] = imcontour(I,10);

Display the levels that the imcontour function selected.

h.LevelList

ans = 1×10

23 46 69 92 115 138 161 184 207 230

To label the levels of the contours, use the clabel function. Zoom in to see the contour details.

13-8
Create Contour Plot of Grayscale Image

clabel(C,h)
xlim([160 200])
ylim([360 400])

Display a single contour at level 128.

imcontour(I,[128 128]);

13-9
13 Analyze Images

See Also
imcontour

13-10
Measuring Regions in Grayscale Images

Measuring Regions in Grayscale Images

This example shows how to measure properties of objects in a grayscale image. To accomplish this,
first segment the grayscale image to get a binary image of objects. Then, use the regionprops
function to analyze the original grayscale pixel values corresponding to each object in the binary
image.

Step 1: Create Synthetic Image

Use a helper function, propsSynthesizeImage, to create a grayscale image that contains five
distinct regions.

I = propsSynthesizeImage;
imshow(I)
title('Synthetic Image')

Step 2: Create a Binary Image

Segment the grayscale image by creating a binary image containing the objects in the original image.

BW = I > 0;
imshow(BW)
title('Binary Image')

13-11
13 Analyze Images

Step 3: Calculate Object Properties Using Pixel Values of Grayscale Image

The regionprops function supports several properties that can be used with grayscale images,
including 'WeightedCentroid', 'MeanIntensity', 'MinIntensity', and 'MaxIntensity'.
These properties use the original pixel values of the objects for their calculations.

For example, you can use regionprops to calculate both the centroid and weighted centroid of
objects in the image. Note how you pass in the binary image (BW) containing your objects and the
original grayscale image (I) as arguments into regionprops.

s = regionprops(BW,I,{'Centroid','WeightedCentroid'});

To compare the weighted centroid locations with the unweighted centroid locations, display the
original image and then, using the hold and plot functions, superimpose the centroids on the
image.

imshow(I)
title('Weighted (red) and Unweighted (blue) Centroids');
hold on
numObj = numel(s);
for k = 1 : numObj
plot(s(k).WeightedCentroid(1), s(k).WeightedCentroid(2), 'r*')
plot(s(k).Centroid(1), s(k).Centroid(2), 'bo')
end
hold off

13-12
Measuring Regions in Grayscale Images

Step 4: Calculate Custom Pixel Value-Based Properties

You can use the 'PixelValues' property to perform custom calculations based on the pixel values
of the original grayscale image. The 'PixelValues' property returns a vector containing the
grayscale values of pixels in a region.

As an example, calculate the standard deviation of each region.

s = regionprops(BW,I,{'Centroid','PixelValues','BoundingBox'});
imshow(I)
title('Standard Deviation of Regions')
hold on
for k = 1:numObj
s(k).StandardDeviation = std(double(s(k).PixelValues));
text(s(k).Centroid(1),s(k).Centroid(2), ...
sprintf('%2.1f', s(k).StandardDeviation), ...
'EdgeColor','b','Color','r');
end
hold off

13-13
13 Analyze Images

This figure shows the standard deviation measurement superimposed on each object in the image.
You can also view the results in other ways, for example as a bar plot showing the standard deviation
by label number.

figure
bar(1:numObj,[s.StandardDeviation])
xlabel('Region Label Number')
ylabel('Standard Deviation')

13-14
Measuring Regions in Grayscale Images

You can use the plot to determine how to partition the data. For example, the following code identifies
objects with a standard deviation lower than 50.

sStd = [s.StandardDeviation];
lowStd = find(sStd < 50);

imshow(I)
title('Objects Having Standard Deviation < 50')
hold on
for k = 1:length(lowStd)
rectangle('Position',s(lowStd(k)).BoundingBox,'EdgeColor','y');
end
hold off

13-15
13 Analyze Images

See Also
regionprops | regionprops3 | Image Region Analyzer

More About
• “Label and Measure Connected Components in a Binary Image” on page 11-57

13-16
Find the Length of a Pendulum in Motion

Find the Length of a Pendulum in Motion

This example shows you how to calculate the length of a pendulum in motion by segmenting the
pendulum and calculating the center of the pendulum across many image frames, then fitting the
center coordinates to the equation of a circle.

Step 1: Load Images

Load the pendulum data from a MAT file. The file contains two variables. The frames variable
contains the video as a 4-D numeric array. The rect variable contains a region of interest that
bounds the extent of the pendulum.
load pendulum;

Preview the video using implay. You can see that the pendulum is swinging in the upper half of each
frame.
implay(frames);

Step 2: Select Region where Pendulum is Swinging

To improve the efficiency of the segmentation, create a new series of frames that contains only the
region where the pendulum is swinging.

First perform imcrop on one frame, specifyiing the cropping region as the preloaded rect variable.
Next, initialize an array to store all of the cropped frames, based on the size of the first cropped

13-17
13 Analyze Images

frame. Finally, loop through all of the frames, crop each one using the same cropping region, and
save the result to the initialized array.

first_frame = frames(:,:,:,1);
first_region = imcrop(first_frame,rect);
nFrames = size(frames,4);
frame_regions = repmat(uint8(0), [size(first_region) nFrames]);
for count = 1:nFrames
frame_regions(:,:,:,count) = imcrop(frames(:,:,:,count),rect);
end
imshow(frame_regions(:,:,:,1))

Step 3: Segment the Pendulum in Each Frame

Notice that the pendulum is much darker than the background. You can segment the pendulum in
each frame by converting the frame to grayscale, thresholding it using imbinarize, and removing
background structures using imopen and imclearborder.

Initialize a binary array to contain the segmented pendulum frames.

seg_pend = false([size(first_region,1) size(first_region,2) nFrames]);


centroids = zeros(nFrames,2);
se_disk = strel("disk",3);

Loop through all of the frames and perform the segmentation and postprocessing. In a montage,
display the original frame and the segmentation result.

for count = 1:nFrames


fr = frame_regions(:,:,:,count);

gfr = im2gray(fr);
gfr = imcomplement(gfr);

bw = imbinarize(gfr,0.7); % Threshold is determined experimentally


bw = imopen(bw,se_disk);
bw = imclearborder(bw);
seg_pend(:,:,count) = bw;

montage({fr,labeloverlay(gfr,bw)});
pause(0.2)

end

13-18
Find the Length of a Pendulum in Motion

Step 4: Find the Center of the Segmented Pendulum in Each Frame

You can see that the shape of the pendulum varied in different frames. This is not a serious issue
because you just need its center. You will use the pendulum centers to find the length of the
pendulum.

Use regionprops to calculate the center of the pendulum.

pend_centers = zeros(nFrames,2);
for count = 1:nFrames
property = regionprops(seg_pend(:,:,count),"Centroid");
pend_centers(count,:) = property.Centroid;
end

Display the pendulum centers using plot.

x = pend_centers(:,1);
y = pend_centers(:,2);
figure
plot(x,y,"m.")
axis ij
axis equal
hold on;
xlabel("x");
ylabel("y");
title("Pendulum Centers");

13-19
13 Analyze Images

Step 5: Calculate Radius by Fitting a Circle Through Pendulum Centers

Rewrite the basic equation of a circle:

(x-xc)^2 + (y-yc)^2 = radius^2

where (xc,yc) is the center, in terms of parameters a, b, c as

x^2 + y^2 + a*x + b*y + c = 0

where a = -2*xc, b = -2*yc, and c = xc^2 + yc^2 - radius^2.

You can solve for parameters a, b, and c using the least squares method. Rewrite the above equation
as

a*x + b*y + c = -(x^2 + y^2)

which can also be rewritten as

[x y 1] * [a;b;c] = -x^2 - y^2.

Solve this equation using the backslash(\) operator.

The circle radius is the length of the pendulum in pixels.

abc = [x y ones(length(x),1)] \ -(x.^2 + y.^2);


a = abc(1);

13-20
Find the Length of a Pendulum in Motion

b = abc(2);
c = abc(3);
xc = -a/2;
yc = -b/2;
circle_radius = sqrt((xc^2 + yc^2) - c);
pendulum_length = round(circle_radius)

pendulum_length =

253

Superimpose the circle and circle center on the plot of pendulum centers.

circle_theta = pi/3:0.01:pi*2/3;
x_fit = circle_radius*cos(circle_theta)+xc;
y_fit = circle_radius*sin(circle_theta)+yc;

plot(x_fit,y_fit,"b-");
plot(xc,yc,"bx","LineWidth",2);
plot([xc x(1)],[yc y(1)],"b-");
text(xc-110,yc+100,sprintf("Pendulum length = %d pixels",pendulum_length));

See Also
regionprops | imclearborder | imopen | imbinarize | imcomplement | labeloverlay

13-21
13 Analyze Images

Create Image Histogram

This example shows how to create a histogram for an image using the imhist function. An image
histogram is a chart that shows the distribution of intensities in an indexed or grayscale image. The
imhist function creates a histogram plot by defining n equally spaced bins, each representing a
range of data values, and then calculating the number of pixels within each range. You can use the
information in a histogram to choose an appropriate enhancement operation. For example, if an
image histogram shows that the range of intensity values is small, you can use an intensity
adjustment function to spread the values across a wider range.

Read an image into the workspace and display it.

I = imread('rice.png');
imshow(I)

Create the histogram. For the example image, showing grains of rice, imhist creates a histogram
with 64 bins. The imhist function displays the histogram, by default. The histogram shows a peak at
around 100, corresponding to the dark gray background in the image.

figure;
imhist(I);

13-22
Create Image Histogram

13-23
13 Analyze Images

Image Mean, Standard Deviation, and Correlation Coefficient


You can compute standard statistics of an image using the mean2, std2, and corr2 functions. mean2
and std2 compute the mean and standard deviation of the elements of a matrix. corr2 computes the
correlation coefficient between two matrices of the same size.

These functions are two-dimensional versions of the mean, std, and corrcoef functions described in
the MATLAB Function Reference.

13-24
Edge Detection

Edge Detection
In an image, an edge is a curve that follows a path of rapid change in image intensity. Edges are often
associated with the boundaries of objects in a scene. Edge detection is used to identify the edges in
an image.

To find edges, you can use the edge function. This function looks for places in the image where the
intensity changes rapidly, using one of these two criteria:

• Places where the first derivative of the intensity is larger in magnitude than some threshold
• Places where the second derivative of the intensity has a zero crossing

edge provides several derivative estimators, each of which implements one of these definitions. For
some of these estimators, you can specify whether the operation should be sensitive to horizontal
edges, vertical edges, or both. edge returns a binary image containing 1's where edges are found and
0's elsewhere.

The most powerful edge-detection method that edge provides is the Canny method. The Canny
method differs from the other edge-detection methods in that it uses two different thresholds (to
detect strong and weak edges), and includes the weak edges in the output only if they are connected
to strong edges. This method is therefore less likely than the others to be affected by noise, and more
likely to detect true weak edges.

Detect Edges in Images

This example shows how to detect edges in an image using both the Canny edge detector and the
Sobel edge detector.

Read the image into the workspace and display it.

I = imread('coins.png');
imshow(I)

13-25
13 Analyze Images

Apply the Sobel edge detector to the unfiltered input image. Then, apply the Canny edge detector to
the unfiltered input image.

BW1 = edge(I,'sobel');
BW2 = edge(I,'canny');

Display the filtered images side-by-side for comparison.

tiledlayout(1,2)

nexttile
imshow(BW1)
title('Sobel Filter')

nexttile
imshow(BW2)
title('Canny Filter')

13-26
Edge Detection

13-27
13 Analyze Images

Boundary Tracing in Images

Trace Boundaries of Objects in Images

This example shows how to trace the boundary of a single object and of all objects in a binary image.

Read and display an image.

I = imread("coins.png");
imshow(I)

Convert the image to a binary image. The bwtraceboundary and bwboundaries function work only
with binary images.

BW = imbinarize(I);
imshow(BW)

13-28
Boundary Tracing in Images

Boundary of Single Object

To trace the boundary of a single object in the binary image, first determine the row and column
coordinates of a pixel on the border of the object. For this example, select a column coordinate. The
example then calculates the row coordinate of the topmost object in that column.

numCols = size(BW,2);

col = ;
row = find(BW(:,col),1)

row = 27

To trace the boundary from the specified point, use the bwtraceboundary function. As required
arguments, you must specify a binary image, the row and column coordinates of the starting point,
and the direction of the first step. The example specifies north ("N").

boundary = bwtraceboundary(BW,[row, col],"N");

Plot the border over the original grayscale image.

imshow(I)
hold on
plot(boundary(:,2),boundary(:,1),"g",LineWidth=3);

13-29
13 Analyze Images

Boundary of All Objects

In the binary image used in this example, some of the coins contain black areas that the
bwboundaries function interprets as separate objects. To ensure that bwboundaries traces only
the exterior of the coins, fill the area inside each coin by using the imfill function.

BW_filled = imfill(BW,"holes");

Trace the boundaries of all coins in the image by using the bwboundaries function. bwboundaries
returns a cell array, where each cell contains the row and column coordinates for an object in the
image.

boundaries = bwboundaries(BW_filled);

Plot the borders of all of the coins over the original grayscale image.

for k=1:10
b = boundaries{k};
plot(b(:,2),b(:,1),"g",LineWidth=3);
end

13-30
Boundary Tracing in Images

Select First Step Direction for Tracing


For certain objects, you must take care when selecting the border pixel you choose as the starting
point and the direction you choose for the first step (such as north or south).

For example, if an object contains a hole and you select a pixel on a thin part of the object as the
starting pixel, you can trace the outside border of the object or the inside border of the hole,
depending on the direction you choose for the first step. For filled objects, the direction you select for
the first step parameter is not as important.

To illustrate, this figure shows the pixels traced when the starting pixel is on a thin part of the object
and the first step is set to north and south. The connectivity is the default value of 8.

13-31
13 Analyze Images

Impact of First Step Direction on Boundary Tracing

See Also
bwboundaries | bwtraceboundary | visboundaries | edge

Related Examples
• “Edge Detection” on page 13-25

13-32
Quadtree Decomposition

Quadtree Decomposition
Quadtree decomposition is an analysis technique that involves subdividing an image into blocks that
are more homogeneous than the image itself. This technique reveals information about the structure
of the image. It is also useful as the first step in adaptive compression algorithms.

You can perform quadtree decomposition using the qtdecomp function. This function works by
dividing a square image into four equal-sized square blocks, and then testing each block to see if it
meets some criterion of homogeneity (for example, if all the pixels in the block are within a specific
dynamic range). If a block meets the criterion, it is not divided any further. If it does not meet the
criterion, it is subdivided again into four blocks, and the test criterion is applied to those blocks. This
process is repeated iteratively until each block meets the criterion. The result might have blocks of
several different sizes. Blocks can be as small as 1-by-1, unless you specify otherwise.

qtdecomp returns the quadtree decomposition as a sparse matrix, the same size as I. The nonzero
elements represent the upper left corners of the blocks. The value of each nonzero element indicates
the block size.

Perform Quadtree Decomposition on an Image

This example shows how to perform quadtree decomposition on a 512-by-512 grayscale image.

Read the grayscale image into the workspace.


I = imread('liftingbody.png');

Perform the quadtree decomposition by calling the qtdecomp function, specifying as arguments the
image and the test criteria used to determine the homogeneity of each block in the decomposition.
For example, the criterion might be a threshold calculation such as max(block(:)) -
min(block(:)) >= 0.27. You can also supply qtdecomp with a function (rather than a threshold
value) for deciding whether to split blocks. For example, you might base the decision on the variance
of the block.
S = qtdecomp(I,0.27);

View a block representation of the quadtree decomposition. Each black square represents a
homogeneous block, and the white lines represent the boundaries between blocks. Notice how the
blocks are smaller in areas corresponding to large changes in intensity in the image.
blocks = repmat(uint8(0),size(S));

for dim = [512 256 128 64 32 16 8 4 2 1];


numblocks = length(find(S==dim));
if (numblocks > 0)
values = repmat(uint8(1),[dim dim numblocks]);
values(2:dim,2:dim,:) = 0;
blocks = qtsetblk(blocks,S,dim,values);
end
end

blocks(end,1:end) = 1;
blocks(1:end,end) = 1;

imshow(I), figure, imshow(blocks,[])

13-33
13 Analyze Images

13-34
Quadtree Decomposition

13-35
13 Analyze Images

Detect and Measure Circular Objects in an Image

This example shows how to automatically detect circles or circular objects in an image and visualize
the detected circles.

Step 1: Load Image

Read and display an image of round plastic chips of various colors. Besides having plenty of circles to
detect, there are a few interesting things going on in this image from a circle detection point-of-view:

1 There are chips of different colors, which have different contrasts with respect to the
background. On one end, the blue and red ones have strong contrast on this background. On the
other end, some of the yellow chips do not contrast well with the background.
2 Notice how some chips are on top of each other and some others that are close together and
almost touching each other. Overlapping object boundaries and object occlusion are usually
challenging scenarios for object detection.

rgb = imread("coloredChips.png");
imshow(rgb)

13-36
Detect and Measure Circular Objects in an Image

Step 2: Determine Radius Range for Searching Circles

Find the appropriate radius range of the circles using the drawline function. Draw a line over the
approximate diameter of a chip.

d = drawline;

The length of the line ROI is the diameter of the chip. Typical chips have diameters in the range 40 to
50 pixels.

pos = d.Position;
diffPos = diff(pos);
diameter = hypot(diffPos(1),diffPos(2))

diameter = 45.4533

Step 3: Initial Attempt to Find Circles

The imfindcircles function searches for circles with a range of radii. Search for circles with radii
in the range of 20 to 25 pixels. Before that, it is a good practice to ask whether the objects are
brighter or darker than the background. To answer that question, look at the grayscale version of this
image.

13-37
13 Analyze Images

gray_image = im2gray(rgb);
imshow(gray_image)

The background is quite bright and most of the chips are darker than the background. But, by
default, imfindcircles finds circular objects that are brighter than the background. So, set the
parameter "ObjectPolarity" to "dark" in imfindcircles to search for dark circles.
[centers,radii] = imfindcircles(rgb,[20 25],"ObjectPolarity","dark")

centers =

[]

radii =

[]

Note that the outputs centers and radii are empty, which means that no circles were found. This
happens frequently because imfindcircles is a circle detector, and similar to most detectors,
imfindcircles has an internal detection threshold that determines its sensitivity. In simple terms it
means that the detector's confidence in a certain (circle) detection has to be greater than a certain
level before it is considered a valid detection. imfindcircles has a parameter "Sensitivity" which

13-38
Detect and Measure Circular Objects in an Image

can be used to control this internal threshold, and consequently, the sensitivity of the algorithm. A
higher "Sensitivity" value sets the detection threshold lower and leads to detecting more circles. This
is similar to the sensitivity control on the motion detectors used in home security systems.

Step 4: Increase Detection Sensitivity

Coming back to the chip image, it is possible that at the default sensitivity level all the circles are
lower than the internal threshold, which is why no circles were detected. By default, "Sensitivity",
which is a number between 0 and 1, is set to 0.85. Increase "Sensitivity" to 0.9.

[centers,radii] = imfindcircles(rgb,[20 25],"ObjectPolarity","dark", ...


"Sensitivity",0.9)

centers = 8×2

146.1895 198.5824
328.8132 135.5883
130.3134 43.8039
175.2698 297.0583
312.2831 192.3709
327.1316 297.0077
243.9893 166.4538
271.5873 280.8920

radii = 8×1

23.1604
22.5710
22.9576
23.7356
22.9551
22.9995
22.9055
23.0298

This time imfindcircles found some circles - eight to be precise. centers contains the locations
of circle centers and radii contains the estimated radii of those circles.

Step 5: Draw the Circles on the Image

The function viscircles can be used to draw circles on the image. Output variables centers and
radii from imfindcircles can be passed directly to viscircles.

imshow(rgb)
h = viscircles(centers,radii);

13-39
13 Analyze Images

The circle centers seem correctly positioned and their corresponding radii seem to match well to the
actual chips. But still quite a few chips were missed. Try increasing the "Sensitivity" even more, to
0.92.

[centers,radii] = imfindcircles(rgb,[20 25],"ObjectPolarity","dark", ...


"Sensitivity",0.92);

length(centers)

ans = 16

So increasing "Sensitivity" gets us even more circles. Plot these circles on the image again.

delete(h) % Delete previously drawn circles


h = viscircles(centers,radii);

13-40
Detect and Measure Circular Objects in an Image

Step 6: Use the Second Method (Two-stage) for Finding Circles

This result looks better. imfindcircles has two different methods for finding circles. So far the
default method, called the phase coding method, was used for detecting circles. There's another
method, popularly called the two-stage method, that is available in imfindcircles. Use the two-
stage method and show the results.

[centers,radii] = imfindcircles(rgb,[20 25],"ObjectPolarity","dark", ...


"Sensitivity",0.92,"Method","twostage");

delete(h)
h = viscircles(centers,radii);

13-41
13 Analyze Images

The two-stage method is detecting more circles, at the Sensitivity of 0.92. In general, these two
method are complementary in that have they have different strengths. The Phase coding method is
typically faster and slightly more robust to noise than the two-stage method. But it may also need
higher "Sensitivity" levels to get the same number of detections as the two-stage method. For
example, the phase coding method also finds the same chips if the "Sensitivity" level is raised higher,
say to 0.95.

[centers,radii] = imfindcircles(rgb,[20 25],"ObjectPolarity","dark", ...


"Sensitivity",0.95);

delete(h)
viscircles(centers,radii);

13-42
Detect and Measure Circular Objects in an Image

Note that both the methods in imfindcircles find the centers and radii of the partially visible
(occluded) chips accurately.

Step 7: Why are Some Circles Still Getting Missed?

Looking at the last result, it is curious that imfindcircles does not find the yellow chips in the
image. The yellow chips do not have strong contrast with the background. In fact they seem to have
very similar intensities as the background. Is it possible that the yellow chips are not really "darker"
than the background as was assumed? To confirm, show the grayscale version of this image again.

imshow(gray_image)

13-43
13 Analyze Images

Step 8: Find "Bright" Circles in the Image

The yellow chips are almost the same intensity, maybe even brighter, as compared to the background.
Therefore, to detect the yellow chips, change "ObjectPolarity" to "bright".

[centersBright,radiiBright] = imfindcircles(rgb,[20 25], ...


"ObjectPolarity","bright","Sensitivity",0.92);

Step 9: Draw "Bright" Circles with Different Color

Draw the bright circles in a different color, by changing the "Color" parameter in viscircles.

imshow(rgb)

hBright = viscircles(centersBright, radiiBright,"Color","b");

13-44
Detect and Measure Circular Objects in an Image

Note that three of the missing yellow chips were found, but one yellow chip is still missing. These
yellow chips are hard to find because they don't stand out as well as others on this background.

Step 10: Lower the Value of "EdgeThreshold"

There is another parameter in imfindcircles which may be useful here, namely "EdgeThreshold".
To find circles, imfindcircles uses only the edge pixels in the image. These edge pixels are
essentially pixels with high gradient value. The "EdgeThreshold" parameter controls how high the
gradient value at a pixel has to be before it is considered an edge pixel and included in computation.
A high value (closer to 1) for this parameter will allow only the strong edges (higher gradient values)
to be included, whereas a low value (closer to 0) is more permissive and includes even the weaker
edges (lower gradient values) in computation. In case of the missing yellow chip, since the contrast is
low, some of the boundary pixels (on the circumference of the chip) are expected to have low gradient
values. Therefore, lower the "EdgeThreshold" parameter to ensure that the most of the edge pixels
for the yellow chip are included in computation.

[centersBright,radiiBright,metricBright] = imfindcircles(rgb,[20 25], ...


"ObjectPolarity","bright","Sensitivity",0.92,"EdgeThreshold",0.1);

delete(hBright)
hBright = viscircles(centersBright, radiiBright,"Color","b");

13-45
13 Analyze Images

Step 11: Draw "Dark" and "Bright" Circles Together

Now imfindcircles finds all of the yellow ones, and a green one too. Draw these chips in blue,
together with the other chips that were found earlier (with "ObjectPolarity" set to "dark"), in red.

h = viscircles(centers,radii);

13-46
Detect and Measure Circular Objects in an Image

All the circles are detected. A final word - it should be noted that changing the parameters to be more
aggressive in detection may find more circles, but it also increases the likelihood of detecting false
circles. There is a trade-off between the number of true circles that can be found (detection rate) and
the number of false circles that are found with them (false alarm rate).

Happy circle hunting!

See Also
imfindcircles | viscircles

Related Examples
• “Identifying Round Objects” on page 13-48
• “Measuring the Radius of a Roll of Tape” on page 13-62

13-47
13 Analyze Images

Identifying Round Objects

This example shows how to classify objects based on their roundness using bwboundaries, a
boundary tracing routine.

Step 1: Read an Image

Read in pills_etc.png.

RGB = imread("pillsetc.png");
imshow(RGB)

Step 2: Threshold the Image

Convert the image to black and white in order to prepare for boundary tracing using bwboundaries.

I = im2gray(RGB);
bw = imbinarize(I);
imshow(bw)

13-48
Identifying Round Objects

Step 3: Preprocess the Image

Using morphology functions, remove pixels which do not belong to the objects of interest.

Remove all objects containing fewer than 30 pixels.

minSize = 30;
bw = bwareaopen(bw,minSize);
imshow(bw)

13-49
13 Analyze Images

Fill a gap in the pen's cap.

se = strel("disk",2);
bw = imclose(bw,se);
imshow(bw)

13-50
Identifying Round Objects

Fill any holes, so that regionprops can be used to estimate the area enclosed by each of the
boundaries

bw = imfill(bw,"holes");
imshow(bw)

13-51
13 Analyze Images

Step 4: Find the Boundaries

Concentrate only on the exterior boundaries. Specifying the "noholes" option will accelerate the
processing by preventing bwboundaries from searching for inner contours.

[B,L] = bwboundaries(bw,"noholes");

Display the label matrix and draw each boundary.

imshow(label2rgb(L,@jet,[.5 .5 .5]))
hold on
for k = 1:length(B)
boundary = B{k};
plot(boundary(:,2),boundary(:,1),"w",LineWidth=2)
end
title("Objects with Boundaries in White")

13-52
Identifying Round Objects

Step 5: Determine which Objects are Round

Estimate the circularity and centroid of all of the objects using the regionprops function. The
circularity metric is equal to 1 for an ideal circle and it is less than 1 for other shapes.

stats = regionprops(L,"Circularity","Centroid");

The classification process can be controlled by setting an appropriate threshold. In this example, use
a threshold of 0.94 so that only the pills will be classified as round.

threshold = 0.94;

Loop over the detected boundaries. For each object:

• Obtain the (x,y) boundary coordinates and the circularity measurement


• Compare the circularity measurement to the threshold. If the circularity exceeds the threshold,
calculate the position of the centroid and display the centroid as a black circle.
• Display the circularity measurement in yellow text over the object.

for k = 1:length(B)

% Obtain (X,Y) boundary coordinates corresponding to label "k"


boundary = B{k};

13-53
13 Analyze Images

% Obtain the circularity corresponding to label "k"


circ_value = stats(k).Circularity;

% Display the results


circ_string = sprintf("%2.2f",circ_value);

% Mark objects above the threshold with a black circle


if circ_value > threshold
centroid = stats(k).Centroid;
plot(centroid(1),centroid(2),"ko");
end

text(boundary(1,2)-35,boundary(1,1)+13,circ_string,Color="y",...
FontSize=14,FontWeight="bold")

end
title("Centroids of Circular Objects and Circularity Values")

See Also
bwboundaries | imbinarize | bwareaopen | imclose | strel | imfill | label2rgb |
regionprops

13-54
Identifying Round Objects

Related Examples
• “Detect and Measure Circular Objects in an Image” on page 13-36
• “Measuring the Radius of a Roll of Tape” on page 13-62

13-55
13 Analyze Images

Measuring Angle of Intersection

This example shows how to measure the angle and point of intersection between two beams using
bwtraceboundary, which is a boundary tracing routine. A common task in machine vision
applications is hands-free measurement using image acquisition and image processing techniques.

Step 1: Load Image

Read in gantrycrane.png and draw arrows pointing to two beams of interest. It is an image of a
gantry crane used to assemble a bridge.

RGB = imread("gantrycrane.png");
imshow(RGB)

text(size(RGB,2),size(RGB,1)+15,"Image courtesy of Jeff Mather", ...


FontSize=7,HorizontalAlignment="right");

line([300 328],[85 103],Color=[1 1 0]);


line([268 255],[85 140],Color=[1 1 0]);

text(150,72,"Measure the angle between these beams",Color="y", ...


FontWeight="bold");

Step 2: Extract the Region of Interest

Crop the image to obtain only the beams of the gantry crane chosen earlier. This step will make it
easier to extract the edges of the two metal beams.

You can obtain the coordinates of the rectangular region using pixel information displayed by
imtool.

13-56
Measuring Angle of Intersection

start_row = 34;
start_col = 208;

cropRGB = RGB(start_row:163,start_col:400,:);
imshow(cropRGB)

Store (x,y) offsets for later use; subtract 1 so that each offset will correspond to the last pixel before
the region of interest.

offsetX = start_col-1;
offsetY = start_row-1;

Step 3: Threshold the Image

The bwtraceboundary function expects objects of interest to be white in a binary image, so convert
the image to black and white and take the image complement.

I = im2gray(cropRGB);
BW = imbinarize(I);
BW = ~BW;
imshow(BW)

Step 4: Find Initial Point on Each Boundary

The bwtraceboundary function requires that you specify a single point on a boundary. This point is
used as the starting location for the boundary tracing process.

13-57
13 Analyze Images

To extract the edge of the lower beam, pick a column in the image and inspect it until a transition
from a background pixel to the object pixel occurs. Store this location for later use in
bwtraceboundary routine. Repeat this procedure for the other beam, but this time tracing
horizontally.

dim = size(BW);

% Horizontal beam
col1 = 4;
row1 = find(BW(:,col1), 1);

% Angled beam
row2 = 12;
col2 = find(BW(row2,:), 1);

Step 5: Trace the Boundaries

The bwtraceboundary routine is used to extract (X, Y) locations of the boundary points. In order to
maximize the accuracy of the angle and point of intersection calculations, it is important to extract as
many points belonging to the beam edges as possible. You should determine the number of points
experimentally. Since the initial point for the horizontal bar was obtained by scanning from north to
south, it is safest to set the initial search step to point towards the outside of the object, i.e. "North".

boundary1 = bwtraceboundary(BW,[row1, col1],"N",8,70);

% Set the search direction to counterclockwise, in order to trace downward


boundary2 = bwtraceboundary(BW,[row2, col2],"E",8,90,"counter");

imshow(RGB)
hold on

% Apply offsets in order to draw in the original image


plot(offsetX+boundary1(:,2),offsetY+boundary1(:,1),"g",LineWidth=2);
plot(offsetX+boundary2(:,2),offsetY+boundary2(:,1),"g",LineWidth=2);

13-58
Measuring Angle of Intersection

Step 6: Fit Lines to the Boundaries

Although (X,Y) coordinates pairs were obtained in the previous step, not all of the points lie exactly
on a line. Which ones should be used to compute the angle and point of intersection? Assuming that
all of the acquired points are equally important, fit lines to the boundary pixel locations.

The equation for a line is y = [x 1]*[a; b]. You can solve for parameters 'a' and 'b' in the least-squares
sense by using polyfit.
ab1 = polyfit(boundary1(:,2),boundary1(:,1),1);
ab2 = polyfit(boundary2(:,2),boundary2(:,1),1);

Step 7: Find the Angle of Intersection

Use the dot product to find the angle.


vect1 = [1 ab1(1)]; % Create a vector based on the line equation
vect2 = [1 ab2(1)];
dp = dot(vect1, vect2);

Compute vector lengths


length1 = sqrt(sum(vect1.^2));
length2 = sqrt(sum(vect2.^2));

Obtain the larger angle of intersection in degrees


angle = 180-acos(dp/(length1*length2))*180/pi

angle = 129.4971

Step 8: Find the Point of Intersection

Solve the system of two equations in order to obtain (X,Y) coordinates of the intersection point.

13-59
13 Analyze Images

intersection = [1 ,-ab1(1); 1, -ab2(1)] \ [ab1(2); ab2(2)];

Apply offsets in order to compute the location in the original uncropped image
intersection = intersection + [offsetY; offsetX]

intersection = 2×1

143.0917
295.7494

Step 9: Plot the Results

Draw an "X" at the point of intersection


inter_x = intersection(2);
inter_y = intersection(1);
plot(inter_x,inter_y,"yx","LineWidth",2);

Annotate the image with the angle between the beams and the (x,y) coordinates of the intersection
point.
angleString = [sprintf("%1.3f",angle)+"{\circ}"];
text(inter_x-80,inter_y-25,angleString, ...
Color="y",FontSize=14,FontWeight="bold");

intersectionString = sprintf("(%2.1f,%2.1f)",inter_x,inter_y);
text(inter_x-10,inter_y+20,intersectionString,...
Color="y",FontSize=14,FontWeight="bold");

See Also
bwboundaries | imbinarize | bwtraceboundary | polyfit

13-60
Measuring Angle of Intersection

Related Examples
• “Detect Lines Using Radon Transform” on page 10-27

More About
• “Hough Transform” on page 10-16
• “Radon Transform” on page 10-21

13-61
13 Analyze Images

Measuring the Radius of a Roll of Tape

This example shows how to measure the radius of a roll of tape, which is partially obscured by the
tape dispenser. Utilize imfindcircles to accomplish this task.

Step 1: Read Image

Read in tape.png.

RGB = imread('tape.png');
imshow(RGB);

hTxt = text(15,15,'Estimate radius of the roll of tape',...


'FontWeight','bold','Color','y');

Step 2: Find the Circle

Find the center and the radius of the circle in the image using imfindcircles.

Rmin = 60;
Rmax = 100;
[center, radius] = imfindcircles(RGB,[Rmin Rmax],'Sensitivity',0.9)

13-62
Measuring the Radius of a Roll of Tape

center = 1×2

236.9291 172.4747

radius = 79.5305

Step 3: Highlight the Circle Outline and Center

% Display the circle


viscircles(center,radius);

% Display the calculated center


hold on;
plot(center(:,1),center(:,2),'yx','LineWidth',2);
hold off;

delete(hTxt);
message = sprintf('The estimated radius is %2.1f pixels', radius);
text(15,15,message,'Color','y','FontWeight','bold');

See Also
imfindcircles | viscircles

13-63
13 Analyze Images

Related Examples
• “Identifying Round Objects” on page 13-48
• “Detect and Measure Circular Objects in an Image” on page 13-36

13-64
Calculate Statistical Measures of Texture

Calculate Statistical Measures of Texture


The toolbox includes several texture analysis functions that filter an image using standard statistical
measures. These statistics can characterize the texture of an image because they provide information
about the local variability of the intensity values of pixels in an image. For example, in areas with
smooth texture, the range of values in the neighborhood around a pixel is a small value; in areas of
rough texture, the range is larger. Similarly, calculating the standard deviation of pixels in a
neighborhood can indicate the degree of variability of pixel values in that region. The table lists these
functions.

Function Description
rangefilt Calculates the local range of pixel intensities of an image.
stdfilt Calculates the local standard deviation of an image.
entropyfilt Calculates the local entropy of a grayscale image. Entropy is a
statistical measure of randomness.

The functions all operate in a similar way: they define a neighborhood around the pixel of interest,
calculate the statistic for that neighborhood, and use that value as the value of the pixel of interest in
the output image.

This example shows how the rangefilt function operates on a simple array.

A = [ 1 2 3 4 5; 6 7 8 9 10; 11 12 13 14 15; 16 17 18 19 20 ]

A =

1 2 3 4 5
6 7 8 9 10
11 12 13 14 15
16 17 18 19 20

B = rangefilt(A)

B =

6 7 7 7 6
11 12 12 12 11
11 12 12 12 11
6 7 7 7 6

The following figure shows how the value of element B(2,4) was calculated from A(2,4). By
default, the rangefilt function uses a 3-by-3 neighborhood but you can specify neighborhoods of
different shapes and sizes.

13-65
13 Analyze Images

Determining Pixel Values in Range Filtered Output Image

The stdfilt and entropyfilt functions operate similarly, defining a neighborhood around the
pixel of interest and calculating the statistic for the neighborhood to determine the pixel value in the
output image. The stdfilt function calculates the standard deviation of all the values in the
neighborhood.

The entropyfilt function calculates the entropy of the neighborhood and assigns that value to the
output pixel. By default, the entropyfilt function defines a 9-by-9 neighborhood around the pixel of
interest. To calculate the entropy of an entire image, use the entropy function.

See Also
rangefilt | stdfilt | entropyfilt

More About
• “Texture Segmentation Using Texture Filters” on page 12-165

13-66
Texture Analysis Using the Gray-Level Co-Occurrence Matrix (GLCM)

Texture Analysis Using the Gray-Level Co-Occurrence Matrix


(GLCM)
A statistical method of examining texture that considers the spatial relationship of pixels is the gray-
level co-occurrence matrix (GLCM), also known as the gray-level spatial dependence matrix. The
GLCM functions characterize the texture of an image by calculating how often pairs of pixel with
specific values and in a specified spatial relationship occur in an image, creating a GLCM, and then
extracting statistical measures from this matrix. (The texture filter functions, described in “Calculate
Statistical Measures of Texture” on page 13-65 cannot provide information about shape, that is, the
spatial relationships of pixels in an image.)

After you create the GLCMs using graycomatrix, you can derive several statistics from them using
graycoprops. These statistics provide information about the texture of an image. The following
table lists the statistics.

Statistic Description
Contrast Measures the local variations in the gray-level co-occurrence matrix.
Correlation Measures the joint probability occurrence of the specified pixel pairs.
Energy Provides the sum of squared elements in the GLCM. Also known as uniformity
or the angular second moment.
Homogeneity Measures the closeness of the distribution of elements in the GLCM to the
GLCM diagonal.

See Also

Related Examples
• “Derive Statistics from GLCM and Plot Correlation” on page 13-71

More About
• “Create a Gray-Level Co-Occurrence Matrix” on page 13-68

13-67
13 Analyze Images

Create a Gray-Level Co-Occurrence Matrix


To create a GLCM, use the graycomatrix function. The function creates a gray-level co-occurrence
matrix (GLCM) by calculating how often a pixel with the intensity (gray-level) value i occurs in a
specific spatial relationship to a pixel with the value j. By default, the spatial relationship is defined as
the pixel of interest and the pixel to its immediate right (horizontally adjacent), but you can specify
other spatial relationships between the two pixels. Each element (i, j) in the resultant GLCM is simply
the sum of the number of times that the pixel with value i occurred in the specified spatial
relationship to a pixel with value j in the input image.

The number of gray levels in the image determines the size of the GLCM. By default, graycomatrix
uses scaling to reduce the number of intensity values in an image to eight, but you can use the
NumLevels and the GrayLimits parameters to control this scaling of gray levels.

The gray-level co-occurrence matrix can reveal certain properties about the spatial distribution of the
gray levels in the texture image. For example, if most of the entries in the GLCM are concentrated
along the diagonal, the texture is coarse with respect to the specified offset. You can also derive
several statistical measures from the GLCM. See “Derive Statistics from GLCM and Plot Correlation”
on page 13-71 for more information.

To illustrate, the following figure shows how graycomatrix calculates the first three values in a
GLCM. In the output GLCM, element (1, 1) contains the value 1 because there is only one instance in
the input image where two horizontally adjacent pixels have the values 1 and 1, respectively. Element
(1, 2) contains the value 2 because there are two instances where two horizontally adjacent pixels
have the values 1 and 2. Element (1, 3) has the value 0 because there are no instances of two
horizontally adjacent pixels with the values 1 and 3. graycomatrix continues processing the input
image, scanning the image for other pixel pairs (i, j) and recording the sums in the corresponding
elements of the GLCM.

Process Used to Create the GLCM

See Also
graycomatrix

Related Examples
• “Specify Offset Used in GLCM Calculation” on page 13-70

13-68
Create a Gray-Level Co-Occurrence Matrix

• “Derive Statistics from GLCM and Plot Correlation” on page 13-71


• “Texture Analysis Using the Gray-Level Co-Occurrence Matrix (GLCM)” on page 13-67

13-69
13 Analyze Images

Specify Offset Used in GLCM Calculation


By default, the graycomatrix function creates a single GLCM, with the spatial relationship, or
offset, defined as two horizontally adjacent pixels. However, a single GLCM might not be enough to
describe the textural features of the input image. For example, a single horizontal offset might not be
sensitive to texture with a vertical orientation. For this reason, graycomatrix can create multiple
GLCMs for a single input image.

To create multiple GLCMs, specify an array of offsets to the graycomatrix function. These offsets
define pixel relationships of varying direction and distance. For example, you can define an array of
offsets that specify four directions (horizontal, vertical, and two diagonals) and four distances. In this
case, the input image is represented by 16 GLCMs. When you calculate statistics from these GLCMs,
you can take the average.

You specify these offsets as a p-by-2 array of integers. Each row in the array is a two-element vector,
[row_offset, col_offset], that specifies one offset. row_offset is the number of rows
between the pixel of interest and its neighbor. col_offset is the number of columns between the
pixel of interest and its neighbor. This example creates an offset that specifies four directions and
four distances for each direction.

offsets = [ 0 1; 0 2; 0 3; 0 4;...
-1 1; -2 2; -3 3; -4 4;...
-1 0; -2 0; -3 0; -4 0;...
-1 -1; -2 -2; -3 -3; -4 -4];

The figure illustrates the spatial relationships of pixels that are defined by this array of offsets, where
D represents the distance from the pixel of interest.

See Also
graycomatrix

Related Examples
• “Create a Gray-Level Co-Occurrence Matrix” on page 13-68
• “Texture Analysis Using the Gray-Level Co-Occurrence Matrix (GLCM)” on page 13-67
• “Derive Statistics from GLCM and Plot Correlation” on page 13-71

13-70
Derive Statistics from GLCM and Plot Correlation

Derive Statistics from GLCM and Plot Correlation

This example shows how to create a set of Gray-Level Co-Occurrence Matrices (GLCMs) and derive
statistics from them. The example also illustrates how the statistics returned by graycoprops have a
direct relationship to the original input image.

Read an image into the workspace and display it. The example converts the truecolor image to a
grayscale image and then, for this example, rotates it 90 degrees.

circuitBoard = rot90(im2gray(imread("board.tif")));
imshow(circuitBoard)

Define offsets of varying direction and distance. Because the image contains objects of a variety of
shapes and sizes that are arranged in horizontal and vertical directions, the example specifies a set of
horizontal offsets that only vary in distance.

offsets0 = [zeros(40,1) (1:40)'];

Create the GLCMs. Call the graycomatrix function specifying the offsets.

glcms = graycomatrix(circuitBoard,"Offset",offsets0);

Derive statistics from the GLCMs using the graycoprops function. The example calculates the
contrast and correlation.

stats = graycoprops(glcms,["Contrast" "Correlation"]);

Plot correlation as a function of offset.

figure, plot([stats.Correlation]);
title("Texture Correlation as a function of offset");

13-71
13 Analyze Images

xlabel("Horizontal Offset")
ylabel("Correlation")

The plot contains peaks at offsets 7, 15, 23, and 30. If you examine the input image closely, you can
see that certain vertical elements in the image have a periodic pattern that repeats every seven
pixels.

See Also
graycomatrix

Related Examples
• “Create a Gray-Level Co-Occurrence Matrix” on page 13-68

13-72
14

Image Quality Metrics

This topic describes functions that enable measuring image quality and detecting regions of interest
in test charts.

• “Image Quality Metrics” on page 14-2


• “Train and Use No-Reference Quality Assessment Model” on page 14-4
• “Compare No Reference Image Quality Metrics” on page 14-8
• “Obtain Local Structural Similarity Index” on page 14-15
• “Compare Image Quality at Various Compression Levels” on page 14-17
• “Anatomy of the Imatest Extended eSFR Chart” on page 14-19
• “Evaluate Quality Metrics on eSFR Test Chart” on page 14-23
• “Correct Colors Using Color Correction Matrix” on page 14-35
14 Image Quality Metrics

Image Quality Metrics


Image quality can degrade due to distortions during image acquisition and processing. Examples of
distortion include noise, blurring, ringing, and compression artifacts.

Efforts have been made to create objective measures of quality. For many applications, a valuable
quality metric correlates well with the subjective perception of quality by a human observer. Quality
metrics can also track unperceived errors as they propagate through an image processing pipeline,
and can be used to compare image processing algorithms.

If an image without distortion is available, you can use it as a reference to measure the quality of
other images. For example, when evaluating the quality of compressed images, an uncompressed
version of the image provides a useful reference. In these cases, you can use full-reference quality
metrics to directly compare the target image and the reference image.

If a reference image without distortion is not available. you can use a no-reference image quality
metric instead. These metrics compute quality scores based on expected image statistics.

Full-Reference Quality Metrics


Full-reference algorithms compare the input image against a pristine reference image with no
distortion.

Metric Description
immse Mean-squared error (MSE). MSE measures the average squared difference
between actual and ideal pixel values. This metric is simple to calculate but
might not align well with the human perception of quality.
psnr Peak signal-to-noise ratio (pSNR). pSNR is derived from the mean square error,
and indicates the ratio of the maximum pixel intensity to the power of the
distortion. Like MSE, the pSNR metric is simple to calculate but might not align
well with perceived quality.
ssim Structural similarity (SSIM) index. The SSIM metric combines local image
structure, luminance, and contrast into a single local quality score. In this
metric, structures are patterns of pixel intensities, especially among
neighboring pixels, after normalizing for luminance and contrast. Because the
human visual system is good at perceiving structure, the SSIM quality metric
agrees more closely with the subjective quality score.
multissim Multi-scale structural similarity (MS-SSIM) index. The MS-SSIM metric expands
on the SSIM index by combining luminance information at the highest
multissim3 resolution level with structure and contrast information at several downsampled
resolutions, or scales. The multiple scales account for variability in the
perception of image details caused by factors such as viewing distance from the
image, distance from the scene to the sensor, and resolution of the image
acquisition sensor.

Because structural similarity is computed locally, ssim, multissim, and multissim3 can generate a
map of quality over the image.

14-2
Image Quality Metrics

No-Reference Quality Metrics


No-reference algorithms use statistical features of the input image to evaluate the image quality.

Metric Description
brisque Blind/Referenceless Image Spatial Quality Evaluator (BRISQUE). A BRISQUE
model is trained on a database of images with known distortions, and BRISQUE
is limited to evaluating the quality of images with the same type of distortion.
BRISQUE is opinion-aware, which means subjective quality scores accompany
the training images.
niqe Natural Image Quality Evaluator (NIQE). Although a NIQE model is trained on a
database of pristine images, NIQE can measure the quality of images with
arbitrary distortion. NIQE is opinion-unaware, and does not use subjective
quality scores. The tradeoff is that the NIQE score of an image might not
correlate as well as the BRISQUE score with human perception of quality.
piqe Perception based Image Quality Evaluator (PIQE). The PIQE algorithm is
opinion-unaware and unsupervised, which means it does not require a trained
model. PIQE can measure the quality of images with arbitrary distortion and in
most cases performs similar to NIQE. PIQE estimates block-wise distortion and
measures the local variance of perceptibly distorted blocks to compute the
quality score.

The BRISQUE and the NIQE algorithms calculate the quality score of an image with computational
efficiency after the model is trained. PIQE is less computationally efficient, but it provides local
measures of quality in addition to a global quality score. All no-reference quality metrics usually
outperform full-reference metrics in terms of agreement with a subjective human quality score.

See Also

More About
• “Train and Use No-Reference Quality Assessment Model” on page 14-4
• “Obtain Local Structural Similarity Index” on page 14-15
• “Compare Image Quality at Various Compression Levels” on page 14-17

14-3
14 Image Quality Metrics

Train and Use No-Reference Quality Assessment Model


The Natural Image Quality Evaluator (NIQE) and Blind/Referenceless Image Spatial Quality Evaluator
(BRISQUE) algorithms use a trained model to compute a quality score.

Both algorithms train a model using identical predictable statistical features, called natural scene
statistics (NSS). NSS are based on normalized luminance coefficients in the spatial domain, and are
modeled as a multidimensional Gaussian distribution. Distortions appear as perturbations to the
Gaussian distribution.

The algorithms differ in how they use the NSS features to train a model and compute a quality score.

NIQE Workflow
NIQE measures the quality of images with arbitrary distortion. A NIQE model is not trained using
subjective quality scores, but the tradeoff is that the NIQE score does not correlate as reliably as the
BRISQUE score with human perception of quality.

Train a NIQE Model

Note If the default NIQE model provides a sufficient quality score for your application, you do not
need to train a new model. You can skip to “Predict Image Quality Using a NIQE Model” on page 14-
4.

To train a NIQE model, pass a datastore of pristine image to the fitniqe function. The function
divides each image into blocks and computes the NSS for each block. The training process includes
only blocks with statistically significant features.

The returned model, niqeModel, stores the multivariate Gaussian mean and standard deviation
derived from the NSS features.

Predict Image Quality Using a NIQE Model

Use the niqe function to calculate an image quality score for an image with arbitrary distortion. The
niqe function extracts the NSS features from statistically significant blocks in the distorted image.
The function fits a multivariate Gaussian distribution to the image NSS features. The quality score is
the distance between the Gaussian distributions.

The diagram shows the full NIQE workflow.

14-4
Train and Use No-Reference Quality Assessment Model

14-5
14 Image Quality Metrics

BRISQUE Workflow
BRISQUE is limited to measuring the quality of images with the same type of distortion as the model.
A BRISQUE model is trained using subjective opinion scores, with the advantage that the BRISQUE
score correlates well with human perception of quality.

Train a BRISQUE Model

Note If the default BRISQUE model provides a sufficient quality score for your application, you do
not need to train a new model. You can skip to “Predict Image Quality Using a BRISQUE Model” on
page 14-6.

To train a BRISQUE model, pass to the fitbrisque function:

• A datastore containing images with known distortions and pristine copies of those images
• A subjective opinion score for each distorted image in the database

The function computes the NSS features for each image, without dividing the image into blocks. The
function uses the NSS features and corresponding opinion scores to train a support vector machine
regression model. The returned model, brisqueModel, stores the parameters of the support vector
regressor.

Predict Image Quality Using a BRISQUE Model

Use the brisque function to calculate an image quality score for an image with the same type of
distortions as the model. The brisque function extracts the NSS features from the distorted image,
and predicts a quality score using support vector regression.

The diagram shows the full BRISQUE workflow.

14-6
Train and Use No-Reference Quality Assessment Model

See Also
fitbrisque | brisqueModel | niqe | niqeModel | fitniqe | brisque

More About
• “Image Quality Metrics” on page 14-2
• “Compare No Reference Image Quality Metrics” on page 14-8

14-7
14 Image Quality Metrics

Compare No Reference Image Quality Metrics

This example shows how to compare the performance of various blind or no-reference image quality
metrics.

Evaluating the quality of an image is an important part of image acquisition, compression, and other
image enhancement workflows. It is desirable to have a fast, automated metric that closely mimics
subjective measures of image quality. This example compares the performance of three no-reference
quality metrics.

• BRISQUE - Blind/Referenceless image spatial quality evaluator


• NIQE - Naturalness image quality evaluator
• PIQE - Perception-based image quality evaluator

Each metric has different strengths depending on the images in the data set. To select the best metric
for your data, you can compare the performance of the three metrics on sample image data. This
example shows how to compare the performance in two different situations: varying levels of JPEG
compression on a single image and for a video stream.

Evaluate Response to Varying Compression on Single Image

Image compression is a tradeoff between visual quality and the compression ratio, or size of the
output data. The tradeoff also depends on the image content. For example, images with uniform areas
can compress to smaller file sizes and exhibit fewer artifacts than images with detailed features.
Image quality metrics can help analyze this tradeoff, while trying to minimize the impact of the image
content on the analysis.

Read an image into the workspace.

im = imread('llama.jpg');

Write copies of the image with different JPG compression ratios. Read each compressed image back
into the workspace.

jpegQuality = 10:10:100;
numObservations = numel(jpegQuality);
compressedFrames = cell(1,numObservations);
for ind = 1:numObservations
q = jpegQuality(ind);
tempFile = ['llama_compression_',num2str(q),'.jpg'];
imwrite(im,tempFile,'Quality',q);
compressedFrames{ind} = imread(tempFile);
end

Inspect the compressed images.

tiledlayout(1,3);

h1 = nexttile;
imshow(compressedFrames{1})
title('JPEG Quality: 10')

nexttile
imshow(compressedFrames{7})

14-8
Compare No Reference Image Quality Metrics

title('JPEG Quality: 70')

nexttile
imshow(im)
title('Input Image')
linkaxes

Zoom in on the compressed image to see the nature of some specific artifacts. At JPEG quality 10, the
blocking artifacts are obvious.

h1.XLim = [650 700];


h1.YLim = [490 550];

14-9
14 Image Quality Metrics

For each compressed JPG image, calculate the quality score using the three quality metrics.

pQ = zeros(1, numObservations);
nQ = zeros(1, numObservations);
bQ = zeros(1, numObservations);

for ind=1:numObservations
bQ(ind) = brisque(compressedFrames{ind});
nQ(ind) = niqe(compressedFrames{ind});
pQ(ind) = piqe(compressedFrames{ind});
end

Visualize the score of the metrics as the JPEG quality increases. Normalize the scores so that each
score has the same value for the uncompressed image. For these three metrics, lower scores
correspond to higher image quality.

The BRISQUE score for a JPEG quality of 50, 60, and 70 is unrealistically lower than for
uncompressed JPEG images. Therefore, for images similar to this test image, NIQE and PIQE are
more reliable metrics.

figure
hold on
plot(jpegQuality,bQ/bQ(end),'*-');
plot(jpegQuality,nQ/nQ(end),'*-');
plot(jpegQuality,pQ/pQ(end),'*-');
legend('BRISQUE','NIQE','PIQE');
ylabel('Metric Score')

14-10
Compare No Reference Image Quality Metrics

xlabel('JPEG Quality')
hold off

Evaluate Response to Varying Compression and Content using a Video

In applications such as streaming video, there is a need to evaluate quality metrics at the receiver
which may not have access to the original pristine sample. Also, the content of each frame can vary
significantly. Let us simulate such a scenario to evaluate the performance characteristics of these
metrics.

Create a VideoReader object that reads frames from the video 'rhinos.avi'. This video has 114
frames.

vidObjR = VideoReader('rhinos.avi');
vidObjW = VideoWriter('varyingCompressed.avi');
open(vidObjW)

Create a varying compression ratio schedule to mimic real time varying bitrate transmissions

numFrames = vidObjR.NumFrames;
varyingQuality = sin(2*pi*(1:numFrames)*0.01);
varyingQuality = round(rescale(varyingQuality)*100);
varyingQuality = max(varyingQuality,1); % min JPEG quality is 1

figure
plot(varyingQuality);
title('JPEG Quality Schedule');

14-11
14 Image Quality Metrics

ylabel('JPEG Quality')
xlabel('Frame Index')

For each frame in the video, compress the frame according to the JPEG quality schedule. Compute
the metrics of the compressed frame and add the compressed frame to the output video for
validation.

pQ = zeros(1,numFrames);
nQ = zeros(1,numFrames);
bQ = zeros(1,numFrames);

ind = 1;
while hasFrame(vidObjR)
im = readFrame(vidObjR);

% Compress it based on the schedule


tempFile = 'rhinos_compressed_frame.jpg';
imwrite(im,tempFile,'Quality',varyingQuality(ind));
frame = imread(tempFile);

writeVideo(vidObjW,frame);

bQ(ind) = brisque(frame);
nQ(ind) = niqe(frame);
pQ(ind) = piqe(frame);
ind = ind+1;

14-12
Compare No Reference Image Quality Metrics

end
close(vidObjW);

Visualize the trend, expect it to mimic the compression schedule. Rescale the metrics to focus on the
trend, and invert the quality schedule to get the compression ratio trend. Quality metrics can still
give a useful indication of perceived quality without access to the original reference frame.

figure
hold on
plot(rescale(bQ));
plot(rescale(nQ));
plot(rescale(pQ));
% Invert JPEG Quality to get the compression ratio
plot(1-rescale(varyingQuality),'k','LineWidth',2)
legend('BRISQUE','NIQE','PIQE','Compression Ratio');
title('Trend of Quality Metrics with Varying Compression and Content');
ylabel('Metric Score')
xlabel('Frame Index')
hold off

See Also
fitbrisque | brisqueModel | niqe | niqeModel | fitniqe | brisque

More About
• “Image Quality Metrics” on page 14-2

14-13
14 Image Quality Metrics

• “Train and Use No-Reference Quality Assessment Model” on page 14-4

14-14
Obtain Local Structural Similarity Index

Obtain Local Structural Similarity Index

This example shows how to measure the quality of regions of an image when compared with a
reference image. The ssim function calculates the structural similarity index for each pixel in an
image, based on its relationship to other pixels in an 11-by-11 neighborhood. The function returns
this information in an image that is the same size as the image whose quality is being measured. This
local, pixel-by-pixel, quality index can be viewed as an image, with proper scaling.

Read an image to use as the reference image.

ref = imread('pout.tif');

Create an image whose quality is to be measured, by making a copy of the reference image and
adding noise. To illustrate local similarity, isolate the noise to half of the image. Display the reference
image and the noisy image side-by-side.

A = ref;

A(:,ceil(end/2):end) = imnoise(ref(:,ceil(end/2):end),'salt & pepper', 0.1);

figure, imshowpair(A,ref,'montage')

Calculate the local Structural Similarity Index for the modified image (A), when compared to the
reference image (ref). Visualize the local structural similarity index. Note how left side of the image,
which is identical to the reference image displays as white because all the local structural similarity
values are 1.

14-15
14 Image Quality Metrics

[global_sim local_sim] = ssim(A,ref);

figure, imshow(local_sim,[])

See Also
ssim

More About
• “Image Quality Metrics” on page 14-2

14-16
Compare Image Quality at Various Compression Levels

Compare Image Quality at Various Compression Levels

This example shows how to test image quality using ssim. The example creates images at various
compression levels and then plots the quality metrics. To run this example, you must have write
permission in your current folder.

Read an image into the workspace.

I = imread('cameraman.tif');

Write the image to a file using various quality values. The JPEG format supports the 'quality'
parameter. Use the ssim function to check the quality of each written image.

ssimValues = zeros(1,10);
qualityFactor = 10:10:100;
for i = 1:10

imwrite(I,'compressedImage.jpg','jpg','quality',qualityFactor(i));

ssimValues(i) = ssim(imread('compressedImage.jpg'),I);
end

Plot the results. Note how the image quality score improves as you increase the quality value
specified with imwrite.

plot(qualityFactor,ssimValues,'b-o');

xlabel('Compression Quality Factor');


ylabel('SSIM Value');

14-17
14 Image Quality Metrics

See Also
ssim

More About
• “Image Quality Metrics” on page 14-2

14-18
Anatomy of the Imatest Extended eSFR Chart

Anatomy of the Imatest Extended eSFR Chart


The Imatest edge spatial frequency response (eSFR) test charts contain visual features that enable
sharpness measurements according to the ISO 12233:2014 standard [1], [2]. You can use a single test
chart to measure chromatic aberration, noise, and scene illumination. Registration markers on the
chart enable automatic selection of regions of interest (ROIs) for measurements.

Image Processing Toolbox supports the Enhanced and Extended versions of the eSFR test chart. The
Extended version has a 16:9 aspect ratio and additional features to measure color accuracy. The
esfrChart object does not analyze other visual features in the chart, such as focusing targets and
wedges.

Slanted Edge Features

The Extended eSFR test chart has 15 gray boxes tilted 5° away from vertical. The left, top, right, and
bottom edges of each box are used to measure:

• Local SFR, which indicates image sharpness. Sharp edges have better contrast than blurry edges,
and they more clearly show the actual position of the edge.

• In sharp edges, pixel intensity values quickly transition across boundaries in the scene. Most
pixels clearly belong to one side of the boundary or the other, and few pixels have intermediate
values. The contrast is high because adjacent pixels on either side of the actual edge have
large differences in intensity.
• In blurry edges, the transition happens gradually over many pixels, which makes it unclear
where the boundary actually occurs. The contrast is low because adjacent pixels have similar
intensity values.

Sharpness is higher toward the center of the imaged region and decreases toward the periphery.
Horizontal sharpness is usually higher than vertical sharpness. To measure SFR, use the
measureSharpness function.
• Local chromatic aberration, or color fringing, which indicates how uniformly the camera optical
system focuses light in the red, green, and blue color channels. In captured images, chromatic

14-19
14 Image Quality Metrics

aberration appears as an artificial strip of color along edges. Chromatic aberration also lowers
contrast in the luminance channel, and therefore reduces image sharpness.

Chromatic aberration increases radially from the center of the image. To measure chromatic
aberration, use the measureChromaticAberration function. This function also returns the edge
profiles for each color channel, which is the averaged projection of pixel intensity values along the
edge.

If you capture an image of the test chart at the full 16:9 aspect ratio, then esfrChart automatically
identifies and labels all 60 available slanted-edge ROIs. You can also image the test chart at the 4:3
and 3:2 aspect ratios, as indicated on the chart. At these ratios, fewer edges are available, and
esfrChart indexes edges according to the convention used by the 16:9 aspect ratio.

In a proper capture of the test chart, orient the chart without rotation, so that the angle of the slanted
edges is close to 5°. The contrast of the edges must be greater than 20%. If the contrast is less than
20%, adjust the scene lighting and the camera exposure.

Gray Patch Features

The Extended eSFR test chart has 20 gray patches of increasing brightness, arranged in a ring
around the center of the image. The gray patches are used to measure:

• Scene illumination, which estimates the color of the light illuminating the scene. You can use the
measured illumination to white-balance images acquired under similar lighting conditions. To
measure scene illumination, use the measureIlluminant function. To white-balance the image,
use chromadapt.
• Noise, which quantifies how much the camera electronics generate error in pixel values. To
estimate noise in each color channel, use the measureNoise function.

In a proper capture of the test chart, set the scene lighting and camera exposure so that each gray
patch appears distinct from the other patches and no pixels are clipped. If the darkest patches appear
identical, or have values of 0, increase the scene lighting or the exposure. If the brightest patches
appear identical, or if the brightest patch is saturated, decrease the scene lighting or the exposure.

14-20
Anatomy of the Imatest Extended eSFR Chart

Color Patch Features

The Extended eSFR test chart has 16 color patches arranged in four groups. The color patches are
used to measure:

• Color accuracy, which indicates how well the measured red, green, and blue values agree with
expected color values. To measure color accuracy, use the measureColor function. This function
also returns a color correction matrix, which you can use to adjust the image colors toward the
expected values.

Registration Markers

The eSFR test charts have registration markers used to orient the image properly. When you import a
chart, esfrChart detects four black-and-white checkered circles and uses their position to define
regions of interest automatically. You can optionally specify the [x, y] coordinates of the circle centers
manually.

14-21
14 Image Quality Metrics

References
[1] Imatest. "Esfr". https://www.imatest.com/mathworks/esfr/.

[2] ISO 12233:2014. "Photography – Electronic still picture imaging – Resolution and spatial
frequency responses." International Organization for Standardization; ISO/TC 42
Photography. URL: https://www.iso.org/standard/59419.html.

See Also
esfrChart | displayChart

More About
• “Evaluate Quality Metrics on eSFR Test Chart” on page 14-23

14-22
Evaluate Quality Metrics on eSFR Test Chart

Evaluate Quality Metrics on eSFR Test Chart

This example shows how to perform standard quality measurements on an Imatest® edge spatial
frequency response (eSFR) test chart. Measured properties include sharpness, chromatic aberration,
noise, illumination, and color accuracy.

Create a Test Chart Object

Read an image of an eSFR chart into the workspace. Display the chart.

I = imread("eSFRTestImage.jpg");
imshow(I)
title("Captured Image of eSFR Chart")
text(size(I,2),size(I,1)+15,["Chart courtesy of Imatest",char(174)], ...
FontSize=10,HorizontalAlignment="right");

Create an eSFR test chart object that automatically defines regions of interest (ROIs) based on
detected registration markers.

chart = esfrChart(I);

Highlight and label the detected ROIs to visually confirm that the ROIs are suitable for
measurements.

displayChart(chart)

14-23
14 Image Quality Metrics

All 60 slanted edge ROIs (labeled in green) are visible and centered on appropriate edges. In
addition, 20 gray patch ROIs (labeled in red) and 16 color patch ROIs (labeled in white) are visible
and are contained within the boundary of each patch. The chart is correctly imported.

Measure Edge Sharpness

Measure the sharpness of all 60 slanted edge ROIs. Also measure the averaged horizontal and
vertical sharpness of these ROIs.

[sharpnessTable,aggregateSharpnessTable] = measureSharpness(chart);

Display the SFR plot for the first four ROIs.

plotSFR(sharpnessTable,ROIIndex=1:4,displayLegend=false,displayTitle=true)

14-24
Evaluate Quality Metrics on eSFR Test Chart

14-25
14 Image Quality Metrics

14-26
Evaluate Quality Metrics on eSFR Test Chart

14-27
14 Image Quality Metrics

Display the average SFR of the averaged vertical and horizontal edges. The average vertical SFR
drops off more rapidly than the average horizontal SFR. Therefore, the average vertical edge is less
sharp than the average horizontal edge.

plotSFR(aggregateSharpnessTable)

14-28
Evaluate Quality Metrics on eSFR Test Chart

14-29
14 Image Quality Metrics

Measure Chromatic Aberration

Measure chromatic aberration at all slanted edge ROIs.

chTable = measureChromaticAberration(chart);

Plot the normalized intensity profile of the three color channels in the first ROI. Store the normalized
edge profile in a separate variable, edgeProfile, for clarity.

roi_index = 1;
edgeProfile = chTable.normalizedEdgeProfile{roi_index};

figure
p = length(edgeProfile.normalizedEdgeProfile_R);
plot(1:p,edgeProfile.normalizedEdgeProfile_R,"r", ...
1:p,edgeProfile.normalizedEdgeProfile_G,"g", ...
1:p,edgeProfile.normalizedEdgeProfile_B,"b")
xlabel("Pixel")
ylabel("Normalized Intensity")
title("ROI "+roi_index+" with Aberration "+chTable.aberration(1))

14-30
Evaluate Quality Metrics on eSFR Test Chart

The color channels have similar normalized intensity profiles, and not much color fringing is visible
along the edge.

Measure Noise

Measure noise using the 20 gray patch ROIs.

noiseTable = measureNoise(chart);

Plot the average raw signal and the signal-to-noise ratio (SNR) in each grayscale ROI.

figure
subplot(1,2,1)
plot(noiseTable.ROI,noiseTable.MeanIntensity_R,"r", ...
noiseTable.ROI,noiseTable.MeanIntensity_G,"g", ...
noiseTable.ROI,noiseTable.MeanIntensity_B,"b")
title("Signal")
ylabel("Intensity")
xlabel("Gray ROI Number")
grid on

subplot(1,2,2)
plot(noiseTable.ROI,noiseTable.SNR_R,"r", ...
noiseTable.ROI,noiseTable.SNR_G,"g", ...
noiseTable.ROI,noiseTable.SNR_B,"b")
title("SNR")
ylabel("dB")

14-31
14 Image Quality Metrics

xlabel("Gray ROI Number")


grid on

Estimate Illuminant

Estimate the scene illumination using the 20 gray patch ROIs. The illuminant has a stronger blue
component a weaker red component, which is consistent with the blue tint of the test chart image.
illum = measureIlluminant(chart)

illum = 1×3

110.9147 116.0008 123.2339

Measure Color Accuracy

Measure color accuracy using the 16 color patch ROIs.


[colorTable,ccm] = measureColor(chart);

Display the average measured color and the expected color of the ROIs. Display the color accuracy
measurement, Delta_E. The closer the Delta_E value is to 1, the less perceptible the color
difference is. Typical values of Delta_E range from 3 to 6 for printing, and up to 20 in other
commercial applications.
figure
displayColorPatch(colorTable)

14-32
Evaluate Quality Metrics on eSFR Test Chart

Plot the measured and reference colors in the CIE 1976 L*a*b* color space on a chromaticity
diagram. Red circles indicate the reference color. Green circles indicate the measured color of each
color patch.

figure
plotChromaticity(colorTable)

14-33
14 Image Quality Metrics

You can use the color correction matrix, ccm, to color-correct the test chart images. For an example,
see “Correct Colors Using Color Correction Matrix” on page 14-35.

References
[1] Imatest. "Esfr". https://www.imatest.com/mathworks/esfr/.

See Also
esfrChart | displayChart | measureSharpness | measureChromaticAberration |
measureIlluminant | measureNoise | measureColor

More About
• “Anatomy of the Imatest Extended eSFR Chart” on page 14-19
• “Correct Colors Using Color Correction Matrix” on page 14-35

14-34
Correct Colors Using Color Correction Matrix

Correct Colors Using Color Correction Matrix

This example shows how to adjust the colors of an image to better match a standardized set of colors
on an Imatest® edge spatial frequency response (eSFR) test chart.

Obtain Color Correction Matrix from the Test Chart Image

Read an image of a test chart, and create a copy of the image in the linear RGB color space.

I = imread("eSFRTestImage.jpg");
Ilin = rgb2lin(I);

Create an esfrChart object that stores information about the test chart. Display the chart,
highlighting the 16 color patches. The image has a blue tint.

chart = esfrChart(I);
displayChart(chart,displayEdgeROIs=false, ...
displayGrayROIs=false,displayRegistrationPoints=false)

Measure the color accuracy of the 16 color patches by using the measureColor function. The
function also returns the color correction matrix that is used to perform the color correction.

[colorTable,ccm] = measureColor(chart);

Compare the measured and reference colors on a color patch diagram. The closer the Delta_E value
is to 1, the less perceptible the color difference is.

14-35
14 Image Quality Metrics

displayColorPatch(colorTable)

Color-Correct the Test Chart Image

Color-correct the original test chart image in the linear RGB color space.

Ilin_corrected = imapplymatrix(ccm(1:3,:)',Ilin,ccm(4,:));

Convert the color-corrected image to the sRGB color space and display the result.

I_corrected = lin2rgb(Ilin_corrected);
imshow(I_corrected)
title("Color-Corrected Image Using Color Patches")

14-36
Correct Colors Using Color Correction Matrix

Create an esfrChart object that stores information about the color-corrected test chart. Measure
the color accuracy of the 16 color-corrected color patches in the sRGB color space.

chart_corrected = esfrChart(I_corrected);
colorTable_corrected = measureColor(chart_corrected);

Compare the corrected and reference colors on a color patch diagram. The measured color errors,
delta_E, are smaller for the color-corrected image than for the original image. Therefore, the colors
in this image better agree with the reference colors. However, the chart now has an overall yellow
tint and the contrast of the image has decreased.

displayColorPatch(colorTable_corrected)

14-37
14 Image Quality Metrics

Improve Color Correction Using Gray Patches

You can improve the color correction by including the gray patches as well as the color patches in the
least squares fit. Display the original chart, highlighting the 20 gray patches and 16 color patches.

displayChart(chart,displayEdgeROIs=false, ...
displayRegistrationPoints=false)

14-38
Correct Colors Using Color Correction Matrix

Get the reference L*a*b* values of the color and grayscale patches, which are stored in the
ReferenceColorLab and ReferenceGrayLab properties of the eSFR chart object. Convert these
values from the sRGB color space to the linear RGB color space.

referenceLab = [chart.ReferenceColorLab; chart.ReferenceGrayLab];


referenceRGB = lab2rgb(referenceLab,outputtype="uint8",ColorSpace="linear-rgb");

Measure the mean gray value on each of the 20 gray patches in the sRGB color space by using the
measureNoise function.

noiseTable = measureNoise(chart);
measuredGrayRGB = [noiseTable.MeanIntensity_R, ...
noiseTable.MeanIntensity_G, ...
noiseTable.MeanIntensity_B];

Concatenate all measured sRGB color values of the color and grayscale patches, then convert the
color values to the linear RGB color space.

measuredColorRGB = [colorTable.Measured_R, ...


colorTable.Measured_G, ...
colorTable.Measured_B];
measuredRGB = [measuredColorRGB; measuredGrayRGB];
measuredRGB = rgb2lin(measuredRGB);

Calculate the color correction matrix.

ccmWithGray = double([measuredRGB ones(36,1)]) \ double(referenceRGB);

14-39
14 Image Quality Metrics

Perform the color correction and display the result. The chart no longer has a yellow tint and the
overall appearance of the chart has improved.

Ilin_correctedWithGray = imapplymatrix(ccmWithGray(1:3,:)',Ilin,ccmWithGray(4,:)');
I_correctedWithGray = lin2rgb(Ilin_correctedWithGray);
imshow(I_correctedWithGray)
title("Color-Corrected Image Using Gray and Color Patches")

Compare the corrected and reference colors on a color patch diagram. Some of the measured color
errors have decreased, while others have increased.

chart_correctedWithGray = esfrChart(I_correctedWithGray);
colorTable_correctedWithGray = measureColor(chart_correctedWithGray);
displayColorPatch(colorTable_correctedWithGray)

14-40
Correct Colors Using Color Correction Matrix

References
[1] Imatest. "Esfr". https://www.imatest.com/mathworks/esfr/.

See Also
esfrChart | measureColor | displayColorPatch | plotChromaticity

More About
• “Evaluate Quality Metrics on eSFR Test Chart” on page 14-23

14-41
15

ROI-Based Processing

• “Specify ROI as Binary Mask” on page 15-2


• “Create ROI Shapes” on page 15-7
• “ROI Migration” on page 15-18
• “Create Binary Mask Using an ROI Function” on page 15-21
• “Overview of ROI Filtering” on page 15-24
• “Sharpen Region of Interest in an Image” on page 15-25
• “Apply Custom Filter to Region of Interest in Image” on page 15-28
• “Fill Region of Interest in an Image” on page 15-31
• “Calculate Properties of Image Regions Using Image Region Analyzer” on page 15-33
• “Filter Images on Properties Using Image Region Analyzer App” on page 15-38
• “Create Image Comparison Tool Using ROIs” on page 15-43
• “Use Freehand ROIs to Refine Segmentation Masks” on page 15-50
• “Rotate Image Interactively Using Rectangle ROI” on page 15-55
• “Subsample or Simplify a Freehand ROI” on page 15-61
• “Measure Distances in an Image” on page 15-71
• “Use Polyline to Create Angle Measurement Tool” on page 15-78
• “Create Freehand ROI Editing Tool” on page 15-82
• “Use Wait Function After Drawing ROI” on page 15-88
• “Interactive Image Inpainting Using Exemplar Matching” on page 15-91
• “Classify Pixels That Are Partially Enclosed by ROI” on page 15-95
15 ROI-Based Processing

Specify ROI as Binary Mask


A binary mask defines a region of interest (ROI) of an image. Mask pixel values of 1 indicate image
pixels that belong to the ROI. Mask pixel values of 0 indicate image pixels that are part of the
background.

Depending on the application, an ROI can consist of contiguous or discontiguous groups of pixels. A
contiguous region is a single group of connected pixels. A contiguous ROI could represent a single
object in an image, such as one car in an image of a street scene, or the body tissue in a medical
image. For example, a discontiguous ROI could represent all pixels corresponding to water in an
aerial photograph, or all tumorous cells in a medical image.

Image Processing Toolbox supports many options to create a binary mask. Here are some common
approaches, although this selection is not exhaustive.

Create Mask Using Thresholding


A common way to create a mask from an image is to classify each pixel based on the intensity value of
the pixel. For example, pixels in a region of interest can appear brighter than the background or have
a different color than the background. Various functions and apps enable you to apply thresholding to
grayscale, indexed, and color images.

Operation Description Sample Output (Input Image, Binary Mask,


and Masked ROI)
Single The ROI consists of grayscale
threshold pixels whose intensity is above
(or below) a specified threshold.
You can create a binary mask
representing the ROI using
mathematical operations or
functions such as imbinarize.

For an example, see “Correct


Nonuniform Illumination and
Analyze Foreground Objects” on
page 1-9.
Range of The ROI consists of grayscale
intensity or pixels whose intensity is within
index values a range of values. You can
create a binary mask
representing the ROI using
mathematical operations or by
using the roicolor function.
This function also supports
indexed images, in which case
the mask specifies pixels with
specific index values
corresponding to colors in a
colormap.

15-2
Specify ROI as Binary Mask

Operation Description Sample Output (Input Image, Binary Mask,


and Masked ROI)
Range of color The ROI consists of the pixels in
values a color image whose color
channels are within a range of
values. You can create a binary
mask from an RGB image using
the Color Thresholder app. This
app enables you to interactively
select the range of values for
the three color channels based
on different color spaces.

Grayscale flood The ROI consists of connected


fill pixels of similar intensity value.
You specify a seed point and
tolerance. Perform a flood fill
operation of a grayscale image
using the grayconnected
function or the Image
Segmenter app. Perform a flood
fill operation on a color image
using the Image Segmenter app.

15-3
15 ROI-Based Processing

Operation Description Sample Output (Input Image, Binary Mask,


and Masked ROI)
Color flood fill The ROI consists of connected
pixels of similar color value. You
specify a seed point and
tolerance. Perform a flood fill
operation on a color image
using the Image Segmenter app.

Create Mask Based on Position


You can specify an ROI based on the position of pixels in an image. For example, you can define an
ROI consisting of pixels within an ROI shape that you draw, or within a rectangular patch whose
location is specified using array indexing.

Operation Description Sample Output (Input Image, Binary Mask,


and Masked ROI)
Create The ROI consists of all pixels
geometrical or whose position is within a
freehand shape geometrical or a hand drawn
using ROI shape. First, you create an ROI
objects object, then you create a binary
mask using the createMask
function. You can draw shapes
interactively. You can also create
shape programmatically from
known positional constraints,
such as vertices for polygons,
waypoints for freehands, and
radius and center for circles.
For more information, see
Create ROI Shapes on page 15-
7.

For an example, see “Create


Binary Mask Using an ROI
Function” on page 15-21.

15-4
Specify ROI as Binary Mask

Operation Description Sample Output (Input Image, Binary Mask,


and Masked ROI)
Create The ROI consists of all pixels
polygonal shape within a polygonal shape. You
using polygon can draw the shape interactively
tool in the polygon tool by using the
roipoly function. This function
returns a binary mask directly.

Create mask The ROI consists of all pixels


from polygon whose position is within a
coordinates polygon. You can create this
binary mask by specifying the
vertices of the polygon using the
roipoly function, or by
specifying the vertices and the
target size of the mask using the
poly2mask function.
poly2mask does not require an
input image.

Create Mask Using Automated and Semi-Automated Segmentation


Algorithms
Some image segmentation algorithms predict an ROI based on features in the image and a coarse
estimate of the location of the ROI. For example, the active contours technique iteratively refines a
mask that you provide towards object boundaries. A benefit of automated and semi-automated
segmentation algorithms is that they can detect intricate ROI boundaries with high fidelity given an
imprecise initial location estimate.

The Image Segmenter and Volume Segmenter apps enable automated and semi-automated
segmentation techniques. You can also segment images into ROI and background using a variety of
functions in the toolbox. For more information about available segmentation techniques, see “Image
Segmentation”.

15-5
15 ROI-Based Processing

Operation Description Sample Output (Input Image, Binary Mask,


and Masked ROI)
Graph cut The graph cut algorithm
estimates an ROI using iterative
graph-based segmentation. You
specify and refine the ROI using
seed pixels for both the ROI and
the background. Graph cut
functionality is enabled by the
grabcut function and the
Image Segmenter app.

Lazy snapping The lazy snapping algorithm


estimates an ROI using graph-
based segmentation. You specify
an initial mask or pixel
coordinates for both the ROI
and the background. Lazy
snapping functionality is
enabled by the lazysnapping
function.

Active contours The active contours (snakes)


algorithm estimates an ROI
using a region growing
technique. You specify an initial
mask around the object
boundaries. Active contour
functionality is enabled by the
activecontour function and
the Image Segmenter app.

See Also
Color Thresholder | Image Segmenter | Volume Segmenter

More About
• Create ROI Shapes on page 15-7
• “Create Binary Mask Using an ROI Function” on page 15-21

15-6
Create ROI Shapes

Create ROI Shapes


You can create ROI objects that represent ROIs of various shapes, including circles, ellipses,
polygons, lines, polylines, rectangles, and hand-drawn shapes. You can also draw freehand shapes
that get "assistance" from the underlying image, automatically following the contour of edges. In this
figure, you see a polygonal ROI drawn over an image.

You can control aspects of the ROI position and appearance. You can create masks from ROIs and
perform other operations. You can also specify how the ROI responds to events that occur within the
ROI, such as mouse clicks and movement.

There are three ways to create ROI objects.

15-7
15 ROI-Based Processing

• Create an ROI interactively by using a creation convenience function. The creation functions
enable you to draw the ROI on an image. Use this approach if you do not have prior knowledge of
the size and position of the ROI and want to use the image content to assist in the placement of
the ROI. For more information, see “Create ROI Using Creation Convenience Functions” on page
15-10.
• Create an ROI programmatically by specifying information about the size and shape of the ROI.
Use this approach if you already know details about the size and shape of the ROI, such as the
coordinates of polygon vertices or the center coordinates and radius of a circle.
• Create an ROI programmatically, then use the draw function to interactively draw the ROI on an
image. Use this approach if you want to set the display properties and behavior of the ROI before
you specify the size and position of the ROI. The draw function also enables you to redraw an
existing ROI, preserving the appearance of the ROI. For more information, see “Create ROI Using
draw Function” on page 15-12.

The table shows the supported ROIs and their respective creation convenience functions.

ROI Object ROI Creation Description


Convenience Function
AssistedFreehand drawassisted Freehand ROI that snaps to edges of existing
objects in the image

Circle drawcircle Circular ROI

Crosshair drawcrosshair Linear ROI that consists of two perpendicular


lines

15-8
Create ROI Shapes

ROI Object ROI Creation Description


Convenience Function
Cuboid drawcuboid 3-D cuboidal ROI

Ellipse drawellipse Ellipsoid ROI

Freehand drawfreehand Freehand ROI that follows the path of the mouse

Line drawline Linear ROI that consists of a single line segment

15-9
15 ROI-Based Processing

ROI Object ROI Creation Description


Convenience Function
Point drawpoint Point ROI

Polygon drawpolygon Polygonal ROI that consists of a closed set of line


segments

Polyline drawpolyline Polyline ROI that consists of an open set of line


segments

Rectangle drawrectangle Rectangular ROI

Create ROI Using Creation Convenience Functions

This example shows how to create an ROI object by using ROI creation convenience functions. These
functions enable you to create the ROI objects interactively by drawing the ROI over an image.

Read and display an image.

15-10
Create ROI Shapes

I = imread('pears.png');
imshow(I)

This example creates an ellipsoid ROI. You can use a similar process to create any ROI object.

Create an Ellipse ROI by using the drawellipse function. Customize the look of the ROI by
specifying the StripeColor name-value pair argument as yellow.

roi = drawellipse('StripeColor','y');

15-11
15 ROI-Based Processing

Inspect properties of the ROI.


roi

roi =
Ellipse with properties:

Center: [446.0000 197.0000]


SemiAxes: [115.8836 71.6200]
RotationAngle: 298.3342
AspectRatio: 1.6180
Label: ''
Color: [0 0.4470 0.7410]
Parent: [1×1 Axes]
Visible: 'on'
Selected: 0

Show all properties

Create ROI Using draw Function

This example shows how to use the draw function to redraw an existing ROI interactively. This
approach is useful if you want to set the display properties and behavior of the ROI before you specify

15-12
Create ROI Shapes

the size and position of the ROI. For example, you may want to create and customize an ROI before
you have an axes in which to display the ROI.

This example creates and draws an ellipsoid ROI. You can use a similar process to create any ROI
object.

Create an Ellipse ROI programmatically by using the images.roi.Ellipse function. Specify


properties to customize the appearance of the ellipse. Here, the face color is cyan and border of the
ROI has a red stripe. Do not specify the position of the ROI.

roi = images.roi.Ellipse('Color','c','StripeColor','r');

Inspect properties of the ROI.

roi

roi =
Ellipse with properties:

Center: []
SemiAxes: []
RotationAngle: 0
AspectRatio: 1.6180
Label: ''

Show all properties

Inspect the parent axes of the ROI. The ROI is not drawn so the parent axes is empty.

roi.Parent

ans =
0×0 empty GraphicsPlaceholder array.

Read and display an image.

I = imread('pears.png');
imshow(I)

15-13
15 ROI-Based Processing

Draw the ROI on the image by using the draw function. Click and drag the cursor over the image to
create the elliptical shape. The displayed ROI has the face color and stripe color that you specified
when you created the ROI.

draw(roi)

15-14
Create ROI Shapes

Inspect properties of the ROI. Several properties of the ROI are updated after drawing.
roi

roi =
Ellipse with properties:

Center: [337 107.5000]


SemiAxes: [109.3766 67.5985]
RotationAngle: 42.2208
AspectRatio: 1.6180
Label: ''
Color: [0 1 1]
Parent: [1×1 Axes]
Visible: 'on'
Selected: 0

Show all properties

The ROI now has a parent axes. Get all graphics objects who share the same parent axes. In this
example, the ROI has the same parent as the displayed image.
roi.Parent.Children

ans =
2×1 graphics array:

15-15
15 ROI-Based Processing

Ellipse
Image

Using ROIs in Apps Created with App Designer


You can use ROIs in apps created with App Designer, parenting an ROI in a UIAxes. You must
explicitly specify the UIAxes when calling the ROI creation function, as an input argument or using
the 'Parent' name/value pair. There are a few limitations when using ROIs in apps in this way:

• The mouse cursor does not update when you hover over the ROI. The cursor is always an arrow.
• The ROI does not change color when you hover over it.
• The ROI right-click menu (UIContextMenu) is not supported.

The following code, while not a typical app-creation workflow, shows how to specify an ROI in a
UIAxes in an app (UIFigure).

1 Create a UIAxes. When you call the uiaxes function, it creates a UIFigure automatically.
uax = uiaxes;

2 Create the ROI in the UIAxes. Call any of the ROI creation functions, such as drawcircle, or
the ROI classes, such as images.roi.Circle. Specify the UIAxes as an argument. Move the

15-16
Create ROI Shapes

cursor over the axes, and click and drag the mouse to draw the ROI. The shape of the cursor does
not change when used with a UIAxes.

h = drawcircle(uax);

You can also create an ROI using the object creation function, such as images.roi.Circle. If
you use the objects, you must use the draw function to define the shape and position of the ROI.

See Also

Related Examples
• “Create Binary Mask Using an ROI Function” on page 15-21

More About
• Specify ROI as Binary Mask on page 15-2
• “Display Graphics in App Designer”

15-17
15 ROI-Based Processing

ROI Migration
Starting in R2018b, a new set of ROI objects replaced the previous set of ROI objects. The new
objects provide better performance and more functional capabilities, such as face color transparency.
With the new objects, you can also receive notification of interactions with the object, such as clicks
or movement, using events. Although there are no plans to remove the old ROI objects at this time,
switch to the new ROIs to take advantage of the additional capabilities and flexibility. For more
information on the new ROIs, see “Create ROI Shapes” on page 15-7.

ROI Object Migration


If your code uses one of the previous ROI objects, replace this with a call to the corresponding new
ROI object. Because the new ROIs offer shapes that weren't supported previously, in some cases, you
have several ROIs to choose from. Two ROIs in the new system have no corresponding ROI in the
previous system: Crosshair and Cuboid.

Previous ROI Object Current ROI Object


imellipse Use Ellipse instead. With the previous set of ROIs, you
used imellipse to draw a circular ROI. With the new
ROIs, use Circle.
imfreehand Use Freehand instead. You can also use
AssistedFreehand to create a hand-drawn ROI that
"assists" your drawing by automatically following the
contours of edges in the underlying image.
imline Use Line instead.
impoint Use Point instead.
impoly Use Polygon instead. To create an open polygonal
shape, use Polyline.
imrect Use Rectangle instead.

ROI Object Function Migration


The previous set of ROIs used object functions to customize many aspects of ROI appearance and
functioning. In many cases, the new ROIs replace these object functions with object properties.
Instead of calling an object function, you get the value of a property or set the value of a property. For
example, instead of using getColor to get the color of the ROI, access the value of the Color
property of the new ROI object. For detailed information about how to migrate code to the new ROI
system, see the Compatibility Considerations section of the object function reference pages
associated with the previous ROI objects.

Previous ROI Object Functions Equivalent Object Functions


addNewPositionCallback Use the addListener object function to specify
the function you want executed when the ROI
moves. For more information about using events,
see “ROI Events” on page 15-20.
createMask Use the equivalent createMask object function
with the new ROIs.

15-18
ROI Migration

Previous ROI Object Functions Equivalent Object Functions


getColor Retrieve the value of the Color property of the
ROI. For example,

roi_color = roi.Color;.
getPosition Retrieve the value of the Position property of
the ROI. For example,

roi_pos = roi.Position;.
getPositionConstraintFcn Use the DrawingArea property to specify
position constraints.
getVertices Retrieve the value of the Vertices property of
the ROI. For example,

roi_vert = roi.Vertices;.
makeConstrainToRectFcn Use the DrawingArea property to specify
position constraints.
removeNewPositionCallback Use the addListener object function to specify
the function to be called with the ROI moves. To
remove this callback function, delete the object
returned by the addListener object function.
resume Use uiresume instead.
setClosed Assign a value to the ROI Closed property. For
example, roi.Closed = 'y'.
setColor Assign a value to the new ROI Color property.
For example, roi.Color = 'y'.
setConstrainedPosition Use the DrawingArea property to specify
position constraints.
setFixedAspectRatioMode Use FixedAspectRatio property of the new
ROIs, setting the value to true.
setPosition Assign a value to the new ROI Position
property. The way to specify the position varies
with each object. For example, roi.Position =
[50 50].
setPositionConstraintFcn Use the DrawingArea property to specify
position constraints.
setResizable Use the InteractionsAllowed property,
setting the value to 'translate'.
setString Assign a value to the new ROI Label property.
For example, roi.Label = 'My Label';.
setVerticesDraggable Use the InteractionsAllowed property,
setting the value to 'translate'.

15-19
15 ROI-Based Processing

Previous ROI Object Functions Equivalent Object Functions


wait Use the equivalent wait with the new ROI
objects. Note that the new wait function does
not support a return value containing position
information.

ROI Events
With the previous ROIs, you could use the addNewPositionCallback object function to receive
notification when the ROI moves. You specify the object and the function that you want executed
when the event occurs: id = addNewPositionCallback(h,fcn).

With the new ROIs, you use the addListener object function to receive notification when the ROI
moves. You specify the object, the name of the event you want to receive notification, and the name of
the function you want executed when the event occurs: el =
addlistener(roi,'ROIMoving',mycallbackfcn). With the new ROIs, you must specify the
name of the event because you can receive notification of many other events, such as when the ROI is
clicked.

To see an example, see the Compatibility Considerations section on the


addNewPositionCallback reference page.

See Also

More About
• “Create ROI Shapes” on page 15-7

15-20
Create Binary Mask Using an ROI Function

Create Binary Mask Using an ROI Function

This example shows to create a binary mask using one of the ROI creation functions, such as
drawcircle, with the mask creation function createMask.

Read an image into the workspace and display it.

img = imread('pout.tif');
h_im = imshow(img);

Create an ROI on the image using one of the ROI creation functions.

circ = drawcircle('Center',[113,66],'Radius',60);

15-21
15 ROI-Based Processing

Create a binary mask from the ROI using createMask. The createMask function returns a binary
image the same size as the input image. The pixels inside the ROI are set to 1 and the pixel values
everywhere else are set to 0.

BW = createMask(circ);
imshow(BW)

15-22
Create Binary Mask Using an ROI Function

See Also

More About
• Specify ROI as Binary Mask on page 15-2
• Create ROI Shapes on page 15-7

15-23
15 ROI-Based Processing

Overview of ROI Filtering


Filtering a region of interest (ROI) is the process of applying a filter to a region in an image, where a
binary mask defines the region. For example, you can apply an intensity adjustment filter to certain
regions of an image.

To filter an ROI in an image, use the roifilt2 function. When you call roifilt2, specify:

• The input grayscale image to be filtered


• A binary mask image that defines the ROI
• A filter (either a 2-D filter or function)

roifilt2 filters the input image and returns an image that consists of filtered values for pixels
where the binary mask contains 1s and unfiltered values for pixels where the binary mask contains
0s. This type of operation is called masked filtering.

roifilt2 is best suited for operations that return data in the same range as in the original image,
because the output image takes some of its data directly from the input image. Certain filtering
operations can result in values outside the normal image data range (that is, [0, 1] for images of class
double, [0, 255] for images of class uint8, and [0, 65535] for images of class uint16).

See Also
roifilt2

Related Examples
• “Sharpen Region of Interest in an Image” on page 15-25
• “Apply Custom Filter to Region of Interest in Image” on page 15-28

15-24
Sharpen Region of Interest in an Image

Sharpen Region of Interest in an Image

Read a grayscale image into the workspace.

I = imread('pout.tif');
imshow(I)

Draw a region of interest over the image to specify the area you want to filter. Use the drawcircle
function to create the region of interest, specifying the center of the circle and the radius of the
circle. Alternatively, if you want to draw the circle interactively, then do not specify the center or
radius of the circle.

hax = drawcircle(gca,'Center',[115 69],'Radius', 60);

15-25
15 ROI-Based Processing

Create the mask using the createMask function and specifying the ROI.

mask = createMask(hax);

Define the function you want to use as a filter. This function, named f, passes the input image x to the
imsharpen function and specifies the strength of the sharpening effect by using the 'Amount'
name-value pair argument.

f = @(x)imsharpen(x,'Amount',3)

f = function_handle with value:


@(x)imsharpen(x,'Amount',3)

Filter the ROI using the roifilt2 function and specifying the image, mask, and filtering function.

J = roifilt2(I,mask,f);

Display the result.

imshow(J)

15-26
Sharpen Region of Interest in an Image

See Also
drawcircle | Circle | createMask | roifilt2 | imsharpen

15-27
15 ROI-Based Processing

Apply Custom Filter to Region of Interest in Image

This example shows how to filter a region of interest (ROI), using the roifilt2 function to specify
the filter. roifilt2 enables you to specify your own function to operate on the ROI. This example
uses the imadjust function to lighten parts of an image.

Read an image into the workspace and display it.

I = imread('cameraman.tif');
figure
imshow(I)

Create the mask image. This example uses a binary image of text as the mask image. All the 1-valued
pixels define the regions of interest. The example crops the image because a mask image must be the
same size as the image to be filtered.

BW = imread('text.png');
mask = BW(1:256,1:256);
figure
imshow(mask)

15-28
Apply Custom Filter to Region of Interest in Image

Create the function you want to use as a filter.

f = @(x) imadjust(x,[],[],0.3);

Filter the ROI, specifying the image to be filtered, the mask that defines the ROI, and the filter that
you want to use.

I2 = roifilt2(I,mask,f);

Display the result.

figure
imshow(I2)

15-29
15 ROI-Based Processing

See Also
roifilt2 | imadjust

15-30
Fill Region of Interest in an Image

Fill Region of Interest in an Image


This example shows how to use regionfill to fill a region of interest (ROI) in an image. The
example uses the roipoly function to define the region of interest interactively with the mouse.
regionfill smoothly interpolates inward into the region from the pixel values on the boundary of
the polygon. You can use this function for image editing, including removal of extraneous details or
artifacts. The filling process replaces values in the region with values that blend with the background.

Read an image into the MATLAB workspace and display it.

I = imread('eight.tif');
imshow(I)

Create a mask image to specify the region of interest (ROI) you want to fill. Use the roipoly
function to specify the region interactively. Call roipoly and move the pointer over the image. The

pointer shape changes to cross hairs . Define the ROI by clicking the mouse to specify the vertices
of a polygon. You can use the mouse to adjust the size and position of the ROI.

mask = roipoly(I);

Double-click to finish defining the region. roipoly creates a binary image with the region filled with
1-valued pixels.

Display the mask image.

15-31
15 ROI-Based Processing

figure
imshow(mask)

Fill the region, using regionfill, specifying the image to be filled and the mask image as inputs.
Display the result. Note the image contains one less coin.

J = regionfill(I,mask);
figure
imshow(J)

See Also
regionfill | roipoly | drawpolygon | createMask

15-32
Calculate Properties of Image Regions Using Image Region Analyzer

Calculate Properties of Image Regions Using Image Region


Analyzer

This example shows how to calculate the properties of regions in binary images by using the Image
Region Analyzer app. This example finds the largest region, measured by area, in the image.

Read a binary image into the workspace.

BW = imread("text.png");

Open the Image Region Analyzer app from the MATLAB® toolstrip. On the Apps tab, in the Image

Processing and Computer Vision section, click Image Region Analyzer .

On the app toolstrip, click Load Image, and then select Load Image from Workspace to load the
image from the workspace into the app. In the Import From Workspace dialog box, select the image
you read into the workspace, and then click OK.

You can also open the app from the command line by using the imageRegionAnalyzer function,
specifying the image you want to analyze: imageRegionAnalyzer(BW);.

The Image Region Analyzer app displays the binary image and a table with region properties. In
the table, each row is a region identified in the image and each column is a property of that region,
such as the area, perimeter, and orientation.

15-33
15 ROI-Based Processing

To access the image exploration controls such as pan and zoom, move the cursor over the image.

The app initially displays a subset of the available properties. To add or remove properties from the
table, click Choose Properties and select the properties you want to view. The app updates the table
automatically.

15-34
Calculate Properties of Image Regions Using Image Region Analyzer

Initially, the app lists the regions in the order it finds them, starting in the upper-left corner of the
image. To change the sorting order, click the sort icon next to a property name in the table. The app
sorts the regions in increasing order. Click again to sort the regions in decreasing order.

For example, to sort the regions in the image from largest area to smallest area, click on the sort icon
for the Area property twice.

15-35
15 ROI-Based Processing

To view the region in the image with the largest area, click the first row in the table. The app
highlights the corresponding region in the image.

15-36
Calculate Properties of Image Regions Using Image Region Analyzer

See Also
Image Region Analyzer | bwpropfilt | bwareafilt | regionprops

Related Examples
• “Filter Images on Properties Using Image Region Analyzer App” on page 15-38

15-37
15 ROI-Based Processing

Filter Images on Properties Using Image Region Analyzer App

This example shows how to create a new binary image by filtering an existing binary image based on
properties of regions in the image.

Read a binary image into the workspace.

BW = imread("text.png");

Open the Image Region Analyzer app from the toolstrip. On the Apps tab, in the Image

Processing and Computer Vision section, click Image Region Analyzer .

On the app toolstrip, click Load Image, and then select Load Image from Workspace to load the
image from the workspace into the app. In the Import from Workspace dialog box, select the image
you read into the workspace, and then click OK.

You can also open the app from the command line using the imageRegionAnalyzer function,
specifying the image you want to analyze:

imageRegionAnalyzer(BW)

Image Region Analyzer displays the binary image and a table with region properties. In the table,
each row is a region identified in the image and each column is a property of that region, such as the
area, perimeter, and orientation.

15-38
Filter Images on Properties Using Image Region Analyzer App

To filter on the value of a region property, on the app toolstrip, click Filter. Select a property on
which you want to filter and specify the filter criteria. For example, to create an image that removes
all but the largest regions, select the Area property, choose the greater than or equal to symbol (>=),
and then specify the minimum value.

15-39
15 ROI-Based Processing

To filter on another property, click Add. The app displays another row in which you can select a
property and specify filter criteria. The result is the intersection (logical AND) of the two filtering
operations.

As you apply filters on properties, the app updates the binary image and the table automatically.

If you are creating a mask image, then you can optionally perform cleanup operations on the mask,
such as clearing all foreground pixels that touch the border and filling holes in objects. Filling holes
can change the area of regions and therefore the regions that appear in the filtered image. For this
example, the area of letters such as "b", "d", and "g" is larger after filling holes. A new region (a letter
"o") with an area of 105 pixels now appears in the filtered image because the filled area of that region
is above the threshold.

15-40
Filter Images on Properties Using Image Region Analyzer App

When you are done filtering the image, you can save it. Click Export > Export Image. In the Export
to Workspace dialog box, accept the default name for the mask image, or specify another name. Then,
click OK.

15-41
15 ROI-Based Processing

You can save the list of properties as a structure or table. Click Export > Export Properties.

You can also export a function that filters binary images using the same filters and cleanup operations
that you specify. The function returns the filtered binary image and the property measurements in a
table. Click Export > Export Function, then save the function as an M file.

function [BW_out,properties] = filterRegions(BW_in)


%filterRegions Filter BW image using auto-generated code from imageRegionAnalyzer app.

% Auto-generated by imageRegionAnalyzer app on 31-Oct-2021


%---------------------------------------------------------

BW_out = BW_in;

% Fill holes in regions.


BW_out = imfill(BW_out, 'holes');

% Filter image based on image properties.


BW_out = bwpropfilt(BW_out,'Area',[85, 124]);

% Get properties.
properties = regionprops(BW_out, {'Area', 'ConvexArea', 'EulerNumber', 'FilledArea', 'MajorAxisLe

See Also
Image Region Analyzer | bwpropfilt | bwareafilt | regionprops

More About
• “Calculate Properties of Image Regions Using Image Region Analyzer” on page 15-33

15-42
Create Image Comparison Tool Using ROIs

Create Image Comparison Tool Using ROIs

This example shows how to use the new ROI functions to create an interactive image comparison
tool. When working with images, we often need to assess visually the impact a function has on an
image. Some effects are clearly distinguishable, like an edge filter. But some impacts are more subtle
and need closer inspection.

Read Sample Image into the Workspace

Read a sample image into the workspace and then create a grayscale version of the image. Display
the images side-by-side in a montage.

im = imread("peppers.png");
imgray = im2gray(im);
figure
montage({im,imgray})

Using an ROI, set the alpha layer (transparency) of two stacked images so that one image shows
through only inside the ROI. This selective view follows the ROI so it can be moved interactively.
Create a new figure and an axes.

hFigure = figure;
hAxes = axes("Parent", hFigure);

15-43
15 ROI-Based Processing

Stack both images on the same axes.

hImage1 = imshow(im, "Parent", hAxes);


drawnow; % Ensure the image gets drawn.
hold on
hImage2 = imshow(imgray, "Parent", hAxes);
hold off

15-44
Create Image Comparison Tool Using ROIs

Create a circular ROI on the axes.

centerLocation = [220, 100];


radius = 60;
hC = images.roi.Circle(...
"Parent", hAxes,...
"FaceAlpha",0,...
"Center",centerLocation,...
"Radius", radius);

15-45
15 ROI-Based Processing

Create a listener that listens to changes in the position of the ROI (the circle). The updateAlpha
function is defined at the end of this example.

addlistener(hC,"MovingROI", @updateAlpha);

Execute the callback manually the first time

updateAlpha(hC)

15-46
Create Image Comparison Tool Using ROIs

Simulate zooming in to a region.

hC.Parent.XLim = [75 370];


hC.Parent.YLim = [0 205];

15-47
15 ROI-Based Processing

This file contains the source code for a function that implements this image comparison tool. This
code listens for two additional events. When a user to enter the 't/T' key to switch which image is on
top. The code also listens for the mouse scroll wheel to increase or decrease the radius of the ROI.

edit helperImageComparer

15-48
Create Image Comparison Tool Using ROIs

An Animation of the Tool in Use

Callback Function to Update Alpha Layer

Callback function to update the alpha layer as the ROI object is moved.

function updateAlpha(hC, ~)
hImages = findobj(hC.Parent,"Type","image");
% Create a BW mask from the Circle ROI
mask = hC.createMask(hImages(1).CData);
% Set the alpha data so that the underlying image shows through
% only inside the circle
set(hImages(1),"AlphaData", ~mask);
end

See Also
Circle | addlistener | createMask

More About
• “Create Callbacks for Graphics Objects”
• “Overview Events and Listeners”

15-49
15 ROI-Based Processing

Use Freehand ROIs to Refine Segmentation Masks

This example shows how to refine a logical segmentation mask by converting it into a Freehand ROI
object. In this method, you take advantage of the interactive reshaping capability of the Freehand
ROI object to make a better fit of the edge of the segmentation mask to the shape of the image region
that you want to segment.

Introduction - Image Segmentation

Segmentation algorithms are used to segment interesting parts of an image. To illustrate, this
example uses K-means clustering to segment bone and tissue in an MRI image.

im = dicomread('knee1.dcm');
segmentedLabels = imsegkmeans(im,3);
boneMask = segmentedLabels==2;
imshowpair(im, boneMask);

15-50
Use Freehand ROIs to Refine Segmentation Masks

Post-Process the Segmentation Masks

Often, the results of automated segmentation algorithms need additional post-processing to clean up
the masks. As a first step, select the two largest bones from the mask, the femur and the tibia.

boneMask = bwareafilt(boneMask, 2);


imshowpair(im, boneMask);

15-51
15 ROI-Based Processing

Convert Mask to Freehand ROI object

To refine the edges of the automatic k-means segmentation, convert the two masks into interactive
freehand ROI objects. First, retrieve the locations of boundary pixels that delineate these two
segmented regions. Note that these ROI objects are densely sampled—their Position property has
the same resolution as the image pixels.

blocations = bwboundaries(boneMask,'noholes');
figure
imshow(im, []);
for ind = 1:numel(blocations)
% Convert to x,y order.
pos = blocations{ind};
pos = fliplr(pos);
% Create a freehand ROI.

15-52
Use Freehand ROIs to Refine Segmentation Masks

drawfreehand('Position', pos);
end

Edit the ROIs

The Freehand ROI object allows simple 'rubber-band' interactive edits. To edit the ROI, click and drag
any of the waypoints along the ROI boundary. You can add additional waypoints anywhere on the
boundary by double-clicking the ROI edge or by using the context menu accessible by right-clicking
the edge.

Convert the Freehand ROIs Back to Masks

After editing the ROIs, convert these ROI objects back to binary masks using the ROI object's
createMask method. Note the additional step required to include the boundary pixels in the final
mask.

15-53
15 ROI-Based Processing

% Convert edited ROI back to masks.


hfhs = findobj(gca, 'Type', 'images.roi.Freehand');
editedMask = false(size(im));

for ind = 1:numel(hfhs)


% Accumulate the mask from each ROI
editedMask = editedMask | hfhs(ind).createMask();

% Include the boundary of the ROI in the final mask.


% Ref: https://blogs.mathworks.com/steve/2014/03/27/comparing-the-geometries-of-bwboundaries-
% Here, we have a dense boundary, so we can take the slightly more
% performant approach of just including the boundary pixels directly in
% the mask.
boundaryLocation = hfhs(ind).Position;
bInds = sub2ind(size(im), boundaryLocation(:,2), boundaryLocation(:,1));
editedMask(bInds) = true;
end

See Also
dicomread | imsegkmeans | bwareafilt | bwboundaries | drawfreehand | Freehand |
createMask

15-54
Rotate Image Interactively Using Rectangle ROI

Rotate Image Interactively Using Rectangle ROI

This example shows how to rotate an image by using a Rectangle ROI with a callback function that
calls imrotate when you move the ROI.

Image rotation is a common preprocessing step. In this example, an image needs to be rotated by an
unknown amount to align the horizon with the x-axis. You can use the imrotate function to rotate
the image, but you need prior knowledge of the rotation angle. By using an interactive rotatable ROI,
you can rotate the image in real time to match the rotation of the ROI.

Create the Rotatable Rectangle ROI

Display an image in an Axes.

im = imread('baby.jpg');
hIm = imshow(im);

15-55
15 ROI-Based Processing

Get the size of the image.

sz = size(im);

Determine the position and size of the Rectangle ROI as a 4-element vector of the form [x y w h]. The
ROI will be drawn at the center of the image and have half of the image width and height.

pos = [(sz(2)/4) + 0.5, (sz(1)/4) + 0.5, sz(2)/2, sz(1)/2];

15-56
Rotate Image Interactively Using Rectangle ROI

Create a rotatable Rectangle ROI at the specified position and set the Rotatable property to true.
You can then rotate the rectangle by clicking and dragging near the corners. As the ROI moves, it
broadcasts an event MovingROI. By adding a listener for that event and a callback function that
executes when the event occurs, you can rotate the image in response to movements of the ROI.

h = drawrectangle('Rotatable',true,...
'DrawingArea','unlimited',...
'Position',pos,...
'FaceAlpha',0);

15-57
15 ROI-Based Processing

Place a prompt in the label.

h.Label = 'Rotate rectangle to rotate image';

15-58
Rotate Image Interactively Using Rectangle ROI

Add a listener that listens for any movement of the ROI.


addlistener(h,'MovingROI',@(src,evt) rotateImage(src,evt,hIm,im));

Call imrotate in Callback Function

Define a callback function that executes as the Rectangle ROI moves. This function retrieves the
current rotation angle of the ROI, calls imrotate on the image with that rotation angle, and updates
the display. The function also updates the label to display the current rotation angle.

15-59
15 ROI-Based Processing

function rotateImage(src,evt,hIm,im)

% Only rotate the image when the ROI is rotated. Determine if the
% RotationAngle has changed
if evt.PreviousRotationAngle ~= evt.CurrentRotationAngle

% Update the label to display current rotation


src.Label = [num2str(evt.CurrentRotationAngle,'%30.1f') ' degrees'];

% Rotate the image and update the display


im = imrotate(im,evt.CurrentRotationAngle,'nearest','crop');
hIm.CData = im;

end

end

See Also
drawrectangle | Rectangle | imrotate | addlistener

More About
• “Create Callbacks for Graphics Objects”
• “Overview Events and Listeners”

15-60
Subsample or Simplify a Freehand ROI

Subsample or Simplify a Freehand ROI

This example shows how to subsample or reduce the number of points in a Freehand ROI object.

Introduction

The drawfreehand function creates a smooth-looking, freehand, region-of-interest (ROI). However,


the edge of the ROI is actually made of discrete points distributed all along the boundary. Two factors
contribute to how smooth a freehand ROI looks: 1) the density of points and 2) the Smoothing
property of the freehand ROI object.

When drawing interactively, mouse motion determines the density of points. For large complex ROIs,
the number of points used can be quite large.

The Smoothing property controls how the boundary looks. By default, the Freehand object uses a
Gaussian smoothing kernel with a sigma value of 1 and a filter size of 5. Changing this value only
changes how the boundary looks, it does not change the underlying Position property of the object.

Default Density of Points

Reducing the density of points can help reduce the space required to store the ROI data and may also
speed up any computation that depends on the number of these points. One way to reduce the density
of points is to subsample the points, for example, pick every other point.

Create a sample freehand ROI by converting a mask to an ROI. The ROI is very dense since every
boundary pixel will correspond to a point in the ROI.

im = imread('football.jpg');
bw = im(:,:,1)>200;
bw = bwareafilt(bw, 1);
bloc = bwboundaries(bw,'noholes');
roipos = fliplr(bloc{1});
imshow(im);
hfh = drawfreehand('Position', roipos);

15-61
15 ROI-Based Processing

To visualize the density of the points, turn every point in the ROI into a waypoint.

hfh.Waypoints(:) = true;

title('Original density');
snapnow

% Zoom in.
xlim([80 200]);
ylim([70 160]);
snapnow

15-62
Subsample or Simplify a Freehand ROI

Subsampling the Position Points

Subsample the points that make up the Position property of the freehand ROI. Since the freehand
ROI is very dense. Subsampling can substantially reduce the size without loosing fidelity. Query the
initial, full, fine-grained position.
fpos = hfh.Position;

15-63
15 ROI-Based Processing

Subsample, picking every other point.

cpos = fpos(1:2:end,:);

Update the Position property of the ROI.

hfh.Position = cpos;

To see the density, turn all points into waypoints.

hfh.Waypoints(:) = true;
title('Simple Subsample');
snapnow

15-64
Subsample or Simplify a Freehand ROI

Subsampling - Using Rate of Change

A better approach to subsample the points would be to selectively start removing points which have
low curvature. It makes more sense to remove a point that is along a relatively straight portion of the
ROI rather than one near a curve. One simple approach to define a curvature value is to measure the
rate of the change in position locations.

Measure the rate of change. The neighbor of the first point is the last point.

dfpos = diff([fpos(end,:); fpos]);

Define an ad-hoc measure of curvature based on a simple low-pass filter.

cm = sum(abs(conv2(dfpos, ones(3,2),'same')),2);

Sort by curvature.

[~, cmInds] = sort(cm);

Pick 3/4 of the points with lower curvature values to remove from the ROI.

numPointsToCull = round(0.25*size(fpos,1));

Remove those positions.

cpos = fpos;
cpos(cmInds(1:numPointsToCull),:) = [];

Update the ROI, turning on all Waypoints to see the impact.

hfh.Position = cpos;
hfh.Waypoints(:) = true;

15-65
15 ROI-Based Processing

title('Curvature Based Subsample (factor of 4)');


snapnow

Subsampling - Using reduce method on freehand ROI objects

An even better approach to subsample the points would be to use the reduce method on the ROI
object. The reduce method operates directly on the Position property of the ROI object. You can
affect the number of points removed by specifying a tolerance value between [0 1.0] as an optional
input argument. The default value of tolerance is 0.01.

Reset the Position property and call reduce on the ROI object.

hfh.Position = fpos;
reduce(hfh);

% View the updated ROI, turning all the points into waypoints to see the
% impact.
hfh.Waypoints(:) = true;
title('Subsampling using reduce method');
snapnow

15-66
Subsample or Simplify a Freehand ROI

Interactive Subsampling

Another way to subsample is to use events to make this process easier. First create a listener to
interactively change the number of points that the freehand ROI uses. Use the UserData property of
the Freehand object to cache the full resolution Position data, along with the current value of
tolerance. Then add a custom context menu to the ROI object by creating a new uimenu and
parenting it to the UIContextMenu of the Freehand object. This menu option allows you to finalize
the ROI, which deletes the temporary cache.

Restore the original ROI, and cache the original position along with its curvature measure in
UserData.

hfh.Waypoints(:) = true;
hfh.UserData.fpos = fpos;
hfh.UserData.tol = 0;

15-67
15 ROI-Based Processing

Respond to mouse scroll.

h = gcf;
h.WindowScrollWheelFcn = @(h, evt) changeSampleDensity(hfh, evt);

Add a context menu to finalize the ROI and perform any clean up needed.

uimenu(hfh.UIContextMenu, 'Text','Finalize',...
'MenuSelectedFcn', @(varargin)finalize(hfh));

title('Scroll to change density interactively');

15-68
Subsample or Simplify a Freehand ROI

Animation of the Interactive Subsampling

Callback Function - Change the Sample Density Based on Mouse Scroll

This function gets called on scroll action. Scrolling up increases the density, and scrolling down
decreases it. This allows you to interactively select the number of points to retain.

function changeSampleDensity(hfh, evt)


% Restore Position property of ROI.
hfh.Position = hfh.UserData.fpos;
% Change tolerance by a fixed amount based on the direction of the scroll.
% This code changes the tolerance by 0.01 for every scroll increment.
tol = hfh.UserData.tol + 0.01 * (evt.VerticalScrollCount);

15-69
15 ROI-Based Processing

% Restrict the range of tolerance values to be from 0 to 0.15, which is the


% useful range.
tol = max(min(tol, 0.15), 0);
% Call |reduce| with the specified tolerance.
reduce(hfh,tol);
hfh.UserData.tol = tol;
% Update the ROI and turn all the points into waypoints to show the
% density.
hfh.Waypoints(:) = true;
end

Callback Function - Finalize the Freehand ROI

Delete and create a new Freehand ROI with the subsampled points to save on space.

function finalize(hfh)
h = ancestor(hfh, 'figure');
% Reset the mouse scroll wheel callback.
h.WindowScrollWheelFcn = [];
% Save finalized set of points.
pos = hfh.Position;
% Delete and create a new Freehand ROI with the new |Position| value.
delete(hfh);
drawfreehand(gca, 'Position', pos);
end

See Also
bwareafilt | bwboundaries | drawfreehand | Freehand

More About
• “Create Callbacks for Graphics Objects”
• “Overview Events and Listeners”

15-70
Measure Distances in an Image

Measure Distances in an Image

This example shows how to use line ROIs to measure distances in an image. You can also calibrate the
measurements to real world values and specify the units. The example illustrates how you can
seamlessly add, edit, and remove ROIs without needing to enter into any specific drawing mode.

Read Image into the Workspace and Display Image

Read image into the workspace.

im = imread('concordorthophoto.png');

Gather data about the image, such as its size, and store the data in a structure that you can pass to
callback functions.

sz = size(im);
myData.Units = 'pixels';
myData.MaxValue = hypot(sz(1),sz(2));
myData.Colormap = hot;
myData.ScaleFactor = 1;

Display the image in an axes.

hIm = imshow(im);

15-71
15 ROI-Based Processing

Specify a callback function for the ButtonDownFcn callback on the image. Pass the myData
structure to the callback function. This callback function creates the line objects and starts drawing
the ROIs.

hIm.ButtonDownFcn = @(~,~) startDrawing(hIm.Parent,myData);

15-72
Measure Distances in an Image

Create Callback Function to Start Drawing ROIs

Create the function used with the ButtonDownFcn callback to create line ROIs. This function:

1. Instantiates a line ROI object.

2. Sets up listeners to react to clicks and movement of the ROI.

3. Adds a custom context menu to the ROIs that includes a 'Delete All' option.

4. Begins drawing the ROI, using the point clicked in the image as the starting point.

function startDrawing(hAx,myData)

% Create a line ROI object. Specify the initial color of the line and
% store the |myData| structure in the |UserData| property of the ROI.
h = images.roi.Line('Color',[0, 0, 0.5625],'UserData',myData);

% Set up a listener for movement of the line ROI. When the line ROI moves,
% the |updateLabel| callback updates the text in the line ROI label and
% changes the color of the line, based on its length.

15-73
15 ROI-Based Processing

addlistener(h,'MovingROI',@updateLabel);

% Set up a listener for clicks on the line ROI. When you click on the line
% ROI, the |updateUnits| callback opens a GUI that lets you specify the
% known distance in real-world units, such as, meters or feet.
addlistener(h,'ROIClicked',@updateUnits);

% Get the current mouse location from the |CurrentPoint| property of the
% axes and extract the _x_ and _y_ coordinates.
cp = hAx.CurrentPoint;
cp = [cp(1,1) cp(1,2)];

% Begin drawing the ROI from the current mouse location. Using the
% |beginDrawingFromPoint| method, you can draw multiple ROIs.
h.beginDrawingFromPoint(cp);

% Add a custom option to the line ROI context menu to delete all existing
% line ROIs.
c = h.UIContextMenu;
uimenu(c,'Label','Delete All','Callback',@deleteAll);

end

Create Callback Function to Update ROI Label and Color

Create the function that is called whenever the line ROI is moving, that is, when the 'MovingROI'
event occurs. This function updates the ROI label with the length of the line and changes the color of
the line based on its length.

This function is called repeatedly when the ROI moves. If you want to update the ROI only when the
movement has finished, listen for the 'ROIMoved' event instead.

function updateLabel(src,evt)

15-74
Measure Distances in an Image

% Get the current line position.


pos = evt.Source.Position;

% Determine the length of the line.


diffPos = diff(pos);
mag = hypot(diffPos(1),diffPos(2));

% Choose a color from the colormap based on the length of the line. The
% line changes color as it gets longer or shorter.
color = src.UserData.Colormap(ceil(64*(mag/src.UserData.MaxValue)),:);

% Apply the scale factor to line length to calibrate the measurements.


mag = mag*src.UserData.ScaleFactor;

% Update the label.


set(src,'Label',[num2str(mag,'%30.1f') ' ' src.UserData.Units],'Color',color);

end

Create Callback Function to Update Measurement Units

Create the function that is called whenever you double-click the ROI label. This function opens a
popup dialog box in which you can enter information about the real-world distance and units.

This function listens for the 'ROIClicked' event, using event data to check the type of click and the
part of the ROI that was clicked.

The popup dialog box prompts you to enter the known distance and units for this measurement. With
this information, you can calibrate all the ROI measurements to real world units.

function updateUnits(src,evt)

% When you double-click the ROI label, the example opens a popup dialog box
% to get information about the actual distance. Use this information to
% scale all line ROI measurements.
if strcmp(evt.SelectionType,'double') && strcmp(evt.SelectedPart,'label')

% Display the popup dialog box.


answer = inputdlg({'Known distance','Distance units'},...
'Specify known distance',[1 20],{'10','meters'});

% Determine the scale factor based on the inputs.


num = str2double(answer{1});

% Get the length of the current line ROI.


pos = src.Position;
diffPos = diff(pos);
mag = hypot(diffPos(1),diffPos(2));

% Calculate the scale factor by dividing the known length value by the
% current length, measured in pixels.
scale = num/mag;

% Store the scale factor and the units information in the |myData|
% structure.
myData.Units = answer{2};
myData.MaxValue = src.UserData.MaxValue;

15-75
15 ROI-Based Processing

myData.Colormap = src.UserData.Colormap;
myData.ScaleFactor = scale;

% Reset the data stored in the |UserData| property of all existing line
% ROI objects. Use |findobj| to find all line ROI objects in the axes.
hAx = src.Parent;
hROIs = findobj(hAx,'Type','images.roi.Line');
set(hROIs,'UserData',myData);

% Update the label in each line ROI object, based on the information
% collected in the input dialog.
for i = 1:numel(hROIs)

pos = hROIs(i).Position;
diffPos = diff(pos);
mag = hypot(diffPos(1),diffPos(2));

set(hROIs(i),'Label',[num2str(mag*scale,'%30.1f') ' ' answer{2}]);

end

% Reset the |ButtonDownFcn| callback function with the current |myData|


% value.
hIm = findobj(hAx,'Type','image');
hIm.ButtonDownFcn = @(~,~) startDrawing(hAx,myData);

end

end

Create Callback Function to Delete All ROIs

Create the function to delete all ROIs. You added a custom context menu item to each line ROI in the
startDrawing callback function. This is the callback associated with that custom context menu. This
callback uses the findobj function to search for the ROI Type and deletes any found ROIs.

15-76
Measure Distances in an Image

function deleteAll(src,~)

hFig = ancestor(src,'figure');
hROIs = findobj(hFig,'Type','images.roi.Line');
delete(hROIs)

end

See Also
addlistener | beginDrawingFromPoint | Line | drawline

More About
• “Create Callbacks for Graphics Objects”
• “Overview Events and Listeners”

15-77
15 ROI-Based Processing

Use Polyline to Create Angle Measurement Tool

This example shows how to create an interactive tool that displays the angle between three vertices
in a polyline ROI.

You can change the angle by clicking and dragging the polyline vertices. When the ROI moves, it
broadcasts an event named MovingROI. By adding a listener for that event and a callback function
that executes when the event occurs, the tool can measure and display changes to the angle in real
time.

Display Image and Polyline ROI

Read and display an image.

im = imread('gantrycrane.png');
imshow(im)

Get the size of the image.

[y,x,~] = size(im);

Get the coordinates of the center of the image. The example places the vertex of the angle
measurement tool at the center of the image.

midy = ceil(y/2);
midx = ceil(x/2);

Specify the coordinates of the first point in the polyline ROI. This example places the first point in the
polyline directly above the image center.

15-78
Use Polyline to Create Angle Measurement Tool

firstx = midx;
firsty = midy - ceil(y/4);

Specify the coordinates of the third point in the polyline ROI. This example places the third point in
the polyline directly to the right of the image center.

lastx = midx + ceil(x/4);


lasty = midy;

Create an empty context menu to replace the default menu.

c = uicontextmenu;

Draw the polyline in red over the image. Specify the coordinates of the three vertices, and add a label
with instructions to interact with the polyline.

h = drawpolyline("Parent",gca, ...
"Position",[firstx,firsty;midx,midy;lastx,lasty], ...
"Label","Modify angle to begin...", ...
"Color",[0.8,0.2,0.2], ...
"UIContextMenu",c);

15-79
15 ROI-Based Processing

Add a listener that listens for movement of the ROI. When the listener detects movement, it calls the
custom callback function updateAngle. This custom function is defined in the section "Update Angle
Label Using Callback Function".

addlistener(h,'MovingROI',@(src,evt) updateAngle(src,evt));

Polyline ROIs also support interactive addition and deletion of vertices. However, an angle
measurement tool requires exactly three vertices at any time, so the addition and deletion of vertices
are undesirable interactions with the ROI. Add listeners that listen for the addition or deletion of
vertices. When you attempt to change the number of vertices, the appropriate listener calls a custom
callback function to suppress the change. These custom functions, storePositionInUserData and
recallPositionInUserData, are defined in the section "Prevent Addition or Deletion of Vertices
Using Callback Functions".

addlistener(h,'AddingVertex',@(src,evt) storePositionInUserData(src,evt));
addlistener(h,'VertexAdded',@(src,evt) recallPositionInUserData(src,evt));
addlistener(h,'DeletingVertex',@(src,evt) storePositionInUserData(src,evt));
addlistener(h,'VertexDeleted',@(src,evt) recallPositionInUserData(src,evt));

Update Angle Label Using Callback Function

Define a callback function that executes as the polyline ROI moves. This function retrieves the
current position of the three vertices, calculates the angle in degrees between the vertices, and
updates the label to display the current rotation angle.

function updateAngle(src,evt)
% Get the current position
p = evt.CurrentPosition;

% Find the angle

15-80
Use Polyline to Create Angle Measurement Tool

v1 = [p(1,1)-p(2,1), p(1,2)-p(2,2)];
v2 = [p(3,1)-p(2,1), p(3,2)-p(2,2)];
theta = acos(dot(v1,v2)/(norm(v1)*norm(v2)));

% Convert the angle to degrees


angleDegrees = (theta * (180/pi));

% Update the label to display the angle


src.Label = sprintf('(%1.0f) degrees',angleDegrees);
end

Prevent Addition or Deletion of Vertices Using Callback Functions

Define a callback function that executes when the listeners detect the 'AddingVertex' or
'DeletingVertex' events. These events occur immediately before the vertex of interest is added to or
deleted from the polyline. Store the current three polyline vertices in the UserData property.

function storePositionInUserData(src,~)
src.UserData = src.Position;
end

Define a callback function that executes when the listeners detect the 'VertexAdded' or
'VertexDeleted' events. These events occur immediately after the vertex of interest is added to or
deleted from the polyline. Restore the stored set of three polyline vertices in the UserData property.

function recallPositionInUserData(src,~)
src.Position = src.UserData;
end

See Also
addlistener | drawpolyline | Polyline

More About
• “Create Callbacks for Graphics Objects”
• “Overview Events and Listeners”

15-81
15 ROI-Based Processing

Create Freehand ROI Editing Tool

This example shows how to create a simple tool to edit the shape of a freehand ROI using another
ROI object. By default, Freehand ROI objects include waypoints that you can click and drag to adjust
the shape of the ROI. You can also add waypoints interactively to any part of the boundary.

Another way to edit the shape of freehand ROIs, offered by many popular image manipulation
programs, is an 'eraser' or 'brush' tool. This example implements one of these tools, using another
ROI object to edit the freehand ROI.

Create Freehand ROI

Create a Freehand ROI that follows the shape of a segmentation mask. For more details on this
process, see “Use Freehand ROIs to Refine Segmentation Masks” on page 15-50.

Read MRI data into the workspace.

im = dicomread('knee1.dcm');

Segment the MRI image and select the two largest regions of the mask.

segmentedLabels = imsegkmeans(im,3);
boneMask = segmentedLabels==2;
boneMask = bwareafilt(boneMask, 1);

Get the coordinates of the boundaries of the two segmented regions.

blocations = bwboundaries(boneMask,'noholes');

Convert the locations returned by bwboundaries to x,y order.

pos = blocations{1};
pos = fliplr(pos);

Display the image.

figure
hImage = imshow(im,[]);

15-82
Create Freehand ROI Editing Tool

Create a freehand ROI inside the segmented mask.

hf = drawfreehand('Position', pos);

15-83
15 ROI-Based Processing

Create the Freehand ROI Editing Tool

Create a Circle ROI that will be used as the eraser or brush ROI editing tool. (You can use any of the
images.roi.* classes by making a small change, mentioned below).

he = images.roi.Circle(...
'Center', [50 50],...
'Radius', 10,...
'Parent', gca,...
'Color','r');

15-84
Create Freehand ROI Editing Tool

Associate two event listeners with the Circle ROI. One listens for ROI movement and the other listens
for when movement stops. The ROI moving callback function , the example makes sure to have its
position snap to pixel locations and also to change color (Red/Green) to indicate if the edit operation
will remove or add to the target freehand ROI. Once the editor ROI stops moving, we will create
corresponding binary masks for the editor ROI and the target freehand ROI and make the required
edit. Finally, we'll transform the updated mask back to a freehand ROI object.Wire up a listener to
react whenever this editor ROI is moved

addlistener(he,'MovingROI', @(varargin)editorROIMoving(he, hf));


addlistener(he,'ROIMoved', @(varargin)editFreehand(hf, he));

Interactively Edit the Freehand ROI

This animation shows the add and remove edit operation.

15-85
15 ROI-Based Processing

This is the ROI moving callback function. This function ensure that the editor ROI snaps to the pixel
grid, and changes the color of the editor ROI to indicate if it will add to the freehand ROI or a remove
a region from the freehand ROI. If the center of the editor ROI is outside the target freehand ROI,
removes operation, otherwise it will 'add'.

function editorROIMoving(he, hf)


% Snap editor ROI to grid
he.Position = round(he.Position);

% Check if the circle ROI's center is inside or outside the freehand ROI.
center = he.Center;
isAdd = hf.inROI(center(1), center(2));
if isAdd
% Green if inside (since we will add to the freehand).
he.Color = 'g';
else
% Red otherwise.
he.Color = 'r';
end
end

15-86
Create Freehand ROI Editing Tool

This is the edit freehand ROI callback that adds or removes the region of the editor ROI that
intersects the target freehand ROI.

function editFreehand(hf, he)

% Create a mask for the target freehand.


tmask = hf.createMask();
[m, n,~] = size(tmask);
% Include the boundary pixel locations
boundaryInd = sub2ind([m,n], hf.Position(:,2), hf.Position(:,1));
tmask(boundaryInd) = true;

% Create a mask from the editor ROI


emask = he.createMask();
boundaryInd = sub2ind([m,n], he.Position(:,2), he.Position(:,1));
emask(boundaryInd) = true;

% Check if center of the editor ROI is inside the target freehand. If you
% use a different editor ROI, ensure to update center computation.
center = he.Center; %
isAdd = hf.inROI(center(1), center(2));
if isAdd
% Add the editor mask to the freehand mask
newMask = tmask|emask;
else
% Delete out the part of the freehand which intersects the editor
newMask = tmask&~emask;
end

% Update the freehand ROI


perimPos = bwboundaries(newMask, 'noholes');
hf.Position = [perimPos{1}(:,2), perimPos{1}(:,1)];

end

See Also
dicomread | imsegkmeans | bwareafilt | bwboundaries | drawfreehand | Freehand | Circle |
addlistener | inROI | createMask

More About
• “Create Callbacks for Graphics Objects”
• “Overview Events and Listeners”

15-87
15 ROI-Based Processing

Use Wait Function After Drawing ROI

This example shows how to define a custom wait function that blocks the MATLAB® command line
until you finish positioning a rectangle.

Display an image.

imshow('pears.png')

Draw a rectangle ROI in the top left corner of the image.

h = drawrectangle('Position',[1 1 100 100]);

Use a custom wait function to block the MATLAB command line while you interact with the rectangle.
This example specifies a function called customWait (defined at the end of the example).

While the command line is blocked, resize and reposition the rectangle so that it encompasses one
pear. Double-click on the rectangle to resume execution of the customWait function. The function
returns the final position of the rectangle.

pos = customWait(h)

15-88
Use Wait Function After Drawing ROI

pos = 1×4

262.0000 36.0000 144.0000 145.0000

This is the custom wait function that blocks the program execution when you click an ROI. When you
have finished interacting with the ROI, the function returns the position of the ROI.

function pos = customWait(hROI)

% Listen for mouse clicks on the ROI


l = addlistener(hROI,'ROIClicked',@clickCallback);

% Block program execution


uiwait;

% Remove listener
delete(l);

% Return the current position


pos = hROI.Position;

end

This click callback function resumes program execution when you double-click the ROI. Note that
event data is passed to the callback function as an images.roi.ROIClickedEventData object,

15-89
15 ROI-Based Processing

which enables you to define callback functions that respond to different types of actions. For example,
you could define a callback function to resume program execution when you click the ROI while
pressing the Shift key or when you click a specific part of the ROI such as the label.

function clickCallback(~,evt)

if strcmp(evt.SelectionType,'double')
uiresume;
end

end

See Also
drawrectangle | Rectangle | addlistener | uiresume | uiwait

More About
• “Create Callbacks for Graphics Objects”
• “Overview Events and Listeners”

15-90
Interactive Image Inpainting Using Exemplar Matching

Interactive Image Inpainting Using Exemplar Matching

This example shows how to interactively select image regions and inpaint the selected regions by
using the exemplar based-matching method. Interactive inpainting allows you to select a region
multiple times and perform inpainting iteratively to achieve the desired results.

In this example, you perform region filling and object removal by:

• Interactively selecting the inpainting region.


• Dynamically updating the parameter values.
• Visualizing the results dynamically.

Read Image

Read an image to inpaint into the workspace. The image has missing image regions to be filled
through inpainting.

I = imread('greensdistorted.png');

Create Interactive Figure Window

Create an interactive figure window to display the image to be inpainted. In the window, you can
select the region of interest (ROI), and dynamically update the parameter values.

h = figure('Name','Interactive Image Inpainting','Position',[0,0,700,400]);

% Create a panel in the current figure to interactively set the parameter


% values.
dataPanel = uipanel(h,'Position',[0.01 0.5 0.25 0.5],'Title','Set parameter values','FontSize',10

% Add an user control interface for specifying the patch size.


% Set the default patch size value to 9.
uicontrol(dataPanel,'Style','text','String','Enter Patch Size','FontSize',10,'Position',[1 150 12
data.patchSize = uicontrol(dataPanel,'Style','edit','String',num2str(9),'Position',[7 130 60 20])

% Add an user control interface for selecting the fill order.


% Set the default fill order to gradient.
uicontrol(dataPanel,'Style','text','String','Select Filling Order','FontSize',10,'Position',[5 10
data.fillOrder = uicontrol(dataPanel,'Style','popupmenu','String',{'gradient','tensor'},'Position

% Create a panel in the current figure to display the image.


viewPanel = uipanel(h,'Position',[0.25 0 0.8 1],'Title','Interactive Inpainting','FontSize',10);
ax = axes(viewPanel);

Display the image in the interactive figure window.

hImage = imshow(I,'Parent',ax);

Select and Inpaint Image Regions Interactively

Select ROIs interactively and dynamically inpaint the selected ROIs by using the callback function
clickCallback. Assign a function handle that references the clickCallback function to the
ButtonDownFcn property of the image object.

hImage.ButtonDownFcn = @(hImage,eventdata)clickCallback(hImage,eventdata,data);

15-91
15 ROI-Based Processing

Interactively inpaint the image by following these steps.

Step 1: Choose the patch size and the fill order for inpainting. To inpaint with local parameter values,
modify the patch size and the fill order to the desired values by using user controls in the interactive
figure window.

The choice of patch size and fill order influences the quality of inpainting, and the best values to use
depend on the characteristics of the image region to be inpainted.

The default patch size value is set to 9.

• To inpaint regions with regular textures, choose a larger patch size and achieve seamless
inpainting.
• To inpaint regions that are locally uniform with respect to a small neighborhood, choose smaller
patch size.

The default fill order is set to 'gradient'. You can choose a 'gradient' or 'tensor' based fill order
for inpainting image regions. However, 'tensor' based fill order is more suitable for inpainting image
regions with linear structures and regular textures.

Step 2: Create a freehand ROI interactively by using your mouse. Position the pointer on the axes
and click and drag to draw the ROI shape. Release the pointer to close the shape.

The function dynamically updates the parameter values specified by using the user control interface
and inpaints the selected ROI. Repeat steps 1 and 2 in order to inpaint all the desired regions in the
image.

15-92
Interactive Image Inpainting Using Exemplar Matching

Create Callback Function to Select and Inpaint ROIs

Create the clickCallback to be used with the ButtonDownFcn to interactively select and inpaint
ROIs.

function clickCallback(src,~,data)
% Get the parameter values for inpainting.
fillOrder = data.fillOrder.String{data.fillOrder.Value};
pSize = data.patchSize.String;
patchSize = str2double(pSize);
% Select and draw freehand ROI.
h = drawfreehand('Parent',src.Parent);
% Create a binary mask of the selected ROI.
mask = h.createMask(src.CData);
% Run exemplar-based inpainting algorithm with user given parameters.
newImage = inpaintExemplar(src.CData,mask,'PatchSize',patchSize,'FillOrder',fillOrder);
% Update input image with output.
src.CData = newImage;
% Delete ROI handle.
delete(h);
end

See Also
inpaintExemplar | createMask | Freehand | drawfreehand

15-93
15 ROI-Based Processing

More About
• “Create Callbacks for Graphics Objects”
• “Overview Events and Listeners”

15-94
Classify Pixels That Are Partially Enclosed by ROI

Classify Pixels That Are Partially Enclosed by ROI


When creating a binary mask from a region of interest (ROI), an algorithm must determine which
pixels are included in the region. This determination can be difficult when pixels on the edge of a
region are only partially covered by the border line.

This figure illustrates a triangular region of interest, examining in close-up one of the vertices of the
ROI. The figure shows how pixels can be partially covered by the border of a ROI.

To determine which pixels are in the region, many Image Processing Toolbox functions use this
algorithm:

1 Divide each pixel into a 5-by-5 subpixel grid

The figure shows the pixel that contains the vertex of an ROI, and the 5-by-5 subpixel grid
dividing the pixel

2 Adjust the position of the vertices

15-95
15 ROI-Based Processing

Move each vertex of the polygon to the nearest intersection of the subpixel grid. Round x and y
coordinates to the nearest subpixel grid corner. This creates a second, modified polygon.

The figure shows the modified vertex with a red "X".

3 Draw a path between adjusted vertices

Form a path from each adjusted vertex to the next, following the edges of the subpixel grid. The
figure shows a portion of this modified polygon by the thick dark lines.

4 Determine which border pixels are inside the polygon

Use the following rule to determine which border pixels are inside the polygon: if the pixel's
central subpixel is inside the boundaries defined by the path between adjusted vertices, then the
pixel is inside the region.

In the following figure, the central subpixels of pixels on the ROI border are shaded a dark gray
color. Pixels inside the polygon are shaded a lighter gray. Note that the pixel containing the
vertex is not part of the ROI because its center pixel is not inside the modified polygon.

15-96
Classify Pixels That Are Partially Enclosed by ROI

See Also
createMask | roipoly | poly2mask | regionfill

15-97
16

Color

This chapter describes the toolbox functions that help you work with color image data. Note that
"color" includes shades of gray; therefore much of the discussion in this chapter applies to grayscale
images as well as color images.

• “Display Colors” on page 16-2


• “Reduce the Number of Colors in an Image” on page 16-3
• “Profile-Based Color Space Conversions” on page 16-10
• “Device-Independent Color Spaces” on page 16-13
• “Understanding Color Spaces and Color Space Conversion” on page 16-15
• “Convert Between RGB and HSV Color Spaces” on page 16-20
• “Determine If L*a*b* Value Is in RGB Gamut” on page 16-24
• “Comparison of Auto White Balance Algorithms” on page 16-25
• “Calculate CIE94 Color Difference of Colors on Test Chart” on page 16-42
16 Color

Display Colors
The number of bits per screen pixel determines the display's screen bit depth. The screen bit depth
determines the screen color resolution, which is how many distinct colors the display can produce.

Most computer displays use 8, 16, or 24 bits per screen pixel. Depending on your system, you might
be able to choose the screen bit depth you want to use. In general, 24-bit display mode produces the
best results. If you need to use a lower screen bit depth, 16-bit is generally preferable to 8-bit.
However, keep in mind that a 16-bit display has certain limitations, such as

• An image might have finer gradations of color than a 16-bit display can represent. If a color is
unavailable, MATLAB uses the closest approximation.
• There are only 32 shades of gray available. If you are working primarily with grayscale images,
you might get better display results using 8-bit display mode, which provides up to 256 shades of
gray.

To determine the bit depth of your system's screen, enter this command at the MATLAB prompt.

get(0,'ScreenDepth')
ans =

32

The integer MATLAB returns represents the number of bits per screen pixel:

Value Screen Bit Depth


8 8-bit displays support 256 colors. An 8-bit display can produce any of the colors
available on a 24-bit display, but only 256 distinct colors can appear at one time.
(There are 256 shades of gray available, but if all 256 shades of gray are used, they
take up all the available color slots.)
16 16-bit displays usually use 5 bits for each color component, resulting in 32 (that is,
25) levels each of red, green, and blue. This supports 32,768 (that is, 215) distinct
colors (of which 32 are shades of gray). Some systems use the extra bit to increase
the number of levels of green that can be displayed. In this case, the number of
different colors supported by a 16-bit display is actually 64,536 (that is, 216).
24 24-bit displays use 8 bits for each of the three color components, resulting in 256
(that is, 28) levels each of red, green, and blue. This supports 16,777,216 (that is,
224) different colors. Of these colors, 256 are shades of gray. Shades of gray occur
where R=G=B. The 16 million possible colors supported by 24-bit display can
render a lifelike image.
32 32-bit displays use 24 bits to store color information and use the remaining 8 bits
to store transparency data (alpha channel). For information about how MATLAB
supports the alpha channel, see “Add Transparency to Graphics Objects”.

Regardless of the number of colors your system can display, MATLAB can store and process images
with very high bit depths: 224 colors for uint8 RGB images, 248 colors for uint16 RGB images, and
2159 for double RGB images. These images are displayed best on systems with 24-bit color, but
usually look fine on 16-bit systems as well. For information about reducing the number of colors used
by an image, see “Reduce the Number of Colors in an Image” on page 16-3.

16-2
Reduce the Number of Colors in an Image

Reduce the Number of Colors in an Image


On systems with 24-bit color displays, truecolor images can display up to 16,777,216 (that is, 224)
colors. On systems with lower screen bit depths, truecolor images are still displayed reasonably well
because MATLAB automatically uses color approximation and dithering if needed. Color
approximation is the process by which the software chooses replacement colors in the event that
direct matches cannot be found.

Indexed images, however, might cause problems if they have a large number of colors. In general,
you should limit indexed images to 256 colors for the following reasons:

• On systems with 8-bit display, indexed images with more than 256 colors will need to be dithered
or mapped and, therefore, might not display well.
• On some platforms, colormaps cannot exceed 256 entries.
• If an indexed image has more than 256 colors, MATLAB cannot store the image data in a uint8
array, but generally uses an array of data type double instead, making the storage size of the
image much larger (each pixel uses 64 bits).
• Most image file formats limit indexed images to 256 colors. If you write an indexed image with
more than 256 colors (using imwrite) to a format that does not support more than 256 colors,
you will receive an error.

This topic elaborates on methods to reduce the number of colors in an image.

Reduce Colors of Truecolor Image Using Color Approximation


To reduce the number of colors in an image, use the rgb2ind function. This function converts a
truecolor image to an indexed image, reducing the number of colors in the process. rgb2ind
provides the following methods for approximating the colors in the original image:

• Quantization (described in “Quantization” on page 16-3)

• Uniform quantization
• Minimum variance quantization
• Colormap mapping (described in “Colormap Mapping” on page 16-6)

The quality of the resulting image depends on the approximation method you use, the range of colors
in the input image, and whether or not you use dithering. Note that different methods work better for
different images. See “Reduce Colors Using Dithering” on page 16-7 for a description of dithering
and how to enable or disable it.

Quantization

Reducing the number of colors in an image involves quantization. The function rgb2ind uses
quantization as part of its color reduction algorithm. rgb2ind supports two quantization methods:
uniform quantization and minimum variance quantization.

An important term in discussions of image quantization is RGB color cube. The RGB color cube is a
three-dimensional array of all of the colors that are defined for a particular data type. Since RGB
images in MATLAB can be of type uint8, uint16, or double, three possible color cube definitions
exist. For example, if an RGB image is of data type uint8, 256 values are defined for each color plane
(red, blue, and green), and, in total, there will be 224 (or 16,777,216) colors defined by the color cube.
This color cube is the same for all uint8 RGB images, regardless of which colors they actually use.

16-3
16 Color

The uint8, uint16, and double color cubes all have the same range of colors. In other words, the
brightest red in a uint8 RGB image appears the same as the brightest red in a double RGB image.
The difference is that the double RGB color cube has many more shades of red (and many more
shades of all colors). The following figure shows an RGB color cube for a uint8 image.

RGB Color Cube for uint8 Images

Quantization involves dividing the RGB color cube into a number of smaller boxes, and then mapping
all colors that fall within each box to the color value at the center of that box.

Uniform quantization and minimum variance quantization differ in the approach used to divide up the
RGB color cube. With uniform quantization, the color cube is cut up into equal-sized boxes (smaller
cubes). With minimum variance quantization, the color cube is cut up into boxes (not necessarily
cubes) of different sizes; the sizes of the boxes depend on how the colors are distributed in the image.
Uniform Quantization

To perform uniform quantization, call rgb2ind and specify a tolerance. The tolerance determines the
size of the cube-shaped boxes into which the RGB color cube is divided. The allowable range for a
tolerance setting is [0,1]. For example, if you specify a tolerance of 0.1, then the edges of the boxes
are one-tenth the length of the RGB color cube and the maximum total number of boxes is

n = (floor(1/tol)+1)^3

The commands below perform uniform quantization with a tolerance of 0.1.

RGB = imread('peppers.png');
[x,map] = rgb2ind(RGB, 0.1);

The following figure illustrates uniform quantization of a uint8 image. For clarity, the figure shows a
two-dimensional slice (or color plane) from the color cube where red=0 and green and blue range
from 0 to 255. The actual pixel values are denoted by the centers of the x markers.

16-4
Reduce the Number of Colors in an Image

Uniform Quantization on a Slice of the RGB Color Cube

After the color cube has been divided, all empty boxes are thrown out. Therefore, only one of the
boxes is used to produce a color for the colormap. As shown earlier, the maximum length of a
colormap created by uniform quantization can be predicted, but the colormap can be smaller than the
prediction because rgb2ind removes any colors that do not appear in the input image.

Minimum Variance Quantization

To perform minimum variance quantization, call rgb2ind and specify the maximum number of colors
in the output image's colormap. The number you specify determines the number of boxes into which
the RGB color cube is divided. These commands use minimum variance quantization to create an
indexed image with 185 colors.

RGB = imread('peppers.png');
[X,map] = rgb2ind(RGB,185);

Minimum variance quantization works by associating pixels into groups based on the variance
between their pixel values. For example, a set of blue pixels might be grouped together because they
have a small variance from the center pixel of the group.

In minimum variance quantization, the boxes that divide the color cube vary in size, and do not
necessarily fill the color cube. If some areas of the color cube do not have pixels, there are no boxes
in these areas.

While you set the number of boxes, n, to be used by rgb2ind, the placement is determined by the
algorithm as it analyzes the color data in your image. Once the image is divided into n optimally
located boxes, the pixels within each box are mapped to the pixel value at the center of the box, as in
uniform quantization.

The resulting colormap usually has the number of entries you specify. This is because the color cube
is divided so that each region contains at least one color that appears in the input image. If the input
image uses fewer colors than the number you specify, the output colormap will have fewer than n
colors, and the output image will contain all of the colors of the input image.

16-5
16 Color

The following figure shows the same two-dimensional slice of the color cube as shown in the
preceding figure (demonstrating uniform quantization). Eleven boxes have been created using
minimum variance quantization.

Minimum Variance Quantization on a Slice of the RGB Color Cube

For a given number of colors, minimum variance quantization produces better results than uniform
quantization, because it takes into account the actual data. Minimum variance quantization allocates
more of the colormap entries to colors that appear frequently in the input image. It allocates fewer
entries to colors that appear infrequently. As a result, the accuracy of the colors is higher than with
uniform quantization. For example, if the input image has many shades of green and few shades of
red, there will be more greens than reds in the output colormap. Note that the computation for
minimum variance quantization takes longer than that for uniform quantization.

Colormap Mapping

If you specify an actual colormap to use, rgb2ind uses colormap mapping (instead of quantization) to
find the colors in the specified colormap that best match the colors in the RGB image. This method is
useful if you need to create images that use a fixed colormap. For example, if you want to display
multiple indexed images on an 8-bit display, you can avoid color problems by mapping them all to the
same colormap. Colormap mapping produces a good approximation if the specified colormap has
similar colors to those in the RGB image. If the colormap does not have similar colors to those in the
RGB image, this method produces poor results.

This example illustrates mapping two images to the same colormap. The colormap used for the two
images is created on the fly using the MATLAB function colorcube, which creates an RGB colormap
containing the number of colors that you specify. (colorcube always creates the same colormap for a
given number of colors.) Because the colormap includes colors all throughout the RGB color cube, the
output images can reasonably approximate the input images.

RGB1 = imread('autumn.tif');
RGB2 = imread('peppers.png');
X1 = rgb2ind(RGB1,colorcube(128));
X2 = rgb2ind(RGB2,colorcube(128));

16-6
Reduce the Number of Colors in an Image

Note The function imshow is also helpful for displaying multiple indexed images. For more
information, see “Display Images Individually in the Same Figure” on page 4-16 or the reference page
for imshow.

Reduce Colors of Indexed Image Using imapprox


Use imapprox when you need to reduce the number of colors in an indexed image. imapprox is
based on rgb2ind and uses the same approximation methods. Essentially, imapprox first calls
ind2rgb to convert the image to RGB format, and then calls rgb2ind to return a new indexed image
with fewer colors.

For example, these commands create a version of the trees image with 64 colors, rather than the
original 128.

load trees
[Y,newcmap] = imapprox(X,map,64);
imshow(Y,newcmap)

The quality of the resulting image depends on the approximation method you use, the range of colors
in the input image, and whether or not you use dithering. Note that different methods work better for
different images. See “Reduce Colors Using Dithering” on page 16-7 for a description of dithering
and how to enable or disable it.

Reduce Colors Using Dithering

When you use rgb2ind or imapprox to reduce the number of colors in an image, the resulting
image might look inferior to the original, because some of the colors are lost. rgb2ind and
imapprox both perform dithering to increase the apparent number of colors in the output image.
Dithering changes the colors of pixels in a neighborhood so that the average color in each
neighborhood approximates the original RGB color.

For an example of how dithering works, consider an image that contains a number of dark orange
pixels for which there is no exact match in the colormap. To create the appearance of this shade of
orange, dithering selects a combination of colors from the colormap, that, taken together as a six-
pixel group, approximate the desired shade of orange. From a distance, the pixels appear to be the
correct shade, but if you look up close at the image, you can see a blend of other shades. To illustrate
dithering, the following example loads a 24-bit truecolor image, and then uses rgb2ind to create an
indexed image with just eight colors. The first example does not use dithering, the second does use
dithering.

Read image and display it.

rgb = imread('onion.png');
imshow(rgb)

16-7
16 Color

Create an indexed image with eight colors and without dithering.

[X_no_dither,map] = rgb2ind(rgb,8,'nodither');
imshow(X_no_dither,map)

Create an indexed image using eight colors with dithering. Notice that the dithered image has a
larger number of apparent colors but is somewhat fuzzy-looking. The image produced without
dithering has fewer apparent colors, but an improved spatial resolution when compared to the
dithered image. One risk in doing color reduction without dithering is that the new image can contain
false contours.

[X_dither,map] = rgb2ind(rgb,8,'dither');
imshow(X_dither,map)

16-8
Reduce the Number of Colors in an Image

16-9
16 Color

Profile-Based Color Space Conversions


If two colors have the same CIE colorimetry, they will match if viewed under the same conditions.
However, because color images are typically produced for a wide variety of viewing environments, it
is necessary to go beyond simple application of the CIE system.

For this reason, the International Color Consortium (ICC) has defined a Color Management System
(CMS) that provides a means for communicating color information among input, output, and display
devices. The CMS uses device profiles that contain color information specific to a particular device.
Vendors that support CMS provide profiles that characterize the color reproduction of their devices,
and methods, called Color Management Modules (CMM), that interpret the contents of each profile
and perform the necessary image processing.

Device profiles contain the information that color management systems need to translate color data
between devices. Any conversion between color spaces is a mathematical transformation from some
domain space to a range space. With profile-based conversions, the domain space is often called the
source space and the range space is called the destination space. In the ICC color management
model, profiles are used to represent the source and destination spaces.

For more information about color management systems, go to the International Color Consortium
website, https://www.color.org.

Read ICC Profiles


To read an ICC profile into the workspace, use the iccread function. In this example, the function
reads in the profile for the color space that describes color monitors.

P = iccread('sRGB.icm');

You can use the iccfind function to find ICC color profiles on your system, or to find a particular
ICC color profile whose description contains a certain text string. To get the name of the directory
that is the default system repository for ICC profiles, use iccroot.

iccread returns the contents of the profile in the structure P. All profiles contain a header, a tag
table, and a series of tagged elements. The header contains general information about the profile,
such as the device class, the device color space, and the file size. The tagged elements, or tags, are
the data constructs that contain the information used by the CMM. For more information about the
contents of this structure, see the iccread function reference page.

Using iccread, you can read both Version 2 (ICC.1:2001-04) or Version 4 (ICC.1:2001-12) ICC profile
formats. For detailed information about these specifications and their differences, visit the ICC
website, https://www.color.org.

Write ICC Profile Information to a File


To export ICC profile information from the workspace to a file, use the iccwrite function. This
example reads a profile into the workspace and then writes the profile information out to a new file.

P = iccread('sRGB.icm');
P_new = iccwrite(P,'my_profile.icm');

16-10
Profile-Based Color Space Conversions

iccwrite returns the profile it writes to the file in P_new because it can be different than the input
profile P. For example, iccwrite updates the Filename field in P to match the name of the file
specified as the second argument.

When it creates the output file, iccwrite checks the validity of the input profile structure. If any
required fields are missing, iccwrite returns an error message. For more information about the
writing ICC profile data to a file, see the iccwrite function reference page. To determine if a
structure is a valid ICC profile, use the isicc function.

Using iccwrite, you can export profile information in both Version 2 (ICC.1:2001-04) or Version 4
(ICC.1:2001-12) ICC profile formats. The value of the Version field in the file profile header
determines the format version. For detailed information about these specifications and their
differences, visit the ICC website, https://www.color.org.

Convert RGB to CMYK Using ICC Profiles


This example shows how to convert color data from the RGB color space used by a monitor to the
CMYK color space used by a printer. This conversion requires two profiles: a monitor profile and a
printer profile. The source color space in this example is monitor RGB and the destination color space
is printer CMYK:

Import RGB color space data. This example imports an RGB color image into the workspace.

I_rgb = imread('peppers.png');

Read ICC profiles. Read the source and destination profiles into the workspace. This example uses
the sRGB profile as the source profile. The sRGB profile is an industry-standard color space that
describes a color monitor.

inprof = iccread('sRGB.icm');

For the destination profile, the example uses a profile that describes a particular color printer. The
printer vendor supplies this profile. (The following profile and several other useful profiles can be
obtained as downloads from www.adobe.com.)

outprof = iccread('USSheetfedCoated.icc');

Create a color transformation structure. You must create a color transformation structure to define
the conversion between the color spaces in the profiles. You use the makecform function to create
the structure, specifying a transformation type string as an argument. This example creates a color
transformation structure that defines a conversion from RGB color data to CMYK color data. The
color space conversion might involve an intermediate conversion into a device-independent color
space, called the Profile Connection Space (PCS), but this is transparent to the user.

C = makecform('icc',inprof,outprof);

Perform the conversion. You use the applycform function to perform the conversion, specifying as
arguments the color data you want to convert and the color transformation structure that defines the
conversion. The function returns the converted data.

I_cmyk = applycform(I_rgb,C);

Write the converted data to a file. To export the CMYK data, use the imwrite function, specifying the
format as TIFF. If the format is TIFF and the data is an m-by-n-by-4 array, imwrite writes CMYK data
to the file.

16-11
16 Color

imwrite(I_cmyk,'pep_cmyk.tif','tif')

To verify that the CMYK data was written to the file, use imfinfo to get information about the file
and look at the PhotometricInterpretation field.

info = imfinfo('pep_cmyk.tif');
info.PhotometricInterpretation

ans =
'CMYK'

What is Rendering Intent in Profile-Based Conversions?


For most devices, the range of reproducible colors is much smaller than the range of colors
represented by the PCS. It is for this reason that four rendering intents (or gamut mapping
techniques) are defined in the profile format. Each one has distinct aesthetic and color-accuracy
tradeoffs.

When you create a profile-based color transformation structure, you can specify the rendering intent
for the source as well as the destination profiles. For more information, see the makecform reference
information.

16-12
Device-Independent Color Spaces

Device-Independent Color Spaces


The standard terms used to describe colors, such as hue, brightness, and intensity, are subjective and
make comparisons difficult.

In 1931, the International Commission on Illumination, known by the acronym CIE, for Commission
Internationale de l'Éclairage, studied human color perception and developed a standard, called the
CIE XYZ. This standard defined a three-dimensional space where three values, called tristimulus
values, define a color. This standard is still widely used today.

In the decades since that initial specification, the CIE has developed several additional color space
specifications that attempt to provide alternative color representations that are better suited to some
purposes than XYZ. For example, in 1976, in an effort to get a perceptually uniform color space that
could be correlated with the visual appearance of colors, the CIE created the L*a*b* color space.

Convert Between Device-Independent Color Spaces


Image Processing Toolbox supports conversions between members of the CIE family of device-
independent color spaces. In addition, the toolbox also supports conversions between these CIE color
spaces and the sRGB color space. This color space was defined by an industry group to describe the
characteristics of a typical PC monitor.

This table lists all the device-independent color spaces that the toolbox supports.

Color Space Description Supported


Conversions
XYZ The original, 1931 CIE color space specification. xyY, uvl, u′v′L, and
L*a*b*
xyY CIE specification that provides normalized chromaticity values. XYZ
The capital Y value represents luminance and is the same as in
XYZ.
uvL CIE specification that attempts to make the chromaticity plane XYZ
more visually uniform. L is luminance and is the same as Y in XYZ.
u′v′L CIE specification in which u and v are rescaled to improve XYZ
uniformity.
L*a*b* CIE specification that attempts to make the luminance scale more XYZ
perceptually uniform. L* is a nonlinear scaling of L, normalized to
a reference white point.
L*ch CIE specification where c is chroma and h is hue. These values L*a*b*
are a polar coordinate conversion of a* and b* in L*a*b*.
sRGB Standard adopted by major manufacturers that characterizes the XYZ and L*a*b*
average PC monitor.

Color Space Data Encodings


When you convert between two device-independent color spaces, the data type used to encode the
color data can sometimes change, depending on what encodings the color spaces support. In the
preceding example, the original image is uint8 data. The XYZ conversion is uint16 data. The XYZ

16-13
16 Color

color space does not define a uint8 encoding. The following table lists the data types that can be
used to represent values in all the device-independent color spaces.

Color Space Encodings


XYZ uint16 or double
xyY double
uvL double
u'v'L double
L*a*b* uint8, uint16, or double
L*ch double
RGB double uint8 uint16

As the table indicates, certain color spaces have data type limitations. For example, the XYZ color
space does not define a uint8 encoding. If you convert 8-bit CIE LAB data into the XYZ color space,
the data is returned in uint16 format. To change the encoding of XYZ data, use these functions:

• xyz2double
• xyz2uint16

To change the encoding of L*a*b* data, use these functions:

• lab2double
• lab2uint8
• lab2uint16

To change the encoding of RGB data, use these functions:

• im2double
• im2uint8
• im2uint16

16-14
Understanding Color Spaces and Color Space Conversion

Understanding Color Spaces and Color Space Conversion


The Image Processing Toolbox software typically represents colors as red, green, and blue (RGB)
numeric values. However, there are other models besides RGB for representing colors numerically.
The various models are referred to as color spaces because most of them can be mapped into a 2-D,
3-D, or 4-D coordinate system.

The various color spaces exist because they present color information in ways that make certain
calculations more convenient or because they provide a way to identify colors that is more intuitive.
For example, the RGB color space defines a color as the percentages of red, green, and blue hues
mixed together. Other color models describe colors by their hue (shade of color), saturation (amount
of gray or pure color), and luminance (intensity, or overall brightness).

The toolbox enables converting color data from one color space to another through mathematical
transformations.

RGB
The RGB color space represents images as an m-by-n-by-3 numeric array whose elements specify the
intensity values of the red, green, and blue color channels. The range of numeric values depends on
the data type of the image.

• For single or double arrays, RGB values range from [0, 1].
• For uint8 arrays, RGB values range from [0, 255].
• For uint16 arrays, RGB values range from [0, 65535].

The toolbox supports variations of the RGB color space.

RGB Color Space Description


Linear RGB Linear RGB values are raw data obtained from a camera sensor. The value
of R, G, and B are directly proportional to the amount of light that
illuminates the sensor. Preprocessing of raw image data, such as white
balance, color balance, and chromatic aberration compensation, are
performed on linear RGB values.

16-15
16 Color

RGB Color Space Description


sRGB sRGB values apply a nonlinear function, called gamma correction on page
8-67, to linear RGB values. Images are frequently displayed in the sRGB
color space because they appear brighter and colors are easier to
distinguish. The parametric curve used to transform linear RGB values to
the sRGB color space is:

f(u) = -f(-u), u<0

f(u) = c ⋅ u, 0≤u<d

f(u) = a ⋅ uɣ + b, u ≥ d,

where u represents one of the R, G, or B color values with these


parameters:

a = 1.055

b = –0.055

c = 12.92

d = 0.0031308

ɣ = 1/2.4
Adobe RGB (1998) Adobe RGB (1998) RGB values apply gamma correction to linear RGB
values using a simple power function:

v = uɣ, u≥0

v = -(-u)ɣ, u < 0,

with

ɣ = 1/2.19921875

HSV
The HSV (Hue, Saturation, Value) color space corresponds better to how people experience color
than the RGB color space does. For example, this color space is often used by people who are
selecting colors, such as paint or ink color, from a color wheel or palette.

Attribute Description
H Hue, which corresponds to the color’s position on a color wheel. H is in the
range [0, 1]. As H increases, colors transition from red to orange, yellow,
green, cyan, blue, magenta, and finally back to red. Both 0 and 1 indicate
red.
S Saturation, which is the amount of hue or departure from neutral. S is in
the range [0, 1]. As S increases, colors vary from unsaturated (shades of
gray) to fully saturated (no white component).

16-16
Understanding Color Spaces and Color Space Conversion

Attribute Description
V Value, which is the maximum value among the red, green, and blue
components of a specific color. V is in the range [0, 1]. As V increases, the
corresponding colors become increasingly brighter.

Illustration of the HSV Color Space

Note MATLAB and the Image Processing Toolbox software do not support the HSI color space (Hue,
Saturation, Intensity). However, if you want to work with color data in terms of hue, saturation, and
intensity, the HSV color space is very similar. Another option is to use the LCH color space
(Luminosity, Chroma, and Hue), which is a polar transformation of the CIE L*a*b* color space — see
“Device-Independent Color Spaces” on page 16-13.

Use the rgb2hsv and hsv2rgb functions to convert between the RGB and HSV color spaces.

CIE 1976 XYZ and CIE 1976 L*a*b*


CIE 1976 XYZ and CIE 1976 L*a*b* are device-independent color spaces developed by the
International Commission on Illumination, known by the acronym CIE. These color spaces model
colors according to the typical sensitivity of the three types of cone cells in the human eye.

The XYZ color space is the original model developed by the CIE. The Y channel represents the
luminance of a color. The Z channel approximately relates to the amount of blue in an image, but the
value of Z in the XYZ color space is not identical to the value of B in the RGB color space. The X
channel does not have a clear color analogy. However, if you consider the XYZ color space as a 3-D
coordinate system, then X values lie along the axis that is orthogonal to the Y (luminance) axis and
the Z axis.

The L*a*b* color space provides a more perceptually uniform color space than the XYZ model. Colors
in the L*a*b* color space can exist outside the RGB gamut (the valid set of RGB colors). For example,
when you convert the L*a*b* value [100, 100, 100] to the RGB color space, the returned value is

16-17
16 Color

[1.7682, 0.5746, 0.1940], which is not a valid RGB color. For more information, see “Determine If
L*a*b* Value Is in RGB Gamut” on page 16-24.

Attribute Description
L* Luminance or brightness of the image. Values are in the range [0, 100],
where 0 specifies black and 100 specifies white. As L* increases, colors
become brighter.
a* Amount of red or green tones in the image. A large positive a* value
corresponds to red/magenta. A large negative a* value corresponds to
green. Although there is no single range for a*, values commonly fall in the
range [-100, 100] or [-128, 127).
b* Amount of yellow or blue tones in the image. A large positive b* value
corresponds to yellow. A large negative b* value corresponds to blue.
Although there is no single range for b*, values commonly fall in the range
[-100, 100] or [-128, 127).

Device-independent color spaces include the effect of the illumination source, called the reference
white point. The source imparts a color hue to the raw image data according to the color temperature
of the illuminant. For example, sunlight during sunrise or sunset imparts a yellow hue to an image,
whereas sunlight around noontime imparts a blue hue.

Use the rgb2xyz and xyz2rgb functions to convert between the RGB and XYZ color spaces. Use the
rgb2lab and lab2rgb functions to convert between the RGB and L*a*b* color spaces.

The toolbox supports several related color space specifications that are better suited to some
purposes than XYZ. For more information see “Device-Independent Color Spaces” on page 16-13.

YCbCr
The YCbCr color space is widely used for digital video. In this format, luminance information is stored
as a single component (Y) and chrominance information is stored as two color-difference components
(Cb and Cr). Cb and Cr represent the difference between a reference value and the blue or red
component, respectively. (YUV, another color space widely used for digital video, is very similar to
YCbCr but not identical.)

Attribute Description
Y Luminance or brightness of the image. Colors increase in brightness as Y
increases.
Cb Chrominance value that indicates the difference between the blue
component and a reference value.
Cr Chrominance value that indicates the difference between the red
component and a reference value.

The range of numeric values depends on the data type of the image. YCbCr does not use the full
range of the image data type so that the video stream can include additional (non-image) information.

• For single or double arrays, Y is in the range [16/255, 235/255] and Cb and Cr are in the range
[16/255, 240/255].
• For uint8 arrays, Y is in the range [16, 235] and Cb and Cr are in the range [16, 240].

16-18
Understanding Color Spaces and Color Space Conversion

• For uint16, Y is in the range [4112, 60395] and Cb and Cr are in the range [4112, 61680].

Use the rgb2ycbcr and ycbcr2rgb functions to convert between the RGB and YCbCr color spaces.

YIQ
The National Television Systems Committee (NTSC) defines a color space known as YIQ. This color
space is used in televisions in the United States. This color space separates grayscale information
from color data, so the same signal can be used for both color and black and white television sets.

Attribute Description
Y Luma, or brightness of the image. Values are in the range [0, 1], where 0
specifies black and 1 specifies white. Colors increase in brightness as Y
increases.
I In-phase, which is approximately the amount of blue or orange tones in the
image. I in the range [-0.5959, 0.5959], where negative numbers indicate
blue tones and positive numbers indicate orange tones. As the magnitude
of I increases, the saturation of the color increases.
Q Quadrature, which is approximately the amount of green or purple tones in
the image. Q in the range [-0.5229, 0.5229], where negative numbers
indicate green tones and positive numbers indicate purple tones. As the
magnitude of Q increases, the saturation of the color increases.

Use the rgb2ntsc and ntsc2rgb functions to convert between the RGB and YIQ color spaces.

Because luminance is one of the components of the NTSC format, the RGB to NTSC conversion is also
useful for isolating the gray level information in an image. In fact, the toolbox functions rgb2gray
and ind2gray use the rgb2ntsc function to extract the grayscale information from a color image.

See Also

Related Examples
• “Convert Between RGB and HSV Color Spaces” on page 16-20
• “Determine If L*a*b* Value Is in RGB Gamut” on page 16-24

16-19
16 Color

Convert Between RGB and HSV Color Spaces

This example shows how to adjust the saturation of a color image by converting the image to the HSV
color space. The example then displays the separate HSV color planes (hue, saturation, and value) of
a synthetic image.

Convert RGB Image to HSV Image

Read an RGB image into the workspace. Display the image.

RGB = imread('peppers.png');
imshow(RGB)

Convert the image to the HSV color space.

HSV = rgb2hsv(RGB);

Process the HSV image. This example increases the saturation of the image by multiplying the S
channel by a scale factor.

[h,s,v] = imsplit(HSV);
saturationFactor = 2;

16-20
Convert Between RGB and HSV Color Spaces

s_sat = s*saturationFactor;
HSV_sat = cat(3,h,s_sat,v);

Convert the processed HSV image back to the RGB color space. Display the new RGB image. Colors
in the processed image are more vibrant.

RGB_sat = hsv2rgb(HSV_sat);
imshow(RGB_sat)

Closer Look at the HSV Color Space

For closer inspection of the HSV color space, create a synthetic RGB image.

RGB = reshape(ones(64,1)*reshape(jet(64),1,192),[64,64,3]);

Convert the synthetic RGB image to the HSV colorspace.

HSV = rgb2hsv(RGB);

Split the HSV version of the synthetic image into its component planes: hue, saturation, and value.

[h,s,v] = imsplit(HSV);

Display the individual HSV color planes with the original image.

16-21
16 Color

montage({h,s,v,RGB},"BorderSize",10,"BackgroundColor",'w');

As the hue plane image in the preceding figure illustrates, hue values make a linear transition from
high to low. If you compare the hue plane image against the original image, you can see that shades
of deep blue have the highest values, and shades of deep red have the lowest values. (As stated
previously, there are values of red on both ends of the hue scale. To avoid confusion, the sample
image uses only the red values from the beginning of the hue range.)

Saturation can be thought of as the purity of a color. As the saturation plane image shows, the colors
with the highest saturation have the highest values and are represented as white. In the center of the
saturation image, notice the various shades of gray. These correspond to a mixture of colors; the
cyans, greens, and yellow shades are mixtures of true colors. Value is roughly equivalent to

16-22
Convert Between RGB and HSV Color Spaces

brightness, and you will notice that the brightest areas of the value plane correspond to the brightest
colors in the original image.

16-23
16 Color

Determine If L*a*b* Value Is in RGB Gamut

This example shows how to use color space conversion to determine if an L*a*b* value is in the RGB
gamut. The set of colors that can be represented using a particular color space is called its gamut.
Some L*a*b* color values can be out-of-gamut when converted to RGB.

Convert an L*a*b* value to RGB. The negative values returned demonstrate that the L*a*b* color [80
-130 85] is not in the gamut of the sRGB color space, which is the default RGB color space used by
lab2rgb. An RGB color is out of gamut when any of its component values are less than 0 or greater
than 1.

lab = [80 -130 85];


lab2rgb(lab)

ans = 1×3

-0.6209 0.9537 -0.1927

Convert the L*a*b* value to RGB, this time specifying a different RGB colorspace, the Adobe® RGB
(1998) color space. The Adobe RGB (1998) has a larger gamut than sRGB. Use the ColorSpace
name-value argument. Because the output values are between 0.0 and 1.0 (inclusive), you can
conclude that the L*a*b* color [80 -130 85] is inside the Adobe RGB (1998) gamut.

lab2rgb(lab,ColorSpace="adobe-rgb-1998")

ans = 1×3

0.1236 0.9522 0.1072

16-24
Comparison of Auto White Balance Algorithms

Comparison of Auto White Balance Algorithms

This example shows how to estimate illumination and perform white balance of a scene using three
different illumination algorithms.

Eyes are very good at judging what is white under different lighting conditions. Digital cameras,
however, without some kind of adjustment, can easily capture unrealistic images with a strong color
cast. Automatic white balance (AWB) algorithms try to correct for the ambient light with minimum
input from the user, so that the resulting image looks like what our eyes would see.

Automatic white balancing is done in two steps:

• Step 1: Estimate the scene illuminant.


• Step 2: Correct the color balance of the image.

Several different algorithms exist to estimate scene illuminant.

• White Patch Retinex [1]


• Gray World [2]
• Cheng's Principal Component Analysis (PCA) method [3]

The performance of each algorithm depends on the scene, lighting, and imaging conditions. This
example judges the quality of three algorithms for one specific image by comparing them to the
ground truth scene illuminant calculated using a ColorChecker® chart.

Read and Preprocess Raw Camera Data

AWB algorithms are usually applied on the raw image data after a minimal amount of preprocessing,
before the image is compressed and saved to the memory card.

Read a 16-bit raw image into the workspace. foosballraw.tiff is an image file that contains raw
sensor data after correcting the black level and scaling the intensities to 16 bits per pixel. This image
is free of the white balancing done by the camera, as well as other preprocessing operations such as
demosaicing, denoising, chromatic aberration compensation, tone adjustments, and gamma
correction.

A = imread("foosballraw.tiff");

Interpolate to Recover Missing Color Information

Digital cameras use a color filter array superimposed on the imaging sensor to simulate color vision,
so that each pixel is sensitive to either red, green or blue. To recover the missing color information at
every pixel, interpolate using the demosaic function. The Bayer pattern used by the camera with
which the photo was captured (Canon EOS 30D) is RGGB.

A = demosaic(A,"rggb");

Gamma-Correct Image for Detection and Display

The image A contains linear RGB values. Linear RGB values are appropriate for estimating scene
illuminant and correcting the color balance of an image. However, if you try to display the linear RGB
image, it will appear very dim, because of the nonlinear characteristic of display devices. Therefore,
for display purposes, gamma-correct the image to the sRGB color space using the lin2rgb function.

16-25
16 Color

A_sRGB = lin2rgb(A);

Display the demosaiced image before and after gamma correction.

montage({A,A_sRGB})
title("Original Image Before and After Gamma Correction")

Measure Ground Truth Illuminant Using ColorChecker Chart

Calculate the ground truth illuminant using the ColorChecker chart that is included in the scene. This
chart consists of 24 neutral and color patches with known spectral reflectances.

Detect the chart in the gamma-corrected image by using the colorChecker function. The linear
RGB image is too dark for colorChecker to detect the chart automatically.

chart_sRGB = colorChecker(A_sRGB);

Confirm that the chart is detected correctly.

displayChart(chart_sRGB)

16-26
Comparison of Auto White Balance Algorithms

Get the coordinates of the registration points at the four corners of the chart.
registrationPoints = chart_sRGB.RegistrationPoints;

Create a new colorChecker object from the linear RGB data. Specify the location of the chart using
the coordinates of the registration points.
chart = colorChecker(A,RegistrationPoints=registrationPoints);

Measure the ground truth illuminant of the linear RGB data using the measureIlluminant function.
illuminant_groundtruth = measureIlluminant(chart)

illuminant_groundtruth = 1×3
103 ×

4.5407 9.3226 6.1812

Create Mask of ColorChecker Chart

When testing the AWB algorithms, prevent the algorithms from unfairly taking advantage of the chart
by masking out the chart.

Create a polygon ROI over the chart by using the drawpolygon function. Specify the vertices of the
polygon as the registration points.

16-27
16 Color

chartROI = drawpolygon(Position=registrationPoints);

Convert the polygon ROI to a binary mask by using the createMask function.

mask_chart = createMask(chartROI);

Invert the mask. Pixels within the chart are excluded from the mask and pixels of the rest of the
scene are included in the mask.

mask_scene = ~mask_chart;

To confirm the accuracy of the mask, display the mask over the image. Pixels included in the mask
have a blue tint.

imshow(labeloverlay(A_sRGB,mask_scene));

16-28
Comparison of Auto White Balance Algorithms

Angular Error

You can consider an illuminant as a vector in 3-D RGB color space. The magnitude of the estimated
illuminant does not matter as much as its direction, because the direction of the illuminant is what is
used to white balance an image.

To evaluate the quality of an estimated illuminant, compute the angular error between the estimated
illuminant and the ground truth. Angular error is the angle (in degrees) formed by the two vectors.
The smaller the angular error, the better the estimation is.

To better understand the concept of angular error, consider the following visualization of an arbitrary
illuminant and the ground truth measured using the ColorChecker chart. The plotColorAngle
helper function plots a unit vector of an illuminant in 3-D RGB color space, and is defined at the end
of the example.
sample_illuminant = [0.066 0.1262 0.0691];

p = plot3([0 1],[0 1],[0,1],LineStyle=":",Color="k");


ax = p.Parent;
hold on
plotColorAngle(illuminant_groundtruth,ax)
plotColorAngle(sample_illuminant,ax)
title("Illuminants in RGB space")
view(28,36)
legend("Achromatic Line","Ground Truth Illuminant","Sample Illuminant")

16-29
16 Color

grid on
axis equal

White Patch Retinex

The White Patch Retinex algorithm for illuminant estimation assumes that the scene contains a bright
achromatic patch. This patch reflects the maximum light possible for each color band, which is the
color of the scene illuminant. Use the illumwhite function to estimate illumination using the White
Patch Retinex algorithm.

Include All Scene Pixels

Estimate the illuminant using all the pixels in the scene. Exclude the ColorChecker chart from the
scene by using the Mask name-value pair argument.

percentileToExclude = 0;
illuminant_wp1 = illumwhite(A,percentileToExclude,Mask=mask_scene);

Compute the angular error for the illuminant estimated with White Patch Retinex.

err_wp1 = colorangle(illuminant_wp1,illuminant_groundtruth);
disp(["Angular error for White Patch with percentile=0: " num2str(err_wp1)])

Angular error for White Patch with percentile=0: 16.5381

16-30
Comparison of Auto White Balance Algorithms

White balance the image using the chromadapt function. Specify the estimated illuminant and
indicate that color values are in the linear RGB color space.
B_wp1 = chromadapt(A,illuminant_wp1,ColorSpace="linear-rgb");

Display the gamma-corrected white-balanced image.


B_wp1_sRGB = lin2rgb(B_wp1);

figure
imshow(B_wp1_sRGB)
title("White-Balanced Image using White Patch Retinex with percentile=0")

Exclude Brightest Pixels

The White Patch Retinex algorithm does not perform well when pixels are overexposed. To improve
the performance of the algorithm, exclude the top 1% of the brightest pixels.
percentileToExclude = 1;
illuminant_wp2 = illumwhite(A,percentileToExclude,Mask=mask_scene);

Calculate the angular error for the estimated illuminant. The error is less than when estimating the
illuminant using all pixels.
err_wp2 = colorangle(illuminant_wp2,illuminant_groundtruth);
disp(["Angular error for White Patch with percentile=1: " num2str(err_wp2)])

16-31
16 Color

Angular error for White Patch with percentile=1: 5.0324

White balance the image in the linear RGB color space using the estimated illuminant.
B_wp2 = chromadapt(A,illuminant_wp2,ColorSpace="linear-rgb");

Display the gamma-corrected white-balanced image with the new illuminant.


B_wp2_sRGB = lin2rgb(B_wp2);
imshow(B_wp2_sRGB)
title("White-Balanced Image using White Patch Retinex with percentile=1")

Gray World

The Gray World algorithm for illuminant estimation assumes that the average color of the world is
gray, or achromatic. Therefore, it calculates the scene illuminant as the average RGB value in the
image. Use the illumgray function to estimate illumination using the Gray World algorithm.

Include All Scene Pixels

First, estimate the scene illuminant using all pixels of the image, excluding those corresponding to
the ColorChecker chart. The illumgray function provides a parameter to specify the percentiles of
bottom and top values (ordered by brightness) to exclude. Here, specify the percentiles as 0.
percentileToExclude = 0;
illuminant_gw1 = illumgray(A,percentileToExclude,Mask=mask_scene);

16-32
Comparison of Auto White Balance Algorithms

Calculate the angular error between the estimated illuminant and the ground truth illuminant.

err_gw1 = colorangle(illuminant_gw1,illuminant_groundtruth);
disp(["Angular error for Gray World with percentiles=[0 0]: " num2str(err_gw1)])

Angular error for Gray World with percentiles=[0 0]: 5.0416

White balance the image in the linear RGB color space using the estimated illuminant.

B_gw1 = chromadapt(A,illuminant_gw1,ColorSpace="linear-rgb");

Display the gamma-corrected white-balanced image.

B_gw1_sRGB = lin2rgb(B_gw1);
imshow(B_gw1_sRGB)
title("White-Balanced Image using Gray World with percentiles=[0 0]")

Exclude Brightest and Darkest Pixels

The Gray World algorithm does not perform well when pixels are underexposed or overexposed. To
improve the performance of the algorithm, exclude the top 1% of the darkest and brightest pixels.

percentileToExclude = 1;
illuminant_gw2 = illumgray(A,percentileToExclude,Mask=mask_scene);

16-33
16 Color

Calculate the angular error for the estimated illuminant. The error is less than when estimating the
illuminant using all pixels.

err_gw2 = colorangle(illuminant_gw2,illuminant_groundtruth);
disp(["Angular error for Gray World with percentiles=[1 1]: " num2str(err_gw2)])

Angular error for Gray World with percentiles=[1 1]: 5.1094

White balance the image in the linear RGB color space using the estimated illuminant.

B_gw2 = chromadapt(A,illuminant_gw2,ColorSpace="linear-rgb");

Display the gamma-corrected white-balanced image with the new illuminant.

B_gw2_sRGB = lin2rgb(B_gw2);
imshow(B_gw2_sRGB)
title("White-Balanced Image using Gray World with percentiles=[1 1]")

Cheng's Principal Component Analysis (PCA) Method

Cheng's illuminant estimation method draws inspiration from spatial domain methods such as Grey
Edge [4], which assumes that the gradients of an image are achromatic. They show that Grey Edge
can be improved by artificially introducing strong gradients by shuffling image blocks, and conclude
that the strongest gradients follow the direction of the illuminant. Their method consists in ordering
pixels according to the norm of their projection along the direction of the mean image color, and

16-34
Comparison of Auto White Balance Algorithms

retaining the bottom and top percentile. These two groups correspond to strong gradients in the
image. Finally, they perform a principal component analysis (PCA) on the retained pixels and return
the first component as the estimated illuminant. Use the illumpca function to estimate illumination
using Cheng's PCA algorithm.

Include Default Bottom and Top 3.5 Percent of Pixels

First, estimate the illuminant using the default percentage value of Cheng's PCA method, excluding
those corresponding to the ColorChecker chart.

illuminant_ch2 = illumpca(A,Mask=mask_scene);

Calculate the angular error between the estimated illuminant and the ground truth illuminant.

err_ch2 = colorangle(illuminant_ch2,illuminant_groundtruth);
disp(["Angular error for Cheng with percentage=3.5: " num2str(err_ch2)])

Angular error for Cheng with percentage=3.5: 5.0162

White balance the image in the linear RGB color space using the estimated illuminant.

B_ch2 = chromadapt(A,illuminant_ch2,ColorSpace="linear-rgb");

Display the gamma-corrected white-balanced image.

B_ch2_sRGB = lin2rgb(B_ch2);
imshow(B_ch2_sRGB)
title("White-Balanced Image using Cheng with percentile=3.5")

16-35
16 Color

Include Bottom and Top 5 Percent of Pixels

Now, estimate the scene illuminant using the bottom and top 5% of pixels along the direction of the
mean color. The second argument of the illumpca function specifies the percentiles of bottom and
top values (ordered by brightness) to exclude.

illuminant_ch1 = illumpca(A,5,Mask=mask_scene);

Calculate the angular error between the estimated illuminant and the ground truth illuminant. The
error is less than when estimating the illuminant using the default percentage.

err_ch1 = colorangle(illuminant_ch1,illuminant_groundtruth);
disp(["Angular error for Cheng with percentage=5: " num2str(err_ch1)])

Angular error for Cheng with percentage=5: 4.7454

White balance the image in the linear RGB color space using the estimated illuminant.

B_ch1 = chromadapt(A,illuminant_ch1,ColorSpace="linear-rgb");

Display the gamma-corrected white-balanced image.

B_ch1_sRGB = lin2rgb(B_ch1);
imshow(B_ch1_sRGB)
title("White-Balanced Image using Cheng with percentage=5")

16-36
Comparison of Auto White Balance Algorithms

Find Optimal Parameters

To find the best parameter to use for each algorithm, you can sweep through a range and calculate
the angular error for each of them. The parameters of the three algorithms have different meanings,
but the similar ranges of these parameters makes it easy to programmatically search for the best one
for each algorithm.

param_range = 0:0.25:5;
err = zeros(numel(param_range),3);
for k = 1:numel(param_range)
% White Patch
illuminant_wp = illumwhite(A,param_range(k),Mask=mask_scene);
err(k,1) = colorangle(illuminant_wp,illuminant_groundtruth);
% Gray World
illuminant_gw = illumgray(A,param_range(k),Mask=mask_scene);
err(k,2) = colorangle(illuminant_gw,illuminant_groundtruth);
% Cheng
if (param_range(k) ~= 0)
illuminant_ch = illumpca(A,param_range(k),Mask=mask_scene);
err(k,3) = colorangle(illuminant_ch,illuminant_groundtruth);
else
% Cheng's algorithm is undefined for percentage=0
err(k,3) = NaN;
end
end

16-37
16 Color

Display a heatmap of the angular error using the heatmap function. Dark blue colors indicate a low
angular error while yellow colors indicate a high angular error. The optimal parameter has the
smallest angular error.
heatmap(err,Title="Angular Error",Colormap=parula(length(param_range)), ...
XData=["White Patch" "Gray World" "Cheng's PCA"], ...
YLabel="Parameter Value",YData=string(param_range));

Find the best parameter for each algorithm.


[~,idx_best] = min(err);
best_param_wp = param_range(idx_best(1));
best_param_gw = param_range(idx_best(2));
best_param_ch = param_range(idx_best(3));

fprintf("The best parameter for White Patch is %1.2f with angular error %1.2f degrees\n", ...
best_param_wp,err(idx_best(1),1));

The best parameter for White Patch is 0.25 with angular error 3.35 degrees

fprintf("The best parameter for Gray World is %1.2f with angular error %1.2f degrees\n", ...
best_param_gw,err(idx_best(2),2));

The best parameter for Gray World is 0.00 with angular error 5.04 degrees

fprintf("The best parameter for Cheng is %1.2f with angular error %1.2f degrees\n", ...
best_param_ch,err(idx_best(3),3));

16-38
Comparison of Auto White Balance Algorithms

The best parameter for Cheng is 0.50 with angular error 1.74 degrees

Calculate the estimated illuminant for each algorithm using the best parameter.
best_illum_wp = illumwhite(A,best_param_wp,Mask=mask_scene);
best_illum_gw = illumgray(A,best_param_gw,Mask=mask_scene);
best_illum_ch = illumpca(A,best_param_ch,Mask=mask_scene);

Display the angular error of each best illuminant in the RGB color space.
p = plot3([0 1],[0 1],[0,1],LineStyle=":",Color="k");
ax = p.Parent;
hold on
plotColorAngle(illuminant_groundtruth,ax)
plotColorAngle(best_illum_wp,ax)
plotColorAngle(best_illum_gw,ax)
plotColorAngle(best_illum_ch,ax)
title("Best Illuminants in RGB space")
view(28,36)
legend("Achromatic Line","Ground Truth","White Patch","Gray World","Cheng")
grid on
axis equal

Calculate the optimal white-balanced images for each algorithm using the best illuminant.
B_wp_best = chromadapt(A,best_illum_wp,ColorSpace="linear-rgb");
B_wp_best_sRGB = lin2rgb(B_wp_best);

16-39
16 Color

B_gw_best = chromadapt(A,best_illum_gw,ColorSpace="linear-rgb");
B_gw_best_sRGB = lin2rgb(B_gw_best);
B_ch_best = chromadapt(A,best_illum_ch,ColorSpace="linear-rgb");
B_ch_best_sRGB = lin2rgb(B_ch_best);

Display the optimal white-balanced images for each algorithm in a montage.

figure
montage({B_wp_best_sRGB,B_gw_best_sRGB,B_ch_best_sRGB},Size=[1 3])
title("Montage of Best White-Balanced Images: White Point, Gray World, Cheng")

Conclusion

This comparison of two classic illuminant estimation algorithms and a more recent one shows that
Cheng's method, using the top and bottom 0.75% darkest and brightest pixels, wins for that
particular image. However, this result should be taken with a grain of salt.

First, the ground truth illuminant was measured using a ColorChecker chart and is sensitive to shot
and sensor noise. The ground truth illuminant of a scene can be better estimated using a
spectrophotometer.

Second, the ground truth illuminant is estimated as the mean color of the neutral patches. It is
common to use the median instead of the mean, which could shift the ground truth by a significant
amount. For example, for the image in this study, using the same pixels, the median color and the
mean color of the neutral patches are 0.5 degrees apart, which in some cases can be more than the
angular error of the illuminants estimated by different algorithms.

Third, a full comparison of illuminant estimation algorithms should use a variety of images taken
under different conditions. One algorithm might work better than the others for a particular image,
but might perform poorly over the entire data set.

Supporting Function

The plotColorAngle function plots a unit vector of an illuminant in 3-D RGB color space. The input
argument illum specifies the illuminant as an RGB color and the input argument ax specifies the
axes on which to plot the unit vector.

function plotColorAngle(illum,ax)
R = illum(1);
G = illum(2);
B = illum(3);
magRGB = norm(illum);

16-40
Comparison of Auto White Balance Algorithms

plot3([0 R/magRGB],[0 G/magRGB],[0 B/magRGB], ...


Marker=".",MarkerSize=10,Parent=ax)
xlabel("R")
ylabel("G")
zlabel("B")
xlim([0 1])
ylim([0 1])
zlim([0 1])
end

References
[1] Ebner, Marc. White Patch Retinex, Color Constancy. John Wiley & Sons, 2007. ISBN
978-0-470-05829-9.

[2] Ebner, Marc. The Gray World Assumption, Color Constancy. John Wiley & Sons, 2007. ISBN
978-0-470-05829-9.

[3] Cheng, Dongliang, Dilip K. Prasad, and Michael S. Brown. "Illuminant estimation for color
constancy: why spatial-domain methods work and the role of the color distribution." JOSA A
31.5 (2014): 1049-1058.

[4] Van De Weijer, Joost, Theo Gevers, and Arjan Gijsenij. "Edge-based color constancy." IEEE
Transactions on image processing 16.9 (2007): 2207-2214.

See Also
colorChecker | measureColor | illumgray | illumpca | illumwhite | lin2rgb | rgb2lin |
chromadapt | colorangle

More About
• “Gamma Correction” on page 8-67

16-41
16 Color

Calculate CIE94 Color Difference of Colors on Test Chart

This example shows how to calculate the color difference of measured and reference colors using the
CIE94 standard.

The measureColor function measures colors on a test chart and calculates the color difference
between measured and reference colors using the CIE76 standard. You can use the imcolordiff
function to calculate the color difference using the CIE94 or CIEDE2000 standard.

Read an image of a ColorChecker® chart into the workspace.

I = imread("colorCheckerTestImage.jpg");

Create a colorChecker object then display the chart with ROI annotations.

chart = colorChecker(I);
displayChart(chart)

Measure the color in each color patch ROI and return the measurements in a table, colorTable. The
color difference measurements in the Delta_E variable of the table follow the CIE76 standard.

colorTable = measureColor(chart);

16-42
Calculate CIE94 Color Difference of Colors on Test Chart

On a color patch diagram, display the measured and reference colors with the corresponding CIE76
color difference superimposed on each patch.
displayColorPatch(colorTable)

Extract the reference L*a*b* and measured RGB color values into a table.
referenceLab = colorTable{:,["Reference_L","Reference_a","Reference_b"]};
measuredRGB = colorTable{:,["Measured_R","Measured_G","Measured_B"]};

Convert the measured RGB colors to the L*a*b* color space, specifying a D50 white point.
measuredLab = rgb2lab(measuredRGB,WhitePoint="d50");

Calculate the color difference using the imcolordiff function, specifying that the color
measurements are in the L*a*b* color space. By default, this function calculates color differences
using the CIE94 standard.
dE = imcolordiff(measuredLab,referenceLab,isInputLab=true);

Create a new color table using the new color difference measurements.
colorTable94 = colorTable;
colorTable94{:,"Delta_E"} = dE;

On a color patch diagram, display the measured and reference colors with the corresponding CIE94
color difference superimposed on each patch.

16-43
16 Color

displayColorPatch(colorTable94)

See Also
deltaE | imcolordiff | rgb2lab | displayChart | displayColorPatch | plotChromaticity

Related Examples
• “Correct Colors Using Color Correction Matrix” on page 14-35
• “Comparison of Auto White Balance Algorithms” on page 16-25

16-44
17

Blocked Image Processing

This topic describes how to work with big image data that does not fit in memory. Big images can
have multiple resolution levels.

• “Set Up Spatial Referencing for Blocked Images” on page 17-2


• “Process Blocked Images Efficiently Using Partial Images or Lower Resolutions” on page 17-13
• “Process Blocked Images Efficiently Using Mask” on page 17-22
• “Explore Blocked Image Details with Interactive ROIs” on page 17-34
• “Warp Blocked Image at Coarse and Fine Resolution Levels” on page 17-42
• “Create Labeled Blocked Image from ROIs and Masks” on page 17-47
• “Convert Image Labeler Polygons to Labeled Blocked Image for Semantic Segmentation”
on page 17-57
• “Read Whole-Slide Images with Custom Blocked Image Adapter” on page 17-62
• “Detect and Count Cell Nuclei in Whole Slide Images” on page 17-69
17 Blocked Image Processing

Set Up Spatial Referencing for Blocked Images

This example shows how to set up and verify the spatial referencing information of a blockedImage
object.

Spatial Referencing in Blocked Images

Blocked images work with multiresolution images, in which image data of a scene is stored as a set of
images at different resolution levels. Blocked images assume that the spatial extents of each level are
the same, in other words, that all levels cover the same physical area in the real world. The first step
in working with a large multiresolution image is to validate this assumption.

Download Blocked Image Data

This example uses one image from the Camelyon16 data set. This data set contains 400 whole-slide
images (WSIs) of lymph nodes, stored as multiresolution TIF files that are too large to be loaded into
memory.

Create a directory to store the image.

imageDir = fullfile(tempdir,'Camelyon17');

To download the image tumor_091.tif for use in this example, go to the GigaDB page for the data
set. On the GigaDB page, scroll down to the table and open the Files tab. Navigate the table to find
the entry for tumor_091.tif, and click on the file name to download the image.

After you download the image, move the file to the directory specified by the imageDir variable.

Explore Default Spatial Referencing

Create a blockedImage object with the default spatial referencing information. By default, a blocked
image sets the spatial referencing of each level to have the same world extents as the finest layer. The
finest layer is the layer that has the highest resolution and the most pixels.

fileName = fullfile(imageDir,'tumor_091.tif');
bim = blockedImage(fileName);

Display the spatial referencing information at the finest level. The image size (specified by the Size
property) matches the extents in world coordinates. Notice that the default image coordinate system
puts the center of the first pixel at (1,1). Because the pixel extents are 1 unit wide in each dimension,
the left edge of the first pixel starts at (0.5,0.5).

finestLevel = 1;
finestStart = bim.WorldStart(finestLevel,:)
finestEnd = bim.WorldEnd(finestLevel,:)

finestStart =

0.5000 0.5000 0.5000

finestEnd =

1.0e+04 *

17-2
Set Up Spatial Referencing for Blocked Images

5.3761 6.1441 0.0003

Display the spatial referencing information at the coarsest level. The world extents are the same as
the finest level, but the coarse image size is only 512-by-512 pixels. Effectively, each pixel in this
coarse level corresponds to a 105-by-120 block of pixels in the finest resolution.
coarsestLevel = bim.NumLevels;
coarsestStart = bim.WorldStart(coarsestLevel,:)
coarsestEnd = bim.WorldEnd(coarsestLevel,:)

coarsestStart =

0.5000 0.5000 0.5000

coarsestEnd =

1.0e+04 *

5.3761 6.1441 0.0003

Verify Aspect Ratio

Display the image size and aspect ratio at each level. The aspect ratio is not consistent, which
indicates that levels do not all span the same world area. Therefore, the default assumption is
incorrect for this image.
t = table((1:8)',bim.Size(:,1),bim.Size(:,2), ...
bim.Size(:,1)./bim.Size(:,2), ...
'VariableNames',["Level" "Height" "Width" "Aspect Ratio"]);
disp(t)

Level Height Width Aspect Ratio


_____ ______ _____ ____________

1 53760 61440 0.875


2 27136 30720 0.88333
3 13824 15360 0.9
4 7168 7680 0.93333
5 3584 4096 0.875
6 2048 2048 1
7 1024 1024 1
8 512 512 1

Display Layers to Compare Spatial Extents

Display the blocked image by using the bigimageshow function. Display the coarsest resolution
level.
figure
subplot(1,2,1);
hl = bigimageshow(bim,'ResolutionLevel',coarsestLevel);
title('Coarsest Resolution Level (8)')
%

17-3
17 Blocked Image Processing

Display image data at the default resolution level in the same figure window. By default,
bigimageshow selects the level to display based on the screen resolution and the size of the
displayed region.

subplot(1,2,2);
hr = bigimageshow(bim);
title('Default Resolution Level')
%

17-4
Set Up Spatial Referencing for Blocked Images

Ensure that both displays show the same extents.

linkaxes([hl.Parent,hr.Parent]);

17-5
17 Blocked Image Processing

Check Default Spatial Referencing

Zoom in on a feature.

xlim([45000 50000]);
ylim([12000 17000]);

17-6
Set Up Spatial Referencing for Blocked Images

Change the resolution level of the image on the right side of the figure window. At level 6, the
features look aligned with the coarsest level.

hr.ResolutionLevel = 6;
title('Level 6');
snapnow
%

17-7
17 Blocked Image Processing

At level 1, the features are not aligned. Therefore, level 1 and level 8 do not span the same world
extents.

hr.ResolutionLevel = 1;
title('Level 1');
snapnow
%

17-8
Set Up Spatial Referencing for Blocked Images

Get Spatial Extents from Blocked Image Metadata

Usually the original source of the data has spatial referencing information encoded in its metadata.
For the Camelyon16 data set, spatial referencing information is stored as XML content in the
ImageDescription metadata field at the finest resolution level. The XML content has a
DICOM_PIXEL_SPACING attribute for each resolution level that specifies the pixel extents.

Get the ImageDescription metadata field at the finest resolution level of the blockedImage
object.

binfo = imfinfo(bim.Source);
binfo = binfo(1).ImageDescription;

Search the content for the string "DICOM_PIXEL_SPACING". There are nine matches found. The
second instance of the attribute corresponds to the pixel spacing of the finest level. The last instance
of the attribute corresponds to the pixel spacing of the coarsest level.

indx = strfind(binfo,"DICOM_PIXEL_SPACING");

Store the pixel spacing at the finest level. To extract the value of the pixel spacing from the XML text,
visually inspect the text following the second instance of the "DICOM_PIXEL_SPACING" attribute.

disp(binfo(indx(2):indx(2)+100))
pixelSpacing_L1 = 0.000227273;

DICOM_PIXEL_SPACING" Group="0x0028" Element="0x0030" PMSVR="IDoubleArray">&quot;0.000227273&quot;

17-9
17 Blocked Image Processing

Similarly, store the pixel spacing at the coarsest level. To extract the value of the pixel spacing from
the XML text, visually inspect the text following the last instance of the "DICOM_PIXEL_SPACING"
attribute.

disp(binfo(indx(end):indx(end)+100))
pixelSpacing_L8 = 0.0290909;

DICOM_PIXEL_SPACING" Group="0x0028" Element="0x0030" PMSVR="IDoubleArray">&quot;0.0290909&quot; &

Set Spatial Extents

Calculate the relative pixel width between level 8 and level 1.

pixelDims = pixelSpacing_L8/pixelSpacing_L1;

The finest level has the reference spatial extent. Calculate the corresponding extents for level 8 with
respect to the extents of level 1.

worldExtents = bim.Size(8,1:2).*pixelDims;

Update the spatial referencing of level 8.

bim.WorldEnd(8,1:2) = worldExtents(2);

Verify Alignment with Custom Spatial Referencing

Redisplay the data to confirm alignment of the key feature. Show level 8 on the left side and level 1
on the right side.

hl.CData = bim;
hl.ResolutionLevel = 8;
snapnow
hr.CData = bim;
hr.ResolutionLevel = 1;
snapnow

17-10
Set Up Spatial Referencing for Blocked Images

17-11
17 Blocked Image Processing

See Also
blockedImage | bigimageshow

More About
• “Define World Coordinate System of Image” on page 2-66

17-12
Process Blocked Images Efficiently Using Partial Images or Lower Resolutions

Process Blocked Images Efficiently Using Partial Images or


Lower Resolutions

This example shows how to process a blocked image quickly using two strategies that enable
computations on smaller representative samples of the high-resolution image.

Processing blocked images can be time consuming, which makes iterative development of algorithms
prohibitively expensive. There are two common ways to shorten the feedback cycle: iterate on a lower
resolution image or iterate on a partial region of the blocked image. This example demonstrates both
of these approaches for creating a segmentation mask for a blocked image.

If you have Parallel Computing Toolbox™ installed, then you can further accelerate the processing by
using multiple workers.

Create a blockedImage object using a modified version of image "tumor_091.tif" from the
CAMELYON16 data set. The original image is a training image of a lymph node containing tumor
tissue. The original image has eight resolution levels, and the finest level has resolution 53760-
by-61440. The modified image has only three coarse resolution levels. The spatial referencing of the
modified image has been adjusted to enforce a consistent aspect ratio and to register features at each
level.

bim = blockedImage("tumor_091R.tif");

Display the blocked image by using the bigimageshow function.

bigimageshow(bim);

17-13
17 Blocked Image Processing

Accelerate Processing Using Lower Resolution Image

Many blocked images contain multiple resolution levels, including coarse lower resolution versions of
the finest high-resolution image. In general, the distribution of individual pixel values should be
roughly equal across all the levels. Leveraging this assumption, you can compute global statistics at a
coarse level and then use the statistics to process the finer levels.

Extract the image at the coarsest level, then convert the image to grayscale.

imLowRes = gather(bim);
imLowResGray = im2gray(imLowRes);

Threshold the image into two classes and display the result.

thresh = graythresh(imLowResGray);
imLowResQuant = imbinarize(imLowResGray,thresh);
imshow(imLowResQuant)

17-14
Process Blocked Images Efficiently Using Partial Images or Lower Resolutions

Validate on the largest image. Negate the result to obtain a mask for the stained region.

bq = apply(bim,@(bs)~imbinarize(rgb2gray(bs.Data),thresh));

Visualize the result at the finest level.

bigimageshow(bq,CDataMapping="scaled");

17-15
17 Blocked Image Processing

Accelerate Processing Using Partial Regions of Blocked Image

Another approach while working with large images is to extract a smaller region with features of
interest. You can compute statistics from the ROI and then use the statistics to process the entire
high-resolution image.

Zoom in on a region of interest.

bigimageshow(bim);
xlim([2400,3300])
ylim([900 1700])

17-16
Process Blocked Images Efficiently Using Partial Images or Lower Resolutions

Extract the region being shown from the finest level.

xrange = xlim;
yrange = ylim;
imRegion = getRegion(bim,[900 2400],[1700 3300],Level=1);
imshow(imRegion);

17-17
17 Blocked Image Processing

Prototype with this region then display the results.

imRegionGray = rgb2gray(imRegion);
thresh = graythresh(imRegionGray);
imLowResQuant = ~imbinarize(imRegionGray,thresh);

imshow(imLowResQuant)

17-18
Process Blocked Images Efficiently Using Partial Images or Lower Resolutions

Validate on the full blocked image and display the results.

bq = apply(bim,@(bs)~imbinarize(rgb2gray(bs.Data),thresh));
bigimageshow(bq,CDataMapping="scaled");

17-19
17 Blocked Image Processing

Accelerate Processing Using Parallel Computing Toolbox

If you have the Parallel Computing Toolbox installed, then you can distribute the processing across
multiple workers to accelerate the processing. To try processing the image in parallel, set the
runInParallel variable to true.
runInParallel = false;
if runInParallel
% Location for output, which should be accessible on the client and all
% workers and should be empty
outDir = tempname;
% Open a pool
p = gcp;
% Ensure workers are on the same folder as the file to be able to
% access it using just the relative path
sourceDir = fileparts(which("tumor_091R.tif"));
spmd
cd(sourceDir)

17-20
Process Blocked Images Efficiently Using Partial Images or Lower Resolutions

end
% Run in parallel
bq = apply(bim, ...
@(bs)~imbinarize(rgb2gray(bs.Data),thresh),UseParallel=true, ...
OutputLocation=outDir);
end

See Also
blockedImage | bigimageshow | apply

17-21
17 Blocked Image Processing

Process Blocked Images Efficiently Using Mask

This example shows how to process a blocked image efficiently by using a mask to isolate regions of
interest (ROIs).

Some sources of large images have meaningful data in only a small portion of the image. You can
improve total processing time by limiting processing to the ROI containing meaningful data. Use a
mask to define ROIs. A mask is a logical image in which true pixels represent the ROI.

In the blocked image workflow, the mask represents the same spatial region as the image data, but it
does not need to be the same size as the image. To further improve the efficiency of the workflow,
create a mask from a coarse image, especially one that fits in memory. Then, use the coarse mask to
process the finer images.

Create a blocked image using a modified version of image tumor_091.tif from the CAMELYON16
data set. The original image is a training image of a lymph node containing tumor tissue. The original
image has eight resolution levels, and the finest level has resolution 53760-by-61440. The modified
image has only three coarse resolution levels. The spatial referencing of the modified image has been
adjusted to enforce a consistent aspect ratio and to register features at each level.

bim = blockedImage("tumor_091R.tif");

Display the blocked image by using the bigimageshow function.

bigimageshow(bim);

17-22
Process Blocked Images Efficiently Using Mask

Create Mask

Determine the image size at the coarsest level. The coarsest level is the last level in the blocked
image.

coarseLevel = bim.NumLevels;
coarseLevelSize = bim.Size(coarseLevel,:)

coarseLevelSize = 1×3

625 670 3

Get the image at the coarsest resolution level.

imLowRes = gather(bim);

You can generate a mask from the coarse image using the Image Segmenter app. Because the app
expects a grayscale input image, you must extract the lightness channel from the coarse image.

imLowResL = rgb2lightness(imLowRes);

17-23
17 Blocked Image Processing

To run the Image Segmenter app, enter this command in the Command Window:

imageSegmenter(imLowResL)

After you define the mask, export the mask as BW, or export the code that the app uses to create the
mask. This section of the example uses code exported from the app. Run this code to create and
display a mask from the coarse input image.

%----------------------------------------------------
% Normalize input data to range in [0,1].
Xmin = min(imLowResL(:));
Xmax = max(imLowResL(:));
if isequal(Xmax,Xmin)
imLowResL = 0*imLowResL;
else
imLowResL = (imLowResL - Xmin) ./ (Xmax - Xmin);
end

% Threshold image - global threshold


BW = imbinarize(imLowResL);

% Invert mask
BW = imcomplement(BW);

% Open mask with square


width = 3;
se = strel("square", width);
BW = imopen(BW, se);
%----------------------------------------------------

imshow(BW)

17-24
Process Blocked Images Efficiently Using Mask

Create a blocked image from the mask with the same spatial referencing as the input mask.

bmask = blockedImage(BW,WorldEnd=bim.WorldEnd(3,1:2));

Display the mask as a translucent green overlay on the original blocked image.

h = bigimageshow(bim);
showlabels(h,bmask,AlphaDat=bmask,Alphamap=[0 0.5],Colormap=[0 0 0; 0 1 0])

17-25
17 Blocked Image Processing

Adjust Inclusion Threshold to Cover Region of Interest

The apply function processes blocked images one block at a time. You can use the
InclusionThreshold name-value argument with the mask to specify which blocks the apply
function uses. The inclusion threshold specifies the percentage of mask pixels that must be true for
apply to process the block.

Highlight the blocks for apply to process using the default inclusion threshold, 0.5. The function
processes only the center blocks, highlighted in green.

h = bigimageshow(bim);
showmask(h,bmask,1)
title("Mask with Default Inclusion Threshold")

17-26
Process Blocked Images Efficiently Using Mask

To process more blocks of the image, decrease the inclusion threshold.

showmask(h,bmask,1,InclusionThreshold=0.4)
title("InclusionThreshold = 0.4")

17-27
17 Blocked Image Processing

You can also process all blocks that have at least a single true pixel in the mask. To use this option,
specify the InclusionThreshold name-value argument as 0. Note that not all blocks of the image
are included.

showmask(h,bmask,1,InclusionThreshold=0)
title("InclusionThreshold = 0")

17-28
Process Blocked Images Efficiently Using Mask

Using the mask with any value of InclusionThreshold decreases the total execution time because
apply processes only a subset of blocks from the full image. The benefit of using a mask is more
significant at higher resolutions and as the processing pipeline increases in complexity.

Measure the execution time of filtering the full image.

tic
bout = apply(bim, ...
@(bs)imnlmfilt(bs.Data,DegreeOfSmoothing=15));
tFullProcessing = toc;

Measure the execution time of filtering only the blocks within the ROI.

bls = selectBlockLocations(bim,Mask=bmask,InclusionThreshold=0);
tic
boutMasked = apply(bim, ...

17-29
17 Blocked Image Processing

@(bs)imnlmfilt(bs.Data,DegreeOfSmoothing=15), ...
BlockLocationSet=bls);
tMaskedProcessing = toc;

bigimageshow(boutMasked)
defaultBlockSize = bim.BlockSize(1,:);
title("Processed Image Using Mask with Default BlockSize = [" + ...
num2str(defaultBlockSize)+"]");

Compare the execution time of processing the full image to the execution time of processing only the
blocks in the ROI.

disp("Speedup using mask: " + ...


num2str(tFullProcessing/tMaskedProcessing) + "x");

Speedup using mask: 1.6918x

17-30
Process Blocked Images Efficiently Using Mask

Adjust Block Size to Follow Contours of Region of Interest

You can decrease the block size to create a tighter wrap around the ROI. For some block sizes, this
reduces the execution time because apply processes fewer pixels outside the ROI. However, if the
block size is too small, then performance decreases because the overhead of processing a larger
number of blocks offsets the reduction in the number of pixels processed.

Highlight the blocks for apply to process using a smaller block size. To specify the block size, use the
BlockSize name-value argument.
blockSize = [512 512];
h = bigimageshow(bim);
showmask(h,bmask,1,BlockSize=blockSize,InclusionThreshold=0)
title("BlockSize = [" + num2str(blockSize) + "] | InclusionThreshold = 0")

Measure the execution time of filtering all the blocks within the ROI with a decreased block size.

17-31
17 Blocked Image Processing

bls = selectBlockLocations(bim,Mask=bmask,InclusionThreshold=0);
tic
boutMasked = apply(bim, ...
@(bs)imnlmfilt(bs.Data,DegreeOfSmoothing=15), ...
BlockLocationSet=bls);
tSmallerBlockProcessing = toc;

bigimageshow(boutMasked);
title("Processed Image Using Mask with BlockSize = [" + ...
num2str(blockSize) + "]");

Compare the execution time of processing the entire ROI with smaller blocks to the execution time of
processing the entire ROI with the original blocks.

disp("Additional speedup using mask with decreased block size: " + ...
num2str(tMaskedProcessing/tSmallerBlockProcessing) + "x");

17-32
Process Blocked Images Efficiently Using Mask

Additional speedup using mask with decreased block size: 0.9917x

See Also
blockedImage | bigimageshow | apply

Related Examples
• “Detect and Count Cell Nuclei in Whole Slide Images” on page 17-69

17-33
17 Blocked Image Processing

Explore Blocked Image Details with Interactive ROIs

This example shows how to display a detailed region of a blocked image selected interactively using
ROI tools.

bigimageshow displays blockedImage objects. If the blockedImage object has multiple levels,
then bigimageshow automatically picks the appropriate level based on the screen size and the view
port. bigimageshow always works in a single 'world coordinate' and displays each level based on its
spatial referencing information. This allows two displays of the same blockedImage object to show
image detail at different levels, but share the same coordinate system.

Create a blockedImage using a modified version of image "tumor_091.tif" from the CAMELYON16
data set. The original image is a training image of a lymph node containing tumor tissue. The original
image has eight resolution levels, and the finest level has resolution 53760-by-61440. The modified
image has only three coarse resolution levels. The spatial referencing of the modified image has been
adjusted to enforce a consistent aspect ratio and to register features at each level.

bim = blockedImage('tumor_091R.tif');

Display Overview Image and Detail View

Display the entire big image on the left side of a figure window by using the bigimageshow function.
The resolution level of the displayed overview automatically changes depending on the size of the
window and your screen size.

hf = figure;
haOView = subplot(1,2,1);
haOView.Tag = 'OverView';
hl = bigimageshow(bim,'Parent',haOView);

17-34
Explore Blocked Image Details with Interactive ROIs

Fix the resolution level of the overview image as the coarsest resolution level.

coarsestLevel = bim.NumLevels;
hl.ResolutionLevel = coarsestLevel;
title('Overview');

17-35
17 Blocked Image Processing

Display a detail view of the big image on the right side of the figure window. Allow bigimageshow to
manage the level of the detail image automatically.

haDetailView = subplot(1,2,2);
haDetailView.Tag = 'DetailView';
hr = bigimageshow(bim,'Parent',haDetailView);

17-36
Explore Blocked Image Details with Interactive ROIs

Zoom into the detailed view.

xlim([2800,3050])
ylim([500,750])
title('Detailed View');

17-37
17 Blocked Image Processing

Add Interactive Rectangle ROI to Control Display View

In the overview image, draw a rectangle ROI. This example specifies the initial size and position of
the rectangle programmatically by setting the Position property as a four-element vector of the
form [xmin,ymin,width,height]. After the ROI appears on the overview, you can adjust the size and
postion of the ROI interactively.

xrange = xlim;
yrange = ylim;
roiPosition = [xrange(1) yrange(1) xrange(2)-xrange(1) yrange(2)-yrange(1)];
hrOView = drawrectangle(haOView,'Position',roiPosition,'Color','r');

17-38
Explore Blocked Image Details with Interactive ROIs

Save the handles of the rectangle to use when defining the interaction between the rectangle and the
detail view.

hrOView.UserData.haDetailView = haDetailView;
haDetailView.UserData.hrOView = hrOView;

Add listeners to the detail view. These listeners detect changes in the spatial extents of the detail
view. When the spatial extents change, the listeners call the updateOverviewROI helper function,
which updates the extents of the ROI to match the extents of the detail view. The helper function is
defined at the end of this example.

addlistener(haDetailView,'XLim','PostSet',@updateOverviewROI);
addlistener(haDetailView,'YLim','PostSet',@updateOverviewROI);

Add a listener to the rectangle ROI. These listeners detect changes in the spatial extent of the
rectangle. When the limits change, the listeners call the updateOverViewROI helper function, which
updates the extents of the detail image to match the extents of the ROI. The helper function is defined
at the end of this example.

addlistener(hrOView,'MovingROI',@updateDetailView);

You can now change the size and position of the rectangle ROI interactively to adjust the display view.
Similarly, when you zoom and pan the detail view, the size and position of the ROI updates.

This example changes the size and position of the ROI programmatically by setting the Position
property.

17-39
17 Blocked Image Processing

hrOView.Position = [2230,1300,980,840];
evt.CurrentPosition = hrOView.Position;
updateDetailView(hrOView,evt);

Define Callback Functions to Control Interactions

function updateOverviewROI(~,hEvt)
% Update overview rectangle position whenever the right hand side
% zooms/pans.
ha = hEvt.AffectedObject;
hr = hEvt.AffectedObject.UserData.hrOView;
hr.Position = [ha.XLim(1),ha.YLim(1),diff(ha.XLim),diff(ha.YLim)];
end

function updateDetailView(hSrc,hEvt)
% Update the right side detail view anytime the overview rectangle is
% moved. bigimageshow automatically picks the appropriate image level.
ha = hSrc.UserData.haDetailView;
ha.XLim = [hEvt.CurrentPosition(1), ...
hEvt.CurrentPosition(1)+hEvt.CurrentPosition(3)];
ha.YLim = [hEvt.CurrentPosition(2), ...
hEvt.CurrentPosition(2)+hEvt.CurrentPosition(4)];
end

17-40
Explore Blocked Image Details with Interactive ROIs

See Also
blockedImage | bigimageshow

17-41
17 Blocked Image Processing

Warp Blocked Image at Coarse and Fine Resolution Levels

This example shows how to apply a geometric transformation (warping) to a blocked image.

Applying a geometric transformation to an image is a key step in many image processing applications
like image registration. You can use imwarp to warp coarse images that fit in memory. For large,
high-resolution images that do not fit in memory, use a blocked image. Set the spatial referencing of
the warped image to preserve characteristics of the image such as pixel extents.

Create a blocked image using a modified version of image "tumor_091.tif" from the CAMELYON16
data set. The original image is a training image of a lymph node containing tumor tissue. The original
image has eight resolution levels, and the finest level has resolution 53760-by-61440. The modified
image has only three coarse resolution levels. The spatial referencing of the modified image has been
adjusted to enforce a consistent aspect ratio and to register features at each level.

bim = blockedImage("tumor_091R.tif");

Apply Geometric Transformation to Coarse Image

Create an affinetform2d object that stores information about an affine geometric transformation.
This transformation applies translation and shear.

A = [0.99 0.17 120;


0.01 0.98 -30;
0 0 1];
tform = affinetform2d(A);

Get the image at the coarsest resolution level.

imCoarse = gather(bim);

Warp the coarse image by using imwarp. Display the original image and the warped image in a
montage.

imCoarseWarped = imwarp(imCoarse,tform);
figure
imshow(imCoarseWarped)

17-42
Warp Blocked Image at Coarse and Fine Resolution Levels

Create Spatial Referencing for Warped Fine Image

Before applying the geometric transformation to the image at a fine resolution level, calculate the
spatial referencing of the blocked image after the warping. Use this spatial referencing when
transforming blocks

Get the pixel extent of the original image from its spatial referencing information.
inPixelExtent = (bim.WorldEnd(1,:)-bim.WorldStart(1,:))./bim.Size(1,:);

Calculate the output horizontal and vertical spatial limits when the transformation is applied.
yWorldLimits = [bim.WorldStart(1,1),bim.WorldEnd(1,1)];
xWorldLimits = [bim.WorldStart(1,2),bim.WorldEnd(1,2)];
[xout,yout] = outputLimits(tform,xWorldLimits,yWorldLimits);

Calculate the size of the output image that preserves the pixel extent. Specify the image size in the
format [numrows, numcols].
outImgSize = [ceil(diff(yout)/inPixelExtent(1)), ...
ceil(diff(xout)/inPixelExtent(2))];

17-43
17 Blocked Image Processing

Store the spatial referencing information of the warped image. Set the world limits and image size of
the warped image.
outWorldStart = [yout(1) xout(1)];
outWorldEnd = [yout(2) xout(2)];

Calculate the corresponding output pixel dimensions.


outPixelExtent = (outWorldEnd-outWorldStart)./outImgSize;
halfPixelWidth = outPixelExtent/2;

Apply Block-Wise Warping to Fine Image

Create a writable blocked image by specifying the output spatial referencing information. Specify a
block size that is large enough to use memory efficiently.
outBlockSize = [1024 1024 3];
bwarped = blockedImage([],[outImgSize 3],outBlockSize,uint8(0), ...
WorldStart=outWorldStart,WorldEnd=outWorldEnd,Mode="w");

Loop through the output image, one block at a time. For each output block:

1 Find the coordinates of the four corners of the output block.


2 Inverse map these coordinates back to the input to get the input (source) region.
3 Read the contents of the input region.
4 Create spatial referencing describing the input region.
5 Calculate the output block content by using imwarp.
6 Write the output block to the output image by using the setBlock function.

If you have Parallel Computing Toolbox™, then you can replace the outer for statements with
parfor statements to run the loops in parallel.
inYWorldLimits = [bim.WorldStart(1,1), bim.WorldEnd(1,1)];
inXWorldLimits = [bim.WorldStart(1,2), bim.WorldEnd(1,2)];

for rBlockInd = 1:bwarped.SizeInBlocks(1)


for cBlockInd = 1:bwarped.SizeInBlocks(2)

% Get the indices of the block


blockSub = [rBlockInd cBlockInd 1];

% Convert the block indices to pixel subscripts to get the


% subscript of the top left pixel. Based on the block size,
% get the subscripts of the bottom right pixel
blockStartSub = blocksub2sub(bwarped,blockSub);
blockEndSub = blockStartSub + outBlockSize - 1;

% Convert the pixel indices to world coordinates. The world


% coordinates indicate the center of the top left and bottom
% right pixels of the block in world units
blockStartWorld = sub2world(bwarped,blockStartSub(1:2));
blockEndWorld = sub2world(bwarped,blockEndSub(1:2));

% Spatial referencing information for this block (Note: spatial


% referencing is in x-y order, while blockStart etc are in y-x
% order).

17-44
Warp Blocked Image at Coarse and Fine Resolution Levels

outRegionRef = imref2d(fliplr(outBlockSize(1:2)));
% Expand the region outwards by half a pixel to align with the
% outer edge of the block
outRegionRef.YWorldLimits = [blockStartWorld(1)-halfPixelWidth(1),...
blockEndWorld(1)+halfPixelWidth(1)];
outRegionRef.XWorldLimits = [blockStartWorld(2)-halfPixelWidth(2),...
blockEndWorld(2)+halfPixelWidth(2)];

% Output bounding box in world coordinates in x-y order


outbbox = [
fliplr(blockStartWorld) % top left
blockStartWorld(2) blockEndWorld(1) % bottom left
blockEndWorld(2) blockStartWorld(1) % top right
fliplr(blockEndWorld) % bottom right
];

% Get corresponding input region. Note: This region need NOT be


% rectangular if the transformation includes shear
inRegion = transformPointsInverse(tform,outbbox);

% Clamp region to image extents


inRegion(:,2) = max(inRegion(:,2),inYWorldLimits(1));
inRegion(:,2) = min(inRegion(:,2),inYWorldLimits(2));
inRegion(:,1) = max(inRegion(:,1),inXWorldLimits(1));
inRegion(:,1) = min(inRegion(:,1),inXWorldLimits(2));

% Find the corresponding input bounding box


inbboxStart = [min(inRegion(:,1)) min(inRegion(:,2))];
inbboxEnd = [max(inRegion(:,1)) max(inRegion(:,2))];

% Move to y-x (row-col) order


inbboxStart = fliplr(inbboxStart);
inbboxEnd = fliplr(inbboxEnd);

% Convert to pixel subscripts


inbboxStartSub = world2sub(bim,inbboxStart);
inbboxEndSub = world2sub(bim,inbboxEnd);

% Read corresponding input region


inputRegion = getRegion(bim,inbboxStartSub,inbboxEndSub);

% Get the input region's spatial referencing


inRegionRef = imref2d(size(inputRegion));

% Convert the actual region pixel's centers back to world


% coordinates
inbboxStart = sub2world(bim,inbboxStartSub);
inbboxEnd = sub2world(bim,inbboxEndSub);

% Convert to pixel edges from pixel centers


inRegionRef.YWorldLimits = [inbboxStart(1)-halfPixelWidth(1),...
inbboxEnd(1)+halfPixelWidth(2)];
inRegionRef.XWorldLimits = [inbboxStart(2)-halfPixelWidth(1),...
inbboxEnd(2)+halfPixelWidth(2)];

% Warp this block


warpedBlock = imwarp(inputRegion,inRegionRef,tform,OutputView=outRegionRef);

17-45
17 Blocked Image Processing

% Set the block data in the output blocked image


setBlock(bwarped,blockSub,warpedBlock);

end
end

Display the warped image.

bwarped.Mode = "r";
figure
bigimageshow(bwarped)

See Also
blockedImage | bigimageshow | setBlock | affinetform2d | imref2d |
transformPointsInverse | getRegion

17-46
Create Labeled Blocked Image from ROIs and Masks

Create Labeled Blocked Image from ROIs and Masks

This example shows how to create a labeled blocked image from a set of ROIs.

In this example, you use two approaches to obtain and display labeled data. One approach uses
polygonal ROI objects that store the coordinates of the boundaries of tumor and normal tissue
regions. The polyToBlockedImage function converts the ROI coordinates into a labeled blocked
image. The second approach uses a mask to indicate a binary segmentation of the image into tissue
and background. This example combines the information in the ROI and mask images to create a
single blocked image with numeric pixel labels corresponding to tumor, normal tissue, and
background regions.

Create a blocked image using a modified version of a training image of a lymph node containing
tumor tissue, from the CAMELYON16 data set. The modified image has three coarse resolution levels.
The spatial referencing has been adjusted to enforce a consistent aspect ratio and to register features
at each level.
bim = blockedImage("tumor_091R.tif");

Load Label Data

The CAMELYON16 data set provides labels for tumor and normal regions as a set of coordinates
specifying the manually annotated region boundaries with respect to the finest resolution level. When
pixels exist within the boundaries of both a normal region and a tumor region, the correct label for
those pixels is normal tissue.

Load label data for the blocked image. This example uses a modified version of the labels of the
tumor_091.tif image from the CAMELYON16 data set. The original labels are stored in XML
format. The modified labels have been resampled and saved as a MAT file.
roiPoints = load("labelledROIs.mat")

roiPoints = struct with fields:


nonCancerRegions: {[46×2 double]}
cancerRegions: {6×1 cell}

Represent Labels as ROI Objects

Create polygonal ROI objects that store the coordinates of the tumor boundaries and normal tissue
boundaries.
tumorPolys = cellfun(@(position) images.roi.Polygon(Position=position, ...
Visible="on",Color="r"),roiPoints.cancerRegions);
normalPolys = cellfun(@(position) images.roi.Polygon(Position=position, ...
Visible="on",Color="g"),roiPoints.nonCancerRegions);

Display ROIs with Labels on Original Data

Display the image overlaid with the polygonal ROIs. The ROIs have the same coordinate system as
the image, so changing the resolution level of the displayed image still renders the ROIs accurately.
h = bigimageshow(bim);
set(tumorPolys,Parent=gca);
set(normalPolys,Parent=gca);
title("Resolution Level: " + num2str(h.ResolutionLevel));

17-47
17 Blocked Image Processing

Zoom in on one ROI. The tumor region boundary is shown in red, and surrounds an internal region of
normal tissue shown in green.

xlim([3940 4290])
ylim([2680 3010])
title("Resolution Level: " + num2str(h.ResolutionLevel));

17-48
Create Labeled Blocked Image from ROIs and Masks

Create Labeled Blocked Image of ROI Data

Use the polyToBlockedImage function to create a labeled blocked image of the ROI coordinate
data. The polyToBlockedImage function requires ROI coordinates, ROI labels, and the size of the
output blocked image as inputs.

Obtain the xy-coordinate data for the normal regions and the tumor regions, and combine them into a
single roiPositions cell array.

normalRegions = roiPoints.nonCancerRegions;
tumorRegions = roiPoints.cancerRegions;
roiPositions = [normalRegions; tumorRegions];

Find the number of normal and tumor regions, and assign the label 1 to normal tissue and the label 2
to tumor tissue. Assign labels in the same order specified by roiPositions. Specify labels as uint8
values to reduce the memory required for storage.

numNormalRegions = numel(normalRegions);
numTumorRegions = numel(tumorRegions);
roiLabelIDs = [ones(numNormalRegions,1,"uint8"); 2*ones(numTumorRegions,1,"uint8")];

17-49
17 Blocked Image Processing

Select the desired resolution level for the new blocked image. This choice is a tradeoff between
efficiency and accuracy. Using a coarser resolution level decreases processing time and storage size.
Using a finer resolution level increases the level of detail preserved in the mask. Coarse resolution
levels can be used for regular ROIs like polygons. For small, freehand ROIs, a fine resolution level can
be more appropriate. For this example, use an intermediate resolution level.

maskLevel = 2;

Specify the image size for the new blocked image to match that of the original image, bim, at the
desired resolution level.

imageSize = bim.Size(maskLevel,1:2);

Create a labeled blocked image. Maintain the spatial referencing of the original blocked image, bim,
at the desired resolution level. By default, pixels that do not fall inside any ROI are assigned the
numeric label 0.

bROILabels = polyToBlockedImage(roiPositions,roiLabelIDs,imageSize, ...


BlockSize=bim.BlockSize(maskLevel,1:2), ...
WorldStart=bim.WorldStart(maskLevel,1:2),WorldEnd=bim.WorldEnd(maskLevel,1:2));

Display Overlay of ROI Labels and Original Data

Display the labeled blocked image overlaid on the original image. The tumor regions are shown in
red, and the normal tissue region completely enclosed in a tumor region is shown in green. The
background and the normal tissue connected to the background are shown in blue, indicating that the
connected normal regions are incorrectly classified as background.

hbim = bigimageshow(bim);
showlabels(hbim,bROILabels,Colormap=[0 0 1; 0 1 0; 1 0 0]);

17-50
Create Labeled Blocked Image from ROIs and Masks

Zoom in on the normal tissue region displayed in green. Visually verify that the ROI boundaries are
represented with sufficient detail.

xlim([3940 4290])
ylim([2680 3010])

17-51
17 Blocked Image Processing

Separate Tissue and Background Using Mask

Use image segmentation to create a mask that correctly distinguishes the normal tissue from the
background. Since thresholding requires reading the underlying image data into memory and the
background and foreground regions are sufficiently large, the coarsest resolution level is appropriate.
The mask is 1 (true) for pixels whose grayscale value is less than 130. Fill small holes in the mask by
performing morphological closing using the bwmorph function.

btissueMask = apply(bim,@(bs)bwmorph(im2gray(bs.Data)<130,"close"),Level=3);
bigimageshow(btissueMask);

17-52
Create Labeled Blocked Image from ROIs and Masks

Create Single Labeled Blocked Image

Combine the tissue mask and ROI label data into a final labeled blocked image. Although the ROI
image has been created at a finer resolution level than the tissue mask, they can be processed
together because they are derived from the same image and have the same world extents. Use the
combineLabels helper function, which is defined at the end of this example.

bLabels = apply(bROILabels,@combineLabels,ExtraImages=btissueMask);

Display the labeled blocked image overlaid on the original image. The three labels (normal, tumor,
and background) appear in green, red, and blue respectively.

hbim = bigimageshow(bim);
showlabels(hbim,bLabels,Colormap=[0 0 1; 0 1 0; 1 0 0]);
title("Background (Blue), Normal Tissue (Green), Tumor (Red)")

17-53
17 Blocked Image Processing

Zoom in on a region of interest to examine in closer detail.

xlim([3940 4290])
ylim([2680 3010])

17-54
Create Labeled Blocked Image from ROIs and Masks

Supporting Function

The combineLabels helper function combines an ROI-based label image and mask-based label
image into a single label image.

function blabel = combineLabels(bs,btissueMask)


% The tissue mask block is smaller in size
btissueMask = imresize(btissueMask,size(bs.Data));
% Convert to labels, 0 is background, 1 is foreground.
blabel = uint8(btissueMask);
% Label tumor regions with 2
blabel(bs.Data==2) = 2;
end

See Also
blockedImage | bigimageshow | blockedImageDatastore

17-55
17 Blocked Image Processing

Related Examples
• “Preprocess Multiresolution Images for Training Classification Network” on page 19-219
• “Classify Tumors in Multiresolution Blocked Images” on page 19-235
• “Convert Image Labeler Polygons to Labeled Blocked Image for Semantic Segmentation” on
page 17-57

17-56
Convert Image Labeler Polygons to Labeled Blocked Image for Semantic Segmentation

Convert Image Labeler Polygons to Labeled Blocked Image for


Semantic Segmentation

This example shows how to convert polygon labels stored in a groundTruth object into a labeled
blocked image suitable for semantic segmentation workflows.

You can use the Image Labeler app in Computer Vision Toolbox to label images that are too large to
fit into memory and multiresolution images. For more information, see “Label Large Images in the
Image Labeler” (Computer Vision Toolbox). The Image Labeler app does not support pixel labeling of
blocked images. You can only create labels using ROI shapes such as polygons, rectangles, and lines.
This example shows how you can use the polyToBlockedImage function to convert polygon ROIs
into a pixel-labeled blocked image for semantic segmentation workflows.

Create a blocked image using a modified version of a training image of a lymph node containing
tumor tissue, from the CAMELYON16 data set. The modified image has three coarse resolution levels.
The spatial referencing has been adjusted to enforce a consistent aspect ratio and to register features
at each level.

bim = blockedImage("tumor_091R.tif");

Load Label Ground Truth Data

This example loads a presaved groundTruth object, gTruth.mat, created by labeling the blocked
image data in bim using the Image Labeler app. The groundTruth object stores the polygon labels
displayed in the figure. A normal tissue ROI is outlined in green, and tumor tissue ROIs are outlined
in red. You can export your own labeled ground truth data from the Image Labeler app by selecting
Export and then To Workspace. Name the variable gTruth.

17-57
17 Blocked Image Processing

load gTruth.mat

Extract ROI Position and Label Data

The LabelData property of the gTruth object stores the polygon label data as a table with one
column for each label.

labelData = gTruth.LabelData

labelData=1×2 table
normal tumor
____________ __________

{4×2 double} {4×1 cell}

17-58
Convert Image Labeler Polygons to Labeled Blocked Image for Semantic Segmentation

Use the helper function gTruthtoXY, defined at the end of this example, to convert the xy-position
data and labels stored in labelData into a format accepted as an input to the
polyToBlockedImage function. gTruthtoXY assigns the numeric label 1 to the normal tissue ROI
and the label 2 to the tumor tissue ROIs.

[roiPositions,roiLabels] = gTruthtoXY(labelData)

roiPositions=5×1 cell array


{ 4×2 double}
{14×2 double}
{16×2 double}
{20×2 double}
{12×2 double}

roiLabels = 5×1

1
2
2
2
2

Create Labeled Blocked Image

Select the desired resolution level for the new blocked image. This choice is a tradeoff between
efficiency and accuracy. Using a coarser resolution level decreases processing time and storage size.
Using a finer resolution level increases the level of detail preserved in the mask. You can use coarse
resolution levels for regular ROIs, like polygons. For small, freehand ROIs, a fine resolution level is
more appropriate. For this example, use an intermediate resolution level.

maskLevel = 2;

Specify the image size for the new blocked image to match that of the original image, bim, at the
desired resolution level.

imageSize = bim.Size(maskLevel,1:2);

Create a labeled blocked image. Maintain the spatial referencing of the original blocked image, bim,
at the desired resolution level. By default, polyToBlockedImage assigns pixels that do not fall
inside any ROI the numeric label 0.

bLabeled = polyToBlockedImage(roiPositions,roiLabels,imageSize, ...


BlockSize=bim.BlockSize(maskLevel,1:2), ...
WorldStart=bim.WorldStart(maskLevel,1:2),WorldEnd=bim.WorldEnd(maskLevel,1:2));

Display the labeled blocked image overlaid on the original image. The regions corresponding to the
tumor and normal tissue polygons of the groundTruth object are shown in red and green,
respectively. To distinguish the normal tissue outside of the ROIs from the background using a binary
mask, see “Create Labeled Blocked Image from ROIs and Masks” on page 17-47.

hbim = bigimageshow(bim);
showlabels(hbim,bLabeled,Colormap=[0 0 1; 0 1 0; 1 0 0])

17-59
17 Blocked Image Processing

Supporting Function

The gTruthtoXY helper function converts the polygon ROI coordinates and label data stored in the
table labelData into cell arrays suitable for input into the polyToBlockedImage function.
function [roiPositions,roiLabels] = gTruthtoXY(labelData)

totalROIs = numel(labelData{1,1}) + numel(labelData{1,2}{:});


roiPositions = cell(totalROIs,1);
roiLabels = zeros(totalROIs,1);

% Obtain label names from the labelData table


labelName = labelData.Properties.VariableNames;

roiIdx = 1; % Initialize ROI index

% Loop through all labels


% Assign a numeric label of 2 to tumor tissue; 1 for normal tissue
for j = 1:numel(labelData)

% All ROIs for a given label

17-60
Convert Image Labeler Polygons to Labeled Blocked Image for Semantic Segmentation

data = labelData{1,j}{:};

if(isequal(labelName{j},"tumor"))
for k = 1:numel(data)
roiLabels(roiIdx) = 2;
roiPositions{roiIdx} = data{k};
roiIdx = roiIdx + 1;
end
else
% For other ROI labels
roiLabels(roiIdx) = 1;
roiPositions{roiIdx} = data;
roiIdx = roiIdx + 1;
end

end

end

See Also
polyToBlockedImage | blockedImage | Image Labeler

Related Examples
• “Label Large Images in the Image Labeler” (Computer Vision Toolbox)
• “Create Labeled Blocked Image from ROIs and Masks” on page 17-47

17-61
17 Blocked Image Processing

Read Whole-Slide Images with Custom Blocked Image Adapter

This example shows how to add support for reading whole-slide images by creating a custom blocked
image adapter. The example creates a MATLAB® interface to the OpenSlide C library, a C library that
provides a simple interface to read whole-slide images (also known as virtual slides). The OpenSlide
library is a product of the research group of M. Satyanarayanan at Carnegie Mellon University,
School of Computer Science.

The example first builds a C++ interface to the OpenSlide library using the MATLAB clibgen
function. The example then uses functions from the OpenSlide library to implement a custom blocked
image adapter.

Make sure helper files are on the path.


addpath(pwd)

Create a MATLAB Interface for OpenSlide Library

This section uses the MATLAB clibgen.generateLibraryDefinition function to generate an


interface to the OpenSlide library functions.

Download the OpenSlide Library and Add it to the Path

Download the latest OpenSlide library for your computer and operating system. This example
assumes a Windows® computer.

Create a variable that points to where you extracted the OpenSlide Library. This folder is expected to
contain bin\, include\, and lib\ subfolders.
OpenSlideInstall = 'I:\my_example\openslide-win64-20171122';
dir(OpenSlideInstall)

. VERSIONS.txt lib
.. bin licenses
README.txt include

Add the location of the OpenSlide shared library to the system path.
sharedLibLoc = fullfile(OpenSlideInstall, 'bin');
systemPath = getenv('PATH');
setenv('PATH', [sharedLibLoc ';' systemPath])

Set Up Your Development Environment

Create variables that contain the names of folders containing key elements of your development
environment.

Create a variable that points to the folder where you want to store the predefined definition file for
the OpenSlide interface that you are creating.
ExampleDir = 'I:\my_example';

Create a variable to point to a test image file. Download CMU-1.zip test file from the OpenSlide test
data page, and update the variable below to point to the extracted image file.
imageLocation = 'I:\my_example\CMU-1.mrxs';

17-62
Read Whole-Slide Images with Custom Blocked Image Adapter

Generate an OpenSlide Interface Definition File

Generate a library interface definition file using the clibgen.generateLibraryDefinition


function.

Create a variable that points to a writable folder to store the generated interface files. Create a folder
in which to write the MATLAB OpenSlide Interface file and change to that folder.

OpenSlideInterface = 'I:\my_example\interfaceFolder';

if ~isfolder(OpenSlideInterface)
mkdir(OpenSlideInterface)
end
cd(OpenSlideInterface)

Create variables to point to the OpenSlide library folder, the two OpenSlide header files, the path to
the header files, the name of the OpenSlide library, and define the name you want to assign to the
generated interface.

libPath = fullfile(OpenSlideInstall,'lib');
hppFiles = {'openslide.h', 'openslide-features.h'};
hppPath = fullfile(OpenSlideInstall, 'include', 'openslide');
libFile = 'libopenslide.lib';
myPkg = 'OpenSlideInterface';

Call the clib.generateLibraryDefinition function, specifying the variables you have set up.

• Header file names (hppFiles) and location (hppPath)


• Folder containing include files (hppPath)
• Shared library name (libFile) and location (libPath)
• Name to give to the generated interface library (myPkg) -- Optional

You can optional set the 'Verbose' parameter to true to display messages produced during
generation.

% Clear previous run (if any)


if isfile('defineOpenSlideInterface.m')
delete('defineOpenSlideInterface.m')
end
clibgen.generateLibraryDefinition(fullfile(hppPath,hppFiles),...
'IncludePath', hppPath,...
'Libraries', fullfile(libPath,libFile),...
'PackageName', myPkg,...
'Verbose',false)

Using MinGW64 Compiler (C++) compiler.


Generated definition file defineOpenSlideInterface.mlx and data file 'OpenSlideInterfaceData.xml'
21 construct(s) require(s) additional definition. To include these construct(s) in the interface,
Build using build(defineOpenSlideInterface).
Use the 'Verbose' option to see the warnings generated while parsing the files for generating int

Edit the Generated Interface Definition File

The clibgen.generateLibraryDefinition command creates two files: the library interface


definition file defineOpenSlideInterface.m and defineOpenSlideInterface.mlx. To use the
generated interface file to create a blocked image adapter, certain edits are required. The toolbox

17-63
17 Blocked Image Processing

provides a template interface file that contains these edits but you still need to provide location
information for certain key folders. This section performs these edits on the template file.

First, rename the interface file you generated with the clib.generateLibraryDefinition
command and keep it as a backup file.

movefile('defineOpenSlideInterface.m','defineOpenSlideInterface_generated.m');

Delete the .mlx file created by the clibgen.generateLibraryDefinition function and then call
rehash.

delete defineOpenSlideInterface.mlx;
rehash

Edit the interface definition file template file that is included with the toolbox. The edits provide the
locations of key folders in your installation. First, open the interface definition template file, included
with this example, for read access. Read the contents into a variable, called interfaceContents,
and then close the template file.

fid = fopen(fullfile('defineOpenSlideInterface_template.m'),'rt');
interfaceContents = fread(fid, 'char=>char');
fclose(fid);

Update place holder variables in the template file variable, interfaceContents, with your actual
folder names.

interfaceContents = strrep(interfaceContents','OPENSLIDE_INSTALL_LOCATION',OpenSlideInstall);
interfaceContents = strrep(interfaceContents,'OPENSLIDE_INTERFACE_LOCATION',OpenSlideInterface);

Now, write the updated interface definition template variable to a new file. Open an interface
definition file for write access, write the template file variable to the file, and then close the file..

fid = fopen('defineOpenSlideInterface.m','wt');
fwrite(fid, interfaceContents);
fclose(fid);

To verify that the changes to the interface file were successful, you can view the differences between
the generated interface file and the edited interface template file.

Create the Library Interface

Using the OpenSlide interface definition file, use the build command to create a MATLAB
OpenSlideInterface shared library.

build(defineOpenSlideInterface)

Building interface file 'OpenSlideInterfaceInterface.dll'.


Interface file 'OpenSlideInterfaceInterface.dll' built in folder 'I:\my_example\interfaceFolder\o
To use the library, add the interface file folder to the MATLAB path.

Add the path of the library to the generated interface library.

addpath osInterface\OpenSlideInterface\

Be sure to click the link in the message after the build is complete to add the interface file to the path

To view the functional capabilities of the interface library, use the summary function.

17-64
Read Whole-Slide Images with Custom Blocked Image Adapter

summary(defineOpenSlideInterface)

MATLAB Interface to OpenSlideInterface Library

Class clib.OpenSlideInterface.openslide_t

No Constructors defined

No Methods defined

No Properties defined

Functions
clib.OpenSlideInterface.openslide_t clib.OpenSlideInterface.openslide_open(string)
int32 clib.OpenSlideInterface.openslide_get_level_count(clib.OpenSlideInterface.openslide_t)
[int64,int64] clib.OpenSlideInterface.openslide_get_level_dimensions(clib.OpenSlideInterface.open
Note: 'int64' used as MLTYPE for C++ pointer argument.
Passing nullptr is not supported with 'int64' types.
To allow nullptr as an input, set MLTYPE to clib.array.
double clib.OpenSlideInterface.openslide_get_level_downsample(clib.OpenSlideInterface.openslide_t
clib.OpenSlideInterface.openslide_read_region(clib.OpenSlideInterface.openslide_t,clib.array.Open
clib.OpenSlideInterface.openslide_close(clib.OpenSlideInterface.openslide_t)

Test the Library Interface

To test the library interface, try using the functions in the library with the sample image.

Load the sample image file using the openslide_open function.


ob = clib.OpenSlideInterface.openslide_open(imageLocation);

Get the number of levels of slides present in this example file.


levels = clib.OpenSlideInterface.openslide_get_level_count(ob);

Get the dimensions of the slide in level 0.


[w, h] = clib.OpenSlideInterface.openslide_get_level_dimensions(ob,int32(0),int64(0),int64(0));
disp([w, h])

109240 220696

Read a region from level 0 using the openslide_read_region function. Setup a clibArray of
type UnsignedInt with desired width and height dimensions. Specify the top left x-coordinate and
the top left y-coordinate in the level 0 reference frame.
rawCData = clibArray('clib.OpenSlideInterface.UnsignedInt', [1024, 1024]);
clib.OpenSlideInterface.openslide_read_region(ob,rawCData,int64(33792),int64(113664),int32(0));

Post-process the acquired region from the clibArray to convert it into a uint8 RGB image.
rawImageData = uint32(rawCData);
RGBA = typecast(rawImageData(:), 'uint8');
% Ignore the A channel
RGB(:,:,1) = reshape(RGBA(3:4:end),1024,1024);
RGB(:,:,2) = reshape(RGBA(2:4:end),1024,1024);
RGB(:,:,3) = reshape(RGBA(1:4:end),1024,1024);

Display the processed image region.

17-65
17 Blocked Image Processing

figure;
imshow(RGB);

Create Blocked Image Custom Adapter for Reading Whole-Slide Images

To read whole-slide images, create a custom Adapter for block-based reading and writing that uses
the capabilities of the OpenSlide Interface built above.

Subclass the images.blocked.Adapter Class

To create a blocked image adapter, first create a class that subclasses the blocked image adapter
interface class, images.blocked.Adapter. To learn more about blocked images and creating a
blocked image adapter, view the images.blocked.Adapter documentation.

17-66
Read Whole-Slide Images with Custom Blocked Image Adapter

Create a read-only OpenSlide adapter by implementing the following methods:

• openToRead - open source for reading


• getInfo - gather information about source
• getIOBlock - get specified IO block

Use the OpenSlide interface functions generated above to implement these methods.

A sample adapter is included in this example, OpenSlideAdapter.m. To view this adapter, you can
open the file in an editor.

Use the new adapter with the sample image by specifying it in the blockedImage object constructor:

bim = blockedImage(imageLocation, "Adapter", OpenSlideAdapter)

bim =
blockedImage with properties:

Read only properties


Source: "I:\my_example\CMU-1.mrxs"
Adapter: [1×1 OpenSlideAdapter]
Size: [10×3 double]
SizeInBlocks: [10×3 double]
ClassUnderlying: [10×1 string]

Settable properties
BlockSize: [10×3 double]
UserData: [1×1 struct]

disp(bim.Size)

220696 109240 3
110348 54620 3
55174 27310 3
27587 13655 3
13793 6827 3
6896 3413 3
3448 1706 3
1724 853 3
862 426 3
431 213 3

bigimageshow(bim)

17-67
17 Blocked Image Processing

See Also
bigimageshow | blockedImage | images.blocked.Adapter

17-68
Detect and Count Cell Nuclei in Whole Slide Images

Detect and Count Cell Nuclei in Whole Slide Images

This example shows how to detect and count cell nuclei in whole slide images (WSIs) of tissue stained
using hemotoxylin and eosin (H&E).

Cell counting is an important step in most digital pathology workflows. The number and distribution
of cells in a tissue sample can be a biomarker of cancer or other diseases. Digital pathology uses
digital images of microscopy slides, or whole slide images (WSIs), for clinical diagnosis or research.
To capture tissue- and cellular-level detail, WSIs have high resolutions and can have sizes on the
order of 200,000-by-100,000 pixels. To facilitate efficient display, navigation, and processing of WSIs,
the best practice is to store them in a multiresolution format. In MATLAB, you can use blocked
images to work with WSIs without loading the full image into core memory.

This example outlines an approach to cell nuclei detection and counting using blocked images. To
ensure that the detection algorithm counts cells on block borders only once, this blocked image
approach uses overlapping borders. Although this example uses a simple cell detection algorithm, you
can use the blocked image approach to create more sophisticated algorithms and implement deep
learning techniques.

Download Sample Image from Camelyon17 Data Set

This example uses one WSI from the Camelyon17 challenge. Navigate to one of the online
repositories specified on the data page, then download the file patient_000_node_1.tif from the
directory training/center_0/patient_000. Update the fileName variable to point to the full
path of the downloaded image.

fileName = "patient_000_node_1.tif";

Create and Preprocess Blocked Image

Create a blockedImage object from the sample image. This image has nine resolution levels. The
finest resolution level, which is the first level, has a size of 197226-by-96651 pixels. The coarsest
resolution level, which is the last level, has a size of 770-by-377 pixels and easily fits in memory.

bim = blockedImage(fileName);
disp(bim.NumLevels)

disp(bim.Size)

197226 96651 3
98613 48325 3
49306 24162 3
24653 12081 3
12326 6040 3
6163 3020 3
3081 1510 3
1540 755 3
770 377 3

Adjust Spatial Extents of Each Level

The TIF image metadata contains information about the world coordinates. The exact format of the
metadata varies across different sources. The file in this example stores the mapping between pixel

17-69
17 Blocked Image Processing

subscripts and world coordinates using the XResolution, YResolution, and ResolutionUnit
fields. The TIFF standard indicates that XResolution and YResolution are the number of pixels
per ResolutionUnit in the corresponding directions.

fileInfo = imfinfo(bim.Source);
info = table((1:9)',[fileInfo.XResolution]',[fileInfo.YResolution]',{fileInfo.ResolutionUnit}', .
VariableNames=["Level","XResolution","YResolution","ResolutionUnit"]);
disp(info)

Level XResolution YResolution ResolutionUnit


_____ ___________ ___________ ______________

1 41136 41136 {'Centimeter'}


2 20568 20568 {'Centimeter'}
3 10284 10284 {'Centimeter'}
4 5142.1 5142.1 {'Centimeter'}
5 2571 2571 {'Centimeter'}
6 1285.5 1285.5 {'Centimeter'}
7 642.76 642.76 {'Centimeter'}
8 321.38 321.38 {'Centimeter'}
9 160.69 160.69 {'Centimeter'}

Calcuate the size of a single pixel, or the pixel extent, in world coordinates at the finest resolution
level. The pixel extent is identical in the x and the y directions because the values of XResolution
and YResolution are equal. Then, calculate the corresponding full image size in world coordinates.

level = 1;
pixelExtentInWorld = 1/fileInfo(level).XResolution;
imageSizeInWorld = bim.Size(level,1:2)*pixelExtentInWorld;

Adjust the spatial extents of each level such that the levels cover the same physical area in the real
world. To ensure that subscripts map to pixel centers, offset the starting coordinate and the ending
coordinate by a half pixel.

bim.WorldStart = [pixelExtentInWorld/2 pixelExtentInWorld/2 0.5];


bim.WorldEnd = bim.WorldStart + [imageSizeInWorld 3];

Display the image using the bigimageshow function. bigimageshow displays the image in world
coordinates, with each axis measured in centimeters.

bigimageshow(bim)
title("WSI of Lymph Node")

17-70
Detect and Count Cell Nuclei in Whole Slide Images

Develop Basic Nuclei Detection Algorithm

This example develops the nuclei detection algorithm using a representative region of interest (ROI),
then applies the algorithm on all regions of the image that contain tissue.

Identify and display a representative ROI in the image.

xRegion = [0.53 0.55];


yRegion = [2.064 2.08];
xlim(xRegion)
ylim(yRegion)
title("Region of Interest")

17-71
17 Blocked Image Processing

Convert the world coordinates of the ROI to pixel subscripts. Then, extract the ROI data at the finest
resolution level using the getRegion function.

subStart = world2sub(bim,[yRegion(1) xRegion(1)]);


subEnd = world2sub(bim,[yRegion(2) xRegion(2)]);
imRegion = getRegion(bim,subStart,subEnd,Level=1);

Detect Nuclei Using Color Thresholding

The nuclei have a distinct stain color that enables segmentation using color thresholding. This
example provides a helper function, createNucleiMask, that segments nuclei using color
thresholding in the RGB color space. The thresholds have been selected, and the code for the helper
function has been generated, using the colorThresholder app. The helper function is attached to
the example as a supporting file.

17-72
Detect and Count Cell Nuclei in Whole Slide Images

If you want to segment the nuclei regions using a different color space or select different thresholds,
use the colorThresholder app. Open the app and load the image using this command:

colorThresholder(imRegion)

After you select a set of thresholds in the color space of your choice, generate the segmentation
function by selecting Export from the toolstrip and then selecting Export Function. Save the
generated function as createNucleiMask.m.

Apply the segmentation to the ROI, and display the mask over the ROI as a falsecolor composite
image.

nucleiMask = createNucleiMask(imRegion);
imshowpair(imRegion,nucleiMask)

17-73
17 Blocked Image Processing

Select Nuclei Using Image Region Properties

Explore properties of the regions in the mask using the imageRegionAnalyzer app. Open the app
by using this command:

imageRegionAnalyzer(nucleiMask)

Clean up regions with holes inside the nuclei by selecting Fill Holes on the toolstrip.

Sort the regions by area. Regions with the largest areas typically correspond to overlapping nuclei.
Becauses your goal is to segment single cells, you can ignore the largest regions at the expense of
potentially underestimating the total number of nuclei.

17-74
Detect and Count Cell Nuclei in Whole Slide Images

Scroll down the list until you find the largest region that corresponds to a single nucleus. Specify
maxArea as the area of this region, and specify equivDiameter as the equivalent diameter,
represented by the EquivDiameter property in the Region Properties pane. Calculate the
equivalent radius of the region as half the value of equivDiameter.

maxArea = 405;
equivDiameter = 22.7082;
equivRadius = round(equivDiameter/2);

17-75
17 Blocked Image Processing

Regions with very small area typically correspond to partial nuclei. Continue scrolling until you find
the smallest region that corresponds to a whole nucleus. Set minArea as the area of this region.
Because your goal is to segment whole cells, you can ignore regions with areas smaller than
minArea.

minArea = 229;

17-76
Detect and Count Cell Nuclei in Whole Slide Images

Filter the image so it contains only the regions with areas between minArea and maxArea. From the
toolstrip, select Filter. The binary image updates in real time to reflect the current filtering
parameters.

17-77
17 Blocked Image Processing

You can optionally change the range of areas or add additional region filtering constraints. For
example, you can require that nuclei be approximately round by excluding regions with large
perimeters. Specify the filter parameters in the Filter Regions dialog box.

This example provides a helper function, calculateNucleiProperties, that calculates the area
and centroid of regions within the range of areas [229, 405]. The code for the helper function has
been generated using the imageRegionAnalyzer app. The helper function is attached to the
example as a supporting file.

If you specify different filtering constraints, then you must generate a new filtering function by
selecting Export from the toolstrip and then selecting Export Function. Save the generated function
as calculateNucleiProperties.m. After you generate the function, you must open the function
and edit the last line of code so that the function returns the region centroids. Add 'Centroid' to
the cell array of properties measured by the regionprops function. For example:
properties = regionprops(BW_out,{'Centroid','Area','Perimeter'});

Calculate the centroids for each likely detection by using the calculateNucleiProperties helper
function. Overlay the centroids as yellow points over the ROI.
[~,stats] = calculateNucleiProperties(nucleiMask);
centroidCoords = vertcat(stats.Centroid);
figure
imshow(imRegion)
hold on

17-78
Detect and Count Cell Nuclei in Whole Slide Images

plot(centroidCoords(:,1),centroidCoords(:,2),"y.",MarkerSize=10)
hold off

Apply Nuclei Detection Algorithm

To process WSI data efficiently at fine resolution levels, apply the nuclei detection algorithm on
blocks consisting of tissue and ignore blocks without tissue. You can identify blocks with tissue using
a coarse tissue mask. For more information, see “Process Blocked Images Efficiently Using Mask” on
page 17-22.

Create Tissue Mask

The tissue is brighter than the background, which enables segmentation using color thresholding.
This example provides a helper function, createTissueMask, that segments tissue using color
thresholding in the L*a*b* color space. The thresholds have been selected, and the code for the
helper function has been generated, using the colorThresholder app. The helper function is
attached to the example as a supporting file.

17-79
17 Blocked Image Processing

If you want to segment the tissue using a different color space or select different thresholds, use the
colorThresholder app. Open a low-resolution version of the image in the app using these
commands:

imCoarse = gather(bim);
colorThresholder(imCoarse)

After you select a set of thresholds, generate the segmentation function by selecting Export from the
toolstrip and then selecting Export Function. Save the generated function as
createTissueMask.m.

Create a mask of the regions of interest that contain tissue by using the apply function. In this code,
the apply function runs the createTissueMask function on each block of the original blocked
image bim and assembles the tissue masks into a single blockedImage with the correct world
coordinates. This example creates the mask at the seventh resolution level, which fits in memory and
follows the contours of the tissue better than a mask at the coarsest resolution level.

maskLevel = 7;
tissueMask = apply(bim,@(bs)createTissueMask(bs.Data),Level=maskLevel);

Overlay the mask on the original WSI image using the bigimageshow function.

hbim = bigimageshow(bim);
showlabels(hbim,tissueMask,AlphaData=tissueMask,Alphamap=[0.5 0])
title("Tissue Mask; Background Has Green Tint")

17-80
Detect and Count Cell Nuclei in Whole Slide Images

Select Blocks With Tissue

Select the set of blocks on which to run the nuclei detection algorithm by using the
selectBlockLocations function. Include only blocks that have at least one pixel inside the mask
by specifying the InclusionThreshold name-value argument as 0. Specify a large block size to
perform the nuclei detection more efficiently. The upper limit of the block size depends on the amount
of system RAM. If you perform the detection using parallel processing, you may need to reduce the
block size.

blockSize = [2048 2048 3];


bls = selectBlockLocations(bim,BlockSize=blockSize, ...
Masks=tissueMask,InclusionThreshold=0,Levels=1);

Apply Algorithm on Blocks with Tissue

Perform the nuclei detection on the selected blocks using the apply function. In this code, the apply
function runs the countNuclei helper function (defined at the end of the example) on each block of
the blocked image bim and returns a structure with information about the area and centroid of each
nucleus in the block. To ensure the function detects and counts nuclei on the edges of a block only

17-81
17 Blocked Image Processing

once, add a border size equal to the equivalent radius value of the largest nucleus region. To
accelerate the processing time, process the blocks in parallel.

if(exist("bndetections","dir"))
rmdir("bndetections","s")
end
bndetections = apply(bim,@(bs)countNuclei(bs), ...
BlockLocationSet=bls,BorderSize=[equivRadius equivRadius], ...
Adapter=images.blocked.MATBlocks,OutputLocation="bndetections", ...
UseParallel=true);

Starting parallel pool (parpool) using the 'Processes' profile ...


Connected to parallel pool with 6 workers.

Display Heatmap of Detected Nuclei

Create a heatmap that shows the distribution of detected nuclei across the tissue sample. Each pixel
in the heatmap contains the count of detected nuclei in a square of size 0.01-by-0.01 square
centimeters in world coordinates.

Determine the number of pixels that this real world area contains:

numPixelsInSquare = world2sub(bim,[0.01 0.01]);

Calculate the size of the heatmap, in pixels, at the finest level. Then initialize a heatmap consisting of
all zeros.

bimSize = bim.Size(1,1:2);
heatMapSize = round(bimSize./numPixelsInSquare);
heatMap = zeros(heatMapSize);

Load the blocked image of detected nuclei into memory. Get the list of the (row, column) pixel indices
of the centroids of all detected nuclei.

ndetections = gather(bndetections);
centroidsPixelIndices = vertcat(ndetections.ncentroidsRC);

Map the nuclei centroid coordinates to pixels in the heatmap. First, normalize the (row, column)
centroid indices relative to the size of the blocked image, in pixels. Then, multiply the normalized
coordinates by the size of the heatmap to yield the (row, column) pixel indices of the centroids in the
heatmap.

centroidsRelative = centroidsPixelIndices./bimSize;
centroidsInHeatmap = ceil(centroidsRelative.*heatMapSize);

Loop through the nuclei and, for each nucleus, increment the value of the corresponding heatmap
pixel by one.

for ind = 1:size(centroidsInHeatmap)


r = centroidsInHeatmap(ind,1);
c = centroidsInHeatmap(ind,2);
heatMap(r,c) = heatMap(r,c)+1;
end

Display the heatmap.

maxCount = max(heatMap,[],"all");
figure

17-82
Detect and Count Cell Nuclei in Whole Slide Images

imshow(heatMap,hot(maxCount))
hcb = colorbar;
hcb.Label.String = "Nuclei Density (count/0.01 cm square)";

Convert the heatmap into a blocked image with the correct world coordinates.
bHeatmap = blockedImage(heatMap, ...
WorldStart=bim.WorldStart(1,1:2),WorldEnd=bim.WorldEnd(1,1:2));

Display the original WSI data next to the heatmap.


tiledlayout(1,2)
nexttile
hl = bigimageshow(bim);
title("WSI Image")
nexttile
hr = bigimageshow(bHeatmap,CDataMapping="direct");

17-83
17 Blocked Image Processing

colormap(hr.Parent,hot(maxCount))
title("Nuclei Count Heatmap")

Link the axes of the two images to enable comparison.


linkaxes([hl.Parent hr.Parent])

Zoom in on an interesting region.


xlim(hl.Parent,[0.56514 0.85377])
ylim(hl.Parent,[2.6624 3.2514])

Inspect the detected nuclei across the entire WSI using the helper function
exploreNucleiDetectionResults, which is attached to the example as a supporting file. The
helper function displays the WSI with a draggable overview rectangle (shown in blue) next to a detail
view of the portion of the image within the overview rectangle. You can move and resize the overview
window interactively, and the detail view updates to display the new selected region and its centroids.

17-84
Detect and Count Cell Nuclei in Whole Slide Images

roi = [0.62333 2.0725 0.005 0.005];


h = exploreNucleiDetectionResults(bim,bndetections,roi);

Supporting Function

The countNuclei helper function performs these operations to count nuclei within a block of
blocked image data:

• Create a binary mask of potential nuclei based on a color threshold, using the
createNucleiMask helper function. This function has been generated using the
colorThresholder app and is attached to the example as a supporting file.
• Select individual nuclei within the binary mask based on region properties, and return the
measured area and centroid of each nucleus, using the calculateNucleiProperties helper
function. This function has been generated using the imageRegionAnalyzer app and is attached
to the example as a supporting file.

17-85
17 Blocked Image Processing

• Create a structure containing the (row, column) coordinates of the top-left and bottom-right
corners of the block, and the area and centroid measurements of nuclei within the block. Omit
nuclei whose centroids are near the border of the block, because the function counts these nuclei
in other partially overlapping blocks.

To implement a more sophisticated nuclei detection algorithm, replace the calls to the
createNucleiMask and calculateNucleiProperties functions with calls to your custom
functions. Preserve the rest of the countNuclei helper function for use with the apply function.

function nucleiInfo = countNuclei(bs)


im = bs.Data;

% Create a binary mask of regions where nuclei are likely to be present


mask = createNucleiMask(im);

% Calculate the area and centroid of likely nuclei


[~,stats] = calculateNucleiProperties(mask);

% Initialize result structure


nucleiInfo.topLeftRC = bs.Start(1:2);
nucleiInfo.botRightRC = bs.End(1:2);
nucleiInfo.narea = [];
nucleiInfo.ncentroidsRC = [];

if ~isempty(stats)
% Go from (x,y) coordinates to (row, column) pixel subscripts
% relative to the block
nucleiInfo.ncentroidsRC = fliplr(vertcat(stats.Centroid));
nucleiInfo.narea = vertcat(stats.Area);

% Trim out centroids on the borders to prevent double-counting


onBorder = nucleiInfo.ncentroidsRC(:,1) < bs.BorderSize(1) ...
| nucleiInfo.ncentroidsRC(:,1) > size(im,1)-2*bs.BorderSize(1) ...
| nucleiInfo.ncentroidsRC(:,2) < bs.BorderSize(2) ...
| nucleiInfo.ncentroidsRC(:,2) > size(im,2)-2*bs.BorderSize(2);
nucleiInfo.ncentroidsRC(onBorder,:) = [];
nucleiInfo.narea(onBorder) = [];

% Convert the coordinates of block centroids from (row, column)


% subscripts relative to the block to (row, column) subscripts relative
% to the global image
nucleiInfo.ncentroidsRC = nucleiInfo.ncentroidsRC + nucleiInfo.topLeftRC;
end
end

See Also
blockedImage | bigimageshow | apply | selectBlockLocations

Related Examples
• “Process Blocked Images Efficiently Using Mask” on page 17-22
• “Set Up Spatial Referencing for Blocked Images” on page 17-2

17-86
18

Neighborhood and Block Operations

This chapter discusses these generic block processing functions. Topics covered include

• “Neighborhood or Block Processing: An Overview” on page 18-2


• “Sliding Neighborhood Operations” on page 18-3
• “Distinct Block Processing” on page 18-6
• “Block Size and Performance” on page 18-9
• “Parallel Block Processing on Large Image Files” on page 18-13
• “Perform Block Processing on Image Files in Unsupported Formats” on page 18-15
• “Use Column-wise Processing to Speed Up Sliding Neighborhood or Distinct Block Operations”
on page 18-21
• “Block Processing Large Images” on page 18-24
• “Compute Statistics for Large Images” on page 18-29
18 Neighborhood and Block Operations

Neighborhood or Block Processing: An Overview


Certain image processing operations involve processing an image in sections, called blocks or
neighborhoods, rather than processing the entire image at once. Several functions in the toolbox,
such as linear filtering and morphological functions, use this approach.

The toolbox includes several functions that you can use to implement image processing algorithms as
a block or neighborhood operation. These functions break the input image into blocks or
neighborhoods, call the specified function to process each block or neighborhood, and then
reassemble the results into an output image. The following table summarizes these functions.

Function Description
nlfilter Implements sliding neighborhood operations that you can use to
process an input image in a pixel-wise fashion. For each pixel in
the input image, the function performs the operation you specify
on a block of neighboring pixels to determine the value of the
corresponding pixel in the output image. For more information,
see “Sliding Neighborhood Operations” on page 18-3
blockproc Implements distinct block operations that you can use to process
an input image a block at a time. The function divides the image
into rectangular blocks, and performs the operation you specify
on each individual block to determine the values of the pixels in
the corresponding block of the output image. For more
information, see “Distinct Block Processing” on page 18-6
colfilt Implements column-wise processing operations which provide a
way of speeding up neighborhood or block operations by
rearranging blocks into matrix columns. For more information,
see “Use Column-wise Processing to Speed Up Sliding
Neighborhood or Distinct Block Operations” on page 18-21.

18-2
Sliding Neighborhood Operations

Sliding Neighborhood Operations

In this section...
“Determine the Center Pixel” on page 18-3
“General Algorithm of Sliding Neighborhood Operations” on page 18-4
“Border Padding Behavior in Sliding Neighborhood Operations” on page 18-4
“Implementing Linear and Nonlinear Filtering as Sliding Neighborhood Operations” on page 18-4

A sliding neighborhood operation is an operation that is performed a pixel at a time, with the value of
any given pixel in the output image being determined by the application of an algorithm to the values
of the corresponding input pixel's neighborhood. A pixel's neighborhood is some set of pixels, defined
by their locations relative to that pixel, which is called the center pixel. The neighborhood is a
rectangular block, and as you move from one element to the next in an image matrix, the
neighborhood block slides in the same direction. (To operate on an image a block at a time, rather
than a pixel at a time, use the distinct block processing function. See “Distinct Block Processing” on
page 18-6 for more information.)

The following figure shows the neighborhood blocks for some of the elements in a 6-by-5 matrix with
2-by-3 sliding blocks. The center pixel for each neighborhood is marked with a dot. For information
about how the center pixel is determined, see “Determine the Center Pixel” on page 18-3.

Neighborhood Blocks in a 6-by-5 Matrix

Determine the Center Pixel


The center pixel is the actual pixel in the input image being processed by the operation. If the
neighborhood has an odd number of rows and columns, the center pixel is actually in the center of
the neighborhood. If one of the dimensions has even length, the center pixel is just to the left of
center or just above center. For example, in a 2-by-2 neighborhood, the center pixel is the upper left
one.

For any m-by-n neighborhood, the center pixel is

floor(([m n]+1)/2)

In the 2-by-3 block shown in the preceding figure, the center pixel is (1,2), or the pixel in the second
column of the top row of the neighborhood.

18-3
18 Neighborhood and Block Operations

General Algorithm of Sliding Neighborhood Operations


To perform a sliding neighborhood operation,

1 Select a single pixel.


2 Determine the pixel's neighborhood.
3 Apply a function to the values of the pixels in the neighborhood. This function must return a
scalar.
4 Find the pixel in the output image whose position corresponds to that of the center pixel in the
input image. Set this output pixel to the value returned by the function.
5 Repeat steps 1 through 4 for each pixel in the input image.

For example, the function might be an averaging operation that sums the values of the neighborhood
pixels and then divides the result by the number of pixels in the neighborhood. The result of this
calculation is the value of the output pixel.

Border Padding Behavior in Sliding Neighborhood Operations


As the neighborhood block slides over the image, some of the pixels in a neighborhood might be
missing, especially if the center pixel is on the border of the image. For example, if the center pixel is
the pixel in the upper left corner of the image, the neighborhoods include pixels that are not part of
the image.

To process these neighborhoods, sliding neighborhood operations pad the borders of the image,
usually with 0's. In other words, these functions process the border pixels by assuming that the image
is surrounded by additional rows and columns of 0's. These rows and columns do not become part of
the output image and are used only as parts of the neighborhoods of the actual pixels in the image.

Implementing Linear and Nonlinear Filtering as Sliding Neighborhood


Operations
You can use sliding neighborhood operations to implement many kinds of filtering operations. One
example of a sliding neighbor operation is convolution, which is used to implement linear filtering.
MATLAB provides the conv and filter2 functions for performing convolution, and the toolbox
provides the imfilter function. See “What Is Image Filtering in the Spatial Domain?” on page 8-2
for more information about these functions.

In addition to convolution, there are many other filtering operations you can implement through
sliding neighborhoods. Many of these operations are nonlinear in nature. For example, you can
implement a sliding neighborhood operation where the value of an output pixel is equal to the
standard deviation of the values of the pixels in the input pixel's neighborhood.

To implement a variety of sliding neighborhood operations, use the nlfilter function. nlfilter
takes as input arguments an image, a neighborhood size, and a function that returns a scalar, and
returns an image of the same size as the input image. nlfilter calculates the value of each pixel in
the output image by passing the corresponding input pixel's neighborhood to the function.

Note Many operations that nlfilter can implement run much faster if the computations are
performed on matrix columns rather than rectangular neighborhoods. For information about this
approach, see “Use Column-wise Processing to Speed Up Sliding Neighborhood or Distinct Block
Operations” on page 18-21.

18-4
Sliding Neighborhood Operations

For example, this code computes each output pixel by taking the standard deviation of the values of
the input pixel's 3-by-3 neighborhood (that is, the pixel itself and its eight contiguous neighbors).

I = imread("tire.tif");
I2 = nlfilter(I,[3 3],"std2");

You can also write code to implement a specific function, and then use this function with nlfilter.
For example, this command processes the matrix I in 2-by-3 neighborhoods with a function called
myfun.m. The syntax @myfun is an example of a function handle.

I2 = nlfilter(I,[2 3],@myfun);

If you prefer not to write code to implement a specific function, you can use an anonymous function
instead. This example converts the image to class double because the square root function is not
defined for the uint8 data type.

I = im2double(imread("tire.tif"));
f = @(x) sqrt(min(x(:)));
I2 = nlfilter(I,[2 2],f);

(For more information on function handles, see “Create Function Handle”. For more information
about anonymous functions, see “Anonymous Functions”.)

The following example uses nlfilter to set each pixel to the maximum value in its 3-by-3
neighborhood.

Note This example is only intended to illustrate the use of nlfilter. For a faster way to perform
this local maximum operation, use imdilate.

I = imread("tire.tif");
f = @(x) max(x(:));
I2 = nlfilter(I,[3 3],f);
imshow(I);
figure, imshow(I2);

Each Output Pixel Set to Maximum Input Neighborhood Value

18-5
18 Neighborhood and Block Operations

Distinct Block Processing


In this section...
“Implement Block Processing Using the blockproc Function” on page 18-6
“Apply Padding” on page 18-7

In distinct block processing, you divide an image matrix into rectangular blocks and perform image
processing operations on individual blocks. Blocks start in the upper left corner and completely cover
the image without overlap. If the blocks do not fit exactly over the image, then any incomplete blocks
are considered partial blocks. The figure shows a 15-by-30 pixel image divided into 4-by-8 pixel
blocks. The right and bottom edges have partial blocks.

Image Divided into Distinct Blocks

You can process partial blocks as is, or you can add padding to the image so that the image size is a
multiple of the block size. For more information, see “Apply Padding” on page 18-7.

Implement Block Processing Using the blockproc Function


To perform distinct block operations, use the blockproc function. The blockproc function extracts
each distinct block from an image and passes it to a function you specify for processing. The
blockproc function assembles the returned blocks to create an output image.

For example, the commands below process image I in 25-by-25 blocks with the function myfun. In
this case, the myfun function resizes the blocks to make a thumbnail. (For more information about
function handles, see “Create Function Handle”. For more information about anonymous functions,
see “Anonymous Functions”.)

myfun = @(block_struct) imresize(block_struct.data,0.15);


I = imread("tire.tif");
I2 = blockproc(I,[25 25],myfun);

Note Due to block edge effects, resizing an image using blockproc does not produce the same
results as resizing the entire image at once.

18-6
Distinct Block Processing

The example below uses the blockproc function to set every pixel in each 32-by-32 block of an
image to the average of the elements in that block. The anonymous function computes the mean of
the block, and then multiplies the result by a matrix of ones, so that the output block is the same size
as the input block. As a result, the output image is the same size as the input image. The blockproc
function does not require that the images be the same size. If this is the result you want, make sure
that the function you specify returns blocks of the appropriate size:

myfun = @(block_struct) ...


uint8(mean2(block_struct.data)* ...
ones(size(block_struct.data)));
I2 = blockproc("moon.tif",[32 32],myfun);

Note Many operations that blockproc can implement run much faster if the computations are
performed on matrix columns rather than rectangular blocks. For information about this approach,
see “Use Column-wise Processing to Speed Up Sliding Neighborhood or Distinct Block Operations”
on page 18-21.

Apply Padding
When processing an image in blocks, you may wish to add padding for two reasons:

• To address partial blocks when the image size is not a multiple of the block size.
• To create overlapping borders to each block.

By default, partial blocks are processed as is, with no additional padding. Set the
PadPartialBlocks argument to true to pad the right or bottom edges of the image and make the
blocks full-sized.

Use the BorderSize argument to specify extra rows and columns of pixels outside the block whose
values are taken into account when processing the block. When there is a border, blockproc passes
the expanded block, including the border, to the specified function.

For example, this command processes image A in 4-by-8 pixel blocks, adding a 1-by-2 pixel border
around each block and zero-padding partial blocks to the full block size. This pixel border expands
each block by one additional pixel on the top and bottom edges and two pixels along the left and right

18-7
18 Neighborhood and Block Operations

edges during processing. The figure depicts a sample image A and indicates in gray the pixel border
added to three sample blocks.

B = blockproc(A,[4 8],myfun,BorderSize=[1 2], ...


PadPartialBlocks=true)

Image A Divided into Distinct Blocks with Specified Borders

Both padding of partial blocks and block borders add to the overall size of image A, as depicted in the
figure. Because partial blocks are padded, the original 15-by-30 pixel image increases in size to the
next multiple of the block size, in this case, 16-by-32 pixels. Because a 1-by-2 pixel border is added to
each block, blocks along the image edges include pixels that extend beyond the bounds of the original
image. The border pixels along the image edges increase the effective size of the input matrix to 18-
by-36 pixels. The outermost rectangle in the figure delineates the new boundaries of the image after
all padding is added.

By default, blockproc pads the image with zeros. If you need a different type of padding, use the
PadMethod name-value argument of the blockproc function.

See Also

More About
• “Neighborhood or Block Processing: An Overview” on page 18-2
• “Block Size and Performance” on page 18-9

18-8
Block Size and Performance

Block Size and Performance


When using the blockproc function to either read or write image files, the number of times the file
is accessed can significantly affect performance. In general, selecting larger block sizes reduces the
number of times blockproc has to access the disk, at the cost of using more memory to process
each block. Knowing the file format layout on disk can help you select block sizes that minimize the
number of times the disk is accessed.

TIFF Image Characteristics


TIFF images organize their data on disk in one of two ways: in tiles or in strips. A tiled TIFF image
stores rectangular blocks of data contiguously in the file. Each tile is read and written as a single
unit. TIFF images with strip layout have data stored in strips; each strip spans the entire width of the
image and is one or more rows in height. (Stripped TIFF images are always organized in rows, never
in columns.) Like a tile, each strip is stored, read, and written as a single unit.

When selecting an appropriate block size for TIFF image processing, understanding the organization
of your TIFF image is important. To find out whether your image is organized in tiles or strips, use
the imfinfo function.

The struct returned by imfinfo for TIFF images contains the fields TileWidth and TileLength. If
these fields have valid (nonempty) values, then the image is a tiled TIFF, and these fields define the
size of each tile. If these fields contain values of empty ([]), then the TIFF is organized in strips. For
TIFFs with strip layout, refer to the struct field RowsPerStrip, which defines the size of each strip
of data.

When reading TIFF images, the minimum amount of data that can be read is a single tile or a single
strip, depending on the type of TIFF. To optimize the performance of blockproc, select block sizes
that correspond closely with how your TIFF image is organized on disk. In this way, you can avoid
rereading the same pixels multiple times.

Choose Block Size to Optimize blockproc Performance

This example shows how block size influences the performance of blockproc. In each of these cases,
the total number of pixels in each block is approximately the same. Only the dimensions of the blocks
are different.

Read an image file and convert it to a TIFF file.

I = imread('concordorthophoto.png','PNG');
imshow(I)

18-9
18 Neighborhood and Block Operations

imwrite(I,'concordorthophoto.tif','TIFF');

Use imfinfo to determine whether concordorthophoto.tif is organized in strips or tiles. The


RowsPerStrip field of the info struct indicates that this TIFF image is organized in strips with 34
rows per strip. Each strip spans the width of the image and is 34 pixels tall.

info = imfinfo('concordorthophoto.tif');
info.RowsPerStrip

ans = 34

Get the image size from the Height and Width fields of info. This image has size 2215-by-2956
pixels.

h = info.Height

h = 2215

w = info.Width

w = 2956

18-10
Block Size and Performance

Case 1: Square Blocks

Process the image using square blocks of size 500-by-500 pixels. Each time the blockproc function
accesses the disk, it reads in an entire strip and discards any part of the strip not included in the
current block. With 34 rows per strip and 500 rows per block, blockproc accesses the disk 15 times
for each block. The image is approximately 6 blocks wide (2956/500 = 5.912). blockproc reads the
same strip over and over again for each block that includes pixels contained in that strip. Since the
image is six blocks wide, blockproc reads every strip of the file six times.

blockSizeSquare = 500;
tic
im = blockproc('concordorthophoto.tif',[blockSizeSquare blockSizeSquare],@(s) s.data);
toc

Elapsed time is 0.373874 seconds.

Case 2: Column-Shaped Blocks

Process the image using blocks that span the full height of the image. Stripped TIFF files are
organized in rows, so this block layout is exactly opposite the actual file layout on disk.

Select a block width such that the blocks have approximately the same number of pixels as the
square block.

numCols = ceil(blockSizeSquare.^2 / h)

numCols = 113

The image is over 26 blocks wide (2956/numCols = 26.1593). Every strip must be read for every
block, therefore blockproc reads the entire image from disk 26 times.

tic
im = blockproc('concordorthophoto.tif',[h numCols],@(s) s.data);
toc

Elapsed time is 0.291722 seconds.

Case 3: Row-Shaped Blocks

Process the image using blocks that span the full width of the image. This block layout aligns with the
TIFF file layout on disk.

Select a block height such that the blocks have approximately the same number of pixels as the
square block.

numRows = ceil(blockSizeSquare.^2 / w)

numRows = 85

Each block spans the width of the image, therefore blockproc reads each strip exactly once. The
execution time is shortest when the block layout aligns with the TIFF image strips.

tic
im = blockproc('concordorthophoto.tif',[numRows w],@(s) s.data);
toc

18-11
18 Neighborhood and Block Operations

Elapsed time is 0.133772 seconds.

See Also
blockproc

More About
• “Neighborhood or Block Processing: An Overview” on page 18-2
• “Distinct Block Processing” on page 18-6

18-12
Parallel Block Processing on Large Image Files

Parallel Block Processing on Large Image Files


If you have a Parallel Computing Toolbox license, then the blockproc function can take advantage of
multiple processor cores on your machine to perform parallel block processing.

What is Parallel Block Processing?


Parallel block processing allows you to process many blocks simultaneously by distributing task
computations to a collection of MATLAB sessions, called workers. The MATLAB session with which
you interact is called the client. The client reserves a collection of workers, called a parallel pool.
Then, the client divides the input image into blocks and sends blocks to the worker MATLAB sessions.
Each worker processes a subset of blocks and sends the results back to the client. The client
aggregates the results into an output variable.

When to Use Parallel Block Processing


When processing small images, serial mode is expected to be faster than parallel mode. For larger
images, however, you may see significant performance gains from parallel processing. The
performance of parallel block processing depends on three factors:

• Function used for processing


• Image size
• Block size

In general, using larger blocks while block processing an image results in faster performance than
completing the same task using smaller blocks. However, sometimes the task or algorithm you are
applying to your image requires a certain block size, and you must use smaller blocks. When block
processing using smaller blocks, parallel block processing is typically faster than regular (serial)
block processing, often by a large margin. If you are using larger blocks, however, you might need to
experiment to determine whether parallel block processing saves computing time.

How to Use Parallel Block Processing


You must meet two conditions to use parallel block processing:

• The source image is not specified as an ImageAdapter class.


• A Parallel Computing Toolbox license exists in the MATLAB installation.

If you meet these conditions, then you can invoke parallel processing in blockproc by specifying the
UseParallel name-value argument as true. When you do so, MATLAB automatically opens a
parallel pool of workers on your local machine and uses all available workers to process the input
image.

In the following example, compute a discrete cosine transform for each 8-by-8 block of an image in
parallel:
blockFun = @(block_struct) dct2(block_struct.data);
result = blockproc(input_image,[8 8],blockFun, ...
UseParallel=true);

Control parallel behavior with the parallel preferences, including scaling up to a cluster. See
parpool for information on configuring your parallel environment.

18-13
18 Neighborhood and Block Operations

See Also
blockproc

More About
• “What Is Parallel Computing?” (Parallel Computing Toolbox)
• “Choose a Parallel Computing Solution” (Parallel Computing Toolbox)
• “Run Batch Parallel Jobs” (Parallel Computing Toolbox)

18-14
Perform Block Processing on Image Files in Unsupported Formats

Perform Block Processing on Image Files in Unsupported


Formats

In addition to reading TIFF or JPEG2000 files and writing TIFF files, the blockproc function can
read and write other formats. To work with image data in another file format, you must construct a
class that inherits from the ImageAdapter class. The ImageAdapter class is an abstract class that
is part of the Image Processing Toolbox software. It defines the signature for methods that
blockproc uses for file I/O with images on disk. You can associate instances of an Image Adapter
class with a file and use them as arguments to blockproc for file-based block processing.

This section demonstrates the process of writing an Image Adapter class by discussing an example
class (the LanAdapter class). The LanAdapter class is part of the toolbox. Use this simple, read-
only class to process arbitrarily large uint8 LAN files with blockproc.

Learning More About the LAN File Format


To understand how the LanAdapter class works, you must first know about the LAN file format.
Landsat thematic mapper imagery is stored in the Erdas LAN file format. Erdas LAN files contain a
128-byte header followed by one or more spectral bands of data, band-interleaved-by-line (BIL), in
order of increasing band number. The data is stored in little-endian byte order. The header contains
several pieces of important information about the file, including size, data type, and number of bands
of imagery contained in the file. The LAN file format specification defines the first 24 bytes of the file
header as shown in the table.

File Header Content

Bytes Data Type Content


1–6 6 byte array of characters that 'HEADER' or 'HEAD74' (Pre-7.4 files say 'HEADER'.)
identify the version of the file
format
7–8 16-bit integer Pack type of the file (indicating bit depth)
9–10 16-bit integer Number of bands of data
11–16 6 bytes Unused
17–20 32-bit integer Number of columns of data
21–24 32-bit integer Number of rows of data

The remaining 104 bytes contain various other properties of the file, which this example does not use.

Parsing the Header


Typically, when working with LAN files, the first step is to learn more about the file by parsing the
header. The following code shows how to parse the header of the rio.lan file:

1 Open the file:

file_name = 'rio.lan';
fid = fopen(file_name,'r');
2 Read the first six bytes of the header:

18-15
18 Neighborhood and Block Operations

headword = fread(fid,6,'uint8=>char')';
fprintf('Version ID: %s\n',headword);
3 Read the pack type:

pack_type = fread(fid,1,'uint16',0,'ieee-le');
fprintf('Pack Type: %d\n',pack_type);
4 Read the number of spectral bands:

num_bands = fread(fid,1,'uint16',0,'ieee-le');
fprintf('Number of Bands: %d\n',num_bands);
5 Read the image width and height:

unused_bytes = fread(fid,6,'uint8',0,'ieee-le');
width = fread(fid,1,'uint32',0,'ieee-le');
height = fread(fid,1,'uint32',0,'ieee-le');
fprintf('Image Size (w x h): %d x %d\n',width,height);
6 Close the file:

fclose(fid);

The output appears as follows:

Version ID: HEAD74


Pack Type: 0
Number of Bands: 7
Image Size (w x h): 512 x 512

The rio.lan file is a 512-by-512, 7-band image. The pack type of 0 indicates that each sample is an
8-bit, unsigned integer (uint8 data type).

Reading the File


In a typical, in-memory workflow, you would read this LAN file using the multibandread function.
The LAN format stores the RGB data from the visible spectrum in bands 3, 2, and 1, respectively. You
could create a truecolor image for further processing.

truecolor = multibandread('rio.lan', [512, 512, 7],...


'uint8=>uint8', 128,'bil', 'ieee-le', {'Band','Direct',[3 2 1]});

For very large LAN files, however, reading and processing the entire image in memory using
multibandread can be impractical, depending on your system capabilities. To avoid memory
limitations, use the blockproc function. With blockproc, you can process images with a file-based
workflow. You can read, process, and then write the results, one block at a time.

The blockproc function only supports reading and writing certain file formats, but it is extensible
via the ImageAdapter class. To write an Image Adapter class for a particular file format, you must
be able to:

• Query the size of the file on disk


• Read a rectangular block of data from the file

If you meet these two conditions, you can write an Image Adapter class for LAN files. You can parse
the image header to query the file size, and you can modify the call to multibandread to read a
particular block of data. You can encapsulate the code for these two objectives in an Image Adapter
class structure, and then operate directly on large LAN files with the blockproc function. The

18-16
Perform Block Processing on Image Files in Unsupported Formats

LanAdapter class is an Image Adapter class for LAN files, and is part of the Image Processing
Toolbox software.

Examining the LanAdapter Class


This section describes the constructor, properties, and methods of the LanAdapter class. Studying
the LanAdapter class helps prepare you for writing your own Image Adapter class. If you are new to
object-oriented programming, see Developing Classes—Typical Workflow for general information on
writing classes.

Open LanAdapter.m and look at the implementation of the LanAdapter class.

Classdef

The LanAdapter class begins with the keyword classdef. The classdef section defines the class
name and indicates that LanAdapter inherits from the ImageAdapter superclass. Inheriting from
ImageAdapter allows the new class to:

• Interact with blockproc


• Define common ImageAdapter properties
• Define the interface that blockproc uses to read and write to LAN files

Properties

Following the classdef section, the LanAdapter class contains two blocks of class properties. The
first block contains properties that are publicly visible, but not publicly modifiable. The second block
contains fully public properties. The LanAdapter class stores some information from the file header
as class properties. Other classes that also inherit from ImageAdapter, but that support different file
formats, can have different properties.

classdef LanAdapter < ImageAdapter


properties(GetAccess = public, SetAccess = private)
Filename
NumBands
end

properties(Access = public)
SelectedBands
end

In addition to the properties defined in LanAdapter.m, the class inherits the ImageSize property
from the ImageAdapter superclass. The new class sets the ImageSize property in the constructor.

Methods: Class Constructor

The class constructor initializes the LanAdapter object. The LanAdapter constructor parses the
LAN file header information and sets the class properties. Implement the constructor, a class method,
inside a methods block.

The constructor contains much of the same code used to parse the LAN file header. The LanAdapter
class only supports uint8 data type files, so the constructor validates the pack type of the LAN file,
as well as the headword. The class properties store the remaining information. The method
responsible for reading pixel data uses these properties. The SelectedBands property allows you to
read a subset of the bands, with the default set to read all bands.

18-17
18 Neighborhood and Block Operations

methods

function obj = LanAdapter(fname)


% LanAdapter constructor for LanAdapter class.
% When creating a new LanAdapter object, read the file
% header to validate the file as well as save some image
% properties for later use.

% Open the file.


obj.Filename = fname;
fid = fopen(fname,'r');

% Verify that the file begins with the headword 'HEADER' or


% 'HEAD74', as per the Erdas LAN file specification.
headword = fread(fid,6,'uint8=>char');
if ~(strcmp(headword','HEADER') || strcmp(headword',...
'HEAD74'))
error('Invalid LAN file header.');
end

% Read the data type from the header.


pack_type = fread(fid,1,'uint16',0,'ieee-le');
if ~isequal(pack_type,0)
error(['Unsupported pack type. The LanAdapter example ' ...
'only supports reading uint8 data.']);
end

% Provide band information.


obj.NumBands = fread(fid,1,'uint16',0,'ieee-le');
% By default, return all bands of data
obj.SelectedBands = 1:obj.NumBands;

% Specify image width and height.


unused_field = fread(fid,6,'uint8',0,'ieee-le');
width = fread(fid,1,'uint32',0,'ieee-le');
height = fread(fid,1,'uint32',0,'ieee-le');
obj.ImageSize = [height width];

% Close the file handle


fclose(fid);

end % LanAdapter

Methods: Required

Adapter classes have two required methods defined in the abstract superclass, ImageAdapter. All
Image Adapter classes must implement these methods. The blockproc function uses the first
method, readRegion, to read blocks of data from files on disk. The second method, close, performs
any necessary cleanup of the Image Adapter object.
function data = readRegion(obj, region_start, region_size)
% readRegion reads a rectangular block of data from the file.

% Prepare various arguments to MULTIBANDREAD.


header_size = 128;
rows = region_start(1):(region_start(1) + region_size(1) - 1);
cols = region_start(2):(region_start(2) + region_size(2) - 1);

18-18
Perform Block Processing on Image Files in Unsupported Formats

% Call MULTIBANDREAD to get data.


full_size = [obj.ImageSize obj.NumBands];
data = multibandread(obj.Filename, full_size,...
'uint8=>uint8', header_size, 'bil', 'ieee-le',...
{'Row', 'Direct', rows},...
{'Column','Direct', cols},...
{'Band', 'Direct', obj.SelectedBands});

end % readRegion

readRegion has two input arguments, region_start and region_size. The region_start
argument, a two-element vector in the form [row col], defines the first pixel in the request block of
data. The region_size argument, a two-element vector in the form [num_rows num_cols],
defines the size of the requested block of data. The readRegion method uses these input arguments
to read and return the requested block of data from the image.

The readRegion method is implemented differently for different file formats, depending on what
tools are available for reading the specific files. The readRegion method for the LanAdapter class
uses the input arguments to prepare custom input for multibandread. For LAN files,
multibandread provides a convenient way to read specific subsections of an image.

The other required method is close. The close method of the LanAdapter class appears as
follows:

function close(obj)
% Close the LanAdapter object. This method is a part
% of the ImageAdapter interface and is required.
% Since the readRegion method is "atomic", there are
% no open file handles to close, so this method is empty.

end

end % public methods

end % LanAdapter

As the comments indicate, the close method for LanAdapter has nothing to do, so close is empty.
The multibandread function does not require maintenance of open file handles, so the close
method has no handles to clean up. Image Adapter classes for other file formats may have more
substantial close methods including closing file handles and performing other class clean-up
responsibilities.

Methods (Optional)

As written, the LanAdapter class can only read LAN files, not write them. If you want to write output
to a LAN format file, or another file with a format that blockproc does not support, implement the
optional writeRegion method. Then, you can specify your class as a 'Destination' parameter in
blockproc and write output to a file of your chosen format.

The signature of the writeRegion method is as follows:

function [] = writeRegion(obj, region_start, region_data)

The first argument, region_start, indicates the first pixel of the block that the writeRegion
method writes. The second argument, region_data, contains the new data that the method writes
to the file.

18-19
18 Neighborhood and Block Operations

Classes that implement the writeRegion method can be more complex than LanAdapter. When
creating a writable Image Adapter object, classes often have the additional responsibility of creating
new files in the class constructor. This file creation requires a more complex syntax in the
constructor, where you potentially need to specify the size and data type of a new file you want to
create. Constructors that create new files can also encounter other issues, such as operating system
file permissions or potentially difficult file-creation code.

Using the LanAdapter Class with blockproc


Now that you understand how the LanAdapter class works, you can use it to enhance the visible
bands of a LAN file. See the “Compute Statistics for Large Images” on page 18-29 example to see
how the blockproc function works with the LanAdapter class.

See Also
blockproc | ImageAdapter | multibandread

More About
• “Compute Statistics for Large Images” on page 18-29

18-20
Use Column-wise Processing to Speed Up Sliding Neighborhood or Distinct Block Operations

Use Column-wise Processing to Speed Up Sliding


Neighborhood or Distinct Block Operations
In this section...
“Using Column Processing with Sliding Neighborhood Operations” on page 18-21
“Using Column Processing with Distinct Block Operations” on page 18-22

Performing sliding neighborhood and distinct block operations column-wise, when possible, can
reduce the execution time required to process an image.

For example, suppose the operation you are performing involves computing the mean of each block.
This computation is much faster if you first rearrange the blocks into columns, because you can
compute the mean of every column with a single call to the mean function, rather than calling mean
for each block individually.

To use column processing, use the colfilt function. This function

1 Reshapes each sliding or distinct block of an image matrix into a column in a temporary matrix
2 Passes the temporary matrix to a function you specify
3 Rearranges the resulting matrix back into the original shape

Using Column Processing with Sliding Neighborhood Operations


For a sliding neighborhood operation, colfilt creates a temporary matrix that has a separate
column for each pixel in the original image. The column corresponding to a given pixel contains the
values of that pixel's neighborhood from the original image.

The following figure illustrates this process. In this figure, a 6-by-5 image matrix is processed in 2-
by-3 neighborhoods. colfilt creates one column for each pixel in the image, so there are a total of
30 columns in the temporary matrix. Each pixel's column contains the value of the pixels in its
neighborhood, so there are six rows. colfilt zero-pads the input image as necessary. For example,
the neighborhood of the upper left pixel in the figure has two zero-valued neighbors, due to zero
padding.

colfilt Creates a Temporary Matrix for Sliding Neighborhood

18-21
18 Neighborhood and Block Operations

The temporary matrix is passed to a function, which must return a single value for each column.
(Many MATLAB functions work this way, for example, mean, median, std, sum, etc.) The resulting
values are then assigned to the appropriate pixels in the output image.

colfilt can produce the same results as nlfilter with faster execution time; however, it might
use more memory. The example below sets each output pixel to the maximum value in the input
pixel's neighborhood, producing the same result as the nlfilter example shown in “Implementing
Linear and Nonlinear Filtering as Sliding Neighborhood Operations” on page 18-4.

I2 = colfilt(I,[3 3],"sliding",@max);

Using Column Processing with Distinct Block Operations


For a distinct block operation, colfilt creates a temporary matrix by rearranging each block in the
image into a column. colfilt pads the original image with 0's, if necessary, before creating the
temporary matrix.

The following figure illustrates this process. A 6-by-16 image matrix is processed in 4-by-6 blocks.
colfilt first zero-pads the image to make the size 8-by-18 (six 4-by-6 blocks), and then rearranges
the blocks into six columns of 24 elements each.

colfilt Creates a Temporary Matrix for Distinct Block Operation

After rearranging the image into a temporary matrix, colfilt passes this matrix to the function. The
function must return a matrix of the same size as the temporary matrix. If the block size is m-by-n,
and the image is mm-by-nn, the size of the temporary matrix is (m*n)-by-(ceil(mm/m)*ceil(nn/
n)). After the function processes the temporary matrix, the output is rearranged into the shape of
the original image matrix.

18-22
Use Column-wise Processing to Speed Up Sliding Neighborhood or Distinct Block Operations

This example sets all the pixels in each 8-by-8 block of an image to the mean pixel value for the block.

I = im2double(imread("tire.tif"));
f = @(x) ones(64,1)*mean(x);
I2 = colfilt(I,[8 8],"distinct",f);

The anonymous function in the example computes the mean of the block and then multiplies the
result by a vector of ones, so that the output block is the same size as the input block. As a result, the
output image is the same size as the input image.

Restrictions

You can use colfilt to implement many of the same distinct block operations that blockproc
performs. However, colfilt has certain restrictions that blockproc does not:

• The output image must be the same size as the input image.
• The blocks cannot overlap.

For situations that do not satisfy these constraints, use blockproc.

18-23
18 Neighborhood and Block Operations

Block Processing Large Images

This example shows how to perform edge detection on a TIFF image by dividing the image into
blocks. When working with large images, normal image processing techniques can sometimes break
down. The images can either be too large to load into memory, or else they can be loaded into
memory but then be too large to process.

To avoid these problems, you can process large images incrementally: reading, processing, and finally
writing the results back to disk, one region at a time. The blockproc function helps you with this
process. Using blockproc, specify an image, a block size, and a function handle. blockproc then
divides the input image into blocks of the specified size, processes them using the function handle
one block at a time, and then assembles the results into an output image. blockproc returns the
output to memory or to a new file on disk.

First, consider the results of performing edge detection without block processing. This example uses
a small image, cameraman.tif, to illustrate the concepts, but block processing is often more useful
for large images.

file_name = "cameraman.tif";
I = imread(file_name);
normal_edges = edge(I,"canny");

imshow(I)
title("Original Image")

imshow(normal_edges)
title("Conventional Edge Detection")

18-24
Block Processing Large Images

Now try the same task using block processing. The blockproc function has built-in support for TIFF
images, so you do not have to read the file completely into memory using imread. Instead, call the
function using the string filename as input. blockproc reads in one block at a time, making this
workflow ideal for very large images.

When working with large images you will often use the "Destination" name-value argument to
specify a file into which blockproc will write the output image. However, in this example you will
return the results to a variable, in memory.

This example uses a block size of [50 50]. In general, choosing larger block sizes yields better
performance for blockproc. This is particularly true for file-to-file workflows where accessing the
disk will incur a significant performance cost. Appropriate block sizes vary based on the machine
resources available, but should likely be in the range of thousands of pixels per dimension.

% You can use an anonymous function to define the function handle. The
% function is passed a structure as input, a "block struct", with several
% fields containing the block data as well as other relevant information.
% The function should return the processed block data.
edgeFun = @(block_struct) edge(block_struct.data,"canny");

block_size = [50 50];


block_edges = blockproc(file_name,block_size,edgeFun);

imshow(block_edges)
title("Block Processing - Simplest Syntax")

18-25
18 Neighborhood and Block Operations

Notice the significant artifacts from the block processing. Determining whether a pixel is an edge
pixel or not requires information from the neighboring pixels. This means that each block cannot be
processed completely separately from its surrounding pixels. To remedy this, use the blockproc
name-value argument "BorderSize" to specify vertical and horizontal borders around each block.
The necessary "BorderSize" varies depending on the task being performed.

border_size = [10 10];


block_edges = blockproc(file_name,block_size,edgeFun,"BorderSize",border_size);

imshow(block_edges)
title("Block Processing - Block Borders")

18-26
Block Processing Large Images

The blocks are now being processed with an additional 10 pixels of image data on each side. This
looks better, but the result is still significantly different from the original in-memory result. The
reason for this is that the Canny edge detector uses a threshold that is computed based on the
complete image histogram. Since the blockproc function calls the edge function for each block, the
Canny algorithm is working with incomplete histograms and therefore using varying thresholds
across the image.

When block processing images, it is important to understand these types of algorithm constraints.
Some functions will not directly translate to block processing for all syntaxes. In this case, the edge
function allows you to pass in a fixed threshold as an input argument instead of computing it. Modify
your function handle to use the three-argument syntax of edge, and thus remove one of the "global"
constraints of the function. Some trial and error finds that a threshold of 0.09 gives good results.

thresh = 0.09;
edgeFun = @(block_struct) edge(block_struct.data,"canny",thresh);
block_edges = blockproc(file_name,block_size,edgeFun,"BorderSize",border_size);

imshow(block_edges)
title("Block Processing - Borders & Fixed Threshold")

18-27
18 Neighborhood and Block Operations

The result now closely matches the original in-memory result. You can see some additional artifacts
along the boundaries. These are due to the different methods of padding used by the Canny edge
detector. Currently, blockproc only supports zero-padding along the image boundaries.

See Also
blockproc | edge

Related Examples
• “Compute Statistics for Large Images” on page 18-29

More About
• “Distinct Block Processing” on page 18-6
• “Block Size and Performance” on page 18-9
• “Create Function Handle”

18-28
Compute Statistics for Large Images

Compute Statistics for Large Images

This example shows how to use blockproc to compute statistics from large images and then use that
information to more accurately process the images blockwise. The blockproc function is well suited
for applying an operation to an image blockwise, assembling the results, and returning them as a new
image. Many image processing algorithms, however, require "global" information about the image,
which is not available when you are only considering one block of image data at a time. These
constraints can prove to be problematic when working with images that are too large to load
completely into memory.

This example performs a task similar to that found in the “Enhance Multispectral Color Composite
Images” on page 8-90 example, but adapted for large images using blockproc.This example
enhances the visible bands of the Erdas LAN file rio.lan. These types of block processing
techniques are typically more useful for large images, but a small image is used to illustrate concepts
in this example.

Step 1: Create a Truecolor Composite

Using blockproc, read the data from rio.lan, a file containing Landsat thematic mapper imagery
in the Erdas LAN file format. blockproc has built-in support for reading TIFF and JPEG2000 files
only. To read other types of files you must write an Image Adapter class to support I/O for your
particular file format. This example uses a pre-built Image Adapter class, the LanAdapter, which
supports reading LAN files. For more information on writing Image Adapter classes, see “Perform
Block Processing on Image Files in Unsupported Formats” on page 18-15.

The Erdas LAN format contains the visible red, green, and blue spectrum in bands 3, 2, and 1,
respectively. Use blockproc to extract the visible bands into an RGB image.

Create the LanAdapter object associated with rio.lan.

input_adapter = LanAdapter("rio.lan");

Select the visible R, G, and B bands.

input_adapter.SelectedBands = [3 2 1];

Create a simple block function that returns the block data unchanged.

identityFcn = @(block_struct) block_struct.data;

Create the initial truecolor image.

truecolor = blockproc(input_adapter,[100 100],identityFcn);

Display the results before enhancement.

imshow(truecolor)
title("Truecolor Composite (Not Enhanced)")

18-29
18 Neighborhood and Block Operations

The resulting truecolor image is similar to that of paris.lan in the “Enhance Multispectral Color
Composite Images” on page 8-90 example. The RGB image appears dull, with little contrast.

Step 2: Enhance the Image - First Attempt

First, try to stretch the data across the dynamic range using blockproc. This first attempt simply
defines a new function handle that calls stretchlim and imadjust on each block of data
individually.

adjustFcn = @(block_struct) imadjust(block_struct.data, ...


stretchlim(block_struct.data));
truecolor_enhanced = blockproc(input_adapter,[100 100],adjustFcn);

imshow(truecolor_enhanced)
title("Truecolor Composite with Blockwise Contrast Stretch")

18-30
Compute Statistics for Large Images

You can see immediately that the results are incorrect. The problem is that the stretchlim function
computes the histogram on the input image and uses this information to compute the stretch limits.
Since each block is adjusted in isolation from its neighbors, each block is computing different limits
from its local histogram.

Step 3: Examine the Histogram Accumulator Class

To examine the distribution of data across the dynamic range of the image, you can compute the
histogram for each of the three visible bands.

When working with sufficiently large images, you cannot simply call imhist to create an image
histogram. One way to incrementally build the histogram is to use blockproc with a class that will
sum the histograms of each block as you move over the image.

Examine the HistogramAccumulator class.

18-31
18 Neighborhood and Block Operations

type HistogramAccumulator

% HistogramAccumulator Accumulate incremental histogram.


% HistogramAccumulator is a class that incrementally builds up a
% histogram for an image. This class is appropriate for use with 8-bit
% or 16-bit integer images and is for educational purposes ONLY.

% Copyright 2009 The MathWorks, Inc.

classdef HistogramAccumulator < handle

properties
Histogram
Range
end

methods

function obj = HistogramAccumulator()


obj.Range = [];
obj.Histogram = [];
end

function addToHistogram(obj,new_data)
if isempty(obj.Histogram)
obj.Range = double(0:intmax(class(new_data)));
obj.Histogram = hist(double(new_data(:)),obj.Range);
else
new_hist = hist(double(new_data(:)),obj.Range);
obj.Histogram = obj.Histogram + new_hist;
end
end
end
end

The class is a simple wrapper around the hist function, allowing you to add data to a histogram
incrementally. It is not specific to blockproc. Observe the following simple use of the
HistogramAccumulator class.

Create the HistogramAccumulator object.

hist_obj = HistogramAccumulator;

Split a sample image into 2 halves.

full_image = imread("liftingbody.png");
top_half = full_image(1:256,:);
bottom_half = full_image(257:end,:);

Compute the histogram incrementally.

addToHistogram(hist_obj,top_half);
addToHistogram(hist_obj,bottom_half);
computed_histogram = hist_obj.Histogram;

Compare against the results of the imhist function.

normal_histogram = imhist(full_image);

18-32
Compute Statistics for Large Images

Examine the results. The histograms are numerically identical.

figure
subplot(1,2,1)
stem(computed_histogram,"Marker","none")
title("Incrementally Computed Histogram")
subplot(1,2,2)
stem(normal_histogram',"Marker","none")
title("imhist Histogram")

Step 4: Use the HistogramAccumulator Class with blockproc

Use the HistogramAccumulator class with blockproc to build the histogram of the red band of
data in rio.lan. You can define a function handle for blockproc that will invoke the
addToHistogram method on each block of data. By viewing this histogram, you can see that the data
is concentrated within a small part of the available dynamic range. The other visible bands have
similar distributions. This is one reason why the original truecolor composite appears dull.

Create the HistogramAccumulator object.

hist_obj = HistogramAccumulator;

Set up blockproc function handle

addToHistFcn = @(block_struct) addToHistogram(hist_obj, block_struct.data);

18-33
18 Neighborhood and Block Operations

Compute the histogram of the red channel. Notice that the addToHistFcn function handle does not
generate any output. Since the function handle passed to blockproc does not return anything,
blockproc will not return anything either.

input_adapter.SelectedBands = 3;
blockproc(input_adapter,[100 100],addToHistFcn);
red_hist = hist_obj.Histogram;

Display the results.

figure
stem(red_hist,"Marker","none")
title("Histogram of Red Band (Band 3)")

Step 5: Enhance the Truecolor Composite with a Contrast Stretch

You can now perform a proper contrast stretch on the image. For conventional, in-memory workflows,
you can simply use the stretchlim function to compute the arguments to imadjust (like the
example “Enhance Multispectral Color Composite Images” on page 8-90). When working with large
images, as we have seen, stretchlim is not easily adapted for use with blockproc since it relies on
the full image histogram.

Once you have computed the image histograms for each of the visible bands, compute the proper
arguments to imadjust by hand (similar to how stretchlim does).

First compute the histograms for the green and blue bands.

18-34
Compute Statistics for Large Images

hist_obj = HistogramAccumulator;
addToHistFcn = @(block_struct) addToHistogram(hist_obj,block_struct.data);
input_adapter.SelectedBands = 2;
blockproc(input_adapter,[100 100],addToHistFcn);
green_hist = hist_obj.Histogram;

hist_obj = HistogramAccumulator;
addToHistFcn = @(block_struct) addToHistogram(hist_obj,block_struct.data);
input_adapter.SelectedBands = 1;
blockproc(input_adapter,[100 100],addToHistFcn);
blue_hist = hist_obj.Histogram;

Compute the CDF of each histogram.

computeCDF = @(histogram) cumsum(histogram) / sum(histogram);


findLowerLimit = @(cdf) find(cdf > 0.01, 1, "first");
findUpperLimit = @(cdf) find(cdf >= 0.99, 1, "first");

red_cdf = computeCDF(red_hist);
red_limits(1) = findLowerLimit(red_cdf);
red_limits(2) = findUpperLimit(red_cdf);

green_cdf = computeCDF(green_hist);
green_limits(1) = findLowerLimit(green_cdf);
green_limits(2) = findUpperLimit(green_cdf);

blue_cdf = computeCDF(blue_hist);
blue_limits(1) = findLowerLimit(blue_cdf);
blue_limits(2) = findUpperLimit(blue_cdf);

Prepare the arguments for the imadjust function.

rgb_limits = [red_limits' green_limits' blue_limits'];

Scale to the range [0, 1].

rgb_limits = (rgb_limits - 1) / (255);

Create a new adjustFcn that applies the global stretch limits and use blockproc to adjust the
truecolor image.

adjustFcn = @(block_struct) imadjust(block_struct.data,rgb_limits);

Select the full RGB data.

input_adapter.SelectedBands = [3 2 1];
truecolor_enhanced = blockproc(input_adapter,[100 100],adjustFcn);

Display the result. The resulting image is much improved, with the data covering more of the dynamic
range. By using blockproc you avoid loading the whole image into memory.

imshow(truecolor_enhanced)
title("Truecolor Composite with Corrected Contrast Stretch")

18-35
18 Neighborhood and Block Operations

See Also
Classes
ImageAdapter

Functions
blockproc | stretchlim | imadjust | imhist

Related Examples
• “Enhance Multispectral Color Composite Images” on page 8-90
• “Block Processing Large Images” on page 18-24

18-36
Compute Statistics for Large Images

More About
• “Distinct Block Processing” on page 18-6
• “Perform Block Processing on Image Files in Unsupported Formats” on page 18-15
• “Create Function Handle”

18-37
19

Deep Learning

This topic describes functions that enable image denoising using convolutional neural networks, and
provides examples of other image processing applications using deep learning techniques.

• “Get Started with Image Preprocessing and Augmentation for Deep Learning” on page 19-2
• “Preprocess Images for Deep Learning” on page 19-6
• “Preprocess Volumes for Deep Learning” on page 19-10
• “Augment Images for Deep Learning Workflows” on page 19-17
• “Get Started with GANs for Image-to-Image Translation” on page 19-39
• “Create Modular Neural Networks” on page 19-44
• “Train and Apply Denoising Neural Networks” on page 19-46
• “Remove Noise from Color Image Using Pretrained Neural Network” on page 19-49
• “Increase Image Resolution Using Deep Learning” on page 19-55
• “JPEG Image Deblocking Using Deep Learning” on page 19-71
• “Image Processing Operator Approximation Using Deep Learning” on page 19-84
• “Develop Camera Processing Pipeline Using Deep Learning” on page 19-98
• “Brighten Extremely Dark Images Using Deep Learning” on page 19-120
• “Semantic Segmentation of Multispectral Images Using Deep Learning” on page 19-131
• “3-D Brain Tumor Segmentation Using Deep Learning” on page 19-149
• “Neural Style Transfer Using Deep Learning” on page 19-159
• “Unsupervised Day-to-Dusk Image Translation Using UNIT” on page 19-168
• “Quantify Image Quality Using Neural Image Assessment” on page 19-179
• “Unsupervised Medical Image Denoising Using CycleGAN” on page 19-192
• “Unsupervised Medical Image Denoising Using UNIT” on page 19-206
• “Preprocess Multiresolution Images for Training Classification Network” on page 19-219
• “Classify Tumors in Multiresolution Blocked Images” on page 19-235
• “Detect Image Anomalies Using Explainable FCDD Network” on page 19-247
• “Classify Defects on Wafer Maps Using Deep Learning” on page 19-260
• “Detect Image Anomalies Using Pretrained ResNet-18 Feature Embeddings” on page 19-276
19 Deep Learning

Get Started with Image Preprocessing and Augmentation for


Deep Learning
Data preprocessing consists of a series of deterministic operations that normalize or enhance desired
data features. For example, you can normalize data to a fixed range or rescale data to the size
required by the network input layer. Preprocessing is used for training, validation, and inference.

Preprocessing can occur at two stages in the deep learning workflow.

• Commonly, preprocessing occurs as a separate step that you complete before preparing the data
to be fed to the network. You load your original data, apply the preprocessing operations, then
save the result to disk. The advantage of this approach is that the preprocessing overhead is only
required once, then the preprocessed images are readily available as a starting place for all future
trials of training a network.
• If you load your data into a datastore, then you can also apply preprocessing during training by
using the transform and combine functions. For more information, see “Datastores for Deep
Learning” (Deep Learning Toolbox). The transformed images are not stored in memory. This
approach is convenient to avoid writing a second copy of training data to disk if your
preprocessing operations are not computationally expensive and do not noticeably impact the
speed of training the network.

Data augmentation consists of randomized operations that are applied to the training data while the
network is training. Augmentation increases the effective amount of training data and helps to make
the network invariant to common distortion in the data. For example, you can add artificial noise to
training data so that the network is invariant to noise.

To augment training data, start by loading your data into a datastore. Some built-in datastores apply
a specific and limited set of augmentation to data for specific applications. You can also apply your
own set of augmentation operations on data in the datastore by using the transform and combine
functions. During training, the datastore randomly perturbs the training data for each epoch, so that
each epoch uses a slightly different data set. For more information, see “Preprocess Images for Deep
Learning” on page 19-6 and “Preprocess Volumes for Deep Learning” on page 19-10.

Preprocess and Augment Images


Common image preprocessing operations include noise removal, edge-preserving smoothing, color
space conversion, contrast enhancement, and morphology.

Augment image data to simulate variations in the image acquisition. For example, the most common
type of image augmentation operations are geometric transformations such as rotation and
translation, which simulate variations in the camera orientation with respect to the scene. Color jitter
simulates variations of lighting conditions and color in the scene. Artificial noise simulates distortions
caused by the electrical fluctuations in the sensor and analog-to-digital conversion errors. Blur
simulates an out-of-focus lens or movement of the camera with respect to the scene.

You can process and augment image data using the operations in this table, as well as any other
functionality in the toolbox. For an example that shows how to create and apply these
transformations, see “Augment Images for Deep Learning Workflows” on page 19-17.

19-2
Get Started with Image Preprocessing and Augmentation for Deep Learning

Processing Description Sample Sample Output


Type Functions
Resize Resize images by • imresize,
images a fixed scaling imresize3
factor or to a
target size

Crop Crop an image to • centerCropW


images a target size from indow2d,
the center or a centerCropW
random position indow3d
• randomWindo
w2d,
randomCropW
indow3d
Warp Apply random • randomAffin
images reflection, e2d,
rotation, scale, randomAffin
shear, and e3d
translation to
images

Jitter color Randomly adjust • jitterColor


image hue, HSV
saturation,
brightness, or
contrast

19-3
19 Deep Learning

Processing Description Sample Sample Output


Type Functions
Simulate Add random • imnoise
noise Gaussian,
Poisson, salt and
pepper, or
multiplicative
noise

Simulate Add Gaussian or • imgaussfilt,


blur directional motion imgaussfilt
blur 3
• imfilter

Preprocess and Augment Pixel Label Images for Semantic


Segmentation
Semantic segmentation data consists of images and corresponding pixel labels represented as
categorical arrays. For more information, see “Getting Started with Semantic Segmentation Using
Deep Learning” (Computer Vision Toolbox).

If you have Computer Vision Toolbox, then you can use the Image Labeler and the Video Labeler apps
to interactively label pixels and export the label data for training a neural network.

When you transform an image for semantic segmentation, you must perform an identical
transformation to the corresponding pixel labeled image. You can preprocess pixel label images using
the functions in the table and any other function that supports categorical input. For an example that
shows how to create and apply these transformations, see “Augment Pixel Labels for Semantic
Segmentation” (Computer Vision Toolbox).

19-4
Get Started with Image Preprocessing and Augmentation for Deep Learning

Processing Description Sample Sample Output


Type Functions
Resize pixel Resize pixel label • imresize
labels images by a fixed
scaling factor or
to a target size

Crop pixel Crop a pixel label • imcrop


labels image to a target • centerCropW
size from the indow2d,
center or a centerCropW
random position indow3d
• randomWindo
w2d,
randomCropW
indow3d
Warp pixel Apply random • randomAffin
labels reflection, e2d,
rotation, scale, randomAffin
shear, and e3d
translation to
pixel label images

See Also

Related Examples
• “Augment Images for Deep Learning Workflows” on page 19-17

More About
• “Preprocess Images for Deep Learning” on page 19-6
• “Preprocess Volumes for Deep Learning” on page 19-10
• “Preprocess Multiresolution Images for Training Classification Network” on page 19-219
• “Datastores for Deep Learning” (Deep Learning Toolbox)
• “Select Datastore for File Format or Application”

19-5
19 Deep Learning

Preprocess Images for Deep Learning


To train a network and make predictions on new data, your images must match the input size of the
network. If you need to adjust the size of your images to match the network, then you can rescale or
crop your data to the required size.

You can effectively increase the amount of training data by applying randomized augmentation to
your data. Augmentation also enables you to train networks to be invariant to distortions in image
data. For example, you can add randomized rotations to input images so that a network is invariant to
the presence of rotation in input images. An augmentedImageDatastore provides a convenient way
to apply a limited set of augmentations to 2-D images for classification problems.

For more advanced preprocessing operations, to preprocess images for regression problems, or to
preprocess 3-D volumetric images, you can start with a built-in datastore. You can also preprocess
images according to your own pipeline by using the transform and combine functions.

Resize Images Using Rescaling and Cropping


You can store image data as a numeric array, an ImageDatastore object, or a table. An
ImageDatastore enables you to import data in batches from image collections that are too large to
fit in memory. You can use an augmented image datastore or a resized 4-D array for training,
prediction, and classification. You can use a resized 3-D array for prediction and classification only.

There are two ways to resize image data to match the input size of a network.

• Rescaling multiplies the height and width of the image by a scaling factor. If the scaling factor is
not identical in the vertical and horizontal directions, then rescaling changes the spatial extents of
the pixels and the aspect ratio.
• Cropping extracts a subregion of the image and preserves the spatial extent of each pixel. You can
crop images from the center or from random positions in the image.

Resizing Data Format Resizing Sample Code


Option Function
Rescaling • 3-D array representing a imresize im = imresize(I,outputSize);
single color or multispectral
image outputSize specifies the
dimensions of the rescaled
• 3-D array representing a image.
stack of grayscale images
• 4-D array representing a
stack of images
• 4-D array representing a augmentedIma auimds = augmentedImageDatastore(output
stack of images geDatastore
outputSize specifies the
• ImageDatastore dimensions of the rescaled
• table image.
Cropping • 3-D array representing a imcrop im = imcrop(I,rect);
single color or multispectral
image rect specifies the size and
position of the 2-D cropping
window.

19-6
Preprocess Images for Deep Learning

Resizing Data Format Resizing Sample Code


Option Function
• 3-D array representing a imcrop3 im = imcrop3(I,cuboid);
stack of grayscale images
cuboid specifies the size and
• 4-D array representing a position of the 3-D cropping
stack of color or window.
multispectral images
• 4-D array representing a augmentedIma auimds = augmentedImageDatastore(output
stack of images geDatastore
Specify m as 'centercrop' to
• ImageDatastore crop from the center of the
• table input image.

Specify m as 'randcrop' to
crop from a random location in
the input image.

Augment Images for Training with Random Geometric Transformations


For image classification problems, you can use an augmentedImageDatastore to augment images
with a random combination of resizing, rotation, reflection, shear, and translation transformations.

The diagram shows how trainNetwork uses an augmented image datastore to transform training
data for each epoch. When you use data augmentation, one randomly augmented version of each
image is used during each epoch of training. For an example of the workflow, see “Train Network
with Augmented Images” (Deep Learning Toolbox).

1 Specify training images.


2 Configure image transformation options, such as the range of rotation angles and whether to
apply reflection at random, by creating an imageDataAugmenter.

Tip To preview the transformations applied to sample images, use the augment function.

19-7
19 Deep Learning

3 Create an augmentedImageDatastore. Specify the training images, the size of output images,
and the imageDataAugmenter. The size of output images must be compatible with the size of
the imageInputLayer of the network.
4 Train the network, specifying the augmented image datastore as the data source for
trainNetwork. For each iteration of training, the augmented image datastore applies a random
combination of transformations to images in the mini-batch of training data.

When you use an augmented image datastore as a source of training images, the datastore
randomly perturbs the training data for each epoch, so that each epoch uses a slightly different
data set. The actual number of training images at each epoch does not change. The transformed
images are not stored in memory.

Perform Additional Image Processing Operations Using Built-In


Datastores
Some datastores perform specific and limited image preprocessing operations when they read a
batch of data. These application-specific datastores are listed in the table. You can use these
datastores as a source of training, validation, and test data sets for deep learning applications that
use Deep Learning Toolbox™. All of these datastores return data in a format supported by
trainNetwork.

Datastore Description
augmentedImageData Apply random affine geometric transformations, including resizing,
store rotation, reflection, shear, and translation, for training deep neural
networks. For an example, see “Transfer Learning Using Pretrained
Network” (Deep Learning Toolbox).
randomPatchExtract Extract multiple pairs of random patches from images or pixel label images
ionDatastore (requires Image Processing Toolbox). You optionally can apply identical
random affine geometric transformations to the pairs of patches. For an
example, see “Increase Image Resolution Using Deep Learning” (Deep
Learning Toolbox).
denoisingImageData Apply randomly generated Gaussian noise for training denoising networks
store (requires Image Processing Toolbox).

Apply Custom Image Processing Pipelines Using Combine and


Transform
To perform more general and complex image preprocessing operations than offered by the
application-specific datastores, you can use the transform and combine functions. For more
information, see “Datastores for Deep Learning” (Deep Learning Toolbox).

Transform Datastores with Image Data

The transform function creates an altered form of a datastore, called an underlying datastore, by
transforming the data read by the underlying datastore according to a transformation function that
you define.

The custom transformation function must accept data in the format returned by the read function of
the underlying datastore. For image data in an ImageDatastore, the format depends on the
ReadSize property.

19-8
Preprocess Images for Deep Learning

• When ReadSize is 1, the transformation function must accept an integer array. The size of the
array is consistent with the type of images in the ImageDatastore. For example, a grayscale
image has dimensions m-by-n, a truecolor image has dimensions m-by-n-by-3, and a multispectral
image with c channels has dimensions m-by-n-by-c.
• When ReadSize is greater than 1, the transformation function must accept a cell array of image
data. Each element corresponds to an image in the batch.

The transform function must return data that matches the input size of the network. The
transform function does not support one-to-many observation mappings.

Tip The transform function supports prefetching when the underlying ImageDatastore reads a
batch of JPG or PNG image files. For these image types, do not use the readFcn argument of
ImageDatastore to apply image preprocessing, as this option is usually significantly slower. If you
use a custom read function, then ImageDatastore does not prefetch.

Combine Datastores with Image Data

The combine function concatenates the data read from multiple datastores and maintains parity
between the datastores.

• Concatenate data into a two-column table or two-column cell array for training networks with a
single input, such as image-to-image regression networks.
• Concatenate data to a (numInputs+1)-column cell array for training networks with multiple
inputs.

See Also
trainNetwork | imresize | transform | combine | ImageDatastore

Related Examples
• “Train Network with Augmented Images” (Deep Learning Toolbox)
• “Train Deep Learning Network to Classify New Images” (Deep Learning Toolbox)
• “Create and Explore Datastore for Image Classification” (Deep Learning Toolbox)
• “Prepare Datastore for Image-to-Image Regression” (Deep Learning Toolbox)

More About
• “Datastores for Deep Learning” (Deep Learning Toolbox)
• “Preprocess Volumes for Deep Learning” (Deep Learning Toolbox)
• “Deep Learning in MATLAB” (Deep Learning Toolbox)

19-9
19 Deep Learning

Preprocess Volumes for Deep Learning

Read Volumetric Data


Supported file formats for volumetric image data include MAT-files, Digital Imaging and
Communications in Medicine (DICOM) files, and Neuroimaging Informatics Technology Initiative
(NIfTI) files.

Read volumetric image data into an ImageDatastore. Read volumetric pixel label data into a
PixelLabelDatastore. For more information, see “Datastores for Deep Learning” (Deep Learning
Toolbox).

The table shows typical usages of imageDatastore and pixelLabelDatastore for each of the
supported file formats. When you create the datastore, specify the FileExtensions name-value
argument as the file extensions of your data. Specify the ReadFcn property as a function handle that
reads data of the file format. The filepath argument specifies the path to the files or folder
containing image data. For pixel label images, the additional classNames and pixelLabelID
arguments specify the mapping of voxel label values to class names.

Image File Create Image Datastore or Pixel Label Datastore


Format
MAT volds = imageDatastore(filepath, ...
"FileExtensions",".mat","ReadFcn",@(x) fcn(x));

pxds = pixelLabelDatastore(filepath,classNames,pixelLabelID, ...


"FileExtensions",".mat","ReadFcn",@(x) fcn(x));

fcn is a custom function that reads data from a MAT file. For example, this code
defines a function called matRead that loads volume data from the first variable
of a MAT file. Save the function in a file called matRead.m.

function data = matRead(filename)


inp = load(filename);
f = fields(inp);
data = inp.(f{1});
end
DICOM volume in volds = imageDatastore(filepath, ...
"FileExtensions",".dcm","ReadFcn",@(x) dicomread(x));
single file
pxds = pixelLabelDatastore(filepath,classNames,pixelLabelID, ...
"FileExtensions",".dcm","ReadFcn",@(x) dicomread(x));

For more information about reading DICOM files, see dicomread.


DICOM volume in Follow these steps. For an example, see “Create Image Datastore Containing
multiple files Single and Multi-File DICOM Volumes” on page 3-26.

• Aggregate the files into a single study by using the dicomCollection


function.
• Read the DICOM data in the study by using the dicomreadVolume function.
• Write each volume as a MAT file.
• Create the ImageDatastore or PixelLabelDatastore from the collection
of MAT files by following the procedure for MAT files.

19-10
Preprocess Volumes for Deep Learning

Image File Create Image Datastore or Pixel Label Datastore


Format
NIfTI volds = imageDatastore(filepath, ...
"FileExtensions",".nii","ReadFcn",@(x) niftiread(x));

pxds = pixelLabelDatastore(filepath,classNames,pixelLabelID, ...


"FileExtensions",".nii","ReadFcn",@(x) niftiread(x));

For more information about reading NIfTI files, see niftiread.

Pair Image and Label Data


To associate volumetric image and label data for semantic segmentation, or two volumetric image
datastores for regression, use a randomPatchExtractionDatastore. A random patch extraction
datastore extracts corresponding randomly-positioned patches from two datastores. Patching is a
common technique to prevent running out of memory when training with arbitrarily large volumes.
Specify a patch size that matches the input size of the network and, for memory efficiency, is smaller
than the full size of the volume, such as 64-by-64-by-64 voxels.

You can also use the combine function to associate two datastores. However, associating two
datastores using a randomPatchExtractionDatastore has some benefits over combine.

• randomPatchExtractionDatastore supports parallel training, multi-GPU training, and


prefetch reading. Specify parallel or multi-GPU training using the ExecutionEnvironment
name-value argument of trainingOptions. Specify prefetch reading using the
DispatchInBackground name-value argument of trainingOptions. Prefetch reading requires
Parallel Computing Toolbox.
• randomPatchExtractionDatastore inherently supports patch extraction. In contrast, to
extract patches from a CombinedDatastore, you must define your own function that crops
images into patches, and then use the transform function to apply the cropping operations.
• randomPatchExtractionDatastore can generate several image patches from one test image.
One-to-many patch extraction effectively increases the amount of available training data.

Preprocess Volumetric Data


Deep learning frequently requires the data to be preprocessed and augmented. For example, you may
want to normalize image intensities, enhance image contrast, or add randomized affine
transformations to prevent overfitting.

To preprocess volumetric data, use the transform function. transform creates an altered form of a
datastore, called an underlying datastore, by transforming the data read by the underlying datastore
according to the set of operations you define in a custom function. Image Processing Toolbox provides
several functions that accept volumetric input. For a full list of functions, see 3-D Volumetric Image
Processing. You can also preprocess volumetric images using functions in MATLAB that work on
multidimensional arrays.

The custom transformation function must accept data in the format returned by the read function of
the underlying datastore.

19-11
19 Deep Learning

Underlying Format of Input to Custom Transformation Function


Datastore
ImageDatastore The input to the custom transformation function depends on the ReadSize
property.

• When ReadSize is 1, the transformation function must accept an integer


array. The size of the array is consistent with the type of images in the
ImageDatastore. For example, a grayscale image has size m-by-n, a
truecolor image has size m-by-n-by-3, and a multispectral image with c
channels has size m-by-n-by-c.
• When ReadSize is greater than 1, the transformation function must accept
a cell array of image data corresponding to each image in the batch.

For more information, see the read function of ImageDatastore.


PixelLabelData The input to the custom transformation function depends on the ReadSize
store property.

• When ReadSize is 1, the transformation function must accept a categorical


matrix.
• When ReadSize is greater than 1, the transformation function must accept
a cell array of categorical matrices.

For more information, see the read function of PixelLabelDatastore.


RandomPatchExt The input to the custom transformation function must be a table with two
ractionDatasto columns.
re
For more information, see the read function of
RandomPatchExtractionDatastore.

The transform function must return data that matches the input size of the network. The
transform function does not support one-to-many observation mappings.

To apply random affine transformations to volumetric data in RandomPatchExtractionDatastore,


you must use the transform function. The DataAugmentation property of this datastore does not
support volumetric data.

Examples
Transform Batch of Volumetric Data in Image Datastore

This example shows how to transform volumetric data in an image datastore using a sample image
preprocessing pipeline.

Specify a set of volumetric images saved at MAT files.


filepath = fullfile(matlabroot,"toolbox","images","imdata","mristack.mat");
files = [filepath; filepath; filepath];

Create an image datastore that stores multiple volumetric images. Specify that the ReadSize of the
datastore is greater than 1. Specify a custom read function, matRead. This function is defined in the
Supporting Functions section of this example.

19-12
Preprocess Volumes for Deep Learning

volDS = imageDatastore(files,FileExtensions=".mat", ...


ReadSize=3,ReadFcn=@(x) matRead(x));

Specify the input size of the network.

inputSize = [128 128];

Preprocess the volumetric images in volDS using the custom preprocessing pipeline defined in the
preprocessVolumetricIMDS supporting function.

dsTrain = transform(volDS,@(x) preprocessVolumetricIMDS(x,inputSize));

Read a batch of data.

minibatch = read(dsTrain)

minibatch=3×1 cell array


{128x128x21 uint8}
{128x128x21 uint8}
{128x128x21 uint8}

Supporting Functions

The matRead function loads volume data from the first variable of a MAT file.

function data = matRead(filename)


inp = load(filename);
f = fields(inp);
data = inp.(f{1});
end

The preprocessVolumetricIMDS function performs the desired transformations of data read from
an underlying image datastore. Because the read size of the image datastore is greater than 1, the
function must accept a cell array of image data. The function loops through each read image and
transforms the data according to this preprocessing pipeline:

• Randomly rotate the image about the z-axis.


• Resize the volume to the size expected by the network.
• Create a noisy version of the image with Gaussian noise.
• Return the image in a cell array.

function batchOut = preprocessVolumetricIMDS(batchIn,inputSize)

numRows = size(batchIn,1);
batchOut = cell(numRows,1);

for idx = 1:numRows

% Perform randomized 90 degree rotation about the z-axis


imRotated = imrotate3(batchIn{idx,1},90*(randi(4)-1),[0 0 1]);

% Resize the volume to the size expected by the network


imResized = imresize(imRotated,inputSize);

% Add zero-mean Gaussian noise with a normalized variance of 0.01


imNoisy = imnoise(imResized,"gaussian",0.01);

19-13
19 Deep Learning

% Return the preprocessed data


batchOut(idx) = {imNoisy};

end
end

Transform Volumetric Data in Random Patch Extraction Datastore

This example shows how to transform pairs of volumetric data in a random patch extraction datastore
using a sample image preprocessing pipeline.

Specify two sets of volumetric images saved at MAT files. Each set contains five volumetric images.

dir = fullfile(matlabroot,"toolbox","images","imdata","BrainMRILabeled");
filesVol1 = fullfile(dir,"images");
filesVol2 = fullfile(dir,"labels");

Store each set of volumetric images in an image datastore. Specify a custom read function, matRead.
This function is defined in the Supporting Functions section of this example. Use the default
ReadSize of 1.

vol1DS = imageDatastore(filesVol1,FileExtensions=".mat",ReadFcn=@(x) matRead(x));


vol2DS = imageDatastore(filesVol2,FileExtensions=".mat",ReadFcn=@(x) matRead(x));

Specify the input size of the network.

inputSize = [128 128];

Create a random patch extraction datastore that extracts corresponding patches from the two
datastores. Select three patches per image.

patchVolDS = randomPatchExtractionDatastore(vol1DS,vol2DS,inputSize,PatchesPerImage=3);

Preprocess the volumetric images in patchVolDS using the custom preprocessing pipeline defined in
the preprocessVolumetricPatchDS supporting function.

dsTrain = transform(patchVolDS,@(x) preprocessVolumetricPatchDS(x));

Read a batch of data.

minibatch = read(dsTrain)

minibatch=15×2 table
InputImage ResponseImage
____________________ ___________________

{128x128x155 uint16} {128x128x155 uint8}


{128x128x155 uint16} {128x128x155 uint8}
{128x128x155 uint16} {128x128x155 uint8}
{128x128x155 uint16} {128x128x155 uint8}
{128x128x155 uint16} {128x128x155 uint8}
{128x128x155 uint16} {128x128x155 uint8}
{128x128x155 uint16} {128x128x155 uint8}
{128x128x155 uint16} {128x128x155 uint8}
{128x128x155 uint16} {128x128x155 uint8}
{128x128x155 uint16} {128x128x155 uint8}

19-14
Preprocess Volumes for Deep Learning

{128x128x155 uint16} {128x128x155 uint8}


{128x128x155 uint16} {128x128x155 uint8}
{128x128x155 uint16} {128x128x155 uint8}
{128x128x155 uint16} {128x128x155 uint8}
{128x128x155 uint16} {128x128x155 uint8}

Supporting Functions

The matRead function loads volume data from the first variable of a MAT file.

function data = matRead(filename)


inp = load(filename);
f = fields(inp);
data = inp.(f{1});
end

The preprocessVolumetricPatchDS function performs the desired transformations of data read


from the underlying random patch extraction datastore. The function must accept a table. The
function transforms the data according to this preprocessing pipeline:

• Randomly select one of five augmentations.


• Apply the same augmentation to the data in both columns of the table.
• Return the augmented image pair in a table.

function batchOut = preprocessVolumetricPatchDS(batchIn)

numRows = size(batchIn,1);
batchOut = batchIn;

% 5 augmentations: nil,rot90,fliplr,flipud,rot90(fliplr)
augType = {@(x) x,@rot90,@fliplr,@flipud,@(x) rot90(fliplr(x))};

for idx = 1:numRows

img = batchIn{idx,1}{1};
resp = batchIn{idx,2}{1};

rndIdx = randi(5,1);
imgAug = augType{rndIdx}(img);
respAug = augType{rndIdx}(resp);

batchOut(idx,:) = {imgAug,respAug};

end
end

See Also
trainNetwork | imageDatastore | pixelLabelDatastore |
randomPatchExtractionDatastore | transform

Related Examples
• “Create Image Datastore Containing Single and Multi-File DICOM Volumes” on page 3-26
• “3-D Brain Tumor Segmentation Using Deep Learning” (Deep Learning Toolbox)

19-15
19 Deep Learning

More About
• “Datastores for Deep Learning” (Deep Learning Toolbox)
• “Deep Learning in MATLAB” (Deep Learning Toolbox)
• “Create Functions in Files”

19-16
Augment Images for Deep Learning Workflows

Augment Images for Deep Learning Workflows

This example shows how you can perform common kinds of randomized image augmentation such as
geometric transformations, cropping, and adding noise.

Image Processing Toolbox functions enable you to implement common styles of image augmentation.
This example demonstrates five common types of transformations:

• Random Image Warping Transformations on page 19-18


• Cropping Transformations on page 19-24
• Color Transformations on page 19-25
• Synthetic Noise on page 19-30
• Synthetic Blur on page 19-31

The example then shows how to apply augmentation to image data in datastores on page 19-32
using a combination of multiple types of transformations.

You can use augmented training data to train a network. For an example of training a network using
augmented images, see “Prepare Datastore for Image-to-Image Regression” (Deep Learning Toolbox).

Read and display a sample image. To compare the effect of the different types of image augmentation,
each transformation uses the same input image.
imOriginal = imresize(imread("kobi.png"),0.25);
imshow(imOriginal)

19-17
19 Deep Learning

Random Image Warping Transformations

The randomAffine2d function creates a randomized 2-D affine transformation from a combination
of rotation, translation, scale (resizing), reflection, and shear. You can specify which transformations
to include and the range of transformation parameters. If you specify the range as a 2-element
numeric vector, then randomAffine2d selects the value of a parameter from a uniform probability
distribution over the specified interval. For more control of the range of parameter values, you can
specify the range using a function handle.

Control the spatial bounds and resolution of the warped image created by imwarp by using the
affineOutputView function.

Rotation

Create a randomized rotation transformation that rotates the input image by an angle selected
randomly from the range [-45, 45] degrees.
tform = randomAffine2d(Rotation=[-45 45]);
outputView = affineOutputView(size(imOriginal),tform);
imAugmented = imwarp(imOriginal,tform,OutputView=outputView);
imshow(imAugmented)

Translation

Create a translation transformation that shifts the input image horizontally and vertically by a
distance selected randomly from the range [-50, 50] pixels.
tform = randomAffine2d(XTranslation=[-50 50],YTranslation=[-50 50]);
outputView = affineOutputView(size(imOriginal),tform);

19-18
Augment Images for Deep Learning Workflows

imAugmented = imwarp(imOriginal,tform,OutputView=outputView);
imshow(imAugmented)

Scale

Create a scale transformation that resizes the input image using a scale factor selected randomly
from the range [1.2, 1.5]. This transformation resizes the image by the same factor in the horizontal
and vertical directions.

tform = randomAffine2d(Scale=[1.2,1.5]);
outputView = affineOutputView(size(imOriginal),tform);
imAugmented = imwarp(imOriginal,tform,OutputView=outputView);
imshow(imAugmented)

19-19
19 Deep Learning

Reflection

Create a reflection transformation that flips the input image with 50% probability in each dimension.

tform = randomAffine2d(XReflection=true,YReflection=true);
outputView = affineOutputView(size(imOriginal),tform);
imAugmented = imwarp(imOriginal,tform,OutputView=outputView);
imshow(imAugmented)

19-20
Augment Images for Deep Learning Workflows

Shear

Create a horizontal shear transformation with the shear angle selected randomly from the range [-30,
30].

tform = randomAffine2d(XShear=[-30 30]);


outputView = affineOutputView(size(imOriginal),tform);
imAugmented = imwarp(imOriginal,tform,OutputView=outputView);
imshow(imAugmented)

19-21
19 Deep Learning

Control Range of Transformation Parameters Using Custom Selection Function

In the preceding transformations, the range of transformation parameters was specified by two-
element numeric vectors. For more control of the range of the transformation parameters, specify a
function handle instead of a numeric vector. The function handle takes no input arguments and yields
a valid value for each parameter.

For example, this code selects a rotation angle from a discrete set of 90 degree rotation angles.

angles = 0:90:270;
tform = randomAffine2d(Rotation=@() angles(randi(4)));
outputView = affineOutputView(size(imOriginal),tform);
imAugmented = imwarp(imOriginal,tform,OutputView=outputView);
imshow(imAugmented)

19-22
Augment Images for Deep Learning Workflows

Control Fill Value

When you warp an image using a geometric transformation, pixels in the output image can map to a
location outside the bounds of the input image. In that case, imwarp assigns a fill value to those
pixels in the output image. By default, imwarp selects black as the fill value. You can change the fill
value by specifying the 'FillValues' name-value argument.

Create a random rotation transformation, then apply the transformation and specify a gray fill value.

tform = randomAffine2d(Rotation=[-45 45]);


outputView = affineOutputView(size(imOriginal),tform);
imAugmented = imwarp(imOriginal,tform,OutputView=outputView, ...
FillValues=[128 128 128]);
imshow(imAugmented)

19-23
19 Deep Learning

Cropping Transformations

To create output images of a desired size, use the randomWindow2d and centerCropWindow2d
functions. Be careful to select a window that includes the desired content in the image.

Specify the desired size of the cropped region as a 2-element vector of the form [height, width].

targetSize = [200,100];

Crop the image to the target size from the center of the image.

win = centerCropWindow2d(size(imOriginal),targetSize);
imCenterCrop = imcrop(imOriginal,win);
imshow(imCenterCrop)

19-24
Augment Images for Deep Learning Workflows

Crop the image to the target size from a random location in the image.
win = randomWindow2d(size(imOriginal),targetSize);
imRandomCrop = imcrop(imOriginal,win);
imshow(imRandomCrop)

Color Transformations

You can randomly adjust the hue, saturation, brightness, and contrast of a color image by using the
jitterColorHSV function. You can specify which color transformations are included and the range
of transformation parameters.

You can randomly adjust the brightness and contrast of grayscale images by using basic math
operations.

19-25
19 Deep Learning

Hue Jitter

Hue specifies the shade of color, or a color's position on a color wheel. As hue varies from 0 to 1,
colors vary from red through yellow, green, cyan, blue, purple, magenta, and back to red. Hue jitter
shifts the apparent shade of colors in an image.

Adjust the hue of the input image by a small positive offset selected randomly from the range [0.05,
0.15]. Colors that were red now appear more orange or yellow, colors that were orange appear yellow
or green, and so on.

imJittered = jitterColorHSV(imOriginal,Hue=[0.05 0.15]);


montage({imOriginal,imJittered})

Saturation Jitter

Saturation is the purity of color. As saturation varies from 0 to 1, hues vary from gray (indicating a
mixture of all colors) to a single pure color. Saturation jitter shifts how dull or vibrant colors are.

Adjust the saturation of the input image by an offset selected randomly from the range [-0.4, -0.1].
The colors in the output image appear more muted, as expected when the saturation decreases.

imJittered = jitterColorHSV(imOriginal,Saturation=[-0.4 -0.1]);


montage({imOriginal,imJittered})

19-26
Augment Images for Deep Learning Workflows

Brightness Jitter

Brightness is the amount of hue. As brightness varies from 0 to 1, colors go from black to white.
Brightness jitter shifts the darkness and lightness of an input image.

Adjust the brightness of the input image by an offset selected randomly from the range [-0.3, -0.1].
The image appears darker, as expected when the brightness decreases.

imJittered = jitterColorHSV(imOriginal,Brightness=[-0.3 -0.1]);


montage({imOriginal,imJittered})

Contrast Jitter

Contrast jitter randomly adjusts the difference between the darkest and brightest regions in an input
image.

19-27
19 Deep Learning

Adjust the contrast of the input image by a scale factor selected randomly from the range [1.2, 1.4].
The contrast increases, such that shadows become darker and highlights become brighter.

imJittered = jitterColorHSV(imOriginal,Contrast=[1.2 1.4]);


montage({imOriginal,imJittered})

Brightness and Contrast Jitter of Grayscale Images

You can apply randomized brightness and contrast jitter to grayscale images by using basic math
operations.

Convert the sample image to grayscale. Specify a random contrast scale factor in the range [0.8, 1]
and a random brightness offset in the range [-0.15, 0.15]. Multiply the image by the contrast scale
factor, then add the brightness offset.

imGray = im2gray(im2double(imOriginal));
contrastFactor = 1-0.2*rand;
brightnessOffset = 0.3*(rand-0.5);
imJittered = imGray.*contrastFactor + brightnessOffset;
imJittered = im2uint8(imJittered);
montage({imGray,imJittered})

19-28
Augment Images for Deep Learning Workflows

Randomized Color-to-Grayscale

One type of color augmentation randomly drops the color information from an RGB image while
preserving the number of channels expected by the network. This code shows a "random grayscale"
transformation in which an RGB image is randomly converted with 80% probability to a three channel
output image where R == G == B.

desiredProbability = 0.8;
if rand <= desiredProbability
imJittered = repmat(rgb2gray(imOriginal),[1 1 3]);
end
imshow(imJittered)

19-29
19 Deep Learning

Other Image Processing Operations

Use the transform function to apply any combination of Image Processing Toolbox functions to
input images. Adding noise and blur are two common image processing operations used in deep
learning applications.

Synthetic Noise

To apply synthetic noise to an input image, use the imnoise function. You can specify which noise
model to use, such as Gaussian, Poisson, salt and pepper, and multiplicative noise. You can also
specify the strength of the noise.

imSaltAndPepperNoise = imnoise(imOriginal,"salt & pepper",0.1);


imGaussianNoise = imnoise(imOriginal,"gaussian");
montage({imSaltAndPepperNoise,imGaussianNoise})

19-30
Augment Images for Deep Learning Workflows

Synthetic Blur

To apply randomized Gaussian blur to an image, use the imgaussfilt function. You can specify the
amount of smoothing.
sigma = 1+5*rand;
imBlurred = imgaussfilt(imOriginal,sigma);
imshow(imBlurred)

19-31
19 Deep Learning

Apply Augmentation to Image Data in Datastores

In practical deep learning problems, the image augmentation pipeline typically combines multiple
operations. Datastores are a convenient way to read and augment collections of images.

This section of the example shows how to define data augmentation pipelines that augment
datastores in the context of training image classification and image regression problems.

First, create an imageDatastore that contains unprocessed images. The image datastore in this
example contains digit images with labels.

digitDatasetPath = fullfile(matlabroot,"toolbox","nnet", ...


"nndemos","nndatasets","DigitDataset");
imds = imageDatastore(digitDatasetPath, ...
IncludeSubfolders=true,LabelSource="foldernames");
imds.ReadSize = 6;

Image Classification

In image classification, the classifier should learn that a randomly altered version of an image still
represents the same image class. To augment data for image classification, it is sufficient to augment
the input images while leaving the corresponding categorical labels unchanged.

Augment images in the pristine image datastore with random Gaussian blur, salt and pepper noise,
and randomized scale and rotation. These operations are defined in the helper function
classificationAugmentationPipeline at the end of this example. Apply data augmentation to
the training data by using the transform function.

dsTrain = transform(imds,@classificationAugmentationPipeline, ...


IncludeInfo=true);

Visualize a sample of the output coming from the augmented pipeline.

dataPreview = preview(dsTrain);
montage(dataPreview(:,1))
title("Augmented Images for Image Classification")

19-32
Augment Images for Deep Learning Workflows

Image Regression

Image augmentation for image-to-image regression is more complicated because you must apply
identical geometric transformations to the input and response images. Associate pairs of input and
response images by using the combine function. Transform one or both images in each pair by using
the transform function.

Combine two identical copies of the image datastore imds. When data is read from the combined
datastore, image data is returned in a two-column cell array, where the first column represents
network input images and the second column contains network responses.

dsCombined = combine(imds,imds);
montage(preview(dsCombined)',Size=[6 2])
title("Combined Input and Response Pairs Before Augmentation")

19-33
19 Deep Learning

Augment each pair of training images with a series of image processing operations:

19-34
Augment Images for Deep Learning Workflows

• Resize the input and response image to 32-by-32 pixels.


• Add salt and pepper noise to the input image only.
• Create a transformation that has randomized scale and rotation.
• Apply the same transformation to the input and response image.

These operations are defined in the helper function imageRegressionAugmentationPipeline at


the end of this example. Apply data augmentation to the training data by using the transform
function.

dsTrain = transform(dsCombined,@imageRegressionAugmentationPipeline);
montage(preview(dsTrain)',Size=[6 2])
title("Combined Input and Response Pairs After Augmentation")

19-35
19 Deep Learning

19-36
Augment Images for Deep Learning Workflows

For a complete example that includes training and evaluating an image-to-image regression network,
see “Prepare Datastore for Image-to-Image Regression” (Deep Learning Toolbox).

Supporting Functions

The classificationAugmentationPipeline helper function augments images for classification.


dataIn and dataOut are two-element cell arrays, where the first element is the network input image
and the second element is the categorical label.
function [dataOut,info] = classificationAugmentationPipeline(dataIn,info)

dataOut = cell([size(dataIn,1),2]);

for idx = 1:size(dataIn,1)


temp = dataIn{idx};

% Add randomized Gaussian blur


temp = imgaussfilt(temp,1.5*rand);

% Add salt and pepper noise


temp = imnoise(temp,"salt & pepper");

% Add randomized rotation and scale


tform = randomAffine2d(Scale=[0.95,1.05],Rotation=[-30 30]);
outputView = affineOutputView(size(temp),tform);
temp = imwarp(temp,tform,OutputView=outputView);

% Form second column expected by trainNetwork which is the expected response,


% the categorical label in this case
dataOut(idx,:) = {temp,info.Label(idx)};
end

end

The imageRegressionAugmentationPipeline helper function augments images for image-to-


image regression. dataIn and dataOut are two-element cell arrays, where the first element is the
network input image and the second element is the network response image.
function dataOut = imageRegressionAugmentationPipeline(dataIn)

dataOut = cell([size(dataIn,1),2]);
for idx = 1:size(dataIn,1)

% Resize images to 32-by-32 pixels and convert to data type single


inputImage = im2single(imresize(dataIn{idx,1},[32 32]));
targetImage = im2single(imresize(dataIn{idx,2},[32 32]));

% Add salt and pepper noise


inputImage = imnoise(inputImage,"salt & pepper");

% Add randomized rotation and scale


tform = randomAffine2d(Scale=[0.9,1.1],Rotation=[-30 30]);
outputView = affineOutputView(size(inputImage),tform);

% Use imwarp with the same tform and outputView to augment both images
% the same way
inputImage = imwarp(inputImage,tform,OutputView=outputView);
targetImage = imwarp(targetImage,tform,OutputView=outputView);

19-37
19 Deep Learning

dataOut(idx,:) = {inputImage,targetImage};
end

end

See Also
transform | combine

Related Examples
• “Prepare Datastore for Image-to-Image Regression” (Deep Learning Toolbox)

More About
• “Get Started with Image Preprocessing and Augmentation for Deep Learning” on page 19-2
• “Preprocess Images for Deep Learning” on page 19-6

19-38
Get Started with GANs for Image-to-Image Translation

Get Started with GANs for Image-to-Image Translation


An image domain is a set of images with a similar characteristics. For example, an image domain can
be a group of images acquired in certain lighting conditions or images with a common set of noise
distortions.

Image-to-image translation is the task of transferring styles and characteristics from one image
domain to another. The source domain is the domain of the starting image. The target domain is the
desired domain after translation. Applications of domain translation for three sample image domains
include:

Application Source Domain Target Domain


Day-to-dusk style conversion Images acquired in the daytime Images acquired at dusk
Image denoising Images with noise distortion Images without visible noise
Super-resolution Low resolution images High resolution images

Select a GAN
You can perform image-to-image translation using deep learning generative adversarial networks
(GANs). A GAN consists of a generator network and one or more discriminator networks that are
trained simultaneously to maximize the overall performance. The objective of the generator network
is to generate realistic images in the translated domain that cannot be distinguished from images in
the original domain. The objective of discriminator networks is to correctly classify original training
data as real and generator-synthesized images as fake.

The type of GAN depends on the training data.

• Supervised GANs have a one-to-one mapping between images in the source and target domains.
For an example, see “Generate Image from Segmentation Map Using Deep Learning” (Computer
Vision Toolbox). In this example, the source domain consists of images captured of street scenes.
The target domain consists of categorical images representing the semantic segmentation maps.
The data set provides a ground truth segmentation map for every input training image.
• Unsupervised GANs do not have a one-to-one mapping between images in the source and target
domains. For an example, see “Unsupervised Day-to-Dusk Image Translation Using UNIT” on page
19-168. In this example, the source and target domains consist of images captured in daytime and
dusk conditions, respectively. However, the scene content of the daytime and dusk images differs,
so the daytime images do not have a corresponding dusk image with identical scene content.

Create GAN Networks


Image Processing Toolbox offers functions that enable you to create popular GAN networks. You can
optionally modify the networks by changing properties such as the number of downsampling
operations and the type of activation and normalization. The table describes the functions that enable
you to create and modify GAN networks.

19-39
19 Deep Learning

Network Creation and Modification Functions


pix2pixHD generator network [1] A pix2pixHD GAN performs supervised learning.
The network consists of a single generator and
single discriminator.

Create a pix2pixHD generator network using the


pix2pixHDGlobalGenerator. Add a local
enhancer to a pix2pixHD network using the
addPix2PixHDLocalEnhancer function.
CycleGAN generator network [2] A CycleGAN network performs unsupervised
learning. The network consists of two generators
and two discriminators. The first generator takes
images from domain A and generates images in
domain B. The corresponding discriminator takes
images generated by the first generator and real
images in domain B, and attempts to correctly
classify the images as real and fake. Conversely,
the second generator takes images from domain
B and generates images in domain A. The
corresponding discriminator takes images
generated by the second generator and real
images in domain A, and attempts to correctly
classify the images as fake and real.

Create a CycleGAN generator network using the


cycleGANGenerator function.
UNIT generator network [3] An unsupervised image-to-image translation
(UNIT) GAN performs unsupervised learning. The
network consists of one generator and two
discriminators. The generator takes images in
both domains, A and B. The generator returns
four output images: two translated images (A-to-B
and B-to-A), and two self-reconstructed images
(A-to-A and B-to-B). The first discriminator takes
a real and a generated image from domain A and
returns the likelihood that the image is real.
Similarly, the second discriminator takes a real
and a generated image from domain B and
returns the likelihood that the image is real.

Create a UNIT generator network using the


unitGenerator function. Perform image-to-
image translation on a trained UNIT network
using the unitPredict function.

19-40
Get Started with GANs for Image-to-Image Translation

Network Creation and Modification Functions


PatchGAN discriminator network [4] A PatchGAN discriminator network can serve as
the discriminator network for pix2pixHD,
CycleGAN, and UNIT GANs, as well as custom
GANs.

Create a PatchGAN discriminator network using


the patchGANDiscriminator function. The
discriminator decides at a patch-level whether an
image is real or fake. By operating on a patch
instead of pixels, the PatchGAN focuses on the
general style of the input rather than the specific
content.

You can also use the patchGANDiscriminator


function to create a pixel discriminator network.
This network is a PatchGAN discriminator
network whose patch size is a single pixel.

Some networks require additional modification beyond the options available in the network creation
functions. For example, you may want to replace the addition layers with depth concatenation layers,
or you may want the initial leaky ReLU layer of a UNIT network to have a scale factor other than 0.2.
To refine an existing GAN network, you can use Deep Network Designer. For more information, see
“Build Networks with Deep Network Designer” (Deep Learning Toolbox).

If you need a network that is not available through the built-in creation functions, then you can create
custom GAN networks from modular components. First, create the encoder and decoder modules,
then combine the modules using the encoderDecoderNetwork function. You can optionally include
a bridge connection, skip connections, or additional layers at the end of the network. For more
information, see “Create Modular Neural Networks” on page 19-44.

Train GAN Network


To train GAN generator and discriminator networks, you must use a custom training loop. There are
several steps involved in preparing a custom training loop. For an example that shows the complete
workflow, see “Train Generative Adversarial Network (GAN)” (Deep Learning Toolbox).

• Create the generator and discriminator networks.


• Create one or more datastores that read, preprocess, and augment training data. For more
information, see “Datastores for Deep Learning” (Deep Learning Toolbox). Then, create a
minibatchqueue object for each datastore that manages the mini-batching of observations in a
custom training loop.
• Define the model gradients function for each network. The function takes as input the network
and a mini-batch of input data, and returns the gradients of the loss. Optionally, you can pass extra
arguments to the gradients function (for example, if the loss function requires extra information),
or return extra arguments (for example, the loss values). For more information, see “Define Model
Loss Function for Custom Training Loop” (Deep Learning Toolbox).
• Define the loss functions. Certain types of loss functions are commonly used for image-to-image
translation applications, although the implementation of each loss can vary.

19-41
19 Deep Learning

• Adversarial loss is commonly used by generator and discriminator networks. This loss relies on
the pixelwise or patchwise difference between the correct classification and the predicted
classification by the discriminator.
• Cycle consistency loss is commonly used by unsupervised generator networks. This loss is
based on the principle that an image translated from one domain to another, then back to the
original domain, should be identical to the original image.
• Specify training options such as the solver type and the number of epochs. For more information,
see “Specify Training Options in Custom Training Loop” (Deep Learning Toolbox).
• Create the custom training loop that loops over mini-batches in every epoch. The loop reads each
mini-batch of data, evaluates the model gradients using the dlfeval function, and updates the
network parameters.

Optionally, include display functions such as plots of scores or batches of generated images that
enable you to monitor the training progress. For more information, see “Monitor GAN Training
Progress and Identify Common Failure Modes” (Deep Learning Toolbox).

References
[1] Wang, Ting-Chun, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, and Bryan Catanzaro. "High-
Resolution Image Synthesis and Semantic Manipulation with Conditional GANs." In 2018
IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8798–8807. Salt Lake
City, UT, USA: IEEE, 2018. https://doi.org/10.1109/CVPR.2018.00917.

[2] Zhu, Jun-Yan, Taesung Park, Phillip Isola, and Alexei A. Efros. "Unpaired Image-to-Image
Translation Using Cycle-Consistent Adversarial Networks." In 2017 IEEE International
Conference on Computer Vision (ICCV), 2242–2251. Venice: IEEE, 2017. https://
ieeexplore.ieee.org/document/8237506.

[3] Liu, Ming-Yu, Thomas Breuel, and Jan Kautz. "Unsupervised Image-to-Image Translation
Networks." Advances in Neural Information Processing Systems 30 (NIPS 2017). Long Beach,
CA: 2017. https://arxiv.org/abs/1703.00848.

[4] Isola, Phillip, Jun-Yan Zhu, Tinghui Zhou, and Alexei A. Efros. "Image-to-Image Translation with
Conditional Adversarial Networks." In 2017 IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), 5967–76. Honolulu, HI: IEEE, 2017. https://arxiv.org/abs/1611.07004.

See Also
encoderDecoderNetwork | blockedNetwork | pretrainedEncoderNetwork |
cycleGANGenerator | patchGANDiscriminator | pix2pixHDGlobalGenerator |
unitGenerator

Related Examples
• “Unsupervised Day-to-Dusk Image Translation Using UNIT” on page 19-168
• “Generate Image from Segmentation Map Using Deep Learning” (Computer Vision Toolbox)

More About
• “Create Modular Neural Networks” on page 19-44
• “Train Generative Adversarial Network (GAN)” (Deep Learning Toolbox)

19-42
Get Started with GANs for Image-to-Image Translation

• “Define Custom Training Loops, Loss Functions, and Networks” (Deep Learning Toolbox)
• “Define Model Loss Function for Custom Training Loop” (Deep Learning Toolbox)
• “Specify Training Options in Custom Training Loop” (Deep Learning Toolbox)
• “Train Network Using Custom Training Loop” (Deep Learning Toolbox)

19-43
19 Deep Learning

Create Modular Neural Networks


Many neural networks used for image processing applications have an architecture that follows a
modular pattern. The pattern consists of an encoder module that downsamples the input followed by
a decoder that upsamples the data. Bridge layers optionally connect the encoder and decoder
modules. The modular pattern is used by convolutional neural networks (CNNs), such as U-Net, and
generative adversarial network (GAN) generator and discriminator networks, such as CycleGAN and
PatchGAN.

Create Encoder and Decoder Modules


To create encoder and decoder modules, you can:

• Create an encoder network from a pretrained network, such as SqueezeNet, using the
pretrainedEncoderNetwork function. The function prunes the pretrained network such that
the encoder includes the number of downsampling operations that you specify.
• Create encoder and decoder modules from building blocks of layers that follow a repeating
pattern. To create a module, define a function that specifies the pattern, then assemble blocks into
a module using the blockedNetwork function.

An encoder module consists of an initial block of layers, downsampling blocks, and residual blocks. A
decoder module consists of upsampling blocks and a final block that provides the network output. The
table describes the blocks of layers that commonly comprise encoder and decoder modules.

Type of Block Description


Initial block • An imageInputLayer
• A convolution2dLayer with a stride of [1 1]
• An optional normalization layer
• An activation layer
Downsampling • A downsampling layer, such as a pooling layer or a convolution2dLayer
block with a stride greater than 1
• An optional normalization layer
• An activation layer
Residual block • A convolution2dLayer
• An optional normalization layer
• An activation layer
• An optional dropout layer
• A second convolution2dLayer
• An optional second normalization layer
• An additionLayer that provides a skip connection between every block
Upsampling block • Layers that perform upsampling, such as a transposed convolution layer, or a
convolution layer followed by a resizing or depth-to-space layer.
• An optional normalization layer
• An activation layer

19-44
Create Modular Neural Networks

Type of Block Description


Final block • A convolution2dLayer
• An optional output layer

Create Networks from Encoder and Decoder Modules


After you have an encoder and a decoder module, you can combine the modules to form a CNN, GAN
generator, or GAN discriminator network using the encoderDecoderNetwork function. You can
optionally include a bridge connection, skip connections, or additional layers at the end of the
network.

You can also create popular GAN generator and discriminator networks directly by using functions
available in Image Processing Toolbox. These networks include CycleGAN, PatchGAN, pix2pixHD, and
UNIT. For more information, see “Get Started with GANs for Image-to-Image Translation” on page 19-
39.

See Also
encoderDecoderNetwork | blockedNetwork | pretrainedEncoderNetwork

More About
• “Get Started with GANs for Image-to-Image Translation” on page 19-39
• “List of Deep Learning Layers” (Deep Learning Toolbox)

19-45
19 Deep Learning

Train and Apply Denoising Neural Networks


Image Processing Toolbox and Deep Learning Toolbox provide many options to remove noise from
images. The simplest and fastest solution is to use the built-in pretrained denoising neural network,
called DnCNN. However, the pretrained network does not offer much flexibility in the type of noise
recognized. For more flexibility, train your own network using predefined layers, or train a fully
custom denoising neural network.

Remove Gaussian Noise Using Pretrained Network


You can use the built-in pretrained DnCNN network to remove Gaussian noise without the challenges
of training a network. Removing noise with the pretrained network has these limitations:

• Noise removal works only with 2-D single-channel images. If you have multiple color channels, or
if you are working with 3-D images, remove noise by treating each channel or plane separately.
For an example, see “Remove Noise from Color Image Using Pretrained Neural Network” on page
19-49.
• The network recognizes only Gaussian noise, with a limited range of standard deviation.

To load the pretrained DnCNN network, use the denoisingNetwork function. Then, pass the
DnCNN network and a noisy 2-D single-channel image to denoiseImage. The image shows the
workflow to denoise an image using the pretrained DnCNN network.

Train a Denoising Network Using Built-In Layers


You can train a network to detect a larger range of Gaussian noise standard deviations from grayscale
images, starting with built-in layers provided by Image Processing Toolbox. To train a denoising
network using predefined layers, follow these steps. The diagram shows the training workflow in the
dark gray box.

• Create an ImageDatastore object that stores pristine images.


• Create a denoisingImageDatastore object that generates noisy training data from the pristine
images. To specify the range of the Gaussian noise standard deviations, set the

19-46
Train and Apply Denoising Neural Networks

GaussianNoiseLevel property. You must use the default value of PatchSize (50) and
ChannelFormat ('grayscale') so that the size of the training data matches the input size of the
network.
• Get the predefined denoising layers using the dnCNNLayers function.
• Define training options using the trainingOptions function.
• Train the network, specifying the denoising image datastore as the data source for
trainNetwork. For each iteration of training, the denoising image datastore generates one mini-
batch of training data by randomly cropping pristine images from the ImageDatastore, then
adding randomly generated zero-mean Gaussian white noise to each image patch. The standard
deviation of the added noise is unique for each image patch, and has a value within the range
specified by the GaussianNoiseLevel property of the denoising image datastore.

After you have trained the network, pass the network and a noisy grayscale image to denoiseImage.
The diagram shows the denoising workflow in the light gray box.

Train Fully Customized Denoising Neural Network


To train a denoising neural network with maximum flexibility, you can use a custom datastore to
generate training data or define your own network architecture. For example, you can:

19-47
19 Deep Learning

• Train a network that detects a larger variety of noise, such as non-Gaussian noise distributions, in
single-channel images. You can define the network architecture by using the layers returned by
the dnCNNLayers function. To generate training images compatible with this network, use the
transform and combine functions to batches of noisy images and the corresponding noise
signal. For more information, see “Preprocess Images for Deep Learning” (Deep Learning
Toolbox).

After you train a denoising network using the DnCNN network architecture, you can use the
denoiseImage function to remove image noise.

Tip The DnCNN network can also detect high-frequency image artifacts caused by other types of
distortion. For example, you can train the DnCNN network to increase image resolution or remove
JPEG compression artifacts. The “JPEG Image Deblocking Using Deep Learning” on page 19-71
example shows how to train a DnCNN network to remove JPEG compression artifacts
• Train a network that detects a range of Gaussian noise distributions for color images. To generate
training images for this network, you can use a denoisingImageDatastore and set the
ChannelFormat property to 'rgb'. You must define a custom convolutional neural network
architecture that supports RGB input images.

After you train a denoising network using a custom network architecture, you can use the
activations function to isolate the noise or high-frequency artifacts in a distorted image. Then,
subtract the noise from the distorted image to obtain a denoised image.

See Also
denoisingImageDatastore | dnCNNLayers | denoisingNetwork | trainingOptions |
trainNetwork | denoiseImage | activations | combine | transform

Related Examples
• “Remove Noise from Color Image Using Pretrained Neural Network” on page 19-49
• “JPEG Image Deblocking Using Deep Learning” on page 19-71
• “Prepare Datastore for Image-to-Image Regression” (Deep Learning Toolbox)

More About
• “Preprocess Images for Deep Learning” (Deep Learning Toolbox)
• “Deep Learning in MATLAB” (Deep Learning Toolbox)

19-48
Remove Noise from Color Image Using Pretrained Neural Network

Remove Noise from Color Image Using Pretrained Neural


Network

This example shows how to remove Gaussian noise from an RGB image using a denoising
convolutional neural network.

Read a color image into the workspace and convert the data to data type double. Display the pristine
color image.

pristineRGB = imread("lighthouse.png");
pristineRGB = im2double(pristineRGB);
imshow(pristineRGB)
title("Pristine Image")

19-49
19 Deep Learning

Add zero-mean Gaussian white noise with a variance of 0.01 to the image. The imnoise function
adds noise to each color channel independently. Display the noisy color image.

noisyRGB = imnoise(pristineRGB,"gaussian",0,0.01);
imshow(noisyRGB)
title("Noisy Image")

19-50
Remove Noise from Color Image Using Pretrained Neural Network

The pretrained denoising convolutional neural network, DnCNN, operates on single-channel images.
Split the noisy RGB image into its three individual color channels.

[noisyR,noisyG,noisyB] = imsplit(noisyRGB);

Load the pretrained DnCNN network.

19-51
19 Deep Learning

net = denoisingNetwork("dncnn");

Use the DnCNN network to remove noise from each color channel.

denoisedR = denoiseImage(noisyR,net);
denoisedG = denoiseImage(noisyG,net);
denoisedB = denoiseImage(noisyB,net);

Recombine the denoised color channels to form the denoised RGB image. Display the denoised color
image.

denoisedRGB = cat(3,denoisedR,denoisedG,denoisedB);
imshow(denoisedRGB)
title("Denoised Image")

19-52
Remove Noise from Color Image Using Pretrained Neural Network

Calculate the peak signal-to-noise ratio (PSNR) for the noisy and denoised images. A larger PSNR
indicates that noise has a smaller relative signal, and is associated with higher image quality.
noisyPSNR = psnr(noisyRGB,pristineRGB);
fprintf("\n The PSNR value of the noisy image is %0.4f.",noisyPSNR);

The PSNR value of the noisy image is 20.6395.

19-53
19 Deep Learning

denoisedPSNR = psnr(denoisedRGB,pristineRGB);
fprintf("\n The PSNR value of the denoised image is %0.4f.",denoisedPSNR);

The PSNR value of the denoised image is 29.6857.

Calculate the structural similarity (SSIM) index for the noisy and denoised images. An SSIM index
close to 1 indicates good agreement with the reference image, and higher image quality.

noisySSIM = ssim(noisyRGB,pristineRGB);
fprintf("\n The SSIM value of the noisy image is %0.4f.",noisySSIM);

The SSIM value of the noisy image is 0.7393.

denoisedSSIM = ssim(denoisedRGB,pristineRGB);
fprintf("\n The SSIM value of the denoised image is %0.4f.",denoisedSSIM);

The SSIM value of the denoised image is 0.9507.

In practice, image color channels frequently have correlated noise. To remove correlated image noise,
first convert the RGB image to a color space with a luminance channel, such as the L*a*b* color
space. Remove noise on the luminance channel only, then convert the denoised image back to the
RGB color space.

See Also
denoisingNetwork | denoiseImage | rgb2lab | lab2rgb | psnr | ssim | imnoise

More About
• “Train and Apply Denoising Neural Networks” on page 19-46

19-54
Increase Image Resolution Using Deep Learning

Increase Image Resolution Using Deep Learning

This example shows how to create a high-resolution image from a low-resolution image using a very-
deep super-resolution (VDSR) neural network.

Super-resolution is the process of creating high-resolution images from low-resolution images. This
example considers single image super-resolution (SISR), where the goal is to recover one high-
resolution image from one low-resolution image. SISR is challenging because high-frequency image
content typically cannot be recovered from the low-resolution image. Without high-frequency
information, the quality of the high-resolution image is limited. Further, SISR is an ill-posed problem
because one low-resolution image can yield several possible high-resolution images.

Several techniques, including deep learning algorithms, have been proposed to perform SISR. This
example explores one deep learning algorithm for SISR, called very-deep super-resolution (VDSR) [1
on page 19-69].

The VDSR Network

VDSR is a convolutional neural network architecture designed to perform single image super-
resolution [1 on page 19-69]. The VDSR network learns the mapping between low- and high-
resolution images. This mapping is possible because low-resolution and high-resolution images have
similar image content and differ primarily in high-frequency details.

VDSR employs a residual learning strategy, meaning that the network learns to estimate a residual
image. In the context of super-resolution, a residual image is the difference between a high-resolution
reference image and a low-resolution image that has been upscaled using bicubic interpolation to
match the size of the reference image. A residual image contains information about the high-
frequency details of an image.

The VDSR network detects the residual image from the luminance of a color image. The luminance
channel of an image, Y, represents the brightness of each pixel through a linear combination of the
red, green, and blue pixel values. In contrast, the two chrominance channels of an image, Cb and Cr,

19-55
19 Deep Learning

are different linear combinations of the red, green, and blue pixel values that represent color-
difference information. VDSR is trained using only the luminance channel because human perception
is more sensitive to changes in brightness than to changes in color.

If Y highres is the luminance of the high-resolution image and Y lowres is the luminance a low-resolution
image that has been upscaled using bicubic interpolation, then the input to the VDSR network is
Y lowres and the network learns to predict Y residual = Y highres − Y lowres from the training data.

After the VDSR network learns to estimate the residual image, you can reconstruct high-resolution
images by adding the estimated residual image to the upsampled low-resolution image, then
converting the image back to the RGB color space.

A scale factor relates the size of the reference image to the size of the low-resolution image. As the
scale factor increases, SISR becomes more ill-posed because the low-resolution image loses more
information about the high-frequency image content. VDSR solves this problem by using a large
receptive field. This example trains a VDSR network with multiple scale factors using scale
augmentation. Scale augmentation improves the results at larger scale factors because the network
can take advantage of the image context from smaller scale factors. Additionally, the VDSR network
can generalize to accept images with noninteger scale factors.

Download Training and Test Data

Download the IAPR TC-12 Benchmark, which consists of 20,000 still natural images [2 on page 19-
69]. The data set includes photos of people, animals, cities, and more. The size of the data file is
~1.8 GB. If you do not want to download the training data set, then you can load the pretrained VDSR
network by typing load("trainedVDSRNet.mat"); at the command line. Then, go directly to the
Perform Single Image Super-Resolution Using VDSR Network on page 19-60 section in this example.

Use the helper function, downloadIAPRTC12Data, to download the data. This function is attached to
the example as a supporting file. Specify dataDir as the desired location of the data.

19-56
Increase Image Resolution Using Deep Learning

dataDir = ;
downloadIAPRTC12Data(dataDir);

This example will train the network with a small subset of the IAPR TC-12 Benchmark data. Load the
imageCLEF training data. All images are 32-bit JPEG color images.

trainImagesDir = fullfile(dataDir,"iaprtc12","images","02");
exts = [".jpg",".bmp",".png"];
pristineImages = imageDatastore(trainImagesDir,FileExtensions=exts);

List the number of training images.

numel(pristineImages.Files)

ans = 616

Prepare Training Data

To create a training data set, generate pairs of images consisting of upsampled images and the
corresponding residual images.

The upsampled images are stored on disk as MAT files in the directory upsampledDirName. The
computed residual images representing the network responses are stored on disk as MAT files in the
directory residualDirName. The MAT files are stored as data type double for greater precision
when training the network.

upsampledDirName = trainImagesDir+filesep+"upsampledImages";
residualDirName = trainImagesDir+filesep+"residualImages";

Use the helper function createVDSRTrainingSet to preprocess the training data. This function is
attached to the example as a supporting file.

The helper function performs these operations for each pristine image in trainImages:

• Convert the image to the YCbCr color space


• Downsize the luminance (Y) channel by different scale factors to create sample low-resolution
images, then resize the images to the original size using bicubic interpolation
• Calculate the difference between the pristine and resized images.
• Save the resized and residual images to disk.

scaleFactors = [2 3 4];
createVDSRTrainingSet(pristineImages,scaleFactors,upsampledDirName,residualDirName);

Define Preprocessing Pipeline for Training Set

In this example, the network inputs are low-resolution images that have been upsampled using
bicubic interpolation. The desired network responses are the residual images. Create an image
datastore called upsampledImages from the collection of input image files. Create an image
datastore called residualImages from the collection of computed residual image files. Both
datastores require a helper function, matRead, to read the image data from the image files. This
function is attached to the example as a supporting file.

upsampledImages = imageDatastore(upsampledDirName,FileExtensions=".mat",ReadFcn=@matRead);
residualImages = imageDatastore(residualDirName,FileExtensions=".mat",ReadFcn=@matRead);

19-57
19 Deep Learning

Create an imageDataAugmenter (Deep Learning Toolbox) that specifies the parameters of data
augmentation. Use data augmentation during training to vary the training data, which effectively
increases the amount of available training data. Here, the augmenter specifies random rotation by 90
degrees and random reflections in the x-direction.

augmenter = imageDataAugmenter( ...


RandRotatio=@()randi([0,1],1)*90, ...
RandXReflection=true);

Create a randomPatchExtractionDatastore that performs randomized patch extraction from the


upsampled and residual image datastores. Patch extraction is the process of extracting a large set of
small image patches, or tiles, from a single larger image. This type of data augmentation is frequently
used in image-to-image regression problems, where many network architectures can be trained on
very small input image sizes. This means that a large number of patches can be extracted from each
full-sized image in the original training set, which greatly increases the size of the training set.

patchSize = [41 41];


patchesPerImage = 64;
dsTrain = randomPatchExtractionDatastore(upsampledImages,residualImages,patchSize, ...
DataAugmentation=augmenter,PatchesPerImage=patchesPerImage);

The resulting datastore, dsTrain, provides mini-batches of data to the network at each iteration of
the epoch. Preview the result of reading from the datastore.

inputBatch = preview(dsTrain);
disp(inputBatch)

InputImage ResponseImage
______________ ______________

{41×41 double} {41×41 double}


{41×41 double} {41×41 double}
{41×41 double} {41×41 double}
{41×41 double} {41×41 double}
{41×41 double} {41×41 double}
{41×41 double} {41×41 double}
{41×41 double} {41×41 double}
{41×41 double} {41×41 double}

Set Up VDSR Layers

This example defines the VDSR network using 41 individual layers from Deep Learning Toolbox™,
including:

• imageInputLayer (Deep Learning Toolbox) - Image input layer


• convolution2dLayer (Deep Learning Toolbox) - 2-D convolution layer for convolutional neural
networks
• reluLayer (Deep Learning Toolbox) - Rectified linear unit (ReLU) layer
• regressionLayer (Deep Learning Toolbox) - Regression output layer for a neural network

The first layer, imageInputLayer, operates on image patches. The patch size is based on the
network receptive field, which is the spatial image region that affects the response of the top-most
layer in the network. Ideally, the network receptive field is the same as the image size so that the field
can see all the high-level features in the image. In this case, for a network with D convolutional
layers, the receptive field is (2D+1)-by-(2D+1).

19-58
Increase Image Resolution Using Deep Learning

VDSR has 20 convolutional layers so the receptive field and the image patch size are 41-by-41. The
image input layer accepts images with one channel because VDSR is trained using only the luminance
channel.

networkDepth = 20;
firstLayer = imageInputLayer([41 41 1],Name="InputLayer",Normalization="none");

The image input layer is followed by a 2-D convolutional layer that contains 64 filters of size 3-by-3.
The mini-batch size determines the number of filters. Zero-pad the inputs to each convolutional layer
so that the feature maps remain the same size as the input after each convolution. He's method [3 on
page 19-69] initializes the weights to random values so that there is asymmetry in neuron learning.
Each convolutional layer is followed by a ReLU layer, which introduces nonlinearity in the network.

convLayer = convolution2dLayer(3,64,Padding=1, ...


WeightsInitializer="he",BiasInitializer="zeros",Name="Conv1");

Specify a ReLU layer.

relLayer = reluLayer(Name="ReLU1");

The middle layers contain 18 alternating convolutional and rectified linear unit layers. Every
convolutional layer contains 64 filters of size 3-by-3-by-64, where a filter operates on a 3-by-3 spatial
region across 64 channels. As before, a ReLU layer follows every convolutional layer.

middleLayers = [convLayer relLayer];


for layerNumber = 2:networkDepth-1
convLayer = convolution2dLayer(3,64,Padding=[1 1], ...
WeightsInitializer="he",BiasInitializer="zeros", ...
Name="Conv"+num2str(layerNumber));

relLayer = reluLayer(Name="ReLU"+num2str(layerNumber));
middleLayers = [middleLayers convLayer relLayer];
end

The penultimate layer is a convolutional layer with a single filter of size 3-by-3-by-64 that
reconstructs the image.

convLayer = convolution2dLayer(3,1,Padding=[1 1], ...


WeightsInitializer="he",BiasInitializer="zeros", ...
NumChannels=64,Name="Conv"+num2str(networkDepth));

The last layer is a regression layer instead of a ReLU layer. The regression layer computes the mean-
squared error between the residual image and network prediction.

finalLayers = [convLayer regressionLayer(Name="FinalRegressionLayer")];

Concatenate all the layers to form the VDSR network.

layers = [firstLayer middleLayers finalLayers];

Specify Training Options

Train the network using stochastic gradient descent with momentum (SGDM) optimization. Specify
the hyperparameter settings for SGDM by using the trainingOptions (Deep Learning Toolbox)
function. The learning rate is initially 0.1 and decreased by a factor of 10 every 10 epochs. Train for
100 epochs.

19-59
19 Deep Learning

Training a deep network is time-consuming. Accelerate the training by specifying a high learning
rate. However, this can cause the gradients of the network to explode or grow uncontrollably,
preventing the network from training successfully. To keep the gradients in a meaningful range,
enable gradient clipping by specifying "GradientThreshold" as 0.01, and specify
"GradientThresholdMethod" to use the L2-norm of the gradients.
maxEpochs = 100;
epochIntervals = 1;
initLearningRate = 0.1;
learningRateFactor = 0.1;
l2reg = 0.0001;
miniBatchSize = 64;
options = trainingOptions("sgdm", ...
Momentum=0.9, ...
InitialLearnRate=initLearningRate, ...
LearnRateSchedule="piecewise", ...
LearnRateDropPeriod=10, ...
LearnRateDropFactor=learningRateFactor, ...
L2Regularization=l2reg, ...
MaxEpochs=maxEpochs, ...
MiniBatchSize=miniBatchSize, ...
GradientThresholdMethod="l2norm", ...
GradientThreshold=0.01, ...
Plots="training-progress", ...
Verbose=false);

Train the Network

By default, the example loads a pretrained version of the VDSR network that has been trained to
super-resolve images for scale factors 2, 3 and 4. The pretrained network enables you to perform
super-resolution of test images without waiting for training to complete.

To train the VDSR network, set the doTraining variable in the following code to true. Train the
network using the trainNetwork (Deep Learning Toolbox) function.

Train on a GPU if one is available. Using a GPU requires Parallel Computing Toolbox™ and a CUDA®
enabled NVIDIA® GPU. For more information, see “GPU Computing Requirements” (Parallel
Computing Toolbox). Training takes about 6 hours on an NVIDIA Titan X.

doTraining = ;
if doTraining
net = trainNetwork(dsTrain,layers,options);
modelDateTime = string(datetime("now",Format="yyyy-MM-dd-HH-mm-ss"));
save("trainedVDSR-"+modelDateTime+".mat","net");
else
load("trainedVDSRNet.mat");
end

Perform Single Image Super-Resolution Using VDSR Network

To perform single image super-resolution (SISR) using the VDSR network, follow the remaining steps
of this example:

• Create a sample low-resolution image from a high-resolution reference image.


• Perform SISR on the low-resolution image using bicubic interpolation, a traditional image
processing solution that does not rely on deep learning.

19-60
Increase Image Resolution Using Deep Learning

• Perform SISR on the low-resolution image using the VDSR neural network.
• Visually compare the reconstructed high-resolution images using bicubic interpolation and VDSR.
• Evaluate the quality of the super-resolved images by quantifying the similarity of the images to the
high-resolution reference image.

Create Sample Low-Resolution Image

The test data set, testImages, contains 20 undistorted images shipped in Image Processing
Toolbox™. Load the images into an imageDatastore and display the images in a montage.

fileNames = ["sherlock.jpg","peacock.jpg","fabric.png","greens.jpg", ...


"hands1.jpg","kobi.png","lighthouse.png","office_4.jpg", ...
"onion.png","pears.png","yellowlily.jpg","indiancorn.jpg", ...
"flamingos.jpg","sevilla.jpg","llama.jpg","parkavenue.jpg", ...
"strawberries.jpg","trailer.jpg","wagon.jpg","football.jpg"];
filePath = fullfile(matlabroot,"toolbox","images","imdata")+filesep;
filePathNames = strcat(filePath,fileNames);
testImages = imageDatastore(filePathNames);

Display the test images as a montage.

montage(testImages)

19-61
19 Deep Learning

Select one of the test images to use for testing the super-resolution network.

testImage = ;
Ireference = imread(testImage);
Ireference = im2double(Ireference);
imshow(Ireference)
title("High-Resolution Reference Image")

19-62
Increase Image Resolution Using Deep Learning

Create a low-resolution version of the high-resolution reference image by using imresize with a
scaling factor of 0.25. The high-frequency components of the image are lost during the downscaling.

scaleFactor = 0.25;
Ilowres = imresize(Ireference,scaleFactor,"bicubic");
imshow(Ilowres)
title("Low-Resolution Image")

19-63
19 Deep Learning

Improve Image Resolution Using Bicubic Interpolation

A standard way to increase image resolution without deep learning is to use bicubic interpolation.
Upscale the low-resolution image using bicubic interpolation so that the resulting high-resolution
image is the same size as the reference image.

[nrows,ncols,np] = size(Ireference);
Ibicubic = imresize(Ilowres,[nrows ncols],"bicubic");
imshow(Ibicubic)
title("High-Resolution Image Obtained Using Bicubic Interpolation")

19-64
Increase Image Resolution Using Deep Learning

Improve Image Resolution Using Pretrained VDSR Network

Recall that VDSR is trained using only the luminance channel of an image because human perception
is more sensitive to changes in brightness than to changes in color.

Convert the low-resolution image from the RGB color space to luminance (Iy) and chrominance (Icb
and Icr) channels by using the rgb2ycbcr function.
Iycbcr = rgb2ycbcr(Ilowres);
Iy = Iycbcr(:,:,1);
Icb = Iycbcr(:,:,2);
Icr = Iycbcr(:,:,3);

Upscale the luminance and two chrominance channels using bicubic interpolation. The upsampled
chrominance channels, Icb_bicubic and Icr_bicubic, require no further processing.
Iy_bicubic = imresize(Iy,[nrows ncols],"bicubic");
Icb_bicubic = imresize(Icb,[nrows ncols],"bicubic");
Icr_bicubic = imresize(Icr,[nrows ncols],"bicubic");

Pass the upscaled luminance component, Iy_bicubic, through the trained VDSR network. Observe
the activations (Deep Learning Toolbox) from the final layer (a regression layer). The output of the
network is the desired residual image.
Iresidual = activations(net,Iy_bicubic,41);
Iresidual = double(Iresidual);

19-65
19 Deep Learning

imshow(Iresidual,[])
title("Residual Image from VDSR")

Add the residual image to the upscaled luminance component to get the high-resolution VDSR
luminance component.

Isr = Iy_bicubic + Iresidual;

Concatenate the high-resolution VDSR luminance component with the upscaled color components.
Convert the image to the RGB color space by using the ycbcr2rgb function. The result is the final
high-resolution color image using VDSR.

Ivdsr = ycbcr2rgb(cat(3,Isr,Icb_bicubic,Icr_bicubic));
imshow(Ivdsr)
title("High-Resolution Image Obtained Using VDSR")

19-66
Increase Image Resolution Using Deep Learning

Visual and Quantitative Comparison

To get a better visual understanding of the high-resolution images, examine a small region inside
each image. Specify a region of interest (ROI) using vector roi in the format [x y width height]. The
elements define the x- and y-coordinate of the top left corner, and the width and height of the ROI.

roi = [360 50 400 350];

Crop the high-resolution images to this ROI, and display the result as a montage. The VDSR image
has clearer details and sharper edges than the high-resolution image created using bicubic
interpolation.

montage({imcrop(Ibicubic,roi),imcrop(Ivdsr,roi)})
title("High-Resolution Results Using Bicubic Interpolation (Left) vs. VDSR (Right)");

19-67
19 Deep Learning

Use image quality metrics to quantitatively compare the high-resolution image using bicubic
interpolation to the VDSR image. The reference image is the original high-resolution image,
Ireference, before preparing the sample low-resolution image.

Measure the peak signal-to-noise ratio (PSNR) of each image against the reference image. Larger
PSNR values generally indicate better image quality. See psnr for more information about this
metric.

bicubicPSNR = psnr(Ibicubic,Ireference)

bicubicPSNR = 38.4747

vdsrPSNR = psnr(Ivdsr,Ireference)

vdsrPSNR = 39.2346

Measure the structural similarity index (SSIM) of each image. SSIM assesses the visual impact of
three characteristics of an image: luminance, contrast and structure, against a reference image. The
closer the SSIM value is to 1, the better the test image agrees with the reference image. See ssim for
more information about this metric.

bicubicSSIM = ssim(Ibicubic,Ireference)

bicubicSSIM = 0.9861

vdsrSSIM = ssim(Ivdsr,Ireference)

vdsrSSIM = 0.9874

Measure perceptual image quality using the Naturalness Image Quality Evaluator (NIQE). Smaller
NIQE scores indicate better perceptual quality. See niqe for more information about this metric.

bicubicNIQE = niqe(Ibicubic)

bicubicNIQE = 5.1721

19-68
Increase Image Resolution Using Deep Learning

vdsrNIQE = niqe(Ivdsr)

vdsrNIQE = 4.7612

Calculate the average PSNR and SSIM of the entire set of test images for the scale factors 2, 3, and
4. For simplicity, you can use the helper function, vdsrMetrics, to compute the average metrics.
This function is attached to the example as a supporting file.

scaleFactors = [2 3 4];
vdsrMetrics(net,testImages,scaleFactors);

Results for Scale factor 2

Average PSNR for Bicubic = 31.467070


Average PSNR for VDSR = 31.481973
Average SSIM for Bicubic = 0.935820
Average SSIM for VDSR = 0.947057

Results for Scale factor 3

Average PSNR for Bicubic = 28.107057


Average PSNR for VDSR = 28.430546
Average SSIM for Bicubic = 0.883927
Average SSIM for VDSR = 0.894634

Results for Scale factor 4

Average PSNR for Bicubic = 27.066129


Average PSNR for VDSR = 27.846590
Average SSIM for Bicubic = 0.863270
Average SSIM for VDSR = 0.878101

VDSR has better metric scores than bicubic interpolation for each scale factor.

References

[1] Kim, J., J. K. Lee, and K. M. Lee. "Accurate Image Super-Resolution Using Very Deep Convolutional
Networks." Proceedings of the IEEE® Conference on Computer Vision and Pattern Recognition.
2016, pp. 1646-1654.

[2] Grubinger, M., P. Clough, H. Müller, and T. Deselaers. "The IAPR TC-12 Benchmark: A New
Evaluation Resource for Visual Information Systems." Proceedings of the OntoImage 2006 Language
Resources For Content-Based Image Retrieval. Genoa, Italy. Vol. 5, May 2006, p. 10.

[3] He, K., X. Zhang, S. Ren, and J. Sun. "Delving Deep into Rectifiers: Surpassing Human-Level
Performance on ImageNet Classification." Proceedings of the IEEE International Conference on
Computer Vision, 2015, pp. 1026-1034.

See Also
randomPatchExtractionDatastore | rgb2ycbcr | ycbcr2rgb | trainingOptions |
trainNetwork | transform | combine

19-69
19 Deep Learning

More About
• “Preprocess Images for Deep Learning” (Deep Learning Toolbox)
• “List of Deep Learning Layers” (Deep Learning Toolbox)

19-70
JPEG Image Deblocking Using Deep Learning

JPEG Image Deblocking Using Deep Learning

This example shows how to reduce JPEG compression artifacts in an image using a denoising
convolutional neural network (DnCNN).

Image compression is used to reduce the memory footprint of an image. One popular and powerful
compression method is employed by the JPEG image format, which uses a quality factor to specify the
amount of compression. Reducing the quality value results in higher compression and a smaller
memory footprint, at the expense of visual quality of the image.

JPEG compression is lossy, meaning that the compression process causes the image to lose
information. For JPEG images, this information loss appears as blocking artifacts in the image. As
shown in the figure, more compression results in more information loss and stronger artifacts.
Textured regions with high-frequency content, such as the grass and clouds, look blurry. Sharp edges,
such as the roof of the house and the guardrails atop the lighthouse, exhibit ringing.

JPEG deblocking is the process of reducing the effects of compression artifacts in JPEG images.
Several JPEG deblocking methods exist, including more effective methods that use deep learning.
This example implements one such deep learning-based method that attempts to minimize the effect
of JPEG compression artifacts.

The DnCNN Network

This example uses a built-in deep feed-forward convolutional neural network, called DnCNN. The
network was primarily designed to remove noise from images. However, the DnCNN architecture can
also be trained to remove JPEG compression artifacts or increase image resolution.

The reference paper [1 on page 19-83] employs a residual learning strategy, meaning that the
DnCNN network learns to estimate the residual image. A residual image is the difference between a
pristine image and a distorted copy of the image. The residual image contains information about the
image distortion. For this example, distortion appears as JPEG blocking artifacts.

The DnCNN network is trained to detect the residual image from the luminance of a color image. The
luminance channel of an image, Y, represents the brightness of each pixel through a linear

19-71
19 Deep Learning

combination of the red, green, and blue pixel values. In contrast, the two chrominance channels of an
image, Cb and Cr, are different linear combinations of the red, green, and blue pixel values that
represent color-difference information. DnCNN is trained using only the luminance channel because
human perception is more sensitive to changes in brightness than changes in color.

If Y Original is the luminance of the pristine image and Y Compressed is the luminance of the image
containing JPEG compression artifacts, then the input to the DnCNN network is Y Compressed and the
network learns to predict Y Residual = Y Compressed − Y Original from the training data.

Once the DnCNN network learns how to estimate a residual image, it can reconstruct an undistorted
version of a compressed JPEG image by adding the residual image to the compressed luminance
channel, then converting the image back to the RGB color space.

Download Training Data

Download the IAPR TC-12 Benchmark, which consists of 20,000 still natural images [2 on page 19-
83]. The data set includes photos of people, animals, cities, and more. The size of the data file is
~1.8 GB. If you do not want to download the training data or train the network, then you can load the
pretrained DnCNN network by typing load("trainedJPEGDnCNN.mat") at the command line.
Then, go directly to the Perform JPEG Deblocking Using DnCNN Network on page 19-76 section in
this example.

Use the helper function, downloadIAPRTC12Data, to download the data. This function is attached to
the example as a supporting file. Specify dataDir as the desired location of the data.

dataDir = ;
downloadIAPRTC12Data(dataDir);

This example will train the network with a small subset of the IAPR TC-12 Benchmark data. Load the
imageCLEF training data. All images are 32-bit JPEG color images.

19-72
JPEG Image Deblocking Using Deep Learning

trainImagesDir = fullfile(dataDir,"iaprtc12","images","00");
exts = [".jpg",".bmp",".png"];
imdsPristine = imageDatastore(trainImagesDir,FileExtensions=exts);

List the number of training images.


numel(imdsPristine.Files)

ans = 251

Prepare Training Data

To create a training data set, read in pristine images and write out images in the JPEG file format
with various levels of compression.

Specify the JPEG image quality values used to render image compression artifacts. Quality values
must be in the range [0, 100]. Small quality values result in more compression and stronger
compression artifacts. Use a denser sampling of small quality values so the training data has a broad
range of compression artifacts.
JPEGQuality = [5:5:40 50 60 70 80];

The compressed images are stored on disk as MAT files in the directory compressedImagesDir. The
computed residual images are stored on disk as MAT files in the directory residualImagesDir. The
MAT files are stored as data type double for greater precision when training the network.
compressedImagesDir = fullfile(dataDir,"iaprtc12","JPEGDeblockingData","compressedImages");
residualImagesDir = fullfile(dataDir,"iaprtc12","JPEGDeblockingData","residualImages");

Use the helper function createJPEGDeblockingTrainingSet to preprocess the training data. This
function is attached to the example as a supporting file.

For each pristine training image, the helper function writes a copy of the image with quality factor
100 to use as a reference image and copies of the image with each quality factor to use as the
network inputs. The function computes the luminance (Y) channel of the reference and compressed
images in data type double for greater precision when calculating the residual images. The
compressed images are stored on disk as .MAT files in the directory compressedDirName. The
computed residual images are stored on disk as .MAT files in the directory residualDirName.
[compressedDirName,residualDirName] = createJPEGDeblockingTrainingSet(imdsPristine,JPEGQuality);

Create Random Patch Extraction Datastore for Training

Use a random patch extraction datastore to feed the training data to the network. This datastore
extracts random corresponding patches from two image datastores that contain the network inputs
and desired network responses.

In this example, the network inputs are the compressed images. The desired network responses are
the residual images. Create an image datastore called imdsCompressed from the collection of
compressed image files. Create an image datastore called imdsResidual from the collection of
computed residual image files. Both datastores require a helper function, matRead, to read the image
data from the image files. This function is attached to the example as a supporting file.
imdsCompressed = imageDatastore(compressedDirName,FileExtensions=".mat",ReadFcn=@matRead);
imdsResidual = imageDatastore(residualDirName,FileExtensions=".mat",ReadFcn=@matRead);

Create an imageDataAugmenter (Deep Learning Toolbox) that specifies the parameters of data
augmentation. Use data augmentation during training to vary the training data, which effectively

19-73
19 Deep Learning

increases the amount of available training data. Here, the augmenter specifies random rotation by 90
degrees and random reflections in the x-direction.

augmenter = imageDataAugmenter( ...


RandRotation=@()randi([0,1],1)*90, ...
RandXReflection=true);

Create the randomPatchExtractionDatastore from the two image datastores. Specify a patch
size of 50-by-50 pixels. Each image generates 128 random patches of size 50-by-50 pixels. Specify a
mini-batch size of 128.

patchSize = 50;
patchesPerImage = 128;
dsTrain = randomPatchExtractionDatastore(imdsCompressed,imdsResidual,patchSize, ...
PatchesPerImage=patchesPerImage, ...
DataAugmentation=augmenter);
dsTrain.MiniBatchSize = patchesPerImage;

The random patch extraction datastore dsTrain provides mini-batches of data to the network at
iteration of the epoch. Preview the result of reading from the datastore.

inputBatch = preview(dsTrain);
disp(inputBatch)

InputImage ResponseImage
______________ ______________

{50×50 double} {50×50 double}


{50×50 double} {50×50 double}
{50×50 double} {50×50 double}
{50×50 double} {50×50 double}
{50×50 double} {50×50 double}
{50×50 double} {50×50 double}
{50×50 double} {50×50 double}
{50×50 double} {50×50 double}

Set up DnCNN Layers

Create the layers of the built-in DnCNN network by using the dnCNNLayers function. By default, the
network depth (the number of convolution layers) is 20.

layers = dnCNNLayers

layers =
1×59 Layer array with layers:

1 'InputLayer' Image Input 50×50×1 images


2 'Conv1' Convolution 64 3×3×1 convolutions with stride [1 1]
3 'ReLU1' ReLU ReLU
4 'Conv2' Convolution 64 3×3×64 convolutions with stride [1 1]
5 'BNorm2' Batch Normalization Batch normalization with 64 channels
6 'ReLU2' ReLU ReLU
7 'Conv3' Convolution 64 3×3×64 convolutions with stride [1 1]
8 'BNorm3' Batch Normalization Batch normalization with 64 channels
9 'ReLU3' ReLU ReLU
10 'Conv4' Convolution 64 3×3×64 convolutions with stride [1 1]
11 'BNorm4' Batch Normalization Batch normalization with 64 channels
12 'ReLU4' ReLU ReLU

19-74
JPEG Image Deblocking Using Deep Learning

13 'Conv5' Convolution 64 3×3×64 convolutions with stride [1 1]


14 'BNorm5' Batch Normalization Batch normalization with 64 channels
15 'ReLU5' ReLU ReLU
16 'Conv6' Convolution 64 3×3×64 convolutions with stride [1 1]
17 'BNorm6' Batch Normalization Batch normalization with 64 channels
18 'ReLU6' ReLU ReLU
19 'Conv7' Convolution 64 3×3×64 convolutions with stride [1 1]
20 'BNorm7' Batch Normalization Batch normalization with 64 channels
21 'ReLU7' ReLU ReLU
22 'Conv8' Convolution 64 3×3×64 convolutions with stride [1 1]
23 'BNorm8' Batch Normalization Batch normalization with 64 channels
24 'ReLU8' ReLU ReLU
25 'Conv9' Convolution 64 3×3×64 convolutions with stride [1 1]
26 'BNorm9' Batch Normalization Batch normalization with 64 channels
27 'ReLU9' ReLU ReLU
28 'Conv10' Convolution 64 3×3×64 convolutions with stride [1 1]
29 'BNorm10' Batch Normalization Batch normalization with 64 channels
30 'ReLU10' ReLU ReLU
31 'Conv11' Convolution 64 3×3×64 convolutions with stride [1 1]
32 'BNorm11' Batch Normalization Batch normalization with 64 channels
33 'ReLU11' ReLU ReLU
34 'Conv12' Convolution 64 3×3×64 convolutions with stride [1 1]
35 'BNorm12' Batch Normalization Batch normalization with 64 channels
36 'ReLU12' ReLU ReLU
37 'Conv13' Convolution 64 3×3×64 convolutions with stride [1 1]
38 'BNorm13' Batch Normalization Batch normalization with 64 channels
39 'ReLU13' ReLU ReLU
40 'Conv14' Convolution 64 3×3×64 convolutions with stride [1 1]
41 'BNorm14' Batch Normalization Batch normalization with 64 channels
42 'ReLU14' ReLU ReLU
43 'Conv15' Convolution 64 3×3×64 convolutions with stride [1 1]
44 'BNorm15' Batch Normalization Batch normalization with 64 channels
45 'ReLU15' ReLU ReLU
46 'Conv16' Convolution 64 3×3×64 convolutions with stride [1 1]
47 'BNorm16' Batch Normalization Batch normalization with 64 channels
48 'ReLU16' ReLU ReLU
49 'Conv17' Convolution 64 3×3×64 convolutions with stride [1 1]
50 'BNorm17' Batch Normalization Batch normalization with 64 channels
51 'ReLU17' ReLU ReLU
52 'Conv18' Convolution 64 3×3×64 convolutions with stride [1 1]
53 'BNorm18' Batch Normalization Batch normalization with 64 channels
54 'ReLU18' ReLU ReLU
55 'Conv19' Convolution 64 3×3×64 convolutions with stride [1 1]
56 'BNorm19' Batch Normalization Batch normalization with 64 channels
57 'ReLU19' ReLU ReLU
58 'Conv20' Convolution 1 3×3×64 convolutions with stride [1 1]
59 'FinalRegressionLayer' Regression Output mean-squared-error

Select Training Options

Train the network using stochastic gradient descent with momentum (SGDM) optimization. Specify
the hyperparameter settings for SGDM by using the trainingOptions (Deep Learning Toolbox)
function.

Training a deep network is time-consuming. Accelerate the training by specifying a high learning
rate. However, this can cause the gradients of the network to explode or grow uncontrollably,
preventing the network from training successfully. To keep the gradients in a meaningful range,

19-75
19 Deep Learning

enable gradient clipping by setting "GradientThreshold" to 0.005, and specify


"GradientThresholdMethod" to use the absolute value of the gradients.

maxEpochs = 30;
initLearningRate = 0.1;
l2reg = 0.0001;
batchSize = 64;

options = trainingOptions("sgdm", ...


Momentum=0.9, ...
InitialLearnRate=initLearningRate, ...
LearnRateSchedule="piecewise", ...
GradientThresholdMethod="absolute-value", ...
GradientThreshold=0.005, ...
L2Regularization=l2reg, ...
MiniBatchSize=batchSize, ...
MaxEpochs=maxEpochs, ...
Plots="training-progress", ...
Verbose=false);

Train the Network

By default, the example loads a pretrained DnCNN network. The pretrained network enables you to
perform JPEG deblocking without waiting for training to complete.

To train the network, set the doTraining variable in the following code to true. Train the DnCNN
network using the trainNetwork (Deep Learning Toolbox) function.

Train on a GPU if one is available. Using a GPU requires Parallel Computing Toolbox™ and a CUDA®
enabled NVIDIA® GPU. For more information, see “GPU Computing Requirements” (Parallel
Computing Toolbox). Training takes about 40 hours on an NVIDIA™ Titan X.

doTraining = ;
if doTraining
[net,info] = trainNetwork(dsTrain,layers,options);
modelDateTime = string(datetime("now",Format="yyyy-MM-dd-HH-mm-ss"));
save("trainedJPEGDnCNN-"+modelDateTime+".mat","net");
else
load("trainedJPEGDnCNN.mat");
end

You can now use the DnCNN network to remove JPEG compression artifacts from images.

Perform JPEG Deblocking Using DnCNN Network

To perform JPEG deblocking using DnCNN, follow the remaining steps of this example:

• Create sample test images with JPEG compression artifacts at three different quality levels.
• Remove the compression artifacts using the DnCNN network.
• Visually compare the images before and after deblocking.
• Evaluate the quality of the compressed and deblocked images by quantifying their similarity to the
undistorted reference image.

19-76
JPEG Image Deblocking Using Deep Learning

Create Sample Images with Blocking Artifacts

The test data set, testImages, contains 20 undistorted images shipped in Image Processing
Toolbox™. Load the images into an imageDatastore.
fileNames = ["sherlock.jpg","peacock.jpg","fabric.png","greens.jpg", ...
"hands1.jpg","kobi.png","lighthouse.png","office_4.jpg", ...
"onion.png","pears.png","yellowlily.jpg","indiancorn.jpg", ...
"flamingos.jpg","sevilla.jpg","llama.jpg","parkavenue.jpg", ...
"strawberries.jpg","trailer.jpg","wagon.jpg","football.jpg"];
filePath = fullfile(matlabroot,"toolbox","images","imdata")+filesep;
filePathNames = strcat(filePath,fileNames);
testImages = imageDatastore(filePathNames);

Display the test images as a montage.


montage(testImages)

Select one of the test images to use for testing the JPEG deblocking network.

19-77
19 Deep Learning

testImage = ;
Ireference = imread(testImage);
imshow(Ireference)
title("Uncompressed Reference Image")

19-78
JPEG Image Deblocking Using Deep Learning

Create three compressed test images with the JPEG Quality values of 10, 20, and 50.

19-79
19 Deep Learning

imwrite(Ireference,fullfile(tempdir,"testQuality10.jpg"),"Quality",10);
imwrite(Ireference,fullfile(tempdir,"testQuality20.jpg"),"Quality",20);
imwrite(Ireference,fullfile(tempdir,"testQuality50.jpg"),"Quality",50);

Preprocess Compressed Images

Read the compressed versions of the image into the workspace.


I10 = imread(fullfile(tempdir,"testQuality10.jpg"));
I20 = imread(fullfile(tempdir,"testQuality20.jpg"));
I50 = imread(fullfile(tempdir,"testQuality50.jpg"));

Display the compressed images as a montage.


montage({I50,I20,I10},Size=[1 3])
title("JPEG-Compressed Images with Quality Factor: 50, 20 and 10 (left to right)")

Recall that DnCNN is trained using only the luminance channel of an image because human
perception is more sensitive to changes in brightness than changes in color. Convert the JPEG-
compressed images from the RGB color space to the YCbCr color space using the rgb2ycbcr
function.
I10ycbcr = rgb2ycbcr(I10);
I20ycbcr = rgb2ycbcr(I20);
I50ycbcr = rgb2ycbcr(I50);

Apply the DnCNN Network

In order to perform the forward pass of the network, use the denoiseImage function. This function
uses exactly the same training and testing procedures for denoising an image. You can think of the
JPEG compression artifacts as a type of image noise.
I10y_predicted = denoiseImage(I10ycbcr(:,:,1),net);
I20y_predicted = denoiseImage(I20ycbcr(:,:,1),net);
I50y_predicted = denoiseImage(I50ycbcr(:,:,1),net);

19-80
JPEG Image Deblocking Using Deep Learning

The chrominance channels do not need processing. Concatenate the deblocked luminance channel
with the original chrominance channels to obtain the deblocked image in the YCbCr color space.

I10ycbcr_predicted = cat(3,I10y_predicted,I10ycbcr(:,:,2:3));
I20ycbcr_predicted = cat(3,I20y_predicted,I20ycbcr(:,:,2:3));
I50ycbcr_predicted = cat(3,I50y_predicted,I50ycbcr(:,:,2:3));

Convert the deblocked YCbCr image to the RGB color space by using the ycbcr2rgb function.

I10_predicted = ycbcr2rgb(I10ycbcr_predicted);
I20_predicted = ycbcr2rgb(I20ycbcr_predicted);
I50_predicted = ycbcr2rgb(I50ycbcr_predicted);

Display the deblocked images as a montage.

montage({I50_predicted,I20_predicted,I10_predicted},Size=[1 3])
title("Deblocked Images with Quality Factor 50, 20 and 10 (Left to Right)")

To get a better visual understanding of the improvements, examine a smaller region inside each
image. Specify a region of interest (ROI) using vector roi in the format [x y width height]. The
elements define the x- and y-coordinate of the top left corner, and the width and height of the ROI.

roi = [30 440 100 80];

Crop the compressed images to this ROI, and display the result as a montage.

i10 = imcrop(I10,roi);
i20 = imcrop(I20,roi);
i50 = imcrop(I50,roi);
montage({i50 i20 i10},Size=[1 3])
title("Patches from JPEG-Compressed Images with Quality Factor 50, 20 and 10 (Left to Right)")

19-81
19 Deep Learning

Crop the deblocked images to this ROI, and display the result as a montage.
i10predicted = imcrop(I10_predicted,roi);
i20predicted = imcrop(I20_predicted,roi);
i50predicted = imcrop(I50_predicted,roi);
montage({i50predicted,i20predicted,i10predicted},Size=[1 3])
title("Patches from Deblocked Images with Quality Factor 50, 20 and 10 (Left to Right)")

Quantitative Comparison

Quantify the quality of the deblocked images through four metrics. You can use the
jpegDeblockingMetrics helper function to compute these metrics for compressed and deblocked
images at the quality factors 10, 20, and 50. This function is attached to the example as a supporting
file.

• Structural Similarity Index (SSIM). SSIM assesses the visual impact of three characteristics of an
image: luminance, contrast and structure, against a reference image. The closer the SSIM value is
to 1, the better the test image agrees with the reference image. Here, the reference image is the
undistorted original image, Ireference, before JPEG compression. See ssim for more
information about this metric.
• Peak signal-to-noise ratio (PSNR). The larger the PSNR value, the stronger the signal compared to
the distortion. See psnr for more information about this metric.
• Naturalness Image Quality Evaluator (NIQE). NIQE measures perceptual image quality using a
model trained from natural scenes. Smaller NIQE scores indicate better perceptual quality. See
niqe for more information about this metric.

19-82
JPEG Image Deblocking Using Deep Learning

• Blind/Referenceless Image Spatial Quality Evaluator (BRISQUE). BRISQUE measures perceptual


image quality using a model trained from natural scenes with image distortion. Smaller BRISQUE
scores indicate better perceptual quality. See brisque for more information about this metric.

jpegDeblockingMetrics(Ireference,I10,I20,I50,I10_predicted,I20_predicted,I50_predicted)

------------------------------------------
SSIM Comparison
===============
I10: 0.90624 I10_predicted: 0.91286
I20: 0.94904 I20_predicted: 0.95444
I50: 0.97238 I50_predicted: 0.97482
------------------------------------------
PSNR Comparison
===============
I10: 26.6046 I10_predicted: 27.0793
I20: 28.8015 I20_predicted: 29.3378
I50: 31.4512 I50_predicted: 31.8584
------------------------------------------
NIQE Comparison
===============
I10: 7.2194 I10_predicted: 3.9469
I20: 4.5158 I20_predicted: 3.0681
I50: 2.8874 I50_predicted: 2.4107
NOTE: Smaller NIQE score signifies better perceptual quality
------------------------------------------
BRISQUE Comparison
==================
I10: 52.372 I10_predicted: 38.9272
I20: 45.3772 I20_predicted: 30.8993
I50: 27.7093 I50_predicted: 24.3847
NOTE: Smaller BRISQUE score signifies better perceptual quality

References

[1] Zhang, K., W. Zuo, Y. Chen, D. Meng, and L. Zhang, "Beyond a Gaussian Denoiser: Residual
Learning of Deep CNN for Image Denoising." IEEE® Transactions on Image Processing. Feb 2017.

[2] Grubinger, M., P. Clough, H. Müller, and T. Deselaers. "The IAPR TC-12 Benchmark: A New
Evaluation Resource for Visual Information Systems." Proceedings of the OntoImage 2006 Language
Resources For Content-Based Image Retrieval. Genoa, Italy. Vol. 5, May 2006, p. 10.

See Also
rgb2ycbcr | ycbcr2rgb | dnCNNLayers | denoiseImage | trainingOptions | trainNetwork |
randomPatchExtractionDatastore

More About
• “Preprocess Images for Deep Learning” (Deep Learning Toolbox)
• “Datastores for Deep Learning” (Deep Learning Toolbox)
• “List of Deep Learning Layers” (Deep Learning Toolbox)

19-83
19 Deep Learning

Image Processing Operator Approximation Using Deep


Learning

This example shows how to approximate an image filtering operation using a multiscale context
aggregation network (CAN).

Operator approximation finds alternative ways to process images such that the result resembles the
output from a conventional image processing operation or pipeline. The goal of operator
approximation is often to reduce the time required to process an image.

Several classical and deep learning techniques have been proposed to perform operator
approximation. Some classical techniques improve the efficiency of a single algorithm but cannot be
generalized to other operations. Another common technique approximates a wide range of operations
by applying the operator to a low resolution copy of an image, but the loss of high-frequency content
limits the accuracy of the approximation.

Deep learning solutions enable the approximation of more general and complex operations. For
example, the multiscale context aggregation network (CAN) presented by Q. Chen [1 on page 19-97]
can approximate multiscale tone mapping, photographic style transfer, nonlocal dehazing, and pencil
drawing. Multiscale CAN trains on full-resolution images for greater accuracy in processing high-
frequency details. After the network is trained, the network can bypass the conventional processing
operation and process images directly.

This example explores how to train a multiscale CAN to approximate a bilateral image filtering
operation, which reduces image noise while preserving edge sharpness. The example presents the
complete training and inference workflow, which includes the process of creating a training
datastore, selecting training options, training the network, and using the network to process test
images.

The Operator Approximation Network

The multiscale CAN is trained to minimize the l2 loss between the conventional output of an image
processing operation and the network response after processing the input image using multiscale
context aggregation. Multiscale context aggregation looks for information about each pixel from
across the entire image, rather than limiting the search to a small neighborhood surrounding the
pixel.

19-84
Image Processing Operator Approximation Using Deep Learning

To help the network learn global image properties, the multiscale CAN architecture has a large
receptive field. The first and last layers have the same size because the operator should not change
the size of the image. Successive intermediate layers are dilated by exponentially increasing scale
factors (hence the "multiscale" nature of the CAN). Dilation enables the network to look for spatially
separated features at various spatial frequencies, without reducing the resolution of the image. After
each convolution layer, the network uses adaptive normalization to balance the impact of batch
normalization and the identity mapping on the approximated operator.

Download Training and Test Data

Download the IAPR TC-12 Benchmark, which consists of 20,000 still natural images [2 on page 19-
97]. The data set includes photos of people, animals, cities, and more. The size of the data file is
~1.8 GB. If you do not want to download the training data set needed to train the network, then you
can load the pretrained CAN by typing load("trainedBilateralFilterNet.mat"); at the
command line. Then, go directly to the Perform Bilateral Filtering Approximation Using Multiscale
CAN on page 19-90 section in this example.

Use the helper function, downloadIAPRTC12Data, to download the data. This function is attached to
the example as a supporting file. Specify dataDir as the desired location of the data.

dataDir = ;
downloadIAPRTC12Data(dataDir);

This example trains the network with small subset of the IAPRTC-12 Benchmark data.
trainImagesDir = fullfile(dataDir,"iaprtc12","images","39");
exts = [".jpg",".bmp",".png"];
pristineImages = imageDatastore(trainImagesDir,FileExtensions=exts);

List the number of training images.

19-85
19 Deep Learning

numel(pristineImages.Files)

ans = 916

Prepare Training Data

To create a training data set, read in pristine images and write out images that have been bilateral
filtered. The filtered images are stored on disk in the directory specified by preprocessDataDir.
preprocessDataDir = trainImagesDir+filesep+"preprocessedDataset";

Use the helper function bilateralFilterDataset to preprocess the training data. This function is
attached to the example as a supporting file. The helper function performs these operations for each
pristine image in inputImages:

• Calculate the degree of smoothing for bilateral filtering. Smoothing the filtered image reduces
image noise.
• Perform bilateral filtering using imbilatfilt.
• Save the filtered image to disk using imwrite.
bilateralFilterDataset(pristineImages,preprocessDataDir);

Define Random Patch Extraction Datastore for Training

Use a random patch extraction datastore to feed the training data to the network. This datastore
extracts random corresponding patches from two image datastores that contain the network inputs
and desired network responses.

In this example, the network inputs are the pristine images in pristineImages. The desired
network responses are the processed images after bilateral filtering. Create an image datastore
called bilatFilteredImages from the collection of bilateral filtered image files.
bilatFilteredImages = imageDatastore(preprocessDataDir,FileExtensions=exts);

Create a randomPatchExtractionDatastore from the two image datastores. Specify a patch size
of 256-by-256 pixels. Specify "PatchesPerImage" to extract one randomly-positioned patch from
each pair of images during training. Specify a mini-batch size of one.
miniBatchSize = 1;
patchSize = [256 256];
dsTrain = randomPatchExtractionDatastore(pristineImages,bilatFilteredImages, ...
patchSize,PatchesPerImage=1);
dsTrain.MiniBatchSize = miniBatchSize;

The randomPatchExtractionDatastore provides mini-batches of data to the network at each


iteration of the epoch. Perform a read operation on the datastore to explore the data.
inputBatch = read(dsTrain);
disp(inputBatch)

InputImage ResponseImage
_________________ _________________

{256×256×3 uint8} {256×256×3 uint8}

Set Up Multiscale CAN Layers

This example defines the multiscale CAN using layers from Deep Learning Toolbox™, including:

19-86
Image Processing Operator Approximation Using Deep Learning

• imageInputLayer (Deep Learning Toolbox) — Image input layer


• convolution2dLayer (Deep Learning Toolbox) — 2D convolution layer for convolutional neural
networks
• batchNormalizationLayer (Deep Learning Toolbox) — Batch normalization layer
• leakyReluLayer (Deep Learning Toolbox) — Leaky rectified linear unit layer
• regressionLayer (Deep Learning Toolbox) — Regression output layer for a neural network

Two custom scale layers are added to implement an adaptive batch normalization layer. These layers
are attached as supporting files to this example.

• adaptiveNormalizationMu — Scale layer that adjusts the strengths of the batch-normalization


branch
• adaptiveNormalizationLambda — Scale layer that adjusts the strengths of the identity branch

The first layer, imageInputLayer, operates on image patches. The patch size is based on the
network receptive field, which is the spatial image region that affects the response of top-most layer
in the network. Ideally, the network receptive field is the same as the image size so that it can see all
the high level features in the image. For a bilateral filter, the approximation image patch size is fixed
to 256-by-256.

networkDepth = 10;
numberOfFilters = 32;
firstLayer = imageInputLayer([256 256 3],Name="InputLayer",Normalization="none");

The image input layer is followed by a 2-D convolution layer that contains 32 filters of size 3-by-3.
Zero-pad the inputs to each convolution layer so that feature maps remain the same size as the input
after each convolution. Initialize the weights to the identity matrix.

Wgts = zeros(3,3,3,numberOfFilters);
for ii = 1:3
Wgts(2,2,ii,ii) = 1;
end
convolutionLayer = convolution2dLayer(3,numberOfFilters,Padding=1, ...
Weights=Wgts,Name="Conv1");

Each convolution layer is followed by a batch normalization layer and an adaptive normalization scale
layer that adjusts the strengths of the batch-normalization branch. Later, this example will create the
corresponding adaptive normalization scale layer that adjusts the strength of the identity branch. For
now, follow the adaptiveNormalizationMu layer with an addition layer. Finally, specify a leaky
ReLU layer with a scalar multiplier of 0.2 for negative inputs.

batchNorm = batchNormalizationLayer(Name="BN1");
adaptiveMu = adaptiveNormalizationMu(numberOfFilters,"Mu1");
addLayer = additionLayer(2,Name="add1");
leakyrelLayer = leakyReluLayer(0.2,Name="Leaky1");

Specify the middle layers of the network following the same pattern. Successive convolution layers
have a dilation factor that scales exponentially with the network depth.

midLayers = [convolutionLayer batchNorm adaptiveMu addLayer leakyrelLayer];

Wgts = zeros(3,3,numberOfFilters,numberOfFilters);
for ii = 1:numberOfFilters
Wgts(2,2,ii,ii) = 1;

19-87
19 Deep Learning

end

for layerNumber = 2:networkDepth-2


dilationFactor = 2^(layerNumber-1);
padding = dilationFactor;
conv2dLayer = convolution2dLayer(3,numberOfFilters, ...
Padding=padding,DilationFactor=dilationFactor, ...
Weights=Wgts,Name="Conv"+num2str(layerNumber));
batchNorm = batchNormalizationLayer(Name="BN"+num2str(layerNumber));
adaptiveMu = adaptiveNormalizationMu(numberOfFilters,"Mu"+num2str(layerNumber));
addLayer = additionLayer(2,Name="add"+num2str(layerNumber));
leakyrelLayer = leakyReluLayer(0.2,Name="Leaky"+num2str(layerNumber));
midLayers = [midLayers conv2dLayer batchNorm adaptiveMu addLayer leakyrelLayer];
end

Do not apply a dilation factor to the second-to-last convolution layer.

conv2dLayer = convolution2dLayer(3,numberOfFilters, ...


Padding=1,Weights=Wgts,Name="Conv9");

batchNorm = batchNormalizationLayer(Name="AN9");
adaptiveMu = adaptiveNormalizationMu(numberOfFilters,"Mu9");
addLayer = additionLayer(2,Name="add9");
leakyrelLayer = leakyReluLayer(0.2,Name="Leaky9");
midLayers = [midLayers conv2dLayer batchNorm adaptiveMu addLayer leakyrelLayer];

The last convolution layer has a single filter of size 1-by-1-by-32-by-3 that reconstructs the image.

Wgts = sqrt(2/(9*numberOfFilters))*randn(1,1,numberOfFilters,3);
conv2dLayer = convolution2dLayer(1,3,NumChannels=numberOfFilters, ...
Weights=Wgts,Name="Conv10");

The last layer is a regression layer instead of a leaky ReLU layer. The regression layer computes the
mean-squared error between the bilateral-filtered image and the network prediction.

finalLayers = [conv2dLayer
regressionLayer(Name="FinalRegressionLayer")
];

Concatenate all the layers.

layers = [firstLayer midLayers finalLayers'];


lgraph = layerGraph(layers);

Create skip connections, which act as the identity branch for the adaptive normalization equation.
Connect the skip connections to the addition layers.

skipConv1 = adaptiveNormalizationLambda(numberOfFilters,"Lambda1");
skipConv2 = adaptiveNormalizationLambda(numberOfFilters,"Lambda2");
skipConv3 = adaptiveNormalizationLambda(numberOfFilters,"Lambda3");
skipConv4 = adaptiveNormalizationLambda(numberOfFilters,"Lambda4");
skipConv5 = adaptiveNormalizationLambda(numberOfFilters,"Lambda5");
skipConv6 = adaptiveNormalizationLambda(numberOfFilters,"Lambda6");
skipConv7 = adaptiveNormalizationLambda(numberOfFilters,"Lambda7");
skipConv8 = adaptiveNormalizationLambda(numberOfFilters,"Lambda8");
skipConv9 = adaptiveNormalizationLambda(numberOfFilters,"Lambda9");

lgraph = addLayers(lgraph,skipConv1);

19-88
Image Processing Operator Approximation Using Deep Learning

lgraph = connectLayers(lgraph,"Conv1","Lambda1");
lgraph = connectLayers(lgraph,"Lambda1","add1/in2");

lgraph = addLayers(lgraph,skipConv2);
lgraph = connectLayers(lgraph,"Conv2","Lambda2");
lgraph = connectLayers(lgraph,"Lambda2","add2/in2");

lgraph = addLayers(lgraph,skipConv3);
lgraph = connectLayers(lgraph,"Conv3","Lambda3");
lgraph = connectLayers(lgraph,"Lambda3","add3/in2");

lgraph = addLayers(lgraph,skipConv4);
lgraph = connectLayers(lgraph,"Conv4","Lambda4");
lgraph = connectLayers(lgraph,"Lambda4","add4/in2");

lgraph = addLayers(lgraph,skipConv5);
lgraph = connectLayers(lgraph,"Conv5","Lambda5");
lgraph = connectLayers(lgraph,"Lambda5","add5/in2");

lgraph = addLayers(lgraph,skipConv6);
lgraph = connectLayers(lgraph,"Conv6","Lambda6");
lgraph = connectLayers(lgraph,"Lambda6","add6/in2");

lgraph = addLayers(lgraph,skipConv7);
lgraph = connectLayers(lgraph,"Conv7","Lambda7");
lgraph = connectLayers(lgraph,"Lambda7","add7/in2");

lgraph = addLayers(lgraph,skipConv8);
lgraph = connectLayers(lgraph,"Conv8","Lambda8");
lgraph = connectLayers(lgraph,"Lambda8","add8/in2");

lgraph = addLayers(lgraph,skipConv9);
lgraph = connectLayers(lgraph,"Conv9","Lambda9");
lgraph = connectLayers(lgraph,"Lambda9","add9/in2");

Visualize the network using the Deep Network Designer (Deep Learning Toolbox) app.

deepNetworkDesigner(lgraph)

Specify Training Options

Train the network using the Adam optimizer. Specify the hyperparameter settings by using the
trainingOptions (Deep Learning Toolbox) function. Use the default values of 0.9 for "Momentum"
and 0.0001 for "L2Regularization" (weight decay). Specify a constant learning rate of 0.0001.
Train for 181 epochs.

maxEpochs = 181;
initLearningRate = 0.0001;
miniBatchSize = 1;

options = trainingOptions("adam", ...


InitialLearnRate=initLearningRate, ...
MaxEpochs=maxEpochs, ...
MiniBatchSize=miniBatchSize, ...
Plots="training-progress", ...
Verbose=false);

19-89
19 Deep Learning

Train the Network

By default, the example loads a pretrained multiscale CAN that approximates a bilateral filter. The
pretrained network enables you to perform an approximation of bilateral filtering without waiting for
training to complete.

To train the network, set the doTraining variable in the following code to true. Train the multiscale
CAN using the trainNetwork (Deep Learning Toolbox) function.

Train on a GPU if one is available. Using a GPU requires Parallel Computing Toolbox™ and a CUDA®
enabled NVIDIA® GPU. For more information, see “GPU Computing Requirements” (Parallel
Computing Toolbox). Training takes about 15 hours on an NVIDIA™ Titan X.

doTraining = ;
if doTraining
net = trainNetwork(dsTrain,lgraph,options);
modelDateTime = string(datetime("now",Format="yyyy-MM-dd-HH-mm-ss"));
save("trainedBilateralFilterNet-"+modelDateTime+".mat","net");
else
load("trainedBilateralFilterNet.mat");
end

Perform Bilateral Filtering Approximation Using Multiscale CAN

To process an image using a trained multiscale CAN network that approximates a bilateral filter,
follow the remaining steps of this example. The remainder of the example shows how to:

• Create a sample noisy input image from a reference image.


• Perform conventional bilateral filtering of the noisy image using the imbilatfilt function.
• Perform an approximation to bilateral filtering on the noisy image using the CAN.
• Visually compare the denoised images from operator approximation and conventional bilateral
filtering.
• Evaluate the quality of the denoised images by quantifying the similarity of the images to the
pristine reference image.

Create Sample Noisy Image

Create a sample noisy image that will be used to compare the results of operator approximation to
conventional bilateral filtering.

The test data set, testImages, contains 20 undistorted images shipped in Image Processing
Toolbox™. Load the images into an imageDatastore and display the images in a montage.
fileNames = ["sherlock.jpg","peacock.jpg","fabric.png","greens.jpg", ...
"hands1.jpg","kobi.png","lighthouse.png","office_4.jpg", ...
"onion.png","pears.png","yellowlily.jpg","indiancorn.jpg", ...
"flamingos.jpg","sevilla.jpg","llama.jpg","parkavenue.jpg", ...
"strawberries.jpg","trailer.jpg","wagon.jpg","football.jpg"];
filePath = fullfile(matlabroot,"toolbox","images","imdata")+filesep;
filePathNames = strcat(filePath,fileNames);
testImages = imageDatastore(filePathNames);

Display the test images as a montage.


montage(testImages)

19-90
Image Processing Operator Approximation Using Deep Learning

Select one of the images to use as the reference image for bilateral filtering. Convert the image to
data type uint8.

testImage = ;
Ireference = imread(testImage);
Ireference = im2uint8(Ireference);

You can optionally use your own image as the reference image. Note that the size of the test image
must be at least 256-by-256. If the test image is smaller than 256-by-256, then increase the image
size by using the imresize function. The network also requires an RGB test image. If the test image
is grayscale, then convert the image to RGB by using the cat function to concatenate three copies of
the original image along the third dimension.

Display the reference image.

imshow(Ireference)
title("Pristine Reference Image")

19-91
19 Deep Learning

Use the imnoise function to add zero-mean Gaussian white noise with a variance of 0.00001 to the
reference image.

Inoisy = imnoise(Ireference,"gaussian",0.00001);
imshow(Inoisy)
title("Noisy Image")

19-92
Image Processing Operator Approximation Using Deep Learning

Filter Image Using Bilateral Filtering

Conventional bilateral filtering is a standard way to reduce image noise while preserving edge
sharpness. Use the imbilatfilt function to apply a bilateral filter to the noisy image. Specify a
degree of smoothing equal to the variance of pixel values.

degreeOfSmoothing = var(double(Inoisy(:)));
Ibilat = imbilatfilt(Inoisy,degreeOfSmoothing);
imshow(Ibilat)
title("Denoised Image Obtained Using Bilateral Filtering")

19-93
19 Deep Learning

Process Image Using Trained Network

Pass the normalized input image through the trained network and observe the activations (Deep
Learning Toolbox) from the final layer (a regression layer). The output of the network is the desired
denoised image.

Iapprox = activations(net,Inoisy,"FinalRegressionLayer");

Image Processing Toolbox™ requires floating point images to have pixel values in the range [0, 1].
Use the rescale function to scale the pixel values to this range, then convert the image to uint8.

Iapprox = rescale(Iapprox);
Iapprox = im2uint8(Iapprox);
imshow(Iapprox)
title("Denoised Image Obtained Using Multiscale CAN")

19-94
Image Processing Operator Approximation Using Deep Learning

Visual and Quantitative Comparison

To get a better visual understanding of the denoised images, examine a small region inside each
image. Specify a region of interest (ROI) using vector roi in the format [x y width height]. The
elements define the x- and y-coordinate of the top left corner, and the width and height of the ROI.

roi = [300 30 50 50];

Crop the images to this ROI, and display the result as a montage. The CAN removes more noise than
conventional bilateral filtering. Both techniques preserve edge sharpness.

montage({imcrop(Ireference,roi),imcrop(Inoisy,roi), ...
imcrop(Ibilat,roi),imcrop(Iapprox,roi)}, ...
Size=[1 4]);
title("Reference Image | Noisy Image | Bilateral-Filtered Image | CAN Prediction");

19-95
19 Deep Learning

Use image quality metrics to quantitatively compare the noisy input image, the bilateral-filtered
image, and the operator-approximated image. The reference image is the original reference image,
Ireference, before adding noise.

Measure the peak signal-to-noise ratio (PSNR) of each image against the reference image. Larger
PSNR values generally indicate better image quality. See psnr for more information about this
metric.

noisyPSNR = psnr(Inoisy,Ireference);
bilatPSNR = psnr(Ibilat,Ireference);
approxPSNR = psnr(Iapprox,Ireference);
PSNR_Score = [noisyPSNR bilatPSNR approxPSNR]';

Measure the structural similarity index (SSIM) of each image. SSIM assesses the visual impact of
three characteristics of an image: luminance, contrast and structure, against a reference image. The
closer the SSIM value is to 1, the better the test image agrees with the reference image. See ssim for
more information about this metric.

noisySSIM = ssim(Inoisy,Ireference);
bilatSSIM = ssim(Ibilat,Ireference);
approxSSIM = ssim(Iapprox,Ireference);
SSIM_Score = [noisySSIM bilatSSIM approxSSIM]';

Measure perceptual image quality using the Naturalness Image Quality Evaluator (NIQE). Smaller
NIQE scores indicate better perceptual quality. See niqe for more information about this metric.

noisyNIQE = niqe(Inoisy);
bilatNIQE = niqe(Ibilat);
approxNIQE = niqe(Iapprox);
NIQE_Score = [noisyNIQE bilatNIQE approxNIQE]';

Display the metrics in a table.

table(PSNR_Score,SSIM_Score,NIQE_Score, ...
RowNames=["Noisy Image","Bilateral Filtering","Operator Approximation"])

ans=3×3 table
PSNR_Score SSIM_Score NIQE_Score
__________ __________ __________

Noisy Image 20.283 0.76238 11.611


Bilateral Filtering 25.774 0.91549 7.4163

19-96
Image Processing Operator Approximation Using Deep Learning

Operator Approximation 26.181 0.92601 6.1291

References

[1] Chen, Q. J. Xu, and V. Koltun. "Fast Image Processing with Fully-Convolutional Networks." In
Proceedings of the 2017 IEEE Conference on Computer Vision. Venice, Italy, Oct. 2017, pp.
2516-2525.

[2] Grubinger, M., P. Clough, H. Müller, and T. Deselaers. "The IAPR TC-12 Benchmark: A New
Evaluation Resource for Visual Information Systems." Proceedings of the OntoImage 2006 Language
Resources For Content-Based Image Retrieval. Genoa, Italy. Vol. 5, May 2006, p. 10.

See Also
randomPatchExtractionDatastore | trainNetwork | trainingOptions | layerGraph |
activations | imbilatfilt

More About
• “Preprocess Images for Deep Learning” (Deep Learning Toolbox)
• “Datastores for Deep Learning” (Deep Learning Toolbox)
• “List of Deep Learning Layers” (Deep Learning Toolbox)

19-97
19 Deep Learning

Develop Camera Processing Pipeline Using Deep Learning

This example shows how to convert RAW camera data to an aesthetically pleasing color image using a
U-Net.

DSLRs and many modern phone cameras offer the ability to save data collected directly from the
camera sensor as a RAW file. Each pixel of RAW data corresponds directly to the amount of light
captured by a corresponding camera photosensor. The data depends on fixed characteristics of the
camera hardware, such as the sensitivity of each photosensor to a particular range of wavelengths of
the electromagnetic spectrum. The data also depends on camera acquisition settings such as
exposure time, and factors of the scene such as the light source.

Demosaicing is the only required operation to convert single-channel RAW data to a three-channel
RGB image. However, without additional image processing operations, the resulting RGB image has
subjectively poor visual quality.

A traditional image processing pipeline performs a combination of additional operations including


denoising, linearization, white-balancing, color correction, brightness adjustment, and contrast
adjustment [1 on page 19-118]. The challenge of designing a pipeline lies in refining algorithms to
optimize the subjective appearance of the final RGB image regardless of variations in the scene and
acquisition settings.

Deep learning techniques enable direct RAW to RGB conversion without the necessity of developing a
traditional processing pipeline. For instance, one technique compensates for underexposure when
converting RAW images to RGB [2 on page 19-118]. This example shows how to convert RAW images
from a lower end phone camera to RGB images that approximate the quality of a higher end DSLR
camera.

Download Zurich RAW to RGB Data Set

This example uses the Zurich RAW to RGB data set [3 on page 19-118]. The size of the data set is 22
GB. The data set contains 48,043 spatially registered pairs of RAW and RGB training image patches
of size 448-by-448. The data set contains two separate test sets. One test set consists of 1,204

19-98
Develop Camera Processing Pipeline Using Deep Learning

spatially registered pairs of RAW and RGB image patches of size 448-by-448. The other test set
consists of unregistered full-resolution RAW and RGB images.

Specify dataDir as the desired location of the data.

dataDir = fullfile(tempdir,"ZurichRAWToRGB");

To download the data set, go to https://data.vision.ee.ethz.ch/ihnatova/public/zr2d/Zurich-RAW-to-


DSLR-Dataset.zip. Extract the data into the directory specified by the dataDir variable. When
extracted successfully, dataDir contains three directories named full_resolution, test, and
train.

Create Datastores for Training, Validation, and Testing

Create Datastore for RGB Image Patch Training Data

Create an imageDatastore that reads the target RGB training image patches acquired using a high-
end Canon DSLR.

trainImageDir = fullfile(dataDir,"train");
dsTrainRGB = imageDatastore(fullfile(trainImageDir,"canon"),ReadSize=16);

Preview an RGB training image patch.

groundTruthPatch = preview(dsTrainRGB);
imshow(groundTruthPatch)

19-99
19 Deep Learning

Create Datastore for RAW Image Patch Training Data

Create an imageDatastore that reads the input RAW training image patches acquired using a
Huawei phone camera. The RAW images are captured with 10-bit precision and are represented as
both 8-bit and 16-bit PNG files. The 8-bit files provide a compact representation of patches with data
in the range [0, 255]. No scaling has been done on any of the RAW data.

dsTrainRAW = imageDatastore(fullfile(trainImageDir,"huawei_raw"),ReadSize=16);

Preview an input RAW training image patch. The datastore reads this patch as an 8-bit uint8 image
because the sensor counts are in the range [0, 255]. To simulate the 10-bit dynamic range of the
training data, divide the image intensity values by 4. If you zoom in on the image, then you can see
the RGGB Bayer pattern.

inputPatch = preview(dsTrainRAW);
inputPatchRAW = inputPatch/4;
imshow(inputPatchRAW)

19-100
Develop Camera Processing Pipeline Using Deep Learning

To simulate the minimal traditional processing pipeline, demosaic the RGGB Bayer pattern of the
RAW data using the demosaic function. Display the processed image and brighten the display.
Compared to the target RGB image, the minimally-processed RGB image is dark and has imbalanced
colors and noticeable artifacts. A trained RAW-to-RGB network performs preprocessing operations so
that the output RGB image resembles the target image.

inputPatchRGB = demosaic(inputPatch,"rggb");
imshow(rescale(inputPatchRGB))

19-101
19 Deep Learning

Partition Test Images into Validation and Test Sets

The test data contains RAW and RGB image patches and full-sized images. This example partitions
the test image patches into a validation set and test set. The example uses the full-sized test images
for qualitative testing only. See Evaluate Trained Image Processing Pipeline on Full-Sized Images on
page 19-112.

Create image datastores that read the RAW and RGB test image patches.
testImageDir = fullfile(dataDir,"test");
dsTestRAW = imageDatastore(fullfile(testImageDir,"huawei_raw"),ReadSize=16);
dsTestRGB = imageDatastore(fullfile(testImageDir,"canon"),ReadSize=16);

Randomly split the test data into two sets for validation and training. The validation data set contains
200 images. The test set contains the remaining images.
numTestImages = dsTestRAW.numpartitions;
numValImages = 200;

19-102
Develop Camera Processing Pipeline Using Deep Learning

testIdx = randperm(numTestImages);
validationIdx = testIdx(1:numValImages);
testIdx = testIdx(numValImages+1:numTestImages);

dsValRAW = subset(dsTestRAW,validationIdx);
dsValRGB = subset(dsTestRGB,validationIdx);

dsTestRAW = subset(dsTestRAW,testIdx);
dsTestRGB = subset(dsTestRGB,testIdx);

Preprocess and Augment Data

The sensor acquires color data in a repeating Bayer pattern that includes one red, two green, and one
blue photosensor. Preprocess the data into a four-channel image expected of the network using the
transform function. The transform function processes the data using the operations specified in
the preprocessRAWDataForRAWToRGB helper function. The helper function is attached to the
example as a supporting file.

The preprocessRAWDataForRAWToRGB helper function converts an H-by-W-by-1 RAW image to an


H/2-by-W/2-by-4 multichannel image consisting of one red, two green, and one blue channel.

The function also casts the data to data type single scaled to the range [0, 1].

dsTrainRAW = transform(dsTrainRAW,@preprocessRAWDataForRAWToRGB);
dsValRAW = transform(dsValRAW,@preprocessRAWDataForRAWToRGB);
dsTestRAW = transform(dsTestRAW,@preprocessRAWDataForRAWToRGB);

The target RGB images are stored on disk as unsigned 8-bit data. To make the computation of metrics
and the network design more convenient, preprocess the target RGB training images using the
transform function and the preprocessRGBDataForRAWToRGB helper function. The helper
function is attached to the example as a supporting file.

The preprocessRGBDataForRAWToRGB helper function casts images to data type single scaled to
the range [0, 1].

dsTrainRGB = transform(dsTrainRGB,@preprocessRGBDataForRAWToRGB);
dsValRGB = transform(dsValRGB,@preprocessRGBDataForRAWToRGB);

Combine the input RAW and target RGB data for the training, validation, and test image sets by using
the combine function.

dsTrain = combine(dsTrainRAW,dsTrainRGB);
dsVal = combine(dsValRAW,dsValRGB);
dsTest = combine(dsTestRAW,dsTestRGB);

Randomly augment the training data using the transform function and the
augmentDataForRAWToRGB helper function. The helper function is attached to the example as a
supporting file.

19-103
19 Deep Learning

The augmentDataForRAWToRGB helper function randomly applies 90 degree rotation and horizontal
reflection to pairs of input RAW and target RGB training images.

dsTrainAug = transform(dsTrain,@augmentDataForRAWToRGB);

Preview the augmented training data.

exampleAug = preview(dsTrainAug)

exampleAug=8×2 cell array


{224×224×4 single} {448×448×3 single}
{224×224×4 single} {448×448×3 single}
{224×224×4 single} {448×448×3 single}
{224×224×4 single} {448×448×3 single}
{224×224×4 single} {448×448×3 single}
{224×224×4 single} {448×448×3 single}
{224×224×4 single} {448×448×3 single}
{224×224×4 single} {448×448×3 single}

Display the network input and target image in a montage. The network input has four channels, so
display the first channel rescaled to the range [0, 1]. The input RAW and target RGB images have
identical augmentation.

exampleInput = exampleAug{1,1};
exampleOutput = exampleAug{1,2};
montage({rescale(exampleInput(:,:,1)),exampleOutput})

Batch Training and Validation Data During Training

This example uses a custom training loop. The minibatchqueue (Deep Learning Toolbox) object is
useful for managing the mini-batching of observations in custom training loops. The

19-104
Develop Camera Processing Pipeline Using Deep Learning

minibatchqueue object also casts data to a dlarray (Deep Learning Toolbox) object that enables
auto differentiation in deep learning applications.
miniBatchSize = 2;
valBatchSize = 10;
trainingQueue = minibatchqueue(dsTrainAug,MiniBatchSize=miniBatchSize, ...
PartialMiniBatch="discard",MiniBatchFormat="SSCB");
validationQueue = minibatchqueue(dsVal,MiniBatchSize=valBatchSize,MiniBatchFormat="SSCB");

The next (Deep Learning Toolbox) function of minibatchqueue yields the next mini-batch of data.
Preview the outputs from one call to the next function. The outputs have data type dlarray. The
data is already cast to gpuArray, on the GPU, and ready for training.
[inputRAW,targetRGB] = next(trainingQueue);
whos inputRAW

Name Size Bytes Class Attributes

inputRAW 224x224x4x2 1605640 dlarray

whos targetRGB

Name Size Bytes Class Attributes

targetRGB 448x448x3x2 4816904 dlarray

Set Up U-Net Network Layers

This example uses a variation of the U-Net network. In U-Net, the initial series of convolutional layers
are interspersed with max pooling layers, successively decreasing the resolution of the input image.
These layers are followed by a series of convolutional layers interspersed with upsampling operators,
successively increasing the resolution of the input image. The name U-Net comes from the fact that
the network can be drawn with a symmetric shape like the letter U.

This example uses a simple U-Net architecture with two modifications. First, the network replaces the
final transposed convolution operation with a custom pixel shuffle upsampling (also known as a
depth-to-space) operation. Second, the network uses a custom hyperbolic tangent activation layer as
the final layer in the network.

Pixel Shuffle Upsampling

Convolution followed by pixel shuffle upsampling can define subpixel convolution for super resolution
applications. Subpixel convolution prevents the checkboard artifacts that can arise from transposed
convolution [6 on page 19-119]. Because the model needs to map H/2-by-W/2-by-4 RAW inputs to W-
by-H-by-3 RGB outputs, the final upsampling stage of the model can be thought of similarly to super
resolution where the number of spatial samples grows from the input to the output.

The figure shows how pixel shuffle upsampling works for a 2-by-2-by-4 input. The first two dimensions
are spatial dimensions and the third dimension is a channel dimension. In general, pixel shuffle
C
upsampling by a factor of S takes an H-by-W-by-C input and yields an S*H-by-S*W-by- 2 output.
S

19-105
19 Deep Learning

The pixel shuffle function grows the spatial dimensions of the output by mapping information from
channel dimensions at a given spatial location into S-by-S spatial blocks in the output in which each
channel contributes to a consistent spatial position relative to its neighbors during upsampling.

Scaled and Hyperbolic Tangent Activation

A hyperbolic tangent activation layer applies the tanh function on the layer inputs. This example uses
a scaled and shifted version of the tanh function, which encourages but does not strictly enforce that
the RGB network outputs are in the range [0, 1].

f x = 0 . 58 * tanh x + 0 . 5

Calculate Training Set Statistics for Input Normalization

Use tall to compute per-channel mean reduction across the training data set. The input layer of the
network performs mean centering of inputs during training and testing using the mean statistics.
dsIn = copy(dsTrainRAW);
dsIn.UnderlyingDatastore.ReadSize = 1;
t = tall(dsIn);
perChannelMean = gather(mean(t,[1 2]));

Create U-Net

Create layers of the initial subnetwork, specifying the per-channel mean.


inputSize = [256 256 4];
initialLayer = imageInputLayer(inputSize,Normalization="zerocenter", ...
Mean=perChannelMean,Name="ImageInputLayer");

Add layers of the first encoding subnetwork. The first encoder has 32 convolutional filters.
numEncoderStages = 4;
numFiltersFirstEncoder = 32;
encoderNamePrefix = "Encoder-Stage-";

encoderLayers = [
convolution2dLayer([3 3],numFiltersFirstEncoder,Padding="same", ...
WeightsInitializer="narrow-normal",Name=encoderNamePrefix+"1-Conv-1")

19-106
Develop Camera Processing Pipeline Using Deep Learning

leakyReluLayer(0.2,Name=encoderNamePrefix+"1-ReLU-1")
convolution2dLayer([3 3],numFiltersFirstEncoder,Padding="same", ...
WeightsInitializer="narrow-normal",Name=encoderNamePrefix+"1-Conv-2")
leakyReluLayer(0.2,Name=encoderNamePrefix+"1-ReLU-2")
maxPooling2dLayer([2 2],Stride=[2 2],Name=encoderNamePrefix+"1-MaxPool")
];

Add layers of additional encoding subnetworks. These subnetworks add channel-wise instance
normalization after each convolutional layer using a groupNormalizationLayer. Each encoder
subnetwork has twice the number of filters as the previous encoder subnetwork.

cnIdx = 1;
for stage = 2:numEncoderStages

numFilters = numFiltersFirstEncoder*2^(stage-1);
layerNamePrefix = encoderNamePrefix+num2str(stage);

encoderLayers = [
encoderLayers
convolution2dLayer([3 3],numFilters,Padding="same", ...
WeightsInitializer="narrow-normal",Name=layerNamePrefix+"-Conv-1")
groupNormalizationLayer("channel-wise",Name="cn"+num2str(cnIdx))
leakyReluLayer(0.2,Name=layerNamePrefix+"-ReLU-1")
convolution2dLayer([3 3],numFilters,Padding="same", ...
WeightsInitializer="narrow-normal",Name=layerNamePrefix+"-Conv-2")
groupNormalizationLayer("channel-wise",Name="cn"+num2str(cnIdx+1))
leakyReluLayer(0.2,Name=layerNamePrefix+"-ReLU-2")
maxPooling2dLayer([2 2],Stride=[2 2],Name=layerNamePrefix+"-MaxPool")
];

cnIdx = cnIdx + 2;
end

Add bridge layers. The bridge subnetwork has twice the number of filters as the final encoder
subnetwork and first decoder subnetwork.

numFilters = numFiltersFirstEncoder*2^numEncoderStages;
bridgeLayers = [
convolution2dLayer([3 3],numFilters,Padding="same", ...
WeightsInitializer="narrow-normal",Name="Bridge-Conv-1")
groupNormalizationLayer("channel-wise",Name="cn7")
leakyReluLayer(0.2,Name="Bridge-ReLU-1")
convolution2dLayer([3 3],numFilters,Padding="same", ...
WeightsInitializer="narrow-normal",Name="Bridge-Conv-2")
groupNormalizationLayer("channel-wise",Name="cn8")
leakyReluLayer(0.2,Name="Bridge-ReLU-2")];

Add layers of the first three decoder subnetworks.

numDecoderStages = 4;
cnIdx = 9;
decoderNamePrefix = "Decoder-Stage-";

decoderLayers = [];
for stage = 1:numDecoderStages-1

numFilters = numFiltersFirstEncoder*2^(numDecoderStages-stage);
layerNamePrefix = decoderNamePrefix+num2str(stage);

19-107
19 Deep Learning

decoderLayers = [
decoderLayers
transposedConv2dLayer([3 3],numFilters,Stride=[2 2],Cropping="same", ...
WeightsInitializer="narrow-normal",Name=layerNamePrefix+"-UpConv")
leakyReluLayer(0.2,Name=layerNamePrefix+"-UpReLU")
depthConcatenationLayer(2,Name=layerNamePrefix+"-DepthConcatenation")
convolution2dLayer([3 3],numFilters,Padding="same", ...
WeightsInitializer="narrow-normal",Name=layerNamePrefix+"-Conv-1")
groupNormalizationLayer("channel-wise",Name="cn"+num2str(cnIdx))
leakyReluLayer(0.2,Name=layerNamePrefix+"-ReLU-1")
convolution2dLayer([3 3],numFilters,Padding="same", ...
WeightsInitializer="narrow-normal",Name=layerNamePrefix+"-Conv-2")
groupNormalizationLayer("channel-wise",Name="cn"+num2str(cnIdx+1))
leakyReluLayer(0.2,Name=layerNamePrefix+"-ReLU-2")
];

cnIdx = cnIdx + 2;
end

Add layers of the last decoder subnetwork. This subnetwork excludes the channel-wise instance
normalization performed by the other decoder subnetworks. Each decoder subnetwork has half the
number of filters as the previous subnetwork.
numFilters = numFiltersFirstEncoder;
layerNamePrefix = decoderNamePrefix+num2str(stage+1);

decoderLayers = [
decoderLayers
transposedConv2dLayer([3 3],numFilters,Stride=[2 2],Cropping="same", ...
WeightsInitializer="narrow-normal",Name=layerNamePrefix+"-UpConv")
leakyReluLayer(0.2,Name=layerNamePrefix+"-UpReLU")
depthConcatenationLayer(2,Name=layerNamePrefix+"-DepthConcatenation")
convolution2dLayer([3 3],numFilters,Padding="same", ...
WeightsInitializer="narrow-normal",Name=layerNamePrefix+"-Conv-1")
leakyReluLayer(0.2,Name=layerNamePrefix+"-ReLU-1")
convolution2dLayer([3 3],numFilters,Padding="same", ...
WeightsInitializer="narrow-normal",Name=layerNamePrefix+"-Conv-2")
leakyReluLayer(0.2,Name=layerNamePrefix+"-ReLU-2")];

Add the final layers of the U-Net. The pixel shuffle layer moves from the H/2-by-W/2-by-12 channel
size of the activations from the final convolution to H-by-W-by-3 channel activations using pixel
shuffle upsampling. The final layer encourages outputs to the desired range [0, 1] using a hyperbolic
tangent function.
finalLayers = [
convolution2dLayer([3 3],12,Padding="same",WeightsInitializer="narrow-normal", ...
Name="Decoder-Stage-4-Conv-3")
pixelShuffleLayer("pixelShuffle",2)
tanhScaledAndShiftedLayer("tanhActivation")];

layers = [initialLayer;encoderLayers;bridgeLayers;decoderLayers;finalLayers];
lgraph = layerGraph(layers);

Connect layers of the encoding and decoding subnetworks.


lgraph = connectLayers(lgraph,"Encoder-Stage-1-ReLU-2", ...
"Decoder-Stage-4-DepthConcatenation/in2");

19-108
Develop Camera Processing Pipeline Using Deep Learning

lgraph = connectLayers(lgraph,"Encoder-Stage-2-ReLU-2", ...


"Decoder-Stage-3-DepthConcatenation/in2");
lgraph = connectLayers(lgraph,"Encoder-Stage-3-ReLU-2", ...
"Decoder-Stage-2-DepthConcatenation/in2");
lgraph = connectLayers(lgraph,"Encoder-Stage-4-ReLU-2", ...
"Decoder-Stage-1-DepthConcatenation/in2");
net = dlnetwork(lgraph);

Visualize the network architecture using the Deep Network Designer (Deep Learning Toolbox) app.

deepNetworkDesigner(lgraph)

Load the Feature Extraction Network

This function modifies a pretrained VGG-16 deep neural network to extract image features at various
layers. These multilayer features are used to compute content loss.

To get a pretrained VGG-16 network, install vgg16 (Deep Learning Toolbox). If you do not have the
required support package installed, then the software provides a download link.

vggNet = vgg16;

To make the VGG-16 network suitable for feature extraction, use the layers up to "relu5_3".

vggNet = vggNet.Layers(1:31);
vggNet = dlnetwork(layerGraph(vggNet));

Define Model Gradients and Loss Functions

The helper function modelGradients calculates the gradients and overall loss for batches of
training data. This function is defined in the Supporting Functions on page 19-117 section of this
example.

The overall loss is a weighted sum of two losses: mean of absolute error (MAE) loss and content loss.
The content loss is weighted such that the MAE loss and content loss contribute approximately
equally to the overall loss:

lossOverall = lossMAE + weightFactor * lossContent

The MAE loss penalises the L1 distance between samples of the network predictions and samples of
the target image. L1 is often a better choice than L2 for image processing applications because it can
help reduce blurring artifacts [4 on page 19-118]. This loss is implemented using the maeLoss helper
function defined in the Supporting Functions on page 19-117 section of this example.

The content loss helps the network learn both high-level structural content and low-level edge and
color information. The loss function calculates a weighted sum of the mean square error (MSE)
between predictions and targets for each activation layer. This loss is implemented using the
contentLoss helper function defined in the Supporting Functions on page 19-117 section of this
example.

Calculate Content Loss Weight Factor

The modelGradients helper function requires the content loss weight factor as an input argument.
Calculate the weight factor for a sample batch of training data such that the MAE loss is equal to the
weighted content loss.

19-109
19 Deep Learning

Preview a batch of training data, which consists of pairs of RAW network inputs and RGB target
outputs.

trainingBatch = preview(dsTrainAug);
networkInput = dlarray((trainingBatch{1,1}),"SSC");
targetOutput = dlarray((trainingBatch{1,2}),"SSC");

Predict the response of the untrained U-Net network using the forward (Deep Learning Toolbox)
function.

predictedOutput = forward(net,networkInput);

Calculate the MAE and content losses between the predicted and target RGB images.

sampleMAELoss = maeLoss(predictedOutput,targetOutput);
sampleContentLoss = contentLoss(vggNet,predictedOutput,targetOutput);

Calculate the weight factor.

weightContent = sampleMAELoss/sampleContentLoss;

Specify Training Options

Define the training options that are used within the custom training loop to control aspects of Adam
optimization. Train for 20 epochs.

learnRate = 5e-5;
numEpochs = 20;

Train Network or Download Pretrained Network

By default, the example downloads a pretrained version of the RAW-to-RGB network by using the
downloadTrainedNetwork helper function. The pretrained network enables you to run the entire
example without waiting for training to complete.

To train the network, set the doTraining variable in the following code to true. Train the model in
a custom training loop. For each iteration:

• Read the data for current mini-batch using the next (Deep Learning Toolbox) function.
• Evaluate the model gradients using the dlfeval (Deep Learning Toolbox) function and the
modelGradients helper function.
• Update the network parameters using the adamupdate (Deep Learning Toolbox) function and the
gradient information.
• Update the training progress plot for every iteration and display various computed losses.

Train on a GPU if one is available. Using a GPU requires Parallel Computing Toolbox™ and a CUDA®
enabled NVIDIA® GPU. For more information, see “GPU Computing Requirements” (Parallel
Computing Toolbox). Training takes about 88 hours on an NVIDIA™ Titan RTX and can take even
longer depending on your GPU hardware.

doTraining = false;
if doTraining

% Create a directory to store checkpoints


checkpointDir = fullfile(dataDir,"checkpoints",filesep);
if ~exist(checkpointDir,"dir")

19-110
Develop Camera Processing Pipeline Using Deep Learning

mkdir(checkpointDir);
end

% Initialize training plot


[hFig,batchLine,validationLine] = initializeTrainingPlotRAWToRGB;

% Initialize Adam solver state


[averageGrad,averageSqGrad] = deal([]);
iteration = 0;

start = tic;
for epoch = 1:numEpochs
reset(trainingQueue);
shuffle(trainingQueue);
while hasdata(trainingQueue)
[inputRAW,targetRGB] = next(trainingQueue);

[grad,loss] = dlfeval(@modelGradients, ...


net,vggNet,inputRAW,targetRGB,weightContent);

iteration = iteration + 1;

[net,averageGrad,averageSqGrad] = adamupdate(net, ...


grad,averageGrad,averageSqGrad,iteration,learnRate);

updateTrainingPlotRAWToRGB(batchLine,validationLine,iteration, ...
loss,start,epoch,validationQueue,numValImages,valBatchSize, ...
net,vggNet,weightContent);
end
% Save checkpoint of network state
save(checkpointDir+"epoch"+epoch,"net");
end

% Save the final network state


modelDateTime = string(datetime("now",Format="yyyy-MM-dd-HH-mm-ss"));
save(fullfile(dataDir,"trainedRAWToRGBNet-"+modelDateTime+".mat"),"net");

else
trainedNet_url = "https://ssd.mathworks.com/supportfiles"+ ...
"/vision/data/trainedRAWToRGBNet.mat";
downloadTrainedNetwork(trainedNet_url,dataDir);
load(fullfile(dataDir,"trainedRAWToRGBNet.mat"));
end

Calculate Image Quality Metrics

Reference-based quality metrics such as MSSIM or PSNR enable a quantitative measure of image
quality. You can calculate the MSSIM and PSNR of the patched test images because they are spatially
registered and the same size.

Iterate through the test set of patched images using a minibatchqueue object.
patchTestSet = combine(dsTestRAW,dsTestRGB);
testPatchQueue = minibatchqueue(patchTestSet, ...
MiniBatchSize=16,MiniBatchFormat="SSCB");

Iterate through the test set and calculate the MSSIM and PSNR for each test image using the
multissim and psnr functions. Calculate MSSIM for color images by using a mean of the metric for

19-111
19 Deep Learning

each color channel as an approximation because the metric is not well defined for multi-channel
inputs.
totalMSSIM = 0;
totalPSNR = 0;
while hasdata(testPatchQueue)
[inputRAW,targetRGB] = next(testPatchQueue);
outputRGB = forward(net,inputRAW);
targetRGB = targetRGB ./ 255;
mssimOut = sum(mean(multissim(outputRGB,targetRGB),3),4);
psnrOut = sum(psnr(outputRGB,targetRGB),4);
totalMSSIM = totalMSSIM + mssimOut;
totalPSNR = totalPSNR + psnrOut;
end

Calculate the mean MSSIM and mean PSNR over the test set. This result is consistent with the
similar U-Net approach from [3 on page 19-118] for mean MSSIM and competitive with the PyNet
approach in [3 on page 19-118] in mean PSNR. The differences in loss functions and use of pixel
shuffle upsampling compared to [3 on page 19-118] likely account for these differences.
numObservations = dsTestRGB.numpartitions;
meanMSSIM = totalMSSIM / numObservations

meanMSSIM =
1(S) × 1(S) × 1(C) × 1(B) single gpuArray dlarray

0.8401

meanPSNR = totalPSNR / numObservations

meanPSNR =
1(S) × 1(S) × 1(C) × 1(B) single gpuArray dlarray

21.0730

Evaluate Trained Image Processing Pipeline on Full-Sized Images

Because of sensor differences between the phone camera and DSLR used to acquire the full-
resolution test images, the scenes are not registered and are not the same size. Reference-based
comparison of the full-resolution images from the network and the DSLR ISP is difficult. However, a
qualitative comparison of the images is useful because a goal of image processing is to create an
aesthetically pleasing image.

Create an image datastore that contains full-sized RAW images acquired by a phone camera.
testImageDir = fullfile(dataDir,"test");
testImageDirRAW = "huawei_full_resolution";
dsTestFullRAW = imageDatastore(fullfile(testImageDir,testImageDirRAW));

Get the names of the image files in the full-sized RAW test set.
targetFilesToInclude = extractAfter(string(dsTestFullRAW.Files), ...
fullfile(testImageDirRAW,filesep));
targetFilesToInclude = extractBefore(targetFilesToInclude,".png");

Preprocess the RAW data by converting the data to the form expected by the network using the
transform function. The transform function processes the data using the operations specified in

19-112
Develop Camera Processing Pipeline Using Deep Learning

the preprocessRAWDataForRAWToRGB helper function. The helper function is attached to the


example as a supporting file.

dsTestFullRAW = transform(dsTestFullRAW,@preprocessRAWDataForRAWToRGB);

Create an image datastore that contains full-sized RGB test images captured from the high-end DSLR.
The Zurich RAW-to-RGB data set contains more full-sized RGB images than RAW images, so include
only the RGB images with a corresponding RAW image.

dsTestFullRGB = imageDatastore(fullfile(dataDir,"full_resolution","canon"));
dsTestFullRGB.Files = dsTestFullRGB.Files( ...
contains(dsTestFullRGB.Files,targetFilesToInclude));

Read in the target RGB images, then display a montage of the first few images.

targetRGB = readall(dsTestFullRGB);
montage(targetRGB,Size=[5 2],Interpolation="bilinear")

19-113
19 Deep Learning

19-114
Develop Camera Processing Pipeline Using Deep Learning

Iterate through the test set of full-sized images using a minibatchqueue object. If you have a GPU
device with sufficient memory to process full-resolution images, then you can run prediction on a GPU
by specifying the output environment as "gpu".

testQueue = minibatchqueue(dsTestFullRAW,MiniBatchSize=1, ...


MiniBatchFormat="SSCB",OutputEnvironment="cpu");

For each full-sized RAW test image, predict the output RGB image by calling forward (Deep
Learning Toolbox) on the network.

outputSize = 2*size(preview(dsTestFullRAW),[1 2]);


outputImages = zeros([outputSize,3,dsTestFullRAW.numpartitions],"uint8");

idx = 1;
while hasdata(testQueue)
inputRAW = next(testQueue);
rgbOut = forward(net,inputRAW);
rgbOut = gather(extractdata(rgbOut));
outputImages(:,:,:,idx) = im2uint8(rgbOut);
idx = idx+1;
end

Get a sense of the overall output by looking at a montage view. The network produces images that are
aesthetically pleasing, with similar characteristics.

montage(outputImages,Size=[5 2],Interpolation="bilinear")

19-115
19 Deep Learning

19-116
Develop Camera Processing Pipeline Using Deep Learning

Compare one target RGB image with the corresponding image predicted by the network. The network
produces colors which are more saturated than the target DSLR images. Although the colors from the
simple U-Net architecture are not the same as the DSLR targets, the images are still qualitatively
pleasing in many cases.

imgIdx = 1;
imTarget = targetRGB{imgIdx};
imPredicted = outputImages(:,:,:,imgIdx);
montage({imTarget,imPredicted},Interpolation="bilinear")

To improve the performance of the RAW-to-RGB network, a network architecture would learn detailed
localized spatial features using multiple scales from global features that describe color and contrast
[3 on page 19-118].

Supporting Functions

Model Gradients Function

The modelGradients helper function calculates the gradients and overall loss. The gradient
information is returned as a table which includes the layer, parameter name and value for each
learnable parameter in the model.

function [gradients,loss] = modelGradients(dlnet,vggNet,X,T,weightContent)


Y = forward(dlnet,X);
lossMAE = maeLoss(Y,T);
lossContent = contentLoss(vggNet,Y,T);
loss = lossMAE + weightContent.*lossContent;
gradients = dlgradient(loss,dlnet.Learnables);
end

Mean Absolute Error Loss Function

The helper function maeLoss computes the mean absolute error between network predictions, Y, and
target images, T.

function loss = maeLoss(Y,T)


loss = mean(abs(Y-T),"all");
end

19-117
19 Deep Learning

Content Loss Function

The helper function contentLoss calculates a weighted sum of the MSE between network
predictions, Y, and target images, T, for each activation layer. The contentLoss helper function
calculates the MSE for each activation layer using the mseLoss helper function. Weights are selected
such that the loss from each activation layers contributes roughly equally to the overall content loss.

function loss = contentLoss(net,Y,T)

layers = ["relu1_1","relu1_2","relu2_1","relu2_2", ...


"relu3_1","relu3_2","relu3_3","relu4_1"];
[T1,T2,T3,T4,T5,T6,T7,T8] = forward(net,T,Outputs=layers);
[X1,X2,X3,X4,X5,X6,X7,X8] = forward(net,Y,Outputs=layers);

l1 = mseLoss(X1,T1);
l2 = mseLoss(X2,T2);
l3 = mseLoss(X3,T3);
l4 = mseLoss(X4,T4);
l5 = mseLoss(X5,T5);
l6 = mseLoss(X6,T6);
l7 = mseLoss(X7,T7);
l8 = mseLoss(X8,T8);

layerLosses = [l1 l2 l3 l4 l5 l6 l7 l8];


weights = [1 0.0449 0.0107 0.0023 6.9445e-04 2.0787e-04 2.0118e-04 6.4759e-04];
loss = sum(layerLosses.*weights);
end

Mean Square Error Loss Function

The helper function mseLoss computes the MSE between network predictions, Y, and target images,
T.

function loss = mseLoss(Y,T)


loss = mean((Y-T).^2,"all");
end

References

1) Sumner, Rob. "Processing RAW Images in MATLAB". May 19, 2014. https://rcsumner.net/
raw_guide/RAWguide.pdf.

2) Chen, Chen, Qifeng Chen, Jia Xu, and Vladlen Koltun. “Learning to See in the Dark.”
ArXiv:1805.01934 [Cs], May 4, 2018. http://arxiv.org/abs/1805.01934.

3) Ignatov, Andrey, Luc Van Gool, and Radu Timofte. “Replacing Mobile Camera ISP with a Single
Deep Learning Model.” ArXiv:2002.05509 [Cs, Eess], February 13, 2020. http://arxiv.org/abs/
2002.05509. Project Website.

4) Zhao, Hang, Orazio Gallo, Iuri Frosio, and Jan Kautz. “Loss Functions for Neural Networks for
Image Processing.” ArXiv:1511.08861 [Cs], April 20, 2018. http://arxiv.org/abs/1511.08861.

19-118
Develop Camera Processing Pipeline Using Deep Learning

5) Johnson, Justin, Alexandre Alahi, and Li Fei-Fei. “Perceptual Losses for Real-Time Style Transfer
and Super-Resolution.” ArXiv:1603.08155 [Cs], March 26, 2016. http://arxiv.org/abs/1603.08155.

6) Shi, Wenzhe, Jose Caballero, Ferenc Huszár, Johannes Totz, Andrew P. Aitken, Rob Bishop, Daniel
Rueckert, and Zehan Wang. “Real-Time Single Image and Video Super-Resolution Using an Efficient
Sub-Pixel Convolutional Neural Network.” ArXiv:1609.05158 [Cs, Stat], September 23, 2016. http://
arxiv.org/abs/1609.05158.

See Also
imageDatastore | trainingOptions | trainNetwork | transform | combine

Related Examples
• “Brighten Extremely Dark Images Using Deep Learning” on page 19-120

More About
• “Preprocess Images for Deep Learning” (Deep Learning Toolbox)
• “Datastores for Deep Learning” (Deep Learning Toolbox)
• “List of Deep Learning Layers” (Deep Learning Toolbox)
• “Define Custom Training Loops, Loss Functions, and Networks” (Deep Learning Toolbox)

19-119
19 Deep Learning

Brighten Extremely Dark Images Using Deep Learning

This example shows how to recover brightened RGB images from RAW camera data collected in
extreme low-light conditions using a U-Net.

Low-light image recovery in cameras is a challenging problem. A typical solution is to increase the
exposure time, which allows more light in the scene to hit the sensor and increases the brightness of
the image. However, longer exposure times can result in motion blur artifacts when objects in the
scene move or when the camera is perturbed during acquisition.

Deep learning offers solutions that recover reasonable images for RAW data collected from DSLRs
and many modern phone cameras despite low light conditions and short exposure times. These
solutions take advantage of the full information present in RAW data to outperform brightening
techniques performed in postprocessed RGB data [1 on page 19-130].

Low Light Image (Left) and Recovered Image (Right)

This example shows how to train a network to implement a low-light camera pipeline using data from
a particular camera sensor. This example shows how to recover well exposed RGB images from very
low light, underexposed RAW data from the same type of camera sensor.

Download See-in-the-Dark Data Set

This example uses the Sony camera data from the See-in-the-Dark (SID) data set [1 on page 19-130].
The SID data set provides registered pairs of RAW images of the same scene. In each pair, one image
has a short exposure time and is underexposed, and the other image has a longer exposure time and
is well exposed. The size of the Sony camera data from the SID data set is 25 GB.

Set dataDir as the desired location of the data set.

dataDir = fullfile(tempdir,"SID");

To download the data set, go to this link: https://storage.googleapis.com/isl-datasets/SID/Sony.zip.


Extract the data into the directory specified by the dataDir variable. When extraction is successful,
dataDir contains the directory Sony with two subdirectories: long and short. The files in the long
subdirectory have a long exposure and are well exposed. The files in the short subdirectory have a
short exposure and are quite underexposed and dark.

19-120
Brighten Extremely Dark Images Using Deep Learning

The data set also provides text files that describe how to partition the files into training, validation,
and test data sets. Move the files Sony_train_list.txt, Sony_val_list.txt, and
Sony_test_list.txt to the directory specified by the dataDir variable.

Create Datastores for Training, Validation, and Testing

Import the list of files to include in the training, validation, and test data sets using the
importSonyFileInfo helper function. This function is attached to the example as a supporting file.

trainInfo = importSonyFileInfo(fullfile(dataDir,"Sony_train_list.txt"));
valInfo = importSonyFileInfo(fullfile(dataDir,"Sony_val_list.txt"));
testInfo = importSonyFileInfo(fullfile(dataDir,"Sony_test_list.txt"));

Combine and Preprocess RAW and RGB Data Using Datastores

Create combined datastores that read and preprocess pairs of underexposed and well exposed RAW
images using the createCombinedDatastoreForLowLightRecovery helper function. This
function is attached to the example as a supporting file.

The createCombinedDatastoreForLowLightRecovery helper function performs these


operations:

• Create an imageDatastore that reads the short exposure RAW images using a custom read
function. The read function reads a RAW image using the rawread function, then separates the
RAW Bayer pattern into separate channels for each of the four sensors using the raw2planar
function. Normalize the data to the range [0, 1] by transforming the imageDatastore object.
• Create an imageDatastore object that reads long-exposure RAW images and converts the data
to an RGB image in one step using the raw2rgb function. Normalize the data to the range [0, 1]
by transforming the imageDatastore object.
• Combine the imageDatastore objects using the combine function.
• Apply a simple multiplicative gain to the pairs of images. The gain corrects for the exposure time
difference between the shorter exposure time of the dark inputs and the longer exposure time of
the output images. This gain is defined by taking the ratio of the long and short exposure times
provided in the image file names.
• Associate the images with metadata such as exposure time, ISO, and aperture.

dsTrainFull = createCombinedDatastoreForLowLightRecovery(dataDir,trainInfo);
dsValFull = createCombinedDatastoreForLowLightRecovery(dataDir,valInfo);
dsTestFull = createCombinedDatastoreForLowLightRecovery(dataDir,testInfo);

Use a subset of the validation images to make computation of validation metrics quicker. Do not apply
additional augmentation.

numVal = 30;
dsValFull = shuffle(dsValFull);
dsVal = subset(dsValFull,1:numVal);

Preprocess Training and Validation Data

Preprocess the training data set using the transform function and the extractRandomPatch
helper function. The helper function is defined at the end of this example. The
extractRandomPatch helper function crops multiple random patches of size 512-by-512-by-4 pixels
from a planar RAW image and corresponding patches of size 1024-by-1024-by-3 pixels from an RGB
image. The scene content in the patches matches. Extract 12 patches per training image.

19-121
19 Deep Learning

inputSize = [512,512,4];
patchesPerImage = 12;
dsTrain = transform(dsTrainFull, ...
@(data) extractRandomPatch(data,inputSize,patchesPerImage));

Preview an original full-sized image and a random training patch.


previewFull = preview(dsTrainFull);
previewPatch = preview(dsTrain);
montage({previewFull{1,2},previewPatch{1,2}},BackgroundColor="w");

Preprocess the validation data set using the transform function and the extractCenterPatch
helper function. The helper function is defined at the end of this example. The
extractCenterPatch helper function crops a single patch of size 512-by-512-by-4 pixels from the
center of a planar RAW image and corresponding patches of size 1024-by-1024-by-3 pixels from an
RGB image. The scene content in the patches matches.
dsVal = transform(dsVal,@(data) extractCenterPatch(data,inputSize));

The testing data set does not require preprocessing. Test images are fed at full size into the network.

Augment Training Data

Augment the training data set using the transform function and the
augmentPatchesForLowLightRecovery helper function. The helper function is included at the
end of this example. The augmentPatchesForLowLightRecovery helper function adds random
horizontal and vertical reflection and randomized 90-degree rotations to pairs of training image
patches.
dsTrain = transform(dsTrain,@(data) augmentPatchesForLowLightRecovery(data));

Verify that the preprocessing and augmentation operations work as expected by previewing one
channel from the planar RAW image patch and the corresponding RGB decoded patch. The planar
RAW data and the target RGB data depict patches of the same scene, randomly extracted from the
original source image. Significant noise is visible in the RAW patch because of the short acquisition
time of the RAW data, causing a low signal-to-noise ratio.
imagePairs = read(dsTrain);
rawImage = imagePairs{1,1};

19-122
Brighten Extremely Dark Images Using Deep Learning

rgbPatch = imagePairs{1,2};
montage({rawImage(:,:,1),rgbPatch});

Define Network

Use a network architecture similar to U-Net. The example creates the encoder and decoder
subnetworks using the blockedNetwork function. This function creates the encoder and decoder
subnetworks programmatically using the buildEncoderBlock and buildDecoderBlock helper
functions, respectively. The helper functions are defined at the end of this example. The example uses
instance normalization between convolution and activation layers in all network blocks except the
first and last, and uses a leaky ReLU layer as the activation layer.

Create an encoder subnetwork that consists of four encoder modules. The first encoder module has
32 channels, or feature maps. Each subsequent module doubles the number of feature maps from the
previous encoder module.

numModules = 4;
numChannelsEncoder = 2.^(5:8);
encoder = blockedNetwork(@(block) buildEncoderBlock(block,numChannelsEncoder), ...
numModules,NamePrefix="encoder");

Create a decoder subnetwork that consists of four decoder modules. The first decoder module has
256 channels, or feature maps. Each subsequent decoder module halves the number of feature maps
from the previous decoder module.

numChannelsDecoder = fliplr(numChannelsEncoder);
decoder = blockedNetwork(@(block) buildDecoderBlock(block,numChannelsDecoder), ...
numModules,NamePrefix="decoder");

Specify the bridge layers that connect the encoder and decoder subnetworks.

19-123
19 Deep Learning

bridgeLayers = [
convolution2dLayer(3,512,Padding="same",PaddingValue="replicate")
groupNormalizationLayer("channel-wise")
leakyReluLayer(0.2)
convolution2dLayer(3,512,Padding="same",PaddingValue="replicate")
groupNormalizationLayer("channel-wise")
leakyReluLayer(0.2)];

Specify the final layers of the network.


finalLayers = [
convolution2dLayer(1,12)
depthToSpace2dLayer(2)];

Combine the encoder subnetwork, bridge layers, decoder subnetwork, and final layers using the
encoderDecoderNetwork function.
net = encoderDecoderNetwork(inputSize,encoder,decoder, ...
LatentNetwork=bridgeLayers, ...
SkipConnections="concatenate", ...
FinalNetwork=finalLayers);
net = layerGraph(net);

Use mean centering normalization on the input as part of training.


net = replaceLayer(net,"encoderImageInputLayer", ...
imageInputLayer(inputSize,Normalization="zerocenter"));

Define the overall loss using the custom layer ssimLossLayerGray. This layer definition is attached
to this example as a supporting file. The ssimLossLayerGray layer uses a loss of the form

lossOverall = α × lossSSIM + 1 − α × lossL1

The layer calculates a multiscale structural similarity (SSIM) loss for the grayscale representations of
the predicted and target RGB images using the multissim function. The layer specifies the
weighting factor α as 7/8 and uses five scales.
finalLayerName = net.Layers(end).Name;
lossLayer = ssimLossLayerGray;
net = addLayers(net,lossLayer);
net = connectLayers(net,finalLayerName,lossLayer.Name);

Specify Training Options

For training, use the Adam solver with an initial learning rate of 1e-3. Train for 30 epochs.
miniBatchSize = 12;
maxEpochs = 30;
options = trainingOptions("adam", ...
Plots="training-progress", ...
MiniBatchSize=miniBatchSize, ...
InitialLearnRate=1e-3, ...
MaxEpochs=maxEpochs, ...
ValidationFrequency=400);

Train Network or Download Pretrained Network

By default, the example loads a pretrained version of the low-light recovery network. The pretrained
network enables you to run the entire example without waiting for training to complete.

19-124
Brighten Extremely Dark Images Using Deep Learning

To train the network, set the doTraining variable in the following code to true. Train the model
using the trainNetwork (Deep Learning Toolbox) function.

Train on a GPU if one is available. Using a GPU requires Parallel Computing Toolbox™ and a CUDA®
enabled NVIDIA® GPU. For more information, see “GPU Computing Requirements” (Parallel
Computing Toolbox).

doTraining = false;

if doTraining
checkpointsDir = fullfile(dataDir,"checkpoints");
if ~exist(checkpointsDir,"dir")
mkdir(checkpointsDir);
end
options.CheckpointPath=checkpointsDir;

netTrained = trainNetwork(dsTrain,net,options);
modelDateTime = string(datetime("now",Format="yyyy-MM-dd-HH-mm-ss"));
save(fullfile(dataDir,"trainedLowLightCameraPipelineNet-"+modelDateTime+".mat"), ...
"netTrained");

else
trainedNet_url = "https://ssd.mathworks.com/supportfiles/"+ ...
"vision/data/trainedLowLightCameraPipelineNet.zip";
downloadTrainedNetwork(trainedNet_url,dataDir);
load(fullfile(dataDir,"trainedLowLightCameraPipelineNet.mat"));
end

Examine Results from Trained Network

Visually examine the results of the trained low-light camera pipeline network.

Read a pair of images and accompanying metadata from the test set. Get the file names of the short
and long exposure images from the metadata.

[testPair,info] = read(dsTestFull);
testShortFilename = info.ShortExposureFilename;
testLongFilename = info.LongExposureFilename;

Convert the original underexposed RAW image to an RGB image in one step using the raw2rgb
function. Display the result, scaling the display range to the range of pixel values. The image looks
almost completely black, with only a few bright pixels.

testShortImage = raw2rgb(testShortFilename);
testShortTime = info.ShortExposureTime;
imshow(testShortImage,[])
title(["Short Exposure Test Image";"Exposure Time = "+num2str(testShortTime)]+" s")

19-125
19 Deep Learning

Convert the original well exposed RAW image to an RGB image in one step using the raw2rgb
function. Display the result.

testLongImage = raw2rgb(testLongFilename);
testLongTime = info.LongExposureTime;
imshow(testLongImage)
title(["Long Exposure Target Image";"Exposure Time = "+num2str(testLongTime)]+" s")

19-126
Brighten Extremely Dark Images Using Deep Learning

Display the network prediction. The trained network recovers an impressive image under challenging
acquisition conditions with very little noise or other visual artifacts. The colors of the network
prediction are less saturated and vibrant than in the ground truth long-exposure image of the scene.

outputFromNetwork = im2uint8(activations(netTrained,testPair{1},"FinalNetworkLayer2"));
imshow(outputFromNetwork)
title("Low-Light Recovery Network Prediction")

19-127
19 Deep Learning

Supporting Functions

The extractRandomPatch helper function crops multiple random patches from a planar RAW image
and corresponding patches from an RGB image. The RAW data patch has size m-by-n-by-4 and the
RGB image patch has size 2m-by-2n-by-3, where [m n] is the value of the targetRAWSize input
argument. Both patches have the same scene content.

function dataOut = extractRandomPatch(data,targetRAWSize,patchesPerImage)


dataOut = cell(patchesPerImage,2);
raw = data{1};
rgb = data{2};
for idx = 1:patchesPerImage
windowRAW = randomCropWindow3d(size(raw),targetRAWSize);
windowRGB = images.spatialref.Rectangle( ...
2*windowRAW.XLimits+[-1,0],2*windowRAW.YLimits+[-1,0]);
dataOut(idx,:) = {imcrop3(raw,windowRAW),imcrop(rgb,windowRGB)};
end
end

The extractCenterPatch helper function crops a single patch from the center of a planar RAW
image and the corresponding patch from an RGB image. The RAW data patch has size m-by-n-by-4
and the RGB image patch has size 2m-by-2n-by-3, where [m n] is the value of the targetRAWSize
input argument. Both patches have the same scene content.

19-128
Brighten Extremely Dark Images Using Deep Learning

function dataOut = extractCenterPatch(data,targetRAWSize)


raw = data{1};
rgb = data{2};
windowRAW = centerCropWindow3d(size(raw),targetRAWSize);
windowRGB = images.spatialref.Rectangle( ...
2*windowRAW.XLimits+[-1,0],2*windowRAW.YLimits+[-1,0]);
dataOut = {imcrop3(raw,windowRAW),imcrop(rgb,windowRGB)};
end

The buildEncoderBlock helper function defines the layers of a single encoder module within the
encoder subnetwork.

function block = buildEncoderBlock(blockIdx,numChannelsEncoder)

if blockIdx < 2
instanceNorm = [];
else
instanceNorm = instanceNormalizationLayer;
end

filterSize = 3;
numFilters = numChannelsEncoder(blockIdx);
block = [
convolution2dLayer(filterSize,numFilters,Padding="same", ...
PaddingValue="replicate",WeightsInitializer="he")
instanceNorm
leakyReluLayer(0.2)
convolution2dLayer(filterSize,numFilters,Padding="same", ...
PaddingValue="replicate",WeightsInitializer="he")
instanceNorm
leakyReluLayer(0.2)
maxPooling2dLayer(2,Stride=2,Padding="same")];
end

The buildDecoderBlock helper function defines the layers of a single encoder module within the
decoder subnetwork.

function block = buildDecoderBlock(blockIdx,numChannelsDecoder)

if blockIdx < 4
instanceNorm = instanceNormalizationLayer;
else
instanceNorm = [];
end

filterSize = 3;
numFilters = numChannelsDecoder(blockIdx);
block = [
transposedConv2dLayer(filterSize,numFilters,Stride=2, ...
WeightsInitializer="he",Cropping="same")
convolution2dLayer(filterSize,numFilters,Padding="same", ...
PaddingValue="replicate",WeightsInitializer="he")
instanceNorm
leakyReluLayer(0.2)
convolution2dLayer(filterSize,numFilters,Padding="same", ...
PaddingValue="replicate",WeightsInitializer="he")
instanceNorm

19-129
19 Deep Learning

leakyReluLayer(0.2)];
end

References

[1] Chen, Chen, Qifeng Chen, Jia Xu, and Vladlen Koltun. "Learning to See in the Dark." Preprint,
submitted May 4, 2018. https://arxiv.org/abs/1805.01934.

See Also
imageDatastore | trainingOptions | trainNetwork | transform | combine

Related Examples
• “Develop Camera Processing Pipeline Using Deep Learning” on page 19-98

More About
• “Datastores for Deep Learning” (Deep Learning Toolbox)
• “List of Deep Learning Layers” (Deep Learning Toolbox)

19-130
Semantic Segmentation of Multispectral Images Using Deep Learning

Semantic Segmentation of Multispectral Images Using Deep


Learning

This example shows how to perform semantic segmentation of a multispectral image with seven
channels using U-Net.

Semantic segmentation involves labeling each pixel in an image with a class. One application of
semantic segmentation is tracking deforestation, which is the change in forest cover over time.
Environmental agencies track deforestation to assess and quantify the environmental and ecological
health of a region.

Deep learning based semantic segmentation can yield a precise measurement of vegetation cover
from high-resolution aerial photographs. One challenge is differentiating classes with similar visual
characteristics, such as trying to classify a green pixel as grass, shrubbery, or tree. To increase
classification accuracy, some data sets contain multispectral images that provide additional
information about each pixel. For example, the Hamlin Beach State Park data set supplements the
color images with three near-infrared channels that provide a clearer separation of the classes.

This example first shows you how to perform semantic segmentation using a pretrained U-Net and
then use the segmentation results to calculate the extent of vegetation cover. Then, you can optionally
train a U-Net network on the Hamlin Beach State Park data set using a patch-based training
methodology.

Download Pretrained U-Net

Specify dataDir as the desired location of the trained network and data set.

19-131
19 Deep Learning

dataDir = fullfile(tempdir,"rit18_data");

Download a pretrained U-Net network.

trainedNet_url = "https://www.mathworks.com/supportfiles/"+ ...


"vision/data/trainedMultispectralUnetModel.zip";
downloadTrainedNetwork(trainedNet_url,dataDir);
load(fullfile(dataDir,"trainedMultispectralUnetModel", ...
"trainedMultispectralUnetModel.mat"));

Download Data Set

This example uses a high-resolution multispectral data set to train the network [1 on page 19-147].
The image set was captured using a drone over the Hamlin Beach State Park, NY. The data contains
labeled training, validation, and test sets, with 18 object class labels. The size of the data file is 3.0
GB.

Download the MAT file version of the data set using the downloadHamlinBeachMSIData helper
function. This function is attached to the example as a supporting file.

downloadHamlinBeachMSIData(dataDir);

Load the data set.

load(fullfile(dataDir,"rit18_data.mat"));
whos train_data val_data test_data

Name Size Bytes Class Attributes

test_data 7x12446x7654 1333663576 uint16


train_data 7x9393x5642 741934284 uint16
val_data 7x8833x6918 855493716 uint16

The multispectral image data is arranged as numChannels-by-width-by-height arrays. However, in


MATLAB®, multichannel images are arranged as width-by-height-by-numChannels arrays. To reshape
the data so that the channels are in the third dimension, use the permute function.

train_data = permute(train_data,[2 3 1]);


val_data = permute(val_data,[2 3 1]);
test_data = permute(test_data,[2 3 1]);

Confirm that the data has the correct structure.

whos train_data val_data test_data

Name Size Bytes Class Attributes

test_data 12446x7654x7 1333663576 uint16


train_data 9393x5642x7 741934284 uint16
val_data 8833x6918x7 855493716 uint16

Visualize Multispectral Data

Display the center of each spectral band in nanometers.

disp(band_centers)

490 550 680 720 800 900

19-132
Semantic Segmentation of Multispectral Images Using Deep Learning

In this data set, the RGB color channels are the 3rd, 2nd, and 1st image channels, respectively.
Display the RGB component of the training, validation, and test images as a montage. To make the
images appear brighter on the screen, equalize their histograms by using the histeq function.

rgbTrain = histeq(train_data(:,:,[3 2 1]));


rgbVal = histeq(val_data(:,:,[3 2 1]));
rgbTest = histeq(test_data(:,:,[3 2 1]));

montage({rgbTrain,rgbVal,rgbTest},BorderSize=10,BackgroundColor="white")
title("RGB Component of Training, Validation, and Test Image (Left to Right)")

The 4th, 5th, and 6th channels of the data correspond to near-infrared bands. Equalize the histogram
of these three channels for the training image, then display these channels as a montage. The
channels highlight different components of the image based on their heat signatures. For example,
the trees are darker in the 4th channel than in the other two infrared channels.

ir4Train = histeq(train_data(:,:,4));
ir5Train = histeq(train_data(:,:,5));
ir6Train = histeq(train_data(:,:,6));

montage({ir4Train,ir5Train,ir6Train},BorderSize=10,BackgroundColor="white")
title("Infrared Channels 4, 5, and 6 (Left to Right) of Training Image ")

19-133
19 Deep Learning

The 7th channel of the data is a binary mask that indicates the valid segmentation region. Display the
mask for the training, validation, and test images.

maskTrain = train_data(:,:,7);
maskVal = val_data(:,:,7);
maskTest = test_data(:,:,7);

montage({maskTrain,maskVal,maskTest},BorderSize=10,BackgroundColor="white")
title("Mask of Training, Validation, and Test Image (Left to Right)")

19-134
Semantic Segmentation of Multispectral Images Using Deep Learning

Visualize Ground Truth Labels

The labeled images contain the ground truth data for the segmentation, with each pixel assigned to
one of the 18 classes. Get a list of the classes with their corresponding IDs.
disp(classes)

0. Other Class/Image Border


1. Road Markings
2. Tree
3. Building
4. Vehicle (Car, Truck, or Bus)
5. Person
6. Lifeguard Chair
7. Picnic Table
8. Black Wood Panel
9. White Wood Panel
10. Orange Landing Pad
11. Water Buoy
12. Rocks
13. Other Vegetation
14. Grass
15. Sand
16. Water (Lake)
17. Water (Pond)
18. Asphalt (Parking Lot/Walkway)

This example aims to segment the images into two classes: vegetation and non-vegetation. Define the
target class names.
classNames = ["NotVegetation" "Vegetation"];

19-135
19 Deep Learning

Group the 18 original classes into the two target classes for the training and validation data.
"Vegetation" is a combination of the original classes "Tree", "Other Vegetation", and "Grass", which
have class IDs 2, 13, and 14. The original class "Other Class/Image Border" with class ID 0 belongs to
the background class. All other original classes belong to the target label "NotVegetation".

vegetationClassIDs = [2 13 14];
nonvegetationClassIDs = setdiff(1:length(classes),vegetationClassIDs);

labelsTrain = zeros(size(train_labels),"uint8");
labelsTrain(ismember(train_labels,nonvegetationClassIDs)) = 1;
labelsTrain(ismember(train_labels,vegetationClassIDs)) = 2;

labelsVal = zeros(size(val_labels),"uint8");
labelsVal(ismember(val_labels,nonvegetationClassIDs)) = 1;
labelsVal(ismember(val_labels,vegetationClassIDs)) = 2;

Save the ground truth validation labels as a PNG file. The example uses this file to calculate accuracy
metrics.

imwrite(labelsVal,"gtruth.png");

Overlay the labels on the histogram-equalized RGB training image. Add a color bar to the image.

cmap = [1 0 1;0 1 0];


B = labeloverlay(rgbTrain,labelsTrain,Transparency=0.8,Colormap=cmap);
imshow(B,cmap)
title("Training Labels")
numClasses = numel(classNames);
ticks = 1/(numClasses*2):1/numClasses:1;
colorbar(TickLabels=cellstr(classNames),Ticks=ticks,TickLength=0,TickLabelInterpreter="none");

19-136
Semantic Segmentation of Multispectral Images Using Deep Learning

19-137
19 Deep Learning

Perform Semantic Segmentation on Test Image

The size of the image prevents segmenting the entire image at once. Instead, segment the image
using a blocked image approach. This approach can scale to very large files because it loads and
processes one block of data at a time.

Create a blocked image containing the six spectral channels of the test data by using the
blockedImage function.

patchSize = [1024 1024];


bimTest = blockedImage(test_data(:,:,1:6),BlockSize=patchSize);

Segment a block of data by using the semanticseg (Computer Vision Toolbox) function. Call the
sematicseg function on all blocks in the blocked image by using the apply function.

bimSeg = apply(bimTest,@(bs)semanticseg(bs.Data,net,Outputtype="uint8"),...
PadPartialBlocks=true,PadMethod=0);

Assemble all of the segmented blocks into a single image into the workspace by using the gather
function.

segmentedImage = gather(bimSeg);

To extract only the valid portion of the segmentation, multiply the segmented image by the mask
channel of the validation data.

segmentedImage = segmentedImage .* uint8(maskTest~=0);


imshow(segmentedImage,[])
title("Segmented Image")

19-138
Semantic Segmentation of Multispectral Images Using Deep Learning

19-139
19 Deep Learning

The output of semantic segmentation is noisy. Perform post image processing to remove noise and
stray pixels. Remove salt-and-pepper noise from the segmentation by using the medfilt2 function.
Display the segmented image with the noise removed.

segmentedImage = medfilt2(segmentedImage,[7 7]);


imshow(segmentedImage,[]);
title("Segmented Image with Noise Removed")

19-140
Semantic Segmentation of Multispectral Images Using Deep Learning

19-141
19 Deep Learning

Overlay the segmented image on the histogram-equalized RGB validation image.

B = labeloverlay(rgbTest,segmentedImage,Transparency=0.8,Colormap=cmap);
imshow(B,cmap)
title("Labeled Segmented Image")
colorbar(TickLabels=cellstr(classNames),Ticks=ticks,TickLength=0,TickLabelInterpreter="none");

19-142
Semantic Segmentation of Multispectral Images Using Deep Learning

19-143
19 Deep Learning

Calculate Extent of Vegetation Cover

The semantic segmentation results can be used to answer pertinent ecological questions. For
example, what percentage of land area is covered by vegetation? To answer this question, find the
number of pixels labeled vegetation in the segmented test image. Also find the total number of pixels
in the ROI by counting the number of nonzero pixels in the segmented image.
vegetationPixels = ismember(segmentedImage(:),vegetationClassIDs);
numVegetationPixels = sum(vegetationPixels(:));
numROIPixels = nnz(segmentedImage);

Calculate the percentage of vegetation cover by dividing the number of vegetation pixels by the
number of pixels in the ROI.
percentVegetationCover = (numVegetationPixels/numROIPixels)*100;
disp("The percentage of vegetation cover is "+percentVegetationCover+"%");

The percentage of vegetation cover is 65.8067%

The rest of the example shows how to train U-Net on the Hamlin Beach data set.

Create Blocked Image Datastores for Training

Use a blocked image datastore to feed the training data to the network. This datastore extracts
multiple corresponding patches from an image datastore and pixel label datastore that contain
ground truth images and pixel label data.

Read the training images, training labels, and mask as blocked images.
inputTileSize = [256 256];
bim = blockedImage(train_data(:,:,1:6),BlockSize=inputTileSize);
bLabels = blockedImage(labelsTrain,BlockSize=inputTileSize);
bmask = blockedImage(maskTrain,BlockSize=inputTileSize);

Select blocks of image data that overlap with the mask.


overlapPct = 0.185;
blockOffsets = round(inputTileSize.*overlapPct);
bls = selectBlockLocations(bLabels, ...
BlockSize=inputTileSize,BlockOffsets=blockOffsets, ...
Masks=bmask,InclusionThreshold=0.95);

One-hot encode the labels.


labelsTrain1hot = onehotencode(labelsTrain,3,ClassNames=1:2);
labelsTrain1hot(isnan(labelsTrain1hot)) = 0;
bLabels = blockedImage(labelsTrain1hot,BlockSize=inputTileSize);

Write the data to blocked image datastores by using the blockedImageDatastore function.
bimds = blockedImageDatastore(bim,BlockLocationSet=bls,PadMethod=0);
bimdsLabels = blockedImageDatastore(bLabels,BlockLocationSet=bls,PadMethod=0);

Create a CombinedDatastore from the two blocked image datastores.


dsTrain = combine(bimds,bimdsLabels);

The blocked image datastore dsTrain provides mini-batches of data to the network at each iteration
of the epoch. Preview the datastore to explore the data.

19-144
Semantic Segmentation of Multispectral Images Using Deep Learning

preview(dsTrain)

ans=1×2 cell array


{256×256×6 uint16} {256×256×2 double}

Create U-Net Network Layers

This example uses a variation of the U-Net network. In U-Net, the initial series of convolutional layers
are interspersed with max pooling layers, successively decreasing the resolution of the input image.
These layers are followed by a series of convolutional layers interspersed with upsampling operators,
successively increasing the resolution of the input image [2 on page 19-147]. The name U-Net comes
from the fact that the network can be drawn with a symmetric shape like the letter U.

Specify hyperparameters of the U-Net. The input depth is the number of hyperspectral channels, 6.

inputDepth = 6;
encoderDepth = 4;
convFilterSize = 3;
upconvFilterSize = 2;

Create the encoder module that consists of repeating blocks of layers by using the blockedNetwork
function. The encoderBlockMultispectralUNet helper function creates a block of layers for the
encoder and is attached to the example as a supporting file.

encoderBlockFcn = @(block) ...


encoderBlockMultispectralUNet(block,inputDepth,convFilterSize,encoderDepth);
encoder = blockedNetwork(encoderBlockFcn,encoderDepth,NamePrefix="encoder_");

Create the decoder module that consists of repeating blocks of layers by using the blockedNetwork
function. The decoderBlockMultispectralUNet helper function creates a block of layers for the
decoder and is attached to the example as a supporting file.

decoderBlockFcn = @(block) ...


decoderBlockMultispectralUNet(block,convFilterSize,upconvFilterSize);
decoder = blockedNetwork(decoderBlockFcn,encoderDepth,NamePrefix="decoder_");

Define the bridge layers by using the bridgeBlockMultispectralUNet helper function, which is
attached to the example as a supporting file.

bridge = bridgeBlockMultispectralUNet(convFilterSize,encoderDepth);

Define the output layers.

final = [
convolution2dLayer(1,numClasses,Padding="same")
softmaxLayer];

Connect the encoder module, bridge, decoder module, and final layers by using the
encoderDecoderNetwork function. Add skip connections.

skipConnectionNames = [
"encoder_Block1Layer5","decoder_Block4Layer2";
"encoder_Block2Layer5","decoder_Block3Layer2";
"encoder_Block3Layer5","decoder_Block2Layer2";
"encoder_Block4Layer5","decoder_Block1Layer2"];
unet = encoderDecoderNetwork([inputTileSize inputDepth],encoder,decoder, ...
OutputChannels=numClasses, ...

19-145
19 Deep Learning

SkipConnectionNames=skipConnectionNames, ...
SkipConnections="concatenate", ...
LatentNetwork=bridge, ...
FinalNetwork=final);

Select Training Options

Train the network using stochastic gradient descent with momentum (SGDM) optimization. Specify
the hyperparameter settings for SGDM by using the trainingOptions (Deep Learning Toolbox)
function. To enable gradient clipping, specify the GradientThreshold name-value argument as
0.05 and specify the GradientThresholdMethod to use the L2-norm of the gradients.
maxEpochs = 150;
minibatchSize = 16;

options = trainingOptions("sgdm", ...


InitialLearnRate=0.05, ...
Momentum=0.9, ...
L2Regularization=0.001, ...
MaxEpochs=maxEpochs, ...
MiniBatchSize=minibatchSize, ...
LearnRateSchedule="piecewise", ...
Shuffle="every-epoch", ...
GradientThresholdMethod="l2norm", ...
GradientThreshold=0.05, ...
Plots="training-progress", ...
VerboseFrequency=20);

Train the Network

To train the network, set the doTraining variable in the following code to true. Train the model by
using the trainnet (Deep Learning Toolbox) function. Specify a custom loss function, modelLoss,
that calculates the cross entropy loss on only the unmasked pixels. This custom loss function is
defined at the end of the example.

Train on a GPU if one is available. Using a GPU requires Parallel Computing Toolbox™ and a CUDA®
enabled NVIDIA® GPU. For more information, see “GPU Computing Requirements” (Parallel
Computing Toolbox).

doTraining = ;
if doTraining
net = trainnet(dsTrain,unet,@modelLoss,options);
modelDateTime = string(datetime("now",Format="yyyy-MM-dd-HH-mm-ss"));
save(fullfile(dataDir,"multispectralUnet-"+modelDateTime+".mat"),"net");
end

Evaluate Segmentation Accuracy

Segment the validation data.

Create a blocked image containing the six spectral channels of the validation data by using the
blockedImage function.
bimVal = blockedImage(val_data(:,:,1:6),BlockSize=patchSize);

Segment a block of data by using the semanticseg (Computer Vision Toolbox) function. Call the
sematicseg function on all blocks in the blocked image by using the apply function.

19-146
Semantic Segmentation of Multispectral Images Using Deep Learning

bimSeg = apply(bimVal,@(bs)semanticseg(bs.Data,net,Outputtype="uint8"),...
PadPartialBlocks=true,PadMethod=0);

Assemble all of the segmented blocks into a single image into the workspace by using the gather
function.

segmentedImage = gather(bimSeg);

Save the segmented image as a PNG file.

imwrite(segmentedImage,"results.png");

Load the segmentation results and ground truth labels by using the pixelLabelDatastore
(Computer Vision Toolbox) function.

pixelLabelIDs = [1 2];
pxdsResults = pixelLabelDatastore("results.png",classNames,pixelLabelIDs);
pxdsTruth = pixelLabelDatastore("gtruth.png",classNames,pixelLabelIDs);

Measure the accuracy of the semantic segmentation by using the


evaluateSemanticSegmentation (Computer Vision Toolbox) function. The global accuracy score
indicates that over 96% of the pixels are classified correctly.

ssm = evaluateSemanticSegmentation(pxdsResults,pxdsTruth);

Evaluating semantic segmentation results


----------------------------------------
* Selected metrics: global accuracy, class accuracy, IoU, weighted IoU, BF score.
* Processed 1 images.
* Finalizing... Done.
* Data set metrics:

GlobalAccuracy MeanAccuracy MeanIoU WeightedIoU MeanBFScore


______________ ____________ _______ ___________ ___________

0.96875 0.96762 0.93914 0.93931 0.79113

Helper Function

The modelLoss function calculates cross entropy loss over all unmasked pixels of an image.

function loss = modelLoss(y,targets)


mask = ~isnan(targets);
targets(isnan(targets)) = 0;
loss = crossentropy(y,targets,Mask=mask);
end

References

[1] Kemker, R., C. Salvaggio, and C. Kanan. "High-Resolution Multispectral Dataset for Semantic
Segmentation." CoRR, abs/1703.01918. 2017.

[2] Ronneberger, O., P. Fischer, and T. Brox. "U-Net: Convolutional Networks for Biomedical Image
Segmentation." CoRR, abs/1505.04597. 2015.

19-147
19 Deep Learning

[3] Kemker, Ronald, Carl Salvaggio, and Christopher Kanan. "Algorithms for Semantic Segmentation
of Multispectral Remote Sensing Imagery Using Deep Learning." ISPRS Journal of Photogrammetry
and Remote Sensing, Deep Learning RS Data, 145 (November 1, 2018): 60-77. https://doi.org/
10.1016/j.isprsjprs.2018.04.014.

See Also
blockedNetwork | encoderDecoderNetwork | trainingOptions | trainnet | blockedImage |
apply | selectBlockLocations | blockedImageDatastore | pixelLabelDatastore |
semanticseg | evaluateSemanticSegmentation

More About
• “Getting Started with Semantic Segmentation Using Deep Learning” (Computer Vision Toolbox)
• “Semantic Segmentation Using Deep Learning” (Computer Vision Toolbox)
• “Label Large Images in the Image Labeler” (Computer Vision Toolbox)
• “Datastores for Deep Learning” (Deep Learning Toolbox)

External Websites
• https://github.com/rmkemker/RIT-18

19-148
3-D Brain Tumor Segmentation Using Deep Learning

3-D Brain Tumor Segmentation Using Deep Learning

This example shows how to perform semantic segmentation of brain tumors from 3-D medical images.

Semantic segmentation involves labeling each pixel in an image or voxel of a 3-D volume with a class.
This example illustrates the use of a 3-D U-Net deep learning network to perform binary semantic
segmentation of brain tumors in magnetic resonance imaging (MRI) scans. U-Net is a fast, efficient
and simple network that has become popular in the semantic segmentation domain [1 on page 19-
157].

One challenge of medical image segmentation is the amount of memory needed to store and process
3-D volumes. Training a network and performing segmentation on the full input volume is impractical
due to GPU resource constraints. This example solves the problem by dividing the image into smaller
patches, or blocks, for training and segmentation.

A second challenge of medical image segmentation is class imbalance in the data that hampers
training when using conventional cross entropy loss. This example solves the problem by using a
weighted multiclass Dice loss function [4 on page 19-158]. Weighting the classes helps to counter the
influence of larger regions on the Dice score, making it easier for the network to learn how to
segment smaller regions.

This example shows how to perform brain tumor segmentation using a pretrained 3-D U-Net
architecture, and how to evaluate the network performance using a set of test images. You can
optionally train a 3-D U-Net on the BraTS data set [2 on page 19-157].

Load Pretrained 3-D U-Net

Download a pretrained 3-D U-Net into a variable called net.

dataDir = fullfile(tempdir,"BraTS");
if ~exist(dataDir,'dir')
mkdir(dataDir);
end
trained3DUnetURL = "https://www.mathworks.com/supportfiles/"+ ...
"vision/data/brainTumor3DUNetValid.mat";
downloadTrainedNetwork(trained3DUnetURL,dataDir);
load(dataDir+filesep+"brainTumor3DUNetValid.mat");

Load BraTS Data

Download five sample test volumes and their corresponding labels from the BraTS data set using the
downloadBraTSSampleTestData helper function [3 on page 19-157]. The helper function is
attached to the example as a supporting file. The sample data enables you to perform segmentation
on test data without downloading the full data set.

downloadBraTSSampleTestData(dataDir);

Load one of the volume samples along with its pixel label ground truth.

testDir = dataDir+filesep+"sampleBraTSTestSetValid";
data = load(fullfile(testDir,"imagesTest","BraTS446.mat"));
labels = load(fullfile(testDir,"labelsTest","BraTS446.mat"));
volTest = data.cropVol;
volTestLabels = labels.cropLabel;

19-149
19 Deep Learning

Segment Brain Tumors in Blocked Image

The example uses an overlap-tile strategy to process the large volume. The overlap-tile strategy
selects overlapping blocks, predicts the labels for each block by using the semanticseg (Computer
Vision Toolbox) function, and then recombines the blocks into a complete segmented test volume. The
strategy enables efficient processing on the GPU, which has limited memory resources. The strategy
also reduces border artifacts by using the valid part of the convolution in the neural network [5 on
page 19-158].

Implement the overlap-tile strategy by storing the volume data as a blockedImage object and
processing blocks using the apply function.

Create a blockedImage object for the sample volume downloaded in the previous section.

bim = blockedImage(volTest);

The apply function executes a custom function for each block within the blockedImage. Define
semanticsegBlock as the function to execute for each block.

semanticsegBlock = @(bstruct)semanticseg(bstruct.Data,net);

Specify the block size as the network output size. To create overlapping blocks, specify a nonzero
border size. This example uses a border size such that the block plus the border match the network
input size.

networkInputSize = net.Layers(1).InputSize;
networkOutputSize = net.Layers(end).OutputSize;
blockSize = [networkOutputSize(1:3) networkInputSize(end)];
borderSize = (networkInputSize(1:3) - blockSize(1:3))/2;

Perform semantic segmentation using blockedImage apply with partial block padding set to true.
The default padding method, "replicate", is appropriate because the volume data contains
multiple modalities. The batch size is specified as 1 to prevent out-of-memory errors on GPUs with
constrained memory resources. However, if your GPU has sufficient memory, then you can increase
the processessing speed by increasing the block size.

batchSize = 1;
results = apply(bim, ...
semanticsegBlock, ...
BlockSize=blockSize, ...
BorderSize=borderSize,...
PadPartialBlocks=true, ...
BatchSize=batchSize);
predictedLabels = results.Source;

Display a montage showing the center slice of the ground truth and predicted labels along the depth
direction.

zID = size(volTest,3)/2;
zSliceGT = labeloverlay(volTest(:,:,zID),volTestLabels(:,:,zID));
zSlicePred = labeloverlay(volTest(:,:,zID),predictedLabels(:,:,zID));

figure
montage({zSliceGT,zSlicePred},Size=[1 2],BorderSize=5)
title("Labeled Ground Truth (Left) vs. Network Prediction (Right)")

19-150
3-D Brain Tumor Segmentation Using Deep Learning

The following image shows the result of displaying slices sequentially across the one of the volumes.
The labeled ground truth is on the left and the network prediction is on the right.

Download BraTS Data Set

If you do not want to download the training data set or train the network, then you can skip to the
Evaluate Network Performance on page 19-155 section of this example.

This example uses the BraTS data set [2 on page 19-157]. The BraTS data set contains MRI scans of
brain tumors, namely gliomas, which are the most common primary brain malignancies. The size of
the data file is ~7 GB.

To download the BraTS data, go to the Medical Segmentation Decathlon website and click the
"Download Data" link. Download the "Task01_BrainTumour.tar" file [3 on page 19-157]. Unzip the
TAR file into the directory specified by the imageDir variable. When unzipped successfully,
imageDir will contain a directory named Task01_BrainTumour that has three subdirectories:
imagesTr, imagesTs, and labelsTr.

19-151
19 Deep Learning

The data set contains 750 4-D volumes, each representing a stack of 3-D images. Each 4-D volume
has size 240-by-240-by-155-by-4, where the first three dimensions correspond to height, width, and
depth of a 3-D volumetric image. The fourth dimension corresponds to different scan modalities. The
data set is divided into 484 training volumes with voxel labels and 266 test volumes. The test volumes
do not have labels so this example does not use the test data. Instead, the example splits the 484
training volumes into three independent sets that are used for training, validation, and testing.

Prepare Data for Training

To train the 3-D U-Net network more efficiently, preprocess the MRI data using the helper function
preprocessBraTSDataset. This function is attached to the example as a supporting file. The helper
function performs these operations:

• Crop the data to a region containing primarily the brain and tumor. Cropping the data reduces the
size of data while retaining the most critical part of each MRI volume and its corresponding labels.
• Normalize each modality of each volume independently by subtracting the mean and dividing by
the standard deviation of the cropped brain region.
• Split the 484 training volumes into 400 training, 29 validation, and 55 test sets.

Preprocessing the data can take about 30 minutes to complete.

sourceDataLoc = dataDir+filesep+"Task01_BrainTumour";
preprocessDataLoc = dataDir+filesep+"preprocessedDataset";
preprocessBraTSDataset(preprocessDataLoc,sourceDataLoc);

Create Random Patch Extraction Datastore for Training and Validation

Create an imageDatastore to store the 3-D image data. Because the MAT file format is a
nonstandard image format, you must use a MAT file reader to enable reading the image data. You can
use the helper MAT file reader, matRead. This function is attached to the example as a supporting
file.

volLoc = fullfile(preprocessDataLoc,"imagesTr");
volds = imageDatastore(volLoc,FileExtensions=".mat",ReadFcn=@matRead);

Create a pixelLabelDatastore (Computer Vision Toolbox) to store the labels.

lblLoc = fullfile(preprocessDataLoc,"labelsTr");
classNames = ["background","tumor"];
pixelLabelID = [0 1];
pxds = pixelLabelDatastore(lblLoc,classNames,pixelLabelID, ...
FileExtensions=".mat",ReadFcn=@matRead);

Create a randomPatchExtractionDatastore that extracts random patches from ground truth


images and corresponding pixel label data. Specify a patch size of 132-by-132-by-132 voxels. Specify
"PatchesPerImage" to extract 16 randomly positioned patches from each pair of volumes and labels
during training. Specify a mini-batch size of 8.

patchSize = [132 132 132];


patchPerImage = 16;
miniBatchSize = 8;
patchds = randomPatchExtractionDatastore(volds,pxds,patchSize, ...
PatchesPerImage=patchPerImage);
patchds.MiniBatchSize = miniBatchSize;

19-152
3-D Brain Tumor Segmentation Using Deep Learning

Create a randomPatchExtractionDatastore that extracts patches from the validation image and
pixel label data. You can use validation data to evaluate whether the network is continuously learning,
underfitting, or overfitting as time progresses.

volLocVal = fullfile(preprocessDataLoc,"imagesVal");
voldsVal = imageDatastore(volLocVal,FileExtensions=".mat", ...
ReadFcn=@matRead);

lblLocVal = fullfile(preprocessDataLoc,"labelsVal");
pxdsVal = pixelLabelDatastore(lblLocVal,classNames,pixelLabelID, ...
FileExtensions=".mat",ReadFcn=@matRead);

dsVal = randomPatchExtractionDatastore(voldsVal,pxdsVal,patchSize, ...


PatchesPerImage=patchPerImage);
dsVal.MiniBatchSize = miniBatchSize;

Configure 3-D U-Net

This example uses the 3-D U-Net network [1 on page 19-157]. In U-Net, the initial series of
convolutional layers are interspersed with max pooling layers, successively decreasing the resolution
of the input image. These layers are followed by a series of convolutional layers interspersed with
upsampling operators, successively increasing the resolution of the input image. A batch
normalization layer is introduced before each ReLU layer. The name U-Net comes from the fact that
the network can be drawn with a symmetric shape like the letter U.

Create a default 3-D U-Net network by using the unetLayers (Computer Vision Toolbox) function.
Specify two class segmentation. Also specify valid convolution padding to avoid border artifacts when
using the overlap-tile strategy for prediction of the test volumes.

numChannels = 4;
inputPatchSize = [patchSize numChannels];
numClasses = 2;
[lgraph,outPatchSize] = unet3dLayers(inputPatchSize, ...
numClasses,ConvolutionPadding="valid");

Augment the training and validation data by using the transform function with custom
preprocessing operations specified by the helper function augmentAndCrop3dPatch. This function
is attached to the example as a supporting file. The augmentAndCrop3dPatch function performs
these operations:

1 Randomly rotate and reflect training data to make the training more robust. The function does
not rotate or reflect validation data.
2 Crop response patches to the output size of the network, 44-by-44-by-44 voxels.

dsTrain = transform(patchds, ...


@(patchIn)augmentAndCrop3dPatch(patchIn,outPatchSize,"Training"));
dsVal = transform(dsVal, ...
@(patchIn)augmentAndCrop3dPatch(patchIn,outPatchSize,"Validation"));

To better segment smaller tumor regions and reduce the influence of larger background regions, this
example uses a dicePixelClassificationLayer (Computer Vision Toolbox). Replace the pixel
classification layer with the Dice pixel classification layer.

outputLayer = dicePixelClassificationLayer(Name="Output");
lgraph = replaceLayer(lgraph,"Segmentation-Layer",outputLayer);

19-153
19 Deep Learning

The data has already been normalized in the Preprocess Training and Validation Data on page 19-152
section of this example. Data normalization in the image3dInputLayer (Deep Learning Toolbox) is
unnecessary, so replace the input layer with an input layer that does not have data normalization.
inputLayer = image3dInputLayer(inputPatchSize, ...
Normalization="none",Name="ImageInputLayer");
lgraph = replaceLayer(lgraph,"ImageInputLayer",inputLayer);

Alternatively, you can modify the 3-D U-Net network by using the Deep Network Designer app.
deepNetworkDesigner(lgraph)

Train 3-D U-Net

Specify Training Options

Train the network using the adam optimization solver. Specify the hyperparameter settings using the
trainingOptions (Deep Learning Toolbox) function. The initial learning rate is set to 5e-4 and
gradually decreases over the span of training. You can experiment with the MiniBatchSize property
based on your GPU memory. To maximize GPU memory utilization, favor large input patches over a
large batch size. Note that batch normalization layers are less effective for smaller values of
MiniBatchSize. Tune the initial learning rate based on the MiniBatchSize.
options = trainingOptions("adam", ...
MaxEpochs=50, ...
InitialLearnRate=5e-4, ...

19-154
3-D Brain Tumor Segmentation Using Deep Learning

LearnRateSchedule="piecewise", ...
LearnRateDropPeriod=5, ...
LearnRateDropFactor=0.95, ...
ValidationData=dsVal, ...
ValidationFrequency=400, ...
Plots="training-progress", ...
Verbose=false, ...
MiniBatchSize=miniBatchSize);

Train Network

By default, the example uses the downloaded pretrained 3-D U-Net network. The pretrained network
enables you to perform semantic segmentation and evaluate the segmentation results without waiting
for training to complete.

To train the network, set the doTraining variable in the following code to true. Train the network
using the trainNetwork (Deep Learning Toolbox) function.

Train on a GPU if one is available. Using a GPU requires Parallel Computing Toolbox™ and a CUDA®
enabled NVIDIA® GPU. For more information, see “GPU Computing Requirements” (Parallel
Computing Toolbox). Training takes about 30 hours on a multi-GPU system with 4 NVIDIA™ Titan Xp
GPUs and can take even longer depending on your GPU hardware.

doTraining = ;
if doTraining
[net,info] = trainNetwork(dsTrain,lgraph,options);
modelDateTime = string(datetime("now",Format="yyyy-MM-dd-HH-mm-ss"));
save("trained3DUNet-"+modelDateTime+".mat","net");
end

Evaluate 3-D U-Net

Select the source of test data that contains ground truth volumes and labels for testing. If you keep
the useFullTestSet variable in the following code as false, then the example uses five sample
volumes for testing. If you set the useFullTestSet variable to true, then the example uses 55 test
images selected from the full data set.

useFullTestSet = ;
if useFullTestSet
volLocTest = fullfile(preprocessDataLoc,"imagesTest");
lblLocTest = fullfile(preprocessDataLoc,"labelsTest");
else
volLocTest = fullfile(testDir,"imagesTest");
lblLocTest = fullfile(testDir,"labelsTest");
end

The voldsTest variable stores the ground truth test images. The pxdsTest variable stores the
ground truth labels.
voldsTest = imageDatastore(volLocTest,FileExtensions=".mat", ...
ReadFcn=@matRead);
pxdsTest = pixelLabelDatastore(lblLocTest,classNames,pixelLabelID, ...
FileExtensions=".mat",ReadFcn=@matRead);

For each test volume, process each block using the apply function. The apply function performs the
operations specified by the helper function calculateBlockMetrics, which is defined at the end of

19-155
19 Deep Learning

this example. The calculateBlockMetrics function performs semantic segmentation of each block
and calculates the confusion matrix between the predicted and ground truth labels.
imageIdx = 1;
datasetConfMat = table;
while hasdata(voldsTest)

% Read volume and label data


vol = read(voldsTest);
volLabels = read(pxdsTest);

% Create blockedImage for volume and label data


testVolume = blockedImage(vol);
testLabels = blockedImage(volLabels{1});

% Calculate block metrics


blockConfMatOneImage = apply(testVolume, ...
@(block,labeledBlock) ...
calculateBlockMetrics(block,labeledBlock,net), ...
ExtraImages=testLabels, ...
PadPartialBlocks=true, ...
BlockSize=blockSize, ...
BorderSize=borderSize, ...
UseParallel=false);

% Read all the block results of an image and update the image number
blockConfMatOneImageDS = blockedImageDatastore(blockConfMatOneImage);
blockConfMat = readall(blockConfMatOneImageDS);
blockConfMat = struct2table([blockConfMat{:}]);
blockConfMat.ImageNumber = imageIdx.*ones(height(blockConfMat),1);
datasetConfMat = [datasetConfMat;blockConfMat];

imageIdx = imageIdx + 1;
end

Evaluate the data set metrics and block metrics for the segmentation using the
evaluateSemanticSegmentation (Computer Vision Toolbox) function.
[metrics,blockMetrics] = evaluateSemanticSegmentation( ...
datasetConfMat,classNames,Metrics="all");

Evaluating semantic segmentation results


----------------------------------------
* Selected metrics: global accuracy, class accuracy, IoU, weighted IoU.
* Processed 5 images.
* Finalizing... Done.
* Data set metrics:

GlobalAccuracy MeanAccuracy MeanIoU WeightedIoU


______________ ____________ _______ ___________

0.99902 0.97955 0.95978 0.99808

Display the Jaccard score calculated for each image.


metrics.ImageMetrics.MeanIoU

ans = 5×1

19-156
3-D Brain Tumor Segmentation Using Deep Learning

0.9613
0.9570
0.9551
0.9656
0.9594

Supporting Function

The calculateBlockMetrics helper function performs semantic segmentation of a block and


calculates the confusion matrix between the predicted and ground truth labels. The function returns
a structure with fields containing the confusion matrix and metadata about the block. You can use the
structure with the evaluateSemanticSegmentation function to calculate metrics and aggregate
block-based results.

function blockMetrics = calculateBlockMetrics(bstruct,gtBlockLabels,net)

% Segment block
predBlockLabels = semanticseg(bstruct.Data,net);

% Trim away border region from gtBlockLabels


blockStart = bstruct.BorderSize + 1;
blockEnd = blockStart + bstruct.BlockSize - 1;
gtBlockLabels = gtBlockLabels( ...
blockStart(1):blockEnd(1), ...
blockStart(2):blockEnd(2), ...
blockStart(3):blockEnd(3));

% Evaluate segmentation results against ground truth


confusionMat = segmentationConfusionMatrix(predBlockLabels,gtBlockLabels);

% blockMetrics is a struct with confusion matrices, image number,


% and block information.
blockMetrics.ConfusionMatrix = confusionMat;
blockMetrics.ImageNumber = bstruct.ImageNumber;
blockInfo.Start = bstruct.Start;
blockInfo.End = bstruct.End;
blockMetrics.BlockInfo = blockInfo;

end

References

[1] Çiçek, Ö., A. Abdulkadir, S. S. Lienkamp, T. Brox, and O. Ronneberger. "3D U-Net: Learning Dense
Volumetric Segmentation from Sparse Annotation." In Proceedings of the International Conference on
Medical Image Computing and Computer-Assisted Intervention - MICCAI 2016. Athens, Greece, Oct.
2016, pp. 424-432.

[2] Isensee, F., P. Kickingereder, W. Wick, M. Bendszus, and K. H. Maier-Hein. "Brain Tumor
Segmentation and Radiomics Survival Prediction: Contribution to the BRATS 2017 Challenge." In
Proceedings of BrainLes: International MICCAI Brainlesion Workshop. Quebec City, Canada, Sept.
2017, pp. 287-297.

19-157
19 Deep Learning

[3] "Brain Tumours". Medical Segmentation Decathlon. http://medicaldecathlon.com/

The BraTS dataset is provided by Medical Segmentation Decathlon under the CC-BY-SA 4.0 license.
All warranties and representations are disclaimed; see the license for details. MathWorks® has
modified the data set linked in the Download BraTS Sample Data on page 19-149 section of this
example. The modified sample data set has been cropped to a region containing primarily the brain
and tumor and each channel has been normalized independently by subtracting the mean and
dividing by the standard deviation of the cropped brain region.

[4] Sudre, C. H., W. Li, T. Vercauteren, S. Ourselin, and M. J. Cardoso. "Generalised Dice Overlap as a
Deep Learning Loss Function for Highly Unbalanced Segmentations." Deep Learning in Medical
Image Analysis and Multimodal Learning for Clinical Decision Support: Third International Workshop.
Quebec City, Canada, Sept. 2017, pp. 240-248.

[5] Ronneberger, O., P. Fischer, and T. Brox. "U-Net:Convolutional Networks for Biomedical Image
Segmentation." In Proceedings of the International Conference on Medical Image Computing and
Computer-Assisted Intervention - MICCAI 2015. Munich, Germany, Oct. 2015, pp. 234-241. Available
at arXiv:1505.04597.

See Also
randomPatchExtractionDatastore | trainNetwork | trainingOptions | transform |
pixelLabelDatastore | imageDatastore | semanticseg | dicePixelClassificationLayer

More About
• “Preprocess Volumes for Deep Learning” (Deep Learning Toolbox)
• “Datastores for Deep Learning” (Deep Learning Toolbox)
• “List of Deep Learning Layers” (Deep Learning Toolbox)

External Websites
• 3-D Brain Tumor Segmentation Using Deep Learning Video Tutorial

19-158
Neural Style Transfer Using Deep Learning

Neural Style Transfer Using Deep Learning

This example shows how to apply the stylistic appearance of one image to the scene content of a
second image using a pretrained VGG-19 network.

Load Data

Load the style image and content image. This example uses the distinctive Van Gogh painting "Starry
Night" as the style image and a photograph of a lighthouse as the content image.
styleImage = im2double(imread("starryNight.jpg"));
contentImage = imread("lighthouse.png");

Display the style image and content image as a montage.


imshow(imtile({styleImage,contentImage},BackgroundColor="w"));

Load Feature Extraction Network

In this example, you use a modified pretrained VGG-19 deep neural network to extract the features of
the content and style image at various layers. These multilayer features are used to calculate
respective content and style losses. The network generates the stylized transfer image using the
combined loss.

To get a pretrained VGG-19 network, install vgg19 (Deep Learning Toolbox). If you do not have the
required support packages installed, then the software provides a download link.
net = vgg19;

To make the VGG-19 network suitable for feature extraction, remove all of the fully connected layers
from the network.
lastFeatureLayerIdx = 38;
layers = net.Layers;
layers = layers(1:lastFeatureLayerIdx);

19-159
19 Deep Learning

The max pooling layers of the VGG-19 network cause a fading effect. To decrease the fading effect
and increase the gradient flow, replace all max pooling layers with average pooling layers [1] on page
19-167.

for l = 1:lastFeatureLayerIdx
layer = layers(l);
if isa(layer,"nnet.cnn.layer.MaxPooling2DLayer")
layers(l) = averagePooling2dLayer( ...
layer.PoolSize,Stride=layer.Stride,Name=layer.Name);
end
end

Create a layer graph with the modified layers.

lgraph = layerGraph(layers);

Visualize the feature extraction network in a plot.

plot(lgraph)
title("Feature Extraction Network")

To train the network with a custom training loop and enable automatic differentiation, convert the
layer graph to a dlnetwork object.

dlnet = dlnetwork(lgraph);

Preprocess Data

Resize the style image and content image to a smaller size for faster processing.

imageSize = [384,512];
styleImg = imresize(styleImage,imageSize);
contentImg = imresize(contentImage,imageSize);

The pretrained VGG-19 network performs classification on a channel-wise mean subtracted image.
Get the channel-wise mean from the image input layer, which is the first layer in the network.

19-160
Neural Style Transfer Using Deep Learning

imgInputLayer = lgraph.Layers(1);
meanVggNet = imgInputLayer.Mean(1,1,:);

The values of the channel-wise mean are appropriate for images of floating point data type with pixel
values in the range [0, 255]. Convert the style image and content image to data type single with
range [0, 255]. Then, subtract the channel-wise mean from the style image and content image.

styleImg = rescale(single(styleImg),0,255) - meanVggNet;


contentImg = rescale(single(contentImg),0,255) - meanVggNet;

Initialize Transfer Image

The transfer image is the output image as a result of style transfer. You can initialize the transfer
image with a style image, content image, or any random image. Initialization with a style image or
content image biases the style transfer process and produces a transfer image more similar to the
input image. In contrast, initialization with white noise removes the bias but takes longer to converge
on the stylized image. For better stylization and faster convergence, this example initializes the
output transfer image as a weighted combination of the content image and a white noise image.

noiseRatio = 0.7;
randImage = randi([-20,20],[imageSize 3]);
transferImage = noiseRatio.*randImage + (1-noiseRatio).*contentImg;

Define Loss Functions and Style Transfer Parameters

Content Loss

The objective of content loss is to make the features of the transfer image match the features of the
content image. The content loss is calculated as the mean squared difference between content image
features and transfer image features for each content feature layer [1] on page 19-167. Y is the
predicted feature map for the transfer image and Y is the predicted feature map for the content
th
image. Wcl is the content layer weight for the l layer. H, W, Care the height, width, and channels of
the feature maps, respectively.

1 l
Lcontent = ∑ Wcl × HWC ∑ (Y i, j − Yi,l j)2
l i, j

Specify the content feature extraction layer names. The features extracted from these layers are used
to calculate the content loss. In the VGG-19 network, training is more effective using features from
deeper layers rather than features from shallow layers. Therefore, specify the content feature
extraction layer as the fourth convolutional layer.

styleTransferOptions.contentFeatureLayerNames = "conv4_2";

Specify the weights of the content feature extraction layers.

styleTransferOptions.contentFeatureLayerWeights = 1;

Style Loss

The objective of style loss is to make the texture of the transfer image match the texture of the style
image. The style representation of an image is represented as a Gram matrix. Therefore, the style
loss is calculated as the mean squared difference between the Gram matrix of the style image and the
Gram matrix of the transfer image [1] on page 19-167. Z and Z are the predicted feature maps for the

19-161
19 Deep Learning

style and transfer image, respectively. GZ and GZ are Gram matrices for style features and transfer
th
features, respectively. Wsl is the style layer weight for the l style layer.

GZ = ∑ Zi, j × Z j, i
i, j

GZ = ∑ Zi, j × Z j, i
i, j

1 l
Lstyle = ∑ Wsl × 2 ∑ (GZ − GZl)2
l (2HWC)

Specify the names of the style feature extraction layers. The features extracted from these layers are
used to calculate style loss.

styleTransferOptions.styleFeatureLayerNames = [ ...
"conv1_1","conv2_1","conv3_1","conv4_1","conv5_1"];

Specify the weights of the style feature extraction layers. Specify small weights for simple style
images and increase the weights for complex style images.

styleTransferOptions.styleFeatureLayerWeights = [0.5,1.0,1.5,3.0,4.0];

Total Loss

The total loss is a weighted combination of content loss and style loss. α and β are weight factors for
content loss and style loss, respectively.

Ltotal = α × Lcontent + β × Lstyle

Specify the weight factors alpha and beta for content loss and style loss. The ratio of alpha to
beta should be around 1e-3 or 1e-4 [1] on page 19-167.

styleTransferOptions.alpha = 1;
styleTransferOptions.beta = 1e3;

Specify Training Options

Train for 2500 iterations.

numIterations = 2500;

Specify options for Adam optimization. Set the learning rate to 2 for faster convergence. You can
experiment with the learning rate by observing your output image and losses. Initialize the trailing
average gradient and trailing average gradient-square decay rates with [].

learningRate = 2;
trailingAvg = [];
trailingAvgSq = [];

Train the Network

Convert the style image, content image, and transfer image to dlarray (Deep Learning Toolbox)
objects with underlying type single and dimension labels "SSC".

19-162
Neural Style Transfer Using Deep Learning

dlStyle = dlarray(styleImg,"SSC");
dlContent = dlarray(contentImg,"SSC");
dlTransfer = dlarray(transferImage,"SSC");

Train on a GPU if one is available. Using a GPU requires Parallel Computing Toolbox™ and a CUDA®
enabled NVIDIA® GPU. For more information, see “GPU Computing Requirements” (Parallel
Computing Toolbox). For GPU training, convert the data into a gpuArray.

if canUseGPU
dlContent = gpuArray(dlContent);
dlStyle = gpuArray(dlStyle);
dlTransfer = gpuArray(dlTransfer);
end

Extract the content features from the content image.

numContentFeatureLayers = numel(styleTransferOptions.contentFeatureLayerNames);
contentFeatures = cell(1,numContentFeatureLayers);
[contentFeatures{:}] = forward(dlnet,dlContent,Outputs=styleTransferOptions.contentFeatureLayerNa

Extract the style features from the style image.

numStyleFeatureLayers = numel(styleTransferOptions.styleFeatureLayerNames);
styleFeatures = cell(1,numStyleFeatureLayers);
[styleFeatures{:}] = forward(dlnet,dlStyle,Outputs=styleTransferOptions.styleFeatureLayerNames);

Train the model using a custom training loop. For each iteration:

• Calculate the content loss and style loss using the features of the content image, style image, and
transfer image. To calculate the loss and gradients, use the helper function imageGradients
(defined in the Supporting Functions on page 19-165 section of this example).
• Update the transfer image using the adamupdate (Deep Learning Toolbox) function.
• Select the best style transfer image as the final output image.

figure

minimumLoss = inf;

for iteration = 1:numIterations


% Evaluate the transfer image gradients and state using dlfeval and the
% imageGradients function listed at the end of the example
[grad,losses] = dlfeval(@imageGradients,dlnet,dlTransfer, ...
contentFeatures,styleFeatures,styleTransferOptions);
[dlTransfer,trailingAvg,trailingAvgSq] = adamupdate( ...
dlTransfer,grad,trailingAvg,trailingAvgSq,iteration,learningRate);

if losses.totalLoss < minimumLoss


minimumLoss = losses.totalLoss;
dlOutput = dlTransfer;
end

% Display the transfer image on the first iteration and after every 50
% iterations. The postprocessing steps are described in the "Postprocess
% Transfer Image for Display" section of this example
if mod(iteration,50) == 0 || (iteration == 1)

transferImage = gather(extractdata(dlTransfer));

19-163
19 Deep Learning

transferImage = transferImage + meanVggNet;


transferImage = uint8(transferImage);
transferImage = imresize(transferImage,size(contentImage,[1 2]));

image(transferImage)
title(["Transfer Image After Iteration ",num2str(iteration)])
axis off image
drawnow
end

end

Postprocess Transfer Image for Display

Get the updated transfer image.

transferImage = gather(extractdata(dlOutput));

Add the network-trained mean to the transfer image.

transferImage = transferImage + meanVggNet;

19-164
Neural Style Transfer Using Deep Learning

Some pixel values can exceed the original range [0, 255] of the content and style image. You can clip
the values to the range [0, 255] by converting the data type to uint8.

transferImage = uint8(transferImage);

Resize the transfer image to the original size of the content image.

transferImage = imresize(transferImage,size(contentImage,[1 2]));

Display the content image, transfer image, and style image in a montage.

imshow(imtile({contentImage,transferImage,styleImage}, ...
GridSize=[1 3],BackgroundColor="w"));

Supporting Functions

Calculate Image Loss and Gradients

The imageGradients helper function returns the loss and gradients using features of the content
image, style image, and transfer image.

function [gradients,losses] = imageGradients(dlnet,dlTransfer, ...


contentFeatures,styleFeatures,params)

% Initialize transfer image feature containers


numContentFeatureLayers = numel(params.contentFeatureLayerNames);
numStyleFeatureLayers = numel(params.styleFeatureLayerNames);

transferContentFeatures = cell(1,numContentFeatureLayers);
transferStyleFeatures = cell(1,numStyleFeatureLayers);

% Extract content features of transfer image


[transferContentFeatures{:}] = forward(dlnet,dlTransfer, ...
Outputs=params.contentFeatureLayerNames);

19-165
19 Deep Learning

% Extract style features of transfer image


[transferStyleFeatures{:}] = forward(dlnet,dlTransfer, ...
Outputs=params.styleFeatureLayerNames);

% Calculate content loss


cLoss = contentLoss(transferContentFeatures,contentFeatures, ...
params.contentFeatureLayerWeights);

% Calculate style loss


sLoss = styleLoss(transferStyleFeatures,styleFeatures, ...
params.styleFeatureLayerWeights);

% Calculate final loss as weighted combination of content and style loss


loss = (params.alpha * cLoss) + (params.beta * sLoss);

% Calculate gradient with respect to transfer image


gradients = dlgradient(loss,dlTransfer);

% Extract various losses


losses.totalLoss = gather(extractdata(loss));
losses.contentLoss = gather(extractdata(cLoss));
losses.styleLoss = gather(extractdata(sLoss));

end

Calculate Content Loss

The contentLoss helper function calculates the weighted mean squared difference between the
content image features and the transfer image features.

function loss = contentLoss(transferContentFeatures,contentFeatures,contentWeights)

loss = 0;
for i=1:numel(contentFeatures)
temp = 0.5 .* mean((transferContentFeatures{1,i}-contentFeatures{1,i}).^2,"all");
loss = loss + (contentWeights(i)*temp);
end
end

Calculate Style Loss

The styleLoss helper function calculates the weighted mean squared difference between the Gram
matrix of the style image features and the Gram matrix of the transfer image features.

function loss = styleLoss(transferStyleFeatures,styleFeatures,styleWeights)

loss = 0;
for i=1:numel(styleFeatures)

tsf = transferStyleFeatures{1,i};
sf = styleFeatures{1,i};
[h,w,c] = size(sf);

gramStyle = calculateGramMatrix(sf);
gramTransfer = calculateGramMatrix(tsf);
sLoss = mean((gramTransfer - gramStyle).^2,"all") / ((h*w*c)^2);

loss = loss + (styleWeights(i)*sLoss);

19-166
Neural Style Transfer Using Deep Learning

end
end

Calculate Gram Matrix

The calculateGramMatrix helper function is used by the styleLoss helper function to calculate
the Gram matrix of a feature map.

function gramMatrix = calculateGramMatrix(featureMap)


[H,W,C] = size(featureMap);
reshapedFeatures = reshape(featureMap,H*W,C);
gramMatrix = reshapedFeatures' * reshapedFeatures;
end

References

[1] Leon A. Gatys, Alexander S. Ecker, and Matthias Bethge. "A Neural Algorithm of Artistic Style."
Preprint, submitted September 2, 2015. https://arxiv.org/abs/1508.06576

See Also
vgg19 | trainNetwork | trainingOptions | dlarray

More About
• “Get Started with GANs for Image-to-Image Translation” on page 19-39
• “Define Custom Training Loops, Loss Functions, and Networks” (Deep Learning Toolbox)
• “Specify Training Options in Custom Training Loop” (Deep Learning Toolbox)
• “Train Network Using Custom Training Loop” (Deep Learning Toolbox)
• “List of Functions with dlarray Support” (Deep Learning Toolbox)
• “List of Deep Learning Layers” (Deep Learning Toolbox)

19-167
19 Deep Learning

Unsupervised Day-to-Dusk Image Translation Using UNIT

This example shows how to translate images between daytime and dusk lighting conditions using an
unsupervised image-to-image translation network (UNIT).

Domain translation is the task of transferring styles and characteristics from one image domain to
another. This technique can be extended to other image-to-image learning operations, such as image
enhancement, image colorization, defect generation, and medical image analysis.

UNIT [1] on page 19-177 is a type of generative adversarial network (GAN) that consists of one
generator network and two discriminator networks that you train simultaneously to maximize the
overall performance. For more information about UNIT, see “Get Started with GANs for Image-to-
Image Translation” on page 19-39.

Download Data Set

This example uses the CamVid data set [2] on page 19-177 from the University of Cambridge for
training. This data set is a collection of 701 images containing street-level views obtained while
driving.

Specify dataDir as the desired location of the data. Download the CamVid data set using the helper
function downloadCamVidImageData. This function is attached to the example as a supporting file.

dataDir = fullfile(tempdir,"CamVid");
downloadCamVidImageData(dataDir);
imgDir = fullfile(dataDir,"images","701_StillsRaw_full");

Load Day and Dusk Data

The CamVid image data set includes 497 images acquired in daytime and 124 images acquired at
dusk. The performance of the trained UNIT network is limited because the number of CamVid
training images is relatively small, which limits the performance of the trained network. Further,
some images belong to an image sequence and therefore are correlated with other images in the data
set. To minimize the impact of these limitations, this example manually partitions the data into
training and test data sets in a way that maximizes the variability of the training data.

Get the file names of the day and dusk images for training and testing by loading the file
camvidDayDuskDatasetFileNames.mat. The training data sets consist of 263 day images and 107
dusk images. The test data sets consist of 234 day images and 17 dusk images.

load("camvidDayDuskDatasetFileNames.mat");

Create imageDatastore objects that manage the day and dusk images for training and testing.

imdsDayTrain = imageDatastore(fullfile(imgDir,trainDayNames));
imdsDuskTrain = imageDatastore(fullfile(imgDir,trainDuskNames));
imdsDayTest = imageDatastore(fullfile(imgDir,testDayNames));
imdsDuskTest = imageDatastore(fullfile(imgDir,testDuskNames));

Preview a training image from the day and dusk training data sets.

day = preview(imdsDayTrain);
dusk = preview(imdsDuskTrain);
montage({day,dusk})

19-168
Unsupervised Day-to-Dusk Image Translation Using UNIT

Preprocess and Augment Training Data

Specify the image input size for the source and target images.

inputSize = [256,256,3];

Augment and preprocess the training data by using the transform function with custom
preprocessing operations specified by the helper function augmentDataForDayToDusk. This
function is attached to the example as a supporting file.

The augmentDataForDayToDusk function performs these operations:

1 Resize the image to the specified input size using bicubic interpolation.
2 Randomly flip the image in the horizontal direction.
3 Scale the image to the range [-1, 1]. This range matches the range of the final tanhLayer (Deep
Learning Toolbox) used in the generator.

imdsDayTrain = transform(imdsDayTrain, @(x)augmentDataForDayToDusk(x,inputSize));


imdsDuskTrain = transform(imdsDuskTrain, @(x)augmentDataForDayToDusk(x,inputSize));

Create Generator Network

Create a UNIT generator network using the unitGenerator function. The source and target
encoder sections of the generator each consist of two downsampling blocks and five residual blocks.
The encoder sections share two of the five residual blocks. Similarly, the source and target decoder
sections of the generator each consist of two downsampling blocks and five residual blocks, and the
decoder sections share two of the five residual blocks.

gen = unitGenerator(inputSize,NumResidualBlocks=5,NumSharedBlocks=2);

Visualize the generator network.

analyzeNetwork(gen)

19-169
19 Deep Learning

Create Discriminator Networks

Create two discriminator networks, one for each of the source and target domains, using the
patchGANDiscriminator function. Day is the source domain and dusk is the target domain.
discDay = patchGANDiscriminator(inputSize,NumDownsamplingBlocks=4,FilterSize=3, ...
ConvolutionWeightsInitializer="narrow-normal",NormalizationLayer="none");
discDusk = patchGANDiscriminator(inputSize,NumDownsamplingBlocks=4,FilterSize=3, ...
ConvolutionWeightsInitializer="narrow-normal",NormalizationLayer="none");

Visualize the discriminator networks.


analyzeNetwork(discDay);
analyzeNetwork(discDusk);

Define Model Gradients and Loss Functions

The modelGradientDisc and modelGradientGen helper functions calculate the gradients and
losses for the discriminators and generator, respectively. These functions are defined in the
Supporting Functions on page 19-175 section of this example.

The objective of each discriminator is to correctly distinguish between real images (1) and translated
images (0) for images in its domain. Each discriminator has a single loss function.

The objective of the generator is to generate translated images that the discriminators classify as
real. The generator loss is a weighted sum of five types of losses: self-reconstruction loss, cycle
consistency loss, hidden KL loss, cycle hidden KL loss, and adversarial loss.

Specify the weight factors for the various losses.


lossWeights.selfReconLossWeight = 10;
lossWeights.hiddenKLLossWeight = 0.01;
lossWeights.cycleConsisLossWeight = 10;
lossWeights.cycleHiddenKLLossWeight = 0.01;
lossWeights.advLossWeight = 1;
lossWeights.discLossWeight = 0.5;

Specify Training Options

Specify the options for Adam optimization. Train the network for 35 epochs. Specify identical options
for the generator and discriminator networks.

• Specify an equal learning rate of 0.0001.


• Initialize the trailing average gradient and trailing average gradient-square decay rates with [].
• Use a gradient decay factor of 0.5 and a squared gradient decay factor of 0.999.
• Use weight decay regularization with a factor of 0.0001.
• Use a mini-batch size of 1 for training.

learnRate = 0.0001;
gradDecay = 0.5;
sqGradDecay = 0.999;
weightDecay = 0.0001;

genAvgGradient = [];
genAvgGradientSq = [];

19-170
Unsupervised Day-to-Dusk Image Translation Using UNIT

discDayAvgGradient = [];
discDayAvgGradientSq = [];

discDuskAvgGradient = [];
discDuskAvgGradientSq = [];

miniBatchSize = 1;
numEpochs = 35;

Batch Training Data

Create a minibatchqueue (Deep Learning Toolbox) object that manages the mini-batching of
observations in a custom training loop. The minibatchqueue object also casts data to a dlarray
(Deep Learning Toolbox) object that enables automatic differentiation in deep learning applications.

Specify the mini-batch data extraction format as "SSCB" (spatial, spatial, channel, batch). Set the
DispatchInBackground name-value argument as the boolean returned by canUseGPU. If a
supported GPU is available for computation, then the minibatchqueue object preprocesses mini-
batches in the background in a parallel pool during training.
mbqDayTrain = minibatchqueue(imdsDayTrain,MiniBatchSize=miniBatchSize, ...
MiniBatchFormat="SSCB",DispatchInBackground=canUseGPU);
mbqDuskTrain = minibatchqueue(imdsDuskTrain,MiniBatchSize=miniBatchSize, ...
MiniBatchFormat="SSCB",DispatchInBackground=canUseGPU);

Train Network

By default, the example downloads a pretrained version of the UNIT generator for the CamVid data
set. The pretrained network enables you to run the entire example without waiting for training to
complete.

To train the network, set the doTraining variable in the following code to true. Train the model in
a custom training loop. For each iteration:

• Read the data for the current mini-batch using the next (Deep Learning Toolbox) function.
• Evaluate the model gradients using the dlfeval (Deep Learning Toolbox) function and the
modelGradientDisc and modelGradientGen helper functions.
• Update the network parameters using the adamupdate (Deep Learning Toolbox) function.
• Display the input and translated images for both the source and target domains after each epoch.

Train on a GPU if one is available. Using a GPU requires Parallel Computing Toolbox™ and a CUDA®
enabled NVIDIA® GPU. For more information, see “GPU Computing Requirements” (Parallel
Computing Toolbox). Training takes about 88 hours on an NVIDIA Titan RTX.
doTraining = false;
if doTraining
% Create a figure to show the results
figure(Units="Normalized");
for iPlot = 1:4
ax(iPlot) = subplot(2,2,iPlot);
end

iteration = 0;

% Loop over epochs


for epoch = 1:numEpochs

19-171
19 Deep Learning

% Shuffle data every epoch


reset(mbqDayTrain);
shuffle(mbqDayTrain);
reset(mbqDuskTrain);
shuffle(mbqDuskTrain);

% Run the loop until all the images in the mini-batch queue
% mbqDayTrain are processed
while hasdata(mbqDayTrain)
iteration = iteration + 1;

% Read data from the day domain


imDay = next(mbqDayTrain);

% Read data from the dusk domain


if hasdata(mbqDuskTrain) == 0
reset(mbqDuskTrain);
shuffle(mbqDuskTrain);
end
imDusk = next(mbqDuskTrain);

% Calculate discriminator gradients and losses


[discDayGrads,discDuskGrads,discDayLoss,disDuskLoss] = dlfeval( ...
@modelGradientDisc,gen,discDay,discDusk,imDay,imDusk, ...
lossWeights.discLossWeight);

% Apply weight decay regularization on day discriminator gradients


discDayGrads = dlupdate(@(g,w) g+weightDecay*w, ...
discDayGrads,discDay.Learnables);

% Update parameters of day discriminator


[discDay,discDayAvgGradient,discDayAvgGradientSq] = adamupdate( ...
discDay,discDayGrads,discDayAvgGradient,discDayAvgGradientSq, ...
iteration,learnRate,gradDecay,sqGradDecay);

% Apply weight decay regularization on dusk discriminator gradients


discDuskGrads = dlupdate(@(g,w) g+weightDecay*w, ...
discDuskGrads,discDusk.Learnables);

% Update parameters of dusk discriminator


[discDusk,discDuskAvgGradient,discDuskAvgGradientSq] = adamupdate( ...
discDusk,discDuskGrads,discDuskAvgGradient,discDuskAvgGradientSq, ...
iteration,learnRate,gradDecay,sqGradDecay);

% Calculate generator gradient and loss


[genGrad,genLoss,images] = dlfeval( ...
@modelGradientGen,gen,discDay,discDusk,imDay,imDusk,lossWeights);

% Apply weight decay regularization on generator gradients


genGrad = dlupdate(@(g,w) g+weightDecay*w,genGrad,gen.Learnables);

% Update parameters of generator


[gen,genAvgGradient,genAvgGradientSq] = adamupdate( ...
gen,genGrad,genAvgGradient,genAvgGradientSq, ...
iteration,learnRate,gradDecay,sqGradDecay);
end

19-172
Unsupervised Day-to-Dusk Image Translation Using UNIT

% Display the results


updateTrainingPlotDayToDusk(ax,images{:});
end

% Save the trained network


modelDateTime = string(datetime("now",Format="yyyy-MM-dd-HH-mm-ss"));
save(fullfile(dataDir,"trainedDayDuskUNITGeneratorNet-"+modelDateTime+".mat"),"gen");

else
net_url = "https://ssd.mathworks.com/supportfiles/"+ ...
"vision/data/trainedDayDuskUNITGeneratorNet.zip";
downloadTrainedNetwork(net_url,dataDir);
load(fullfile(dataDir,"trainedDayDuskUNITGeneratorNet.mat"));
end

Evaluate Source-to-Target Translation

Source-to-target image translation uses the UNIT generator to generate an image in the target
domain (dusk) from an image in the source domain (day).

Read an image from the datastore of day test images.

idxToTest = 1;
dayTestImage = readimage(imdsDayTest,idxToTest);

Convert the image to data type single and normalize the image to the range [-1, 1].

dayTestImage = im2single(dayTestImage);
dayTestImage = (dayTestImage-0.5)/0.5;

Create a dlarray object that inputs data to the generator. If a supported GPU is available for
computation, then perform inference on a GPU by converting the data to a gpuArray object.

dlDayImage = dlarray(dayTestImage,"SSCB");
if canUseGPU
dlDayImage = gpuArray(dlDayImage);
end

Translate the input day image to the dusk domain using the unitPredict function.

dlDayToDuskImage = unitPredict(gen,dlDayImage);
dayToDuskImage = extractdata(gather(dlDayToDuskImage));

The final layer of the generator network produces activations in the range [-1, 1]. For display, rescale
the activations to the range [0, 1]. Also, rescale the input day image before display.

dayToDuskImage = rescale(dayToDuskImage);
dayTestImage = rescale(dayTestImage);

Display the input day image and its translated dusk version in a montage.

figure
montage({dayTestImage dayToDuskImage})
title("Day Test Image "+num2str(idxToTest)+" with Translated Dusk Image")

19-173
19 Deep Learning

Evaluate Target-to-Source Translation

Target-to-source image translation uses the UNIT generator to generate an image in the source
domain (day) from an image in the target domain (dusk).

Read an image from the datastore of dusk test images.


idxToTest = 1;
duskTestImage = readimage(imdsDuskTest,idxToTest);

Convert the image to data type single and normalize the image to the range [-1, 1].
duskTestImage = im2single(duskTestImage);
duskTestImage = (duskTestImage-0.5)/0.5;

Create a dlarray object that inputs data to the generator. If a supported GPU is available for
computation, then perform inference on a GPU by converting the data to a gpuArray object.
dlDuskImage = dlarray(duskTestImage,"SSCB");
if canUseGPU
dlDuskImage = gpuArray(dlDuskImage);
end

Translate the input dusk image to the day domain using the unitPredict function.
dlDuskToDayImage = unitPredict(gen,dlDuskImage,OutputType="TargetToSource");
duskToDayImage = extractdata(gather(dlDuskToDayImage));

For display, rescale the activations to the range [0, 1]. Also, rescale the input dusk image before
display.
duskToDayImage = rescale(duskToDayImage);
duskTestImage = rescale(duskTestImage);

Display the input dusk image and its translated day version in a montage.
montage({duskTestImage duskToDayImage})
title("Test Dusk Image "+num2str(idxToTest)+" with Translated Day Image")

19-174
Unsupervised Day-to-Dusk Image Translation Using UNIT

Supporting Functions

Model Gradients Functions

The modelGradientDisc helper function calculates the gradients and loss for the two
discriminators.

function [discAGrads,discBGrads,discALoss,discBLoss] = modelGradientDisc(gen, ...


discA,discB,ImageA,ImageB,discLossWeight)

[~,fakeA,fakeB,~] = forward(gen,ImageA,ImageB);

% Calculate loss of the discriminator for X_A


outA = forward(discA,ImageA);
outfA = forward(discA,fakeA);
discALoss = discLossWeight*computeDiscLoss(outA,outfA);

% Update parameters of the discriminator for X


discAGrads = dlgradient(discALoss,discA.Learnables);

% Calculate loss of the discriminator for X_B


outB = forward(discB,ImageB);
outfB = forward(discB,fakeB);
discBLoss = discLossWeight*computeDiscLoss(outB,outfB);

% Update parameters of the discriminator for Y


discBGrads = dlgradient(discBLoss,discB.Learnables);

% Convert the data type from dlarray to single


discALoss = extractdata(discALoss);
discBLoss = extractdata(discBLoss);
end

The modelGradientGen helper function calculates the gradients and loss for the generator.

function [genGrad,genLoss,images] = modelGradientGen(gen, ...


discA,discB,ImageA,ImageB,lossWeights)

19-175
19 Deep Learning

[ImageAA,ImageBA,ImageAB,ImageBB] = forward(gen,ImageA,ImageB);
hidden = forward(gen,ImageA,ImageB,Outputs="encoderSharedBlock");

[~,ImageABA,ImageBAB,~] = forward(gen,ImageBA,ImageAB);
cycle_hidden = forward(gen,ImageBA,ImageAB,Outputs="encoderSharedBlock");

% Calculate different losses


selfReconLoss = computeReconLoss(ImageA,ImageAA) + computeReconLoss(ImageB,ImageBB);
hiddenKLLoss = computeKLLoss(hidden);
cycleReconLoss = computeReconLoss(ImageA,ImageABA) + computeReconLoss(ImageB,ImageBAB);
cycleHiddenKLLoss = computeKLLoss(cycle_hidden);

outA = forward(discA,ImageBA);
outB = forward(discB,ImageAB);
advLoss = computeAdvLoss(outA) + computeAdvLoss(outB);

% Calculate the total loss of generator as a weighted sum of five losses


genTotalLoss = ...
selfReconLoss*lossWeights.selfReconLossWeight + ...
hiddenKLLoss*lossWeights.hiddenKLLossWeight + ...
cycleReconLoss*lossWeights.cycleConsisLossWeight + ...
cycleHiddenKLLoss*lossWeights.cycleHiddenKLLossWeight + ...
advLoss*lossWeights.advLossWeight;

% Update the parameters of generator


genGrad = dlgradient(genTotalLoss,gen.Learnables);

% Convert the data type from dlarray to single


genLoss = extractdata(genTotalLoss);
images = {ImageA,ImageAB,ImageB,ImageBA};
end

Loss Functions

The computeDiscLoss helper function calculates the discriminator loss. Each discriminator loss is a
sum of two components:

• The squared difference between a vector of ones and the predictions of the discriminator on real
images, Y real
• The squared difference between a vector of zeros and the predictions of the discriminator on
generated images, Y translated

2 2
discriminatorLoss = 1 − Y real + 0 − Y translated

function discLoss = computeDiscLoss(Yreal,Ytranslated)


discLoss = mean(((1-Yreal).^2),"all") + ...
mean(((0-Ytranslated).^2),"all");
end

The computeAdvLoss helper function calculates the adversarial loss for the generator. Adversarial
loss is the squared difference between a vector of ones and the discriminator predictions on the
translated image.

2
adversarialLoss = 1 − Y translated

19-176
Unsupervised Day-to-Dusk Image Translation Using UNIT

function advLoss = computeAdvLoss(Ytranslated)


advLoss = mean(((Ytranslated-1).^2),"all");
end

The computeReconLoss helper function calculates the self-reconstruction loss and cycle-consistency
loss for the generator. Self-reconstruction loss is the L1 distance between the input images and their
self-reconstructed versions. Cycle-consistency loss is the L1 distance between the input images and
their cycle-reconstructed versions.

selfReconstructionLoss = Y real − Y self − reconstructed 1

cycleConsistencyLoss = Y real − Y cycle − reconstructed 1

function reconLoss = computeReconLoss(Yreal,Yrecon)


reconLoss = mean(abs(Yreal-Yrecon),"all");
end

The computeKLLoss helper function calculates the hidden KL loss and cycle-hidden KL loss for the
generator. Hidden KL loss is the squared difference between a vector of zeros and the
encoderSharedBlock activation for the self-reconstruction stream. Cycle-hidden KL loss is the
squared difference between a vector of zeros and the encoderSharedBlock activation for the cycle-
reconstruction stream.

2
hiddenKLLoss = 0 − Y encoderSharedBlockActivation

2
cycleHiddenKLLoss = 0 − Y encoderSharedBlockActivation

function klLoss = computeKLLoss(hidden)


klLoss = mean(abs(hidden.^2),"all");
end

References

[1] Liu, Ming-Yu, Thomas Breuel, and Jan Kautz, "Unsupervised image-to-image translation networks".
In Advances in Neural Information Processing Systems, 2017. https://arxiv.org/abs/1703.00848.

[2] Brostow, Gabriel J., Julien Fauqueur, and Roberto Cipolla. "Semantic Object Classes in Video: A
High-Definition Ground Truth Database." Pattern Recognition Letters. Vol. 30, Issue 2, 2009, pp
88-97.

See Also
transform | unitGenerator | unitPredict | dlarray | dlfeval | adamupdate |
minibatchqueue | patchGANDiscriminator

More About
• “Get Started with GANs for Image-to-Image Translation” on page 19-39
• “Datastores for Deep Learning” (Deep Learning Toolbox)

19-177
19 Deep Learning

• “Define Custom Training Loops, Loss Functions, and Networks” (Deep Learning Toolbox)
• “Define Model Loss Function for Custom Training Loop” (Deep Learning Toolbox)
• “Specify Training Options in Custom Training Loop” (Deep Learning Toolbox)
• “Train Network Using Custom Training Loop” (Deep Learning Toolbox)

19-178
Quantify Image Quality Using Neural Image Assessment

Quantify Image Quality Using Neural Image Assessment

This example shows how to analyze the aesthetic quality of images using a Neural Image Assessment
(NIMA) convolutional neural network (CNN).

Image quality metrics provide an objective measure of image quality. An effective metric provides
quantitative scores that correlate well with a subjective perception of quality by a human observer.
Quality metrics enable the comparison of image processing algorithms.

NIMA [1] on page 19-190 is a no-reference technique that predicts the quality of an image without
relying on a pristine reference image, which is frequently unavailable. NIMA uses a CNN to predict a
distribution of quality scores for each image.

Evaluate Image Quality Using Trained NIMA Model

Set dataDir as the desired location of the data set.

dataDir = fullfile(tempdir,"LIVEInTheWild");
if ~exist(dataDir,"dir")
mkdir(dataDir);
end

Download a pretrained NIMA neural network by using the helper function


downloadTrainedNetwork. The helper function is attached to the example as a supporting file. This
model predicts a distribution of quality scores for each image in the range [1, 10], where 1 and 10 are
the lowest and the highest possible values for the score, respectively. A high score indicates good
image quality.

trainedNet_url = "https://ssd.mathworks.com/supportfiles/image/data/trainedNIMA.zip";
downloadTrainedNetwork(trainedNet_url,dataDir);
load(fullfile(dataDir,"trainedNIMA.mat"));

You can evaluate the effectiveness of the NIMA model by comparing the predicted scores for a high-
quality and lower quality image.

Read a high-quality image into the workspace.

imOriginal = imread("kobi.png");

Reduce the aesthetic quality of the image by applying a Gaussian blur. Display the original image and
the blurred image in a montage. Subjectively, the aesthetic quality of the blurred image is worse than
the quality of the original image.

imBlur = imgaussfilt(imOriginal,5);
montage({imOriginal,imBlur})

19-179
19 Deep Learning

Predict the NIMA quality score distribution for the two images using the predictNIMAScore helper
function. This function is attached to the example as a supporting file.

The predictNIMAScore function returns the mean and standard deviation of the NIMA score
distribution for an image. The predicted mean score is a measure of the quality of the image. The
standard deviation of scores can be considered a measure of the confidence level of the predicted
mean score.

[meanOriginal,stdOriginal] = predictNIMAScore(dlnet,imOriginal);
[meanBlur,stdBlur] = predictNIMAScore(dlnet,imBlur);

Display the images along with the mean and standard deviation of the score distributions predicted
by the NIMA model. The NIMA model correctly predicts scores for these images that agree with the
subjective visual assessment.

figure
t = tiledlayout(1,2);
displayImageAndScoresForNIMA(t,imOriginal,meanOriginal,stdOriginal,"Original Image")
displayImageAndScoresForNIMA(t,imBlur,meanBlur,stdBlur,"Blurred Image")

19-180
Quantify Image Quality Using Neural Image Assessment

The rest of this example shows how to train and evaluate a NIMA model.

Download LIVE In the Wild Data Set

This example uses the LIVE In the Wild data set [2] on page 19-190, which is a public-domain
subjective image quality challenge database. The data set contains 1162 photos captured by mobile
devices, with 7 additional images provided to train the human scorers. Each image is rated by an
average of 175 individuals on a scale of [1, 100]. The data set provides the mean and standard
deviation of the subjective scores for each image.

Download the data set by following the instructions outlined in LIVE In the Wild Image Quality
Challenge Database. Extract the data into the directory specified by the dataDir variable. When
extraction is successful, dataDir contains two directories: Data and Images.

Load LIVE In the Wild Data

Get the file paths to the images.

imageData = load(fullfile(dataDir,"Data","AllImages_release.mat"));
imageData = imageData.AllImages_release;
nImg = length(imageData);
imageList(1:7) = fullfile(dataDir,"Images","trainingImages",imageData(1:7));
imageList(8:nImg) = fullfile(dataDir,"Images",imageData(8:end));

Create an image datastore that manages the image data.

19-181
19 Deep Learning

imds = imageDatastore(imageList);

Load the mean and standard deviation data corresponding to the images.
meanData = load(fullfile(dataDir,"Data","AllMOS_release.mat"));
meanData = meanData.AllMOS_release;
stdData = load(fullfile(dataDir,"Data","AllStdDev_release.mat"));
stdData = stdData.AllStdDev_release;

Optionally, display a few sample images from the data set with the corresponding mean and standard
deviation values.
figure
t = tiledlayout(1,3);
idx1 = 785;
displayImageAndScoresForNIMA(t,readimage(imds,idx1), ...
meanData(idx1),stdData(idx1),"Image "+imageData(idx1))
idx2 = 203;
displayImageAndScoresForNIMA(t,readimage(imds,idx2), ...
meanData(idx2),stdData(idx2),"Image "+imageData(idx2))
idx3 = 777;
displayImageAndScoresForNIMA(t,readimage(imds,idx3), ...
meanData(idx3),stdData(idx3),"Image "+imageData(idx3))

Preprocess and Augment Data

Preprocess the images by resizing them to 256-by-256 pixels.

19-182
Quantify Image Quality Using Neural Image Assessment

rescaleSize = [256 256];


imds = transform(imds,@(x)imresize(x,rescaleSize));

The NIMA model requires a distribution of human scores, but the LIVE data set provides only the
mean and standard deviation of the distribution. Approximate an underlying distribution for each
image in the LIVE data set using the createNIMAScoreDistribution helper function. This
function is attached to the example as a supporting file.

The createNIMAScoreDistribution rescales the scores to the range [1, 10], then generates
maximum entropy distribution of scores from the mean and standard deviation values.

newMaxScore = 10;
prob = createNIMAScoreDistribution(meanData,stdData);
cumProb = cumsum(prob,2);

Create an arrayDatastore that manages the score distributions.

probDS = arrayDatastore(cumProb',IterationDimension=2);

Combine the datastores containing the image data and score distribution data.

dsCombined = combine(imds,probDS);

Preview the output of reading from the combined datastore.

sampleRead = preview(dsCombined)

sampleRead=1×2 cell array


{256×256×3 uint8} {10×1 double}

figure
tiledlayout(1,2)
nexttile
imshow(sampleRead{1})
title("Sample Image from Data Set")
nexttile
plot(sampleRead{2})
title("Cumulative Score Distribution")

19-183
19 Deep Learning

Split Data for Training, Validation, and Testing

Partition the data into training, validation, and test sets. Allocate 70% of the data for training, 15%
for validation, and the remainder for testing.

numTrain = floor(0.70 * nImg);


numVal = floor(0.15 * nImg);

Idx = randperm(nImg);
idxTrain = Idx(1:numTrain);
idxVal = Idx(numTrain+1:numTrain+numVal);
idxTest = Idx(numTrain+numVal+1:nImg);

dsTrain = subset(dsCombined,idxTrain);
dsVal = subset(dsCombined,idxVal);
dsTest = subset(dsCombined,idxTest);

Augment Training Data

Augment the training data using the augmentDataForNIMA helper function. This function is
attached to the example as a supporting file. The augmentDataForNIMA function performs these
augmentation operations on each training image:

• Crop the image to 224-by-244 pixels to reduce overfitting.


• Flip the image horizontally with 50% probability.

19-184
Quantify Image Quality Using Neural Image Assessment

inputSize = [224 224];


dsTrain = transform(dsTrain,@(x)augmentDataForNIMA(x,inputSize));

Calculate Training Set Statistics for Input Normalization

The input layer of the network performs z-score normalization of the training images. Calculate the
mean and standard deviation of the training images for use in z-score normalization.
meanImage = zeros([inputSize 3]);
meanImageSq = zeros([inputSize 3]);
while hasdata(dsTrain)
dat = read(dsTrain);
img = double(dat{1});
meanImage = meanImage + img;
meanImageSq = meanImageSq + img.^2;
end
meanImage = meanImage/numTrain;
meanImageSq = meanImageSq/numTrain;
varImage = meanImageSq - meanImage.^2;
stdImage = sqrt(varImage);

Reset the datastore to its initial state.


reset(dsTrain);

Batch Training Data

Create a minibatchqueue (Deep Learning Toolbox) object that manages the mini-batching of
observations in a custom training loop. The minibatchqueue object also casts data to a dlarray
(Deep Learning Toolbox) object that enables automatic differentiation in deep learning applications.

Specify the mini-batch data extraction format as "SSCB" (spatial, spatial, channel, batch). Set the
"DispatchInBackground" name-value argument to the boolean returned by canUseGPU. If a
supported GPU is available for computation, then the minibatchqueue object preprocesses mini-
batches in the background in a parallel pool during training.
miniBatchSize = 128;
mbqTrain = minibatchqueue(dsTrain,MiniBatchSize=miniBatchSize, ...
PartialMiniBatch="discard",MiniBatchFormat=["SSCB",""], ...
DispatchInBackground=canUseGPU);
mbqVal = minibatchqueue(dsVal,MiniBatchSize=miniBatchSize, ...
MiniBatchFormat=["SSCB",""],DispatchInBackground=canUseGPU);

Load and Modify MobileNet-v2 Network

This example starts with a MobileNet-v2 [3] on page 19-190 CNN trained on ImageNet [4] on page
19-190. The example modifies the network by replacing the last layer of the MobileNet-v2 network
with a fully connected layer with 10 neurons, each representing a discrete score from 1 through 10.
The network predicts the probability of each score for each image. The example normalizes the
outputs of the fully connected layer using a softmax activation layer.

The mobilenetv2 (Deep Learning Toolbox) function returns a pretrained MobileNet-v2 network.
This function requires the Deep Learning Toolbox™ Model for MobileNet-v2 Network support
package. If this support package is not installed, then the function provides a download link.
net = mobilenetv2;

Convert the network into a layerGraph (Deep Learning Toolbox) object.

19-185
19 Deep Learning

lgraph = layerGraph(net);

The network has an image input size of 224-by-224 pixels. Replace the input layer with an image
input layer that performs z-score normalization on the image data using the mean and standard
deviation of the training images.

inLayer = imageInputLayer([inputSize 3],Name="input", ...


Normalization="zscore",Mean=meanImage,StandardDeviation=stdImage);
lgraph = replaceLayer(lgraph,"input_1",inLayer);

Replace the original final classification layer with a fully connected layer with 10 neurons. Add a
softmax layer to normalize the outputs. Set the learning rate of the fully connected layer to 10 times
the learning rate of the baseline CNN layers. Apply a dropout of 75%.

lgraph = removeLayers(lgraph,["ClassificationLayer_Logits","Logits_softmax","Logits"]);
newFinalLayers = [
dropoutLayer(0.75,Name="drop")
fullyConnectedLayer(newMaxScore,Name="fc",WeightLearnRateFactor=10,BiasLearnRateFactor=10)
softmaxLayer(Name="prob")];
lgraph = addLayers(lgraph,newFinalLayers);
lgraph = connectLayers(lgraph,"global_average_pooling2d_1","drop");
dlnet = dlnetwork(lgraph);

Visualize the network using the Deep Network Designer (Deep Learning Toolbox) app.

deepNetworkDesigner(lgraph)

Define Model Gradients and Loss Functions

The modelGradients helper function calculates the gradients and losses for each iteration of
training the network. This function is defined in the Supporting Functions on page 19-189 section of
this example.

The objective of the NIMA network is to minimize the earth mover's distance (EMD) between the
ground truth and predicted score distributions. EMD loss considers the distance between classes
when penalizing misclassification. Therefore, EMD loss performs better than a typical softmax cross-
entropy loss used in classification tasks [5] on page 19-190. This example calculates the EMD loss
using the earthMoverDistance helper function, which is defined in the Supporting Functions on
page 19-189 section of this example.

For the EMD loss function, use an r-norm distance with r = 2. This distance allows for easy
optimization when you work with gradient descent.

Specify Training Options

Specify the options for SGDM optimization. Train the network for 150 epochs.

numEpochs = 150;
momentum = 0.9;
initialLearnRate = 3e-3;
decay = 0.95;

Train Network

By default, the example loads a pretrained version of the NIMA network. The pretrained network
enables you to run the entire example without waiting for training to complete.

19-186
Quantify Image Quality Using Neural Image Assessment

To train the network, set the doTraining variable in the following code to true. Train the model in
a custom training loop. For each iteration:

• Read the data for the current mini-batch using the next (Deep Learning Toolbox) function.
• Evaluate the model gradients using the dlfeval (Deep Learning Toolbox) function and the
modelGradients helper function.
• Update the network parameters using the sgdmupdate (Deep Learning Toolbox) function.

Train on a GPU if one is available. Using a GPU requires Parallel Computing Toolbox™ and a CUDA®
enabled NVIDIA® GPU. For more information, see “GPU Computing Requirements” (Parallel
Computing Toolbox).
doTraining = false;
if doTraining
iteration = 0;
velocity = [];
start = tic;

[hFig,lineLossTrain,lineLossVal] = initializeTrainingPlotNIMA;

for epoch = 1:numEpochs

shuffle (mbqTrain);
learnRate = initialLearnRate/(1+decay*floor(epoch/10));

while hasdata(mbqTrain)
iteration = iteration + 1;
[dlX,cdfY] = next(mbqTrain);
[grad,loss] = dlfeval(@modelGradients,dlnet,dlX,cdfY);
[dlnet,velocity] = sgdmupdate(dlnet,grad,velocity,learnRate,momentum);

updateTrainingPlotNIMA(lineLossTrain,loss,epoch,iteration,start)
end

% Add validation data to plot


[~,lossVal,~] = modelPredictions(dlnet,mbqVal);
updateTrainingPlotNIMA(lineLossVal,lossVal,epoch,iteration,start)

end

% Save the trained network


modelDateTime = string(datetime("now",Format="yyyy-MM-dd-HH-mm-ss"));
save(fullfile(dataDir,"trainedNIMA-"+modelDateTime+".mat"),"dlnet");

else
load(fullfile(dataDir,"trainedNIMA.mat"));
end

Evaluate NIMA Model

Evaluate the performance of the model on the test data set using three metrics: EMD, binary
classification accuracy, and correlation coefficients. The performance of the NIMA network on the
test data set is in agreement with the performance of the reference NIMA model reported by Talebi
and Milanfar [1] on page 19-190.

Create a minibatchqueue (Deep Learning Toolbox) object that manages the mini-batching of test
data.

19-187
19 Deep Learning

mbqTest = minibatchqueue(dsTest,MiniBatchSize=miniBatchSize,MiniBatchFormat=["SSCB",""]);

Calculate the predicted probabilities and ground truth cumulative probabilities of mini-batches of test
data using the modelPredictions function. This function is defined in the Supporting Functions on
page 19-189 section of this example.
[YPredTest,~,cdfYTest] = modelPredictions(dlnet,mbqTest);

Calculate the mean and standard deviation values of the ground truth and predicted distributions.
meanPred = extractdata(YPredTest)' * (1:10)';
stdPred = sqrt(extractdata(YPredTest)'*((1:10).^2)' - meanPred.^2);
origCdf = extractdata(cdfYTest);
origPdf = [origCdf(1,:); diff(origCdf)];
meanOrig = origPdf' * (1:10)';
stdOrig = sqrt(origPdf'*((1:10).^2)' - meanOrig.^2);

Calculate EMD

Calculate the EMD of the ground truth and predicted score distributions. For prediction, use an r-
norm distance with r = 1. The EMD value indicates the closeness of the predicted and ground truth
rating distributions.
EMDTest = earthMoverDistance(YPredTest,cdfYTest,1)

EMDTest =
1×1 single gpuArray dlarray

0.0974

Calculate Binary Classification Accuracy

For binary classification accuracy, convert the distributions to two classifications: high-quality and
low-quality. Classify images with a mean score greater than a threshold as high-quality.
qualityThreshold = 5;
binaryPred = meanPred > qualityThreshold;
binaryOrig = meanOrig > qualityThreshold;

Calculate the binary classification accuracy.


binaryAccuracy = 100 * sum(binaryPred==binaryOrig)/length(binaryPred)

binaryAccuracy =

81.8182

Calculate Correlation Coefficients

Large correlation values indicate a large positive correlation between the ground truth and predicted
scores. Calculate the linear correlation coefficient (LCC) and Spearman’s rank correlation coefficient
(SRCC) for the mean scores.
meanLCC = corr(meanOrig,meanPred)

meanLCC =

gpuArray single

19-188
Quantify Image Quality Using Neural Image Assessment

0.8270

meanSRCC = corr(meanOrig,meanPred,type="Spearman")

meanSRCC =

gpuArray single

0.8133

Supporting Functions

Model Gradients Function

The modelGradients function takes as input a dlnetwork object dlnet and a mini-batch of input
data dlX with corresponding target cumulative probabilities cdfY. The function returns the gradients
of the loss with respect to the learnable parameters in dlnet as well as the loss. To compute the
gradients automatically, use the dlgradient function.

function [gradients,loss] = modelGradients(dlnet,dlX,cdfY)


dlYPred = forward(dlnet,dlX);
loss = earthMoverDistance(dlYPred,cdfY,2);
gradients = dlgradient(loss,dlnet.Learnables);
end

Loss Function

The earthMoverDistance function calculates the EMD between the ground truth and predicted
distributions for a specified r-norm value. The earthMoverDistance uses the computeCDF helper
function to calculate the cumulative probabilities of the predicted distribution.

function loss = earthMoverDistance(YPred,cdfY,r)


N = size(cdfY,1);
cdfYPred = computeCDF(YPred);
cdfDiff = (1/N) * (abs(cdfY - cdfYPred).^r);
lossArray = sum(cdfDiff,1).^(1/r);
loss = mean(lossArray);

end
function cdfY = computeCDF(Y)
% Given a probability mass function Y, compute the cumulative probabilities
[N,miniBatchSize] = size(Y);
L = repmat(triu(ones(N)),1,1,miniBatchSize);
L3d = permute(L,[1 3 2]);
prod = Y.*L3d;
prodSum = sum(prod,1);
cdfY = reshape(prodSum(:)',miniBatchSize,N)';
end

Model Predictions Function

The modelPredictions function calculates the estimated probabilities, loss, and ground truth
cumulative probabilities of mini-batches of data.

function [dlYPred,loss,cdfYOrig] = modelPredictions(dlnet,mbq)


reset(mbq);
loss = 0;

19-189
19 Deep Learning

numObservations = 0;
dlYPred = [];
cdfYOrig = [];

while hasdata(mbq)
[dlX,cdfY] = next(mbq);
miniBatchSize = size(dlX,4);

dlY = predict(dlnet,dlX);
loss = loss + earthMoverDistance(dlY,cdfY,2)*miniBatchSize;
dlYPred = [dlYPred dlY];
cdfYOrig = [cdfYOrig cdfY];

numObservations = numObservations + miniBatchSize;

end
loss = loss / numObservations;
end

References

[1] Talebi, Hossein, and Peyman Milanfar. “NIMA: Neural Image Assessment.” IEEE Transactions on
Image Processing 27, no. 8 (August 2018): 3998–4011. https://doi.org/10.1109/TIP.2018.2831899.

[2] LIVE: Laboratory for Image and Video Engineering. "LIVE In the Wild Image Quality Challenge
Database." https://live.ece.utexas.edu/research/ChallengeDB/index.html.

[3] Sandler, Mark, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen.
“MobileNetV2: Inverted Residuals and Linear Bottlenecks.” In 2018 IEEE/CVF Conference on
Computer Vision and Pattern Recognition, 4510–20. Salt Lake City, UT: IEEE, 2018. https://doi.org/
10.1109/CVPR.2018.00474.

[4] ImageNet. https://www.image-net.org.

[5] Hou, Le, Chen-Ping Yu, and Dimitris Samaras. “Squared Earth Mover’s Distance-Based Loss for
Training Deep Neural Networks.” Preprint, submitted November 30, 2016. https://arxiv.org/abs/
1611.05916.

See Also
mobilenetv2 | transform | layerGraph | dlnetwork | minibatchqueue | predict | dlfeval |
sgdmupdate

More About
• “Image Quality Metrics” on page 14-2
• “Datastores for Deep Learning” (Deep Learning Toolbox)
• “Define Custom Training Loops, Loss Functions, and Networks” (Deep Learning Toolbox)

19-190
Quantify Image Quality Using Neural Image Assessment

• “Define Model Loss Function for Custom Training Loop” (Deep Learning Toolbox)
• “Specify Training Options in Custom Training Loop” (Deep Learning Toolbox)
• “Train Network Using Custom Training Loop” (Deep Learning Toolbox)

19-191
19 Deep Learning

Unsupervised Medical Image Denoising Using CycleGAN

This example shows how to generate high-quality high-dose computed tomography (CT) images from
noisy low-dose CT images using a CycleGAN neural network.

Note: This example references the Low Dose CT Grand Challenge data set, as accessed on May 1,
2021. The example uses chest images from the data set that are now under restricted access. To run
this example, you must have a compatible data set with low-dose and high-dose CT images, and adapt
the data preprocessing and training options to suit your data.

X-ray CT is a popular imaging modality used in clinical and industrial applications because it
produces high-quality images and offers superior diagnostic capabilities. To protect the safety of
patients, clinicians recommend a low radiation dose. However, a low radiation dose results in a lower
signal-to-noise ratio (SNR) in the images, and therefore reduces the diagnostic accuracy.

Deep learning techniques can improve the image quality for low-dose CT (LDCT) images. Using a
generative adversarial network (GAN) for image-to-image translation, you can convert noisy LDCT
images to images of the same quality as regular-dose CT images. For this application, the source
domain consists of LDCT images and the target domain consists of regular-dose images. For more
information, see “Get Started with GANs for Image-to-Image Translation” on page 19-39.

CT image denoising requires a GAN that performs unsupervised training because clinicians do not
typically acquire matching pairs of low-dose and regular-dose CT images of the same patient in the
same session. This example uses a cycle-consistent GAN (CycleGAN) trained on patches of image
data from a large sample of data. For a similar approach using a UNIT neural network trained on full
images from a limited sample of data, see “Unsupervised Medical Image Denoising Using UNIT” on
page 19-206.

19-192
Unsupervised Medical Image Denoising Using CycleGAN

Download LDCT Data Set

This example uses data from the Low Dose CT Grand Challenge [2, 3, 4]. The data includes pairs of
regular-dose CT images and simulated low-dose CT images for 99 head scans (labeled N for neuro),
100 chest scans (labeled C for chest), and 100 abdomen scans (labeled L for liver). The size of the
data set is 1.2 TB.

Specify dataDir as the desired location of the data set.

dataDir = fullfile(tempdir,"LDCT","LDCT-and-Projection-data");

To download the data, go to the Cancer Imaging Archive website. This example uses only images from
the chest. Download the chest files from the "Images (DICOM, 952 GB)" data set into the directory
specified by dataDir using the NBIA Data Retriever. When the download is successful, dataDir
contains 50 subfolders with names such as "C002" and "C004", ending with "C296".

Create Datastores for Training, Validation, and Testing

The LDCT data set provides pairs of low-dose and high-dose CT images. However, the CycleGAN
architecture requires unpaired data for unsupervised learning. This example simulates unpaired
training and validation data by partitioning images such that the patients used to obtain low-dose CT
and high-dose CT images do not overlap. The example retains pairs of low-dose and regular-dose
images for testing.

Split the data into training, validation, and test data sets using the createLDCTFolderList helper
function. This function is attached to the example as a supporting file. The helper function splits the

19-193
19 Deep Learning

data such that there is roughly good representation of the two types of images in each group.
Approximately 80% of the data is used for training, 15% is used for testing, and 5% is used for
validation.

maxDirsForABodyPart = 25;
[filesTrainLD,filesTrainHD,filesTestLD,filesTestHD,filesValLD,filesValHD] = ...
createLDCTFolderList(dataDir,maxDirsForABodyPart);

Create image datastores that contain training and validation images for both domains, namely low-
dose CT images and high-dose CT images. The data set consists of DICOM images, so use the custom
ReadFcn name-value argument in imageDatastore to enable reading the data.

exts = ".dcm";
readFcn = @(x)dicomread(x);
imdsTrainLD = imageDatastore(filesTrainLD,FileExtensions=exts,ReadFcn=readFcn);
imdsTrainHD = imageDatastore(filesTrainHD,FileExtensions=exts,ReadFcn=readFcn);
imdsValLD = imageDatastore(filesValLD,FileExtensions=exts,ReadFcn=readFcn);
imdsValHD = imageDatastore(filesValHD,FileExtensions=exts,ReadFcn=readFcn);
imdsTestLD = imageDatastore(filesTestLD,FileExtensions=exts,ReadFcn=readFcn);
imdsTestHD = imageDatastore(filesTestHD,FileExtensions=exts,ReadFcn=readFcn);

The number of low-dose and high-dose images can differ. Select a subset of the files such that the
number of images is equal.

numTrain = min(numel(imdsTrainLD.Files),numel(imdsTrainHD.Files));
imdsTrainLD = subset(imdsTrainLD,1:numTrain);
imdsTrainHD = subset(imdsTrainHD,1:numTrain);

numVal = min(numel(imdsValLD.Files),numel(imdsValHD.Files));
imdsValLD = subset(imdsValLD,1:numVal);
imdsValHD = subset(imdsValHD,1:numVal);

numTest = min(numel(imdsTestLD.Files),numel(imdsTestHD.Files));
imdsTestLD = subset(imdsTestLD,1:numTest);
imdsTestHD = subset(imdsTestHD,1:numTest);

Preprocess and Augment Data

Preprocess the data by using the transform function with custom preprocessing operations
specified by the normalizeCTImages helper function. This function is attached to the example as a
supporting file. The normalizeCTImages function rescales the data to the range [-1, 1].

timdsTrainLD = transform(imdsTrainLD,@(x){normalizeCTImages(x)});
timdsTrainHD = transform(imdsTrainHD,@(x){normalizeCTImages(x)});
timdsValLD = transform(imdsValLD,@(x){normalizeCTImages(x)});
timdsValHD = transform(imdsValHD,@(x){normalizeCTImages(x)});
timdsTestLD = transform(imdsTestLD,@(x){normalizeCTImages(x)});
timdsTestHD = transform(imdsTestHD,@(x){normalizeCTImages(x)});

Combine the low-dose and high-dose training data by using a randomPatchExtractionDatastore.


When reading from this datastore, augment the data using random rotation and horizontal reflection.

inputSize = [128,128,1];
augmenter = imageDataAugmenter(RandRotation=@()90*(randi([0,1],1)),RandXReflection=true);
dsTrain = randomPatchExtractionDatastore(timdsTrainLD,timdsTrainHD, ...
inputSize(1:2),PatchesPerImage=16,DataAugmentation=augmenter);

19-194
Unsupervised Medical Image Denoising Using CycleGAN

Combine the validation data by using a randomPatchExtractionDatastore. You do not need to


perform augmentation when reading validation data.

dsVal = randomPatchExtractionDatastore(timdsValLD,timdsValHD,inputSize(1:2));

Visualize Data Set

Look at a few low-dose and high-dose image patch pairs from the training set. Notice that the image
pairs of low-dose (left) and high-dose (right) images are unpaired, as they are from different patients.

numImagePairs = 6;
imagePairsTrain = [];
for i = 1:numImagePairs
imLowAndHighDose = read(dsTrain);
inputImage = imLowAndHighDose.InputImage{1};
inputImage = rescale(im2single(inputImage));
responseImage = imLowAndHighDose.ResponseImage{1};
responseImage = rescale(im2single(responseImage));
imagePairsTrain = cat(4,imagePairsTrain,inputImage,responseImage);
end
montage(imagePairsTrain,Size=[numImagePairs 2],BorderSize=4,BackgroundColor="w")

19-195
19 Deep Learning

Batch Training and Validation Data During Training

This example uses a custom training loop. The minibatchqueue (Deep Learning Toolbox) object is
useful for managing the mini-batching of observations in custom training loops. The
minibatchqueue object also casts data to a dlarray object that enables auto differentiation in
deep learning applications.

19-196
Unsupervised Medical Image Denoising Using CycleGAN

Process the mini-batches by concatenating image patches along the batch dimension using the helper
function concatenateMiniBatchLD2HDCT. This function is attached to the example as a supporting
file. Specify the mini-batch data extraction format as "SSCB" (spatial, spatial, channel, batch).
Discard any partial mini-batches with less than miniBatchSize observations.
miniBatchSize = 32;

mbqTrain = minibatchqueue(dsTrain, ...


MiniBatchSize=miniBatchSize, ...
MiniBatchFcn=@concatenateMiniBatchLD2HDCT, ...
PartialMiniBatch="discard", ...
MiniBatchFormat="SSCB");
mbqVal = minibatchqueue(dsVal, ...
MiniBatchSize=miniBatchSize, ...
MiniBatchFcn=@concatenateMiniBatchLD2HDCT, ...
PartialMiniBatch="discard", ...
MiniBatchFormat="SSCB");

Create Generator and Discriminator Networks

The CycleGAN consists of two generators and two discriminators. The generators perform image-to-
image translation from low-dose to high-dose and vice versa. The discriminators are PatchGAN
networks that return the patch-wise probability that the input data is real or generated. One
discriminator distinguishes between the real and generated low-dose images and the other
discriminator distinguishes between real and generated high-dose images.

Create each generator network using the cycleGANGenerator function. For an input size of 256-
by-256 pixels, specify the NumResidualBlocks argument as 9. By default, the function has 3
encoder modules and uses 64 filters in the first convolutional layer.
numResiduals = 6;
genHD2LD = cycleGANGenerator(inputSize,NumResidualBlocks=numResiduals,NumOutputChannels=1);
genLD2HD = cycleGANGenerator(inputSize,NumResidualBlocks=numResiduals,NumOutputChannels=1);

Create each discriminator network using the patchGANDiscriminator function. Use the default
settings for the number of downsampling blocks and number of filters in the first convolutional layer
in the discriminators.
discLD = patchGANDiscriminator(inputSize);
discHD = patchGANDiscriminator(inputSize);

Define Loss Functions and Scores

The modelGradients helper function calculates the gradients and losses for the discriminators and
generators. This function is defined in the Supporting Functions on page 19-203 section of this
example.

The objective of the generator is to generate translated images that the discriminators classify as
real. The generator loss is a weighted sum of three types of losses: adversarial loss, cycle consistency
loss, and fidelity loss. Fidelity loss is based on structural similarity (SSIM) loss.

LTotal = LAdversarial + λ * LCycle consistency + LFidelity

Specify the weighting factor λ that controls the relative significance of the cycle consistency loss with
the adversarial and fidelity losses.
lambda = 10;

19-197
19 Deep Learning

The objective of each discriminator is to correctly distinguish between real images (1) and translated
images (0) for images in its domain. Each discriminator has a single loss function that relies on the
mean squared error (MSE) between the expected and predicted output.

Specify Training Options

Train with a mini-batch size of 32 for 3 epochs.

numEpochs = 3;
miniBatchSize = 32;

Specify the options for Adam optimization. For both generator and discriminator networks, use:

• A learning rate of 0.0002


• A gradient decay factor of 0.5
• A squared gradient decay factor of 0.999

learnRate = 0.0002;
gradientDecay = 0.5;
squaredGradientDecayFactor = 0.999;

Initialize Adam parameters for the generators and discriminators.

avgGradGenLD2HD = [];
avgSqGradGenLD2HD = [];
avgGradGenHD2LD = [];
avgSqGradGenHD2LD = [];
avgGradDiscLD = [];
avgSqGradDiscLD = [];
avgGradDiscHD = [];
avgSqGradDiscHD = [];

Display the generated validation images every 100 iterations.

validationFrequency = 100;

Train or Download Model

By default, the example downloads a pretrained version of the CycleGAN generator for low-dose to
high-dose CT. The pretrained network enables you to run the entire example without waiting for
training to complete.

To train the network, set the doTraining variable in the following code to true. Train the model in
a custom training loop. For each iteration:

• Read the data for the current mini-batch using the next (Deep Learning Toolbox) function.
• Evaluate the model gradients using the dlfeval (Deep Learning Toolbox) function and the
modelGradients helper function.
• Update the network parameters using the adamupdate (Deep Learning Toolbox) function.
• Display the input and translated images for both the source and target domains after each epoch.

Train on a GPU if one is available. Using a GPU requires Parallel Computing Toolbox™ and a CUDA®
enabled NVIDIA® GPU. For more information, see “GPU Computing Requirements” (Parallel
Computing Toolbox). Training takes about 30 hours on an NVIDIA™ Titan X with 24 GB of GPU
memory.

19-198
Unsupervised Medical Image Denoising Using CycleGAN

doTraining = false;
if doTraining

iteration = 0;
start = tic;

% Create a directory to store checkpoints


checkpointDir = fullfile(dataDir,"checkpoints");
if ~exist(checkpointDir,"dir")
mkdir(checkpointDir);
end

% Initialize plots for training progress


[figureHandle,tileHandle,imageAxes,scoreAxesX,scoreAxesY, ...
lineScoreGenLD2HD,lineScoreGenD2LD,lineScoreDiscHD,lineScoreDiscLD] = ...
initializeTrainingPlotLD2HDCT_CycleGAN;

for epoch = 1:numEpochs

shuffle(mbqTrain);

% Loop over mini-batches


while hasdata(mbqTrain)
iteration = iteration + 1;

% Read mini-batch of data


[imageLD,imageHD] = next(mbqTrain);

% Convert mini-batch of data to dlarray and specify the dimension labels


% "SSCB" (spatial, spatial, channel, batch)
imageLD = dlarray(imageLD,"SSCB");
imageHD = dlarray(imageHD,"SSCB");

% If training on a GPU, then convert data to gpuArray


if canUseGPU
imageLD = gpuArray(imageLD);
imageHD = gpuArray(imageHD);
end

% Calculate the loss and gradients


[genHD2LDGrad,genLD2HDGrad,discrXGrad,discYGrad, ...
genHD2LDState,genLD2HDState,scores,imagesOutLD2HD,imagesOutHD2LD] = ...
dlfeval(@modelGradients,genLD2HD,genHD2LD, ...
discLD,discHD,imageHD,imageLD,lambda);
genHD2LD.State = genHD2LDState;
genLD2HD.State = genLD2HDState;

% Update parameters of discLD, which distinguishes


% the generated low-dose CT images from real low-dose CT images
[discLD.Learnables,avgGradDiscLD,avgSqGradDiscLD] = ...
adamupdate(discLD.Learnables,discrXGrad,avgGradDiscLD, ...
avgSqGradDiscLD,iteration,learnRate,gradientDecay,squaredGradientDecayFactor);

% Update parameters of discHD, which distinguishes


% the generated high-dose CT images from real high-dose CT images
[discHD.Learnables,avgGradDiscHD,avgSqGradDiscHD] = ...
adamupdate(discHD.Learnables,discYGrad,avgGradDiscHD, ...
avgSqGradDiscHD,iteration,learnRate,gradientDecay,squaredGradientDecayFactor);

19-199
19 Deep Learning

% Update parameters of genHD2LD, which


% generates low-dose CT images from high-dose CT images
[genHD2LD.Learnables,avgGradGenHD2LD,avgSqGradGenHD2LD] = ...
adamupdate(genHD2LD.Learnables,genHD2LDGrad,avgGradGenHD2LD, ...
avgSqGradGenHD2LD,iteration,learnRate,gradientDecay,squaredGradientDecayFactor);

% Update parameters of genLD2HD, which


% generates high-dose CT images from low-dose CT images
[genLD2HD.Learnables,avgGradGenLD2HD,avgSqGradGenLD2HD] = ...
adamupdate(genLD2HD.Learnables,genLD2HDGrad,avgGradGenLD2HD, ...
avgSqGradGenLD2HD,iteration,learnRate,gradientDecay,squaredGradientDecayFactor);

% Update the plots of network scores


updateTrainingPlotLD2HDCT_CycleGAN(scores,iteration,epoch,start,scoreAxesX,scoreAxesY
lineScoreGenLD2HD,lineScoreGenD2LD, ...
lineScoreDiscHD,lineScoreDiscLD)

% Every validationFrequency iterations, display a batch of


% generated images using the held-out generator input
if mod(iteration,validationFrequency) == 0 || iteration == 1
displayGeneratedLD2HDCTImages(mbqVal,imageAxes,genLD2HD,genHD2LD);
end
end

% Save the model after each epoch


if canUseGPU
[genLD2HD,genHD2LD,discLD,discHD] = ...
gather(genLD2HD,genHD2LD,discLD,discHD);
end
generatorHighDoseToLowDose = genHD2LD;
generatorLowDoseToHighDose = genLD2HD;
discriminatorLowDose = discLD;
discriminatorHighDose = discHD;
modelDateTime = string(datetime("now",Format="yyyy-MM-dd-HH-mm-ss"));
save(checkpointDir+filesep+"LD2HDCTCycleGAN-"+modelDateTime+"-Epoch-"+epoch+".mat", ...
'generatorLowDoseToHighDose','generatorHighDoseToLowDose', ...
'discriminatorLowDose','discriminatorHighDose');
end

% Save the final model


modelDateTime = string(datetime("now",Format="yyyy-MM-dd-HH-mm-ss"));
save(checkpointDir+filesep+"trainedLD2HDCTCycleGANNet-"+modelDateTime+".mat", ...
'generatorLowDoseToHighDose','generatorHighDoseToLowDose', ...
'discriminatorLowDose','discriminatorHighDose');

else
net_url = "https://www.mathworks.com/supportfiles/vision/data/trainedLD2HDCTCycleGANNet.mat";
downloadTrainedNetwork(net_url,dataDir);
load(fullfile(dataDir,"trainedLD2HDCTCycleGANNet.mat"));
end

Generate New Images Using Test Data

Define the number of test images to use for calculating quality metrics. Randomly select two test
images to display.

19-200
Unsupervised Medical Image Denoising Using CycleGAN

numTest = timdsTestLD.numpartitions;
numImagesToDisplay = 2;
idxImagesToDisplay = randi(numTest,1,numImagesToDisplay);

Initialize variables to calculate PSNR and SSIM.

origPSNR = zeros(numTest,1);
generatedPSNR = zeros(numTest,1);
origSSIM = zeros(numTest,1);
generatedSSIM = zeros(numTest,1);

To generate new translated images, use the predict (Deep Learning Toolbox) function. Read images
from the test data set and use the trained generators to generate new images.

for idx = 1:numTest


imageTestLD = read(timdsTestLD);
imageTestHD = read(timdsTestHD);

imageTestLD = cat(4,imageTestLD{1});
imageTestHD = cat(4,imageTestHD{1});

% Convert mini-batch of data to dlarray and specify the dimension labels


% "SSCB" (spatial, spatial, channel, batch)
imageTestLD = dlarray(imageTestLD,"SSCB");
imageTestHD = dlarray(imageTestHD,"SSCB");

% If running on a GPU, then convert data to gpuArray


if canUseGPU
imageTestLD = gpuArray(imageTestLD);
imageTestHD = gpuArray(imageTestHD);
end

% Generate translated images


generatedImageHD = predict(generatorLowDoseToHighDose,imageTestLD);
generatedImageLD = predict(generatorHighDoseToLowDose,imageTestHD);

% Display a few images to visualize the network responses


if ismember(idx,idxImagesToDisplay)
figure
origImLD = rescale(extractdata(imageTestLD));
genImHD = rescale(extractdata(generatedImageHD));
montage({origImLD,genImHD},Size=[1 2],BorderSize=5)
title("Original LDCT Test Image "+idx+" (Left), Generated HDCT Image (Right)")
end

origPSNR(idx) = psnr(imageTestLD,imageTestHD);
generatedPSNR(idx) = psnr(generatedImageHD,imageTestHD);

origSSIM(idx) = multissim(imageTestLD,imageTestHD);
generatedSSIM(idx) = multissim(generatedImageHD,imageTestHD);
end

19-201
19 Deep Learning

Calculate the average PSNR of the original and generated images. A larger PSNR value indicates
better image quality.

disp("Average PSNR of original images: "+mean(origPSNR,"all"));

Average PSNR of original images: 20.4045

19-202
Unsupervised Medical Image Denoising Using CycleGAN

disp("Average PSNR of generated images: "+mean(generatedPSNR,"all"));

Average PSNR of generated images: 27.9155

Calculate the average SSIM of the original and generated images. An SSIM value closer to 1 indicates
better image quality.

disp("Average SSIM of original images: "+mean(origSSIM,"all"));

Average SSIM of original images: 0.76651

disp("Average SSIM of generated images: "+mean(generatedSSIM,"all"));

Average SSIM of generated images: 0.90194

Supporting Functions

Model Gradients Function

The function modelGradients takes as input the two generator and discriminator dlnetwork
objects and a mini-batch of input data. The function returns the gradients of the loss with respect to
the learnable parameters in the networks and the scores of the four networks. Because the
discriminator outputs are not in the range [0, 1], the modelGradients function applies the sigmoid
function to convert discriminator outputs into probability scores.

function [genHD2LDGrad,genLD2HDGrad,discLDGrad,discHDGrad, ...


genHD2LDState,genLD2HDState,scores,imagesOutLDAndHDGenerated,imagesOutHDAndLDGenerated] = ...
modelGradients(genLD2HD,genHD2LD,discLD,discHD,imageHD,imageLD,lambda)

% Translate images from one domain to another: low-dose to high-dose and


% vice versa
[imageLDGenerated,genHD2LDState] = forward(genHD2LD,imageHD);
[imageHDGenerated,genLD2HDState] = forward(genLD2HD,imageLD);

% Calculate predictions for real images in each domain by the corresponding


% discriminator networks
predRealLD = forward(discLD,imageLD);
predRealHD = forward(discHD,imageHD);

% Calculate predictions for generated images in each domain by the


% corresponding discriminator networks
predGeneratedLD = forward(discLD,imageLDGenerated);
predGeneratedHD = forward(discHD,imageHDGenerated);

% Calculate discriminator losses for real images


discLDLossReal = lossReal(predRealLD);
discHDLossReal = lossReal(predRealHD);

% Calculate discriminator losses for generated images


discLDLossGenerated = lossGenerated(predGeneratedLD);
discHDLossGenerated = lossGenerated(predGeneratedHD);

% Calculate total discriminator loss for each discriminator network


discLDLossTotal = 0.5*(discLDLossReal + discLDLossGenerated);
discHDLossTotal = 0.5*(discHDLossReal + discHDLossGenerated);

% Calculate generator loss for generated images


genLossHD2LD = lossReal(predGeneratedLD);

19-203
19 Deep Learning

genLossLD2HD = lossReal(predGeneratedHD);

% Complete the round-trip (cycle consistency) outputs by applying the


% generator to each generated image to get the images in the corresponding
% original domains
cycleImageLD2HD2LD = forward(genHD2LD,imageHDGenerated);
cycleImageHD2LD2HD = forward(genLD2HD,imageLDGenerated);

% Calculate cycle consistency loss between real and generated images


cycleLossLD2HD2LD = cycleConsistencyLoss(imageLD,cycleImageLD2HD2LD,lambda);
cycleLossHD2LD2HD = cycleConsistencyLoss(imageHD,cycleImageHD2LD2HD,lambda);

% Calculate identity outputs


identityImageLD = forward(genHD2LD,imageLD);
identityImageHD = forward(genLD2HD,imageHD);

% Calculate fidelity loss (SSIM) between the identity outputs


fidelityLossLD = mean(1-multissim(identityImageLD,imageLD),"all");
fidelityLossHD = mean(1-multissim(identityImageHD,imageHD),"all");

% Calculate total generator loss


genLossTotal = genLossHD2LD + cycleLossHD2LD2HD + ...
genLossLD2HD + cycleLossLD2HD2LD + fidelityLossLD + fidelityLossHD;

% Calculate scores of generators


genHD2LDScore = mean(sigmoid(predGeneratedLD),"all");
genLD2HDScore = mean(sigmoid(predGeneratedHD),"all");

% Calculate scores of discriminators


discLDScore = 0.5*mean(sigmoid(predRealLD),"all") + ...
0.5*mean(1-sigmoid(predGeneratedLD),"all");
discHDScore = 0.5*mean(sigmoid(predRealHD),"all") + ...
0.5*mean(1-sigmoid(predGeneratedHD),"all");

% Combine scores into cell array


scores = {genHD2LDScore,genLD2HDScore,discLDScore,discHDScore};

% Calculate gradients of generators


genLD2HDGrad = dlgradient(genLossTotal,genLD2HD.Learnables,RetainData=true);
genHD2LDGrad = dlgradient(genLossTotal,genHD2LD.Learnables,RetainData=true);

% Calculate gradients of discriminators


discLDGrad = dlgradient(discLDLossTotal,discLD.Learnables,RetainData=true);
discHDGrad = dlgradient(discHDLossTotal,discHD.Learnables);

% Return mini-batch of images transforming low-dose CT into high-dose CT


imagesOutLDAndHDGenerated = {imageLD,imageHDGenerated};

% Return mini-batch of images transforming high-dose CT into low-dose CT


imagesOutHDAndLDGenerated = {imageHD,imageLDGenerated};
end

Loss Functions

Specify MSE loss functions for real and generated images.

function loss = lossReal(predictions)


loss = mean((1-predictions).^2,"all");

19-204
Unsupervised Medical Image Denoising Using CycleGAN

end

function loss = lossGenerated(predictions)


loss = mean((predictions).^2,"all");
end

Specify cycle consistency loss functions for real and generated images.

function loss = cycleConsistencyLoss(imageReal,imageGenerated,lambda)


loss = mean(abs(imageReal-imageGenerated),"all") * lambda;
end

References

[1] Zhu, Jun-Yan, Taesung Park, Phillip Isola, and Alexei A. Efros. “Unpaired Image-to-Image
Translation Using Cycle-Consistent Adversarial Networks.” In 2017 IEEE International Conference on
Computer Vision (ICCV), 2242–51. Venice: IEEE, 2017. https://doi.org/10.1109/ICCV.2017.244.

[2] McCollough, Cynthia, Baiyu Chen, David R Holmes III, Xinhui Duan, Zhicong Yu, Lifeng Yu, Shuai
Leng, and Joel Fletcher. “Low Dose CT Image and Projection Data (LDCT-and-Projection-Data).” The
Cancer Imaging Archive, 2020. https://doi.org/10.7937/9NPB-2637.

[3] Grants EB017095 and EB017185 (Cynthia McCollough, PI) from the National Institute of
Biomedical Imaging and Bioengineering.

[4] Clark, Kenneth, Bruce Vendt, Kirk Smith, John Freymann, Justin Kirby, Paul Koppel, Stephen
Moore, et al. “The Cancer Imaging Archive (TCIA): Maintaining and Operating a Public Information
Repository.” Journal of Digital Imaging 26, no. 6 (December 2013): 1045–57. https://doi.org/10.1007/
s10278-013-9622-7.

[5] You, Chenyu, Qingsong Yang, Hongming Shan, Lars Gjesteby, Guang Li, Shenghong Ju, Zhuiyang
Zhang, et al. “Structurally-Sensitive Multi-Scale Deep Neural Network for Low-Dose CT Denoising.”
IEEE Access 6 (2018): 41839–55. https://doi.org/10.1109/ACCESS.2018.2858196.

See Also
cycleGANGenerator | patchGANDiscriminator | transform | combine | minibatchqueue |
dlarray | dlfeval | adamupdate

Related Examples
• “Unsupervised Medical Image Denoising Using UNIT” on page 19-206

More About
• “Get Started with GANs for Image-to-Image Translation” on page 19-39
• “Datastores for Deep Learning” (Deep Learning Toolbox)
• “Define Custom Training Loops, Loss Functions, and Networks” (Deep Learning Toolbox)
• “Define Model Loss Function for Custom Training Loop” (Deep Learning Toolbox)
• “Specify Training Options in Custom Training Loop” (Deep Learning Toolbox)
• “Train Network Using Custom Training Loop” (Deep Learning Toolbox)

19-205
19 Deep Learning

Unsupervised Medical Image Denoising Using UNIT

This example shows how to generate high-quality computed tomography (CT) images from noisy low-
dose CT images using a UNIT neural network.

Note: This example references the Low Dose CT Grand Challenge data set, as accessed on May 1,
2021. The example uses chest images from the data set that are now under restricted access. To run
this example, you must have a compatible data set with low-dose and high-dose CT images, and adapt
the data preprocessing and training options to suit your data.

This example uses an unsupervised image-to-image translation (UNIT) neural network trained on full
images from a limited sample of data. For a similar approach using a CycleGAN neural network
trained on patches of image data from a large sample of data, see “Unsupervised Medical Image
Denoising Using CycleGAN” on page 19-192.

X-ray CT is a popular imaging modality used in clinical and industrial applications because it
produces high-quality images and offers superior diagnostic capabilities. To protect the safety of
patients, clinicians recommend a low radiation dose. However, a low radiation dose results in a lower
signal-to-noise ratio (SNR) in the images, and therefore reduces the diagnostic accuracy.

Deep learning techniques offer solutions to improve the image quality for low-dose CT (LDCT)
images. Using a generative adversarial network (GAN) for image-to-image translation, you can
convert noisy LDCT images to images of the same quality as regular-dose CT images. For this
application, the source domain consists of LDCT images and the target domain consists of regular-
dose images.

CT image denoising requires a GAN that performs unsupervised training because clinicians do not
typically acquire matching pairs of low-dose and regular-dose CT images of the same patient in the
same session. This example uses a UNIT architecture that supports unsupervised training. For more
information, see “Get Started with GANs for Image-to-Image Translation” on page 19-39.

Download LDCT Data Set

This example uses data from the Low Dose CT Grand Challenge [2, 3, 4]. The data includes pairs of
regular-dose CT images and simulated low-dose CT images for 99 head scans (labeled N for neuro),
100 chest scans (labeled C for chest), and 100 abdomen scans (labeled L for liver).

Specify dataDir as the desired location of the data set. The data for this example requires 52 GB of
memory.
dataDir = fullfile(tempdir,"LDCT","LDCT-and-Projection-data");

To download the data, go to the Cancer Imaging Archive website. This example uses only two patient
scans from the chest. Download the chest files "C081" and "C120" from the "Images (DICOM, 952
GB)" data set using the NBIA Data Retriever. Specify the dataFolder variable as the location of the
downloaded data. When the download is successful, dataFolder contains two subfolders named
"C081" and "C120".

Create Datastores for Training, Validation, and Testing

Specify the patient scans that are the source of each data set.
scanDirTrain = fullfile(dataDir,"C120","08-30-2018-97899");
scanDirTest = fullfile(dataDir,"C081","08-29-2018-10762");

19-206
Unsupervised Medical Image Denoising Using UNIT

Create imageDatastore objects that manage the low-dose and high-dose CT images for training and
testing. The data set consists of DICOM images, so use the custom ReadFcn name-value argument in
imageDatastore to enable reading the data.

exts = ".dcm";
readFcn = @(x)rescale(dicomread(x));
imdsLDTrain = imageDatastore(fullfile(scanDirTrain,"1.000000-Low Dose Images-71581"), ...
FileExtensions=exts,ReadFcn=readFcn);
imdsHDTrain = imageDatastore(fullfile(scanDirTrain,"1.000000-Full dose images-34601"), ...
FileExtensions=exts,ReadFcn=readFcn);
imdsLDTest = imageDatastore(fullfile(scanDirTest,"1.000000-Low Dose Images-32837"), ...
FileExtensions=exts,ReadFcn=readFcn);
imdsHDTest = imageDatastore(fullfile(scanDirTest,"1.000000-Full dose images-95670"), ...
FileExtensions=exts,ReadFcn=readFcn);

Preview a training image from the low-dose and high-dose CT training data sets.

lowDose = preview(imdsLDTrain);
highDose = preview(imdsHDTrain);
montage({lowDose,highDose})

Preprocess and Augment Training Data

Specify the image input size for the source and target images.

inputSize = [256,256,1];

Augment and preprocess the training data by using the transform function with custom
preprocessing operations specified by the augmentDataForLD2HDCT helper function. This function
is attached to the example as a supporting file.

The augmentDataForLD2HDCT function performs these operations:

19-207
19 Deep Learning

1 Resize the image to the specified input size using bicubic interpolation.
2 Randomly flip the image in the horizontal direction.
3 Scale the image to the range [-1, 1]. This range matches the range of the final tanhLayer (Deep
Learning Toolbox) used in the generator.

imdsLDTrain = transform(imdsLDTrain, @(x)augmentDataForLD2HDCT(x,inputSize));


imdsHDTrain = transform(imdsHDTrain, @(x)augmentDataForLD2HDCT(x,inputSize));

The LDCT data set provides pairs of low-dose and high-dose CT images. However, the UNIT
architecture requires unpaired data for unsupervised learning. This example simulates unpaired
training and validation data by shuffling the data in each iteration of the training loop.

Batch Training and Validation Data During Training

This example uses a custom training loop. The minibatchqueue (Deep Learning Toolbox) object is
useful for managing the mini-batching of observations in custom training loops. The
minibatchqueue object also casts data to a dlarray object that enables auto differentiation in
deep learning applications.

Specify the mini-batch data extraction format as SSCB (spatial, spatial, channel, batch). Set the
DispatchInBackground name-value argument as the boolean returned by canUseGPU. If a
supported GPU is available for computation, then the minibatchqueue object preprocesses mini-
batches in the background in a parallel pool during training.

miniBatchSize = 1;
mbqLDTrain = minibatchqueue(imdsLDTrain,MiniBatchSize=miniBatchSize, ...
MiniBatchFormat="SSCB",DispatchInBackground=canUseGPU);
mbqHDTrain = minibatchqueue(imdsHDTrain,MiniBatchSize=miniBatchSize, ...
MiniBatchFormat="SSCB",DispatchInBackground=canUseGPU);

Create Generator Network

The UNIT consists of one generator and two discriminators. The generator performs image-to-image
translation from low dose to high dose. The discriminators are PatchGAN networks that return the
patch-wise probability that the input data is real or generated. One discriminator distinguishes
between the real and generated low-dose images and the other discriminator distinguishes between
real and generated high-dose images.

Create a UNIT generator network using the unitGenerator function. The source and target
encoder sections of the generator each consist of two downsampling blocks and five residual blocks.
The encoder sections share two of the five residual blocks. Likewise, the source and target decoder
sections of the generator each consist of two downsampling blocks and five residual blocks, and the
decoder sections share two of the five residual blocks.

gen = unitGenerator(inputSize);

Visualize the generator network.

analyzeNetwork(gen)

Create Discriminator Networks

There are two discriminator networks, one for each of the image domains (low-dose CT and high-dose
CT). Create the discriminators for the source and target domains using the
patchGANDiscriminator function.

19-208
Unsupervised Medical Image Denoising Using UNIT

discLD = patchGANDiscriminator(inputSize,NumDownsamplingBlocks=4,FilterSize=3, ...


ConvolutionWeightsInitializer="narrow-normal",NormalizationLayer="none");
discHD = patchGANDiscriminator(inputSize,"NumDownsamplingBlocks",4,FilterSize=3, ...
ConvolutionWeightsInitializer="narrow-normal",NormalizationLayer="none");

Visualize the discriminator networks.


analyzeNetwork(discLD);
analyzeNetwork(discHD);

Define Model Gradients and Loss Functions

The modelGradientDisc and modelGradientGen helper functions calculate the gradients and
losses for the discriminators and generator, respectively. These functions are defined in the
Supporting Functions on page 19-215 section of this example.

The objective of each discriminator is to correctly distinguish between real images (1) and translated
images (0) for images in its domain. Each discriminator has a single loss function.

The objective of the generator is to generate translated images that the discriminators classify as
real. The generator loss is a weighted sum of five types of losses: self-reconstruction loss, cycle
consistency loss, hidden KL loss, cycle hidden KL loss, and adversarial loss.

Specify the weight factors for the various losses.


lossWeights.selfReconLossWeight = 10;
lossWeights.hiddenKLLossWeight = 0.01;
lossWeights.cycleConsisLossWeight = 10;
lossWeights.cycleHiddenKLLossWeight = 0.01;
lossWeights.advLossWeight = 1;
lossWeights.discLossWeight = 0.5;

Specify Training Options

Specify the options for Adam optimization. Train the network for 26 epochs.
numEpochs = 26;

Specify identical options for the generator and discriminator networks.

• Specify a learning rate of 0.0001.


• Initialize the trailing average gradient and trailing average gradient-square decay rates with [].
• Use a gradient decay factor of 0.5 and a squared gradient decay factor of 0.999.
• Use weight decay regularization with a factor of 0.0001.
• Use a mini-batch size of 1 for training.
learnRate = 0.0001;
gradDecay = 0.5;
sqGradDecay = 0.999;
weightDecay = 0.0001;

genAvgGradient = [];
genAvgGradientSq = [];
discLDAvgGradient = [];
discLDAvgGradientSq = [];
discHDAvgGradient = [];
discHDAvgGradientSq = [];

19-209
19 Deep Learning

Train Model or Download Pretrained UNIT Network

By default, the example downloads a pretrained version of the UNIT generator for the NIH-AAPM-
Mayo Clinic Low-Dose CT data set. The pretrained network enables you to run the entire example
without waiting for training to complete.

To train the network, set the doTraining variable in the following code to true. Train the model in
a custom training loop. For each iteration:

• Read the data for the current mini-batch using the next (Deep Learning Toolbox) function.
• Evaluate the discriminator model gradients using the dlfeval (Deep Learning Toolbox) function
and the modelGradientDisc helper function.
• Update the parameters of the discriminator networks using the adamupdate (Deep Learning
Toolbox) function.
• Evaluate the generator model gradients using the dlfeval function and the modelGradientGen
helper function.
• Update the parameters of the generator network using the adamupdate function.
• Display the input and translated images for both the source and target domains after each epoch.

Train on a GPU if one is available. Using a GPU requires Parallel Computing Toolbox™ and a CUDA®
enabled NVIDIA® GPU. For more information, see “GPU Computing Requirements” (Parallel
Computing Toolbox). Training takes about 58 hours on an NVIDIA Titan RTX.
doTraining = false;
if doTraining

% Create a figure to show the results


figure(Units="Normalized");
for iPlot = 1:4
ax(iPlot) = subplot(2,2,iPlot);
end

iteration = 0;

% Loop over epochs


for epoch = 1:numEpochs

% Shuffle data every epoch


reset(mbqLDTrain);
shuffle(mbqLDTrain);
reset(mbqHDTrain);
shuffle(mbqHDTrain);

% Run the loop until all the images in the mini-batch queue
% mbqLDTrain are processed
while hasdata(mbqLDTrain)
iteration = iteration + 1;

% Read data from the low-dose domain


imLowDose = next(mbqLDTrain);

% Read data from the high-dose domain


if hasdata(mbqHDTrain) == 0
reset(mbqHDTrain);
shuffle(mbqHDTrain);

19-210
Unsupervised Medical Image Denoising Using UNIT

end
imHighDose = next(mbqHDTrain);

% Calculate discriminator gradients and losses


[discLDGrads,discHDGrads,discLDLoss,discHDLoss] = dlfeval(@modelGradientDisc, ...
gen,discLD,discHD,imLowDose,imHighDose,lossWeights.discLossWeight);

% Apply weight decay regularization on low-dose discriminator gradients


discLDGrads = dlupdate(@(g,w) g+weightDecay*w,discLDGrads,discLD.Learnables);

% Update parameters of low-dose discriminator


[discLD,discLDAvgGradient,discLDAvgGradientSq] = adamupdate(discLD,discLDGrads, ...
discLDAvgGradient,discLDAvgGradientSq,iteration,learnRate,gradDecay,sqGradDecay);

% Apply weight decay regularization on high-dose discriminator gradients


discHDGrads = dlupdate(@(g,w) g+weightDecay*w,discHDGrads,discHD.Learnables);

% Update parameters of high-dose discriminator


[discHD,discHDAvgGradient,discHDAvgGradientSq] = adamupdate(discHD,discHDGrads, ...
discHDAvgGradient,discHDAvgGradientSq,iteration,learnRate,gradDecay,sqGradDecay);

% Calculate generator gradient and loss


[genGrad,genLoss,images] = dlfeval(@modelGradientGen,gen,discLD,discHD,imLowDose,imHi

% Apply weight decay regularization on generator gradients


genGrad = dlupdate(@(g,w) g+weightDecay*w,genGrad,gen.Learnables);

% Update parameters of generator


[gen,genAvgGradient,genAvgGradientSq] = adamupdate(gen,genGrad,genAvgGradient, ...
genAvgGradientSq,iteration,learnRate,gradDecay,sqGradDecay);
end

% Display the results


updateTrainingPlotLD2HDCT_UNIT(ax,images{:});
end

% Save the trained network


modelDateTime = string(datetime("now",Format="yyyy-MM-dd-HH-mm-ss"));
save(fullfile(dataDir,"trainedLowDoseHighDoseUNITGeneratorNet-"+modelDateTime+".mat"),"gen");

else
net_url = "https://ssd.mathworks.com/supportfiles/vision/data/trainedLowDoseHighDoseUNITGener
downloadTrainedNetwork(net_url,dataDir);
load(fullfile(dataDir,"trainedLowDoseHighDoseUNITGeneratorNet.mat"));
end

Generate High-Dose Image Using Trained Network

Read and display an image from the datastore of low-dose test images.

idxToTest = 1;
imLowDoseTest = readimage(imdsLDTest,idxToTest);
figure
imshow(imLowDoseTest)

19-211
19 Deep Learning

Convert the image to data type single. Rescale the image data to the range [-1, 1] as expected by
the final layer of the generator network.

imLowDoseTest = im2single(imLowDoseTest);
imLowDoseTestRescaled = (imLowDoseTest-0.5)/0.5;

Create a dlarray object that inputs data to the generator. If a supported GPU is available for
computation, then perform inference on a GPU by converting the data to a gpuArray object.

dlLowDoseImage = dlarray(imLowDoseTestRescaled,'SSCB');
if canUseGPU
dlLowDoseImage = gpuArray(dlLowDoseImage);
end

19-212
Unsupervised Medical Image Denoising Using UNIT

Translate the input low-dose image to the high-dose domain using the unitPredict function. The
generated image has pixel values in the range [-1, 1]. For display, rescale the activations to the range
[0, 1].

dlImLowDoseToHighDose = unitPredict(gen,dlLowDoseImage);
imHighDoseGenerated = extractdata(gather(dlImLowDoseToHighDose));
imHighDoseGenerated = rescale(imHighDoseGenerated);
imshow(imHighDoseGenerated)

Read and display the ground truth high-dose image. The high-dose and low-dose test datastores are
not shuffled, so the ground truth high-dose image corresponds directly to the low-dose test image.

imHighDoseGroundTruth = readimage(imdsHDTest,idxToTest);
imshow(imHighDoseGroundTruth)

19-213
19 Deep Learning

Display the input low-dose CT, the generated high-dose version, and the ground truth high-dose
image in a montage. Although the network is trained on data from a single patient scan, the network
generalizes well to test images from other patient scans.

imshow([imLowDoseTest imHighDoseGenerated imHighDoseGroundTruth])


title("Test Image "+num2str(idxToTest)+": Low-Dose, Generated High-dose, and Ground Truth High-do

19-214
Unsupervised Medical Image Denoising Using UNIT

Supporting Functions

Model Gradients Function

The modelGradientGen helper function calculates the gradients and loss for the generator.

function [genGrad,genLoss,images] = modelGradientGen(gen,discLD,discHD,imLD,imHD,lossWeights)

[imLD2LD,imHD2LD,imLD2HD,imHD2HD] = forward(gen,imLD,imHD);
hidden = forward(gen,imLD,imHD,Outputs="encoderSharedBlock");

[~,imLD2HD2LD,imHD2LD2HD,~] = forward(gen,imHD2LD,imLD2HD);
cycle_hidden = forward(gen,imHD2LD,imLD2HD,Outputs="encoderSharedBlock");

% Calculate different losses


selfReconLoss = computeReconLoss(imLD,imLD2LD) + computeReconLoss(imHD,imHD2HD);
hiddenKLLoss = computeKLLoss(hidden);
cycleReconLoss = computeReconLoss(imLD,imLD2HD2LD) + computeReconLoss(imHD,imHD2LD2HD);
cycleHiddenKLLoss = computeKLLoss(cycle_hidden);

outA = forward(discLD,imHD2LD);
outB = forward(discHD,imLD2HD);
advLoss = computeAdvLoss(outA) + computeAdvLoss(outB);

% Calculate the total loss of generator as a weighted sum of five losses


genTotalLoss = ...
selfReconLoss*lossWeights.selfReconLossWeight + ...
hiddenKLLoss*lossWeights.hiddenKLLossWeight + ...
cycleReconLoss*lossWeights.cycleConsisLossWeight + ...
cycleHiddenKLLoss*lossWeights.cycleHiddenKLLossWeight + ...
advLoss*lossWeights.advLossWeight;

% Update the parameters of generator


genGrad = dlgradient(genTotalLoss,gen.Learnables);

% Convert the data type from dlarray to single


genLoss = extractdata(genTotalLoss);
images = {imLD,imLD2HD,imHD,imHD2LD};
end

19-215
19 Deep Learning

The modelGradientDisc helper function calculates the gradients and loss for the two
discriminators.

function [discLDGrads,discHDGrads,discLDLoss,discHDLoss] = modelGradientDisc(gen, ...


discLD,discHD,imRealLD,imRealHD,discLossWeight)

[~,imFakeLD,imFakeHD,~] = forward(gen,imRealLD,imRealHD);

% Calculate loss of the discriminator for low-dose images


outRealLD = forward(discLD,imRealLD);
outFakeLD = forward(discLD,imFakeLD);
discLDLoss = discLossWeight*computeDiscLoss(outRealLD,outFakeLD);

% Update parameters of the discriminator for low-dose images


discLDGrads = dlgradient(discLDLoss,discLD.Learnables);

% Calculate loss of the discriminator for high-dose images


outRealHD = forward(discHD,imRealHD);
outFakeHD = forward(discHD,imFakeHD);
discHDLoss = discLossWeight*computeDiscLoss(outRealHD,outFakeHD);

% Update parameters of the discriminator for high-dose images


discHDGrads = dlgradient(discHDLoss,discHD.Learnables);

% Convert the data type from dlarray to single


discLDLoss = extractdata(discLDLoss);
discHDLoss = extractdata(discHDLoss);
end

Loss Functions

The computeDiscLoss helper function calculates discriminator loss. Each discriminator loss is a
sum of two components:

• The squared difference between a vector of ones and the predictions of the discriminator on real
images, Y real
• The squared difference between a vector of zeros and the predictions of the discriminator on
generated images, Y translated

2 2
discriminatorLoss = 1 − Y real + 0 − Y translated

function discLoss = computeDiscLoss(Yreal,Ytranslated)


discLoss = mean(((1-Yreal).^2),"all") + ...
mean(((0-Ytranslated).^2),"all");
end

The computeAdvLoss helper function calculates adversarial loss for the generator. Adversarial loss
is the squared difference between a vector of ones and the discriminator predictions on the translated
image.

2
adversarialLoss = 1 − Y translated

function advLoss = computeAdvLoss(Ytranslated)


advLoss = mean(((Ytranslated-1).^2),"all");
end

19-216
Unsupervised Medical Image Denoising Using UNIT

The computeReconLoss helper function calculates self-reconstruction loss and cycle consistency
loss for the generator. Self reconstruction loss is the L1 distance between the input images and their
self-reconstructed versions. Cycle consistency loss is the L1 distance between the input images and
their cycle-reconstructed versions.

selfReconstructionLoss = Y real − Y self − reconstructed 1

cycleConsistencyLoss = Y real − Y cycle − reconstructed 1

function reconLoss = computeReconLoss(Yreal,Yrecon)


reconLoss = mean(abs(Yreal-Yrecon),"all");
end

The computeKLLoss helper function calculates hidden KL loss and cycle-hidden KL loss for the
generator. Hidden KL loss is the squared difference between a vector of zeros and the
'encoderSharedBlock' activation for the self-reconstruction stream. Cycle-hidden KL loss is the
squared difference between a vector of zeros and the 'encoderSharedBlock' activation for the
cycle-reconstruction stream.

2
hiddenKLLoss = 0 − Y encoderSharedBlockActivation

2
cycleHiddenKLLoss = 0 − Y encoderSharedBlockActivation

function klLoss = computeKLLoss(hidden)


klLoss = mean(abs(hidden.^2),"all");
end

References

[1] Liu, Ming-Yu, Thomas Breuel, and Jan Kautz, "Unsupervised image-to-image translation networks".
In Advances in Neural Information Processing Systems, 2017. https://arxiv.org/pdf/1703.00848.pdf.

[2] McCollough, C.H., Chen, B., Holmes, D., III, Duan, X., Yu, Z., Yu, L., Leng, S., Fletcher, J. (2020).
Data from Low Dose CT Image and Projection Data [Data set]. The Cancer Imaging Archive. https://
doi.org/10.7937/9npb-2637.

[3] Grants EB017095 and EB017185 (Cynthia McCollough, PI) from the National Institute of
Biomedical Imaging and Bioengineering.

[4] Clark, Kenneth, Bruce Vendt, Kirk Smith, John Freymann, Justin Kirby, Paul Koppel, Stephen
Moore, et al. “The Cancer Imaging Archive (TCIA): Maintaining and Operating a Public Information
Repository.” Journal of Digital Imaging 26, no. 6 (December 2013): 1045–57. https://doi.org/10.1007/
s10278-013-9622-7.

See Also
unitGenerator | unitPredict | patchGANDiscriminator | minibatchqueue | dlarray |
dlfeval | adamupdate

Related Examples
• “Unsupervised Medical Image Denoising Using CycleGAN” on page 19-192

19-217
19 Deep Learning

More About
• “Get Started with GANs for Image-to-Image Translation” on page 19-39
• “Datastores for Deep Learning” (Deep Learning Toolbox)
• “Define Custom Training Loops, Loss Functions, and Networks” (Deep Learning Toolbox)
• “Define Model Loss Function for Custom Training Loop” (Deep Learning Toolbox)
• “Specify Training Options in Custom Training Loop” (Deep Learning Toolbox)
• “Train Network Using Custom Training Loop” (Deep Learning Toolbox)

19-218
Preprocess Multiresolution Images for Training Classification Network

Preprocess Multiresolution Images for Training Classification


Network

This example shows how to prepare datastores that read and preprocess multiresolution whole slide
images (WSIs) that might not fit in memory.

Deep learning approaches to tumor classification rely on digital pathology, in which whole tissue
slides are imaged and digitized. The resulting WSIs have high resolution, on the order of 200,000-
by-100,000 pixels. WSIs are frequently stored in a multiresolution format to facilitate efficient display,
navigation, and processing of images.

Read and process WSI data using blockedImage and blockedImageDatastore objects. These
objects facilitate working with multiple resolution levels and do not require the image to be loaded
into core memory. This example shows how to use lower resolution image data to efficiently prepare
data from the finer levels. You can use the processed data to train classification deep learning
networks. For an example, see “Classify Tumors in Multiresolution Blocked Images” on page 19-235.

Download Camelyon16 Data Set

This example uses WSIs from the Camelyon16 challenge [1 on page 19-233]. The data from this
challenge contains a total of 400 WSIs of lymph nodes from two independent sources, separated into
270 training images and 130 test images. The WSIs are stored as TIF files in a stripped format with
an 11-level pyramid structure.

The training data set consists of 159 WSIs of normal lymph nodes and 111 WSIs of lymph nodes with
tumor and healthy tissue. Ground truth coordinates of the lesion boundaries accompany the tumor
images. The size of the data set is approximately 451 GB.

Specify dataDir as the target location of the data.

dataDir = fullfile(tempdir,"Camelyon16");

Create folders to store the Camelyon16 data set.

trainingDir = fullfile(dataDir,"training");
trainNormalDir = fullfile(trainingDir,"normal");
trainTumorDir = fullfile(trainingDir,"tumor");
trainAnnotationDir = fullfile(trainingDir,"lesion_annotations");
if ~exist(dataDir,"dir")
mkdir(trainingDir)
mkdir(trainNormalDir)
mkdir(trainTumorDir)
mkdir(trainAnnotationDir)
end

To download the entire training data set for running this example, see http://gigadb.org/dataset/
100439. Navigate to the Files tab and click (FTPSite), which is the FTP server location of the data
set. Use an FTP client or call the ftp function to download the training data folder /pub/
gigadb/pub/10.5524/100001_101000/100439/CAMELYON16/training from the host
ftp.cngb.org. Download the contents of this training data folder by following these steps:

1 Download the lesion_annotations.zip file. Extract the files to the folder specified by the
trainAnnotationDir variable.

19-219
19 Deep Learning

2 Download the images from the normal subfolder to the folder specified by the trainNormalDir
variable.
3 Download the images from the tumor subfolder to the folder specified by the trainTumorDir
variable.

To download and access each WSI file individually, select files to download at the bottom of the page
in the Files table.

Create blockedImage Objects to Manage WSI Images

Get the file names of the normal and tumor training images.

normalFileSet = matlab.io.datastore.FileSet(fullfile(trainNormalDir,"normal*"));
normalFilenames = normalFileSet.FileInfo.Filename;
tumorFileSet = matlab.io.datastore.FileSet(fullfile(trainTumorDir,"tumor*"));
tumorFilenames = tumorFileSet.FileInfo.Filename;

One of the training images of normal tissue, "normal_144.tif," is missing important metadata required
to deduce the physical extents of the image. Exclude this file.

normalFilenames(contains(normalFilenames,"normal_144")) = [];

Create two arrays of blockedImage objects, one for normal images and one for the tumor images.
Each blockedImage object points to the corresponding image file on disk.

normalImages = blockedImage(normalFilenames);
tumorImages = blockedImage(tumorFilenames);

Display WSI Images

To get a better understanding of the training data, display the blocked images of normal tissue. The
images at a coarse resolution level are small enough to fit in memory, so you can visually inspect the
blockedImage data in the Image Browser app using the browseBlockedImages helper function.
This helper function is attached to the example as a supporting file. Note that the images contain a
lot of empty white space and that the tissue occupies only a small portion of the image. These
characteristics are typical of WSIs.

browseBlockedImages(normalImages,8)

19-220
Preprocess Multiresolution Images for Training Classification Network

Explore a single blockedImage in more depth. Select a sample tumor image to visualize, then
display the image using the bigimageshow function. The function automatically selects the
resolution level based on the screen size and the current zoom level.

idxSampleTumorImage = 10;
tumorImage = tumorImages(idxSampleTumorImage);
h = bigimageshow(tumorImage);
title("Resolution Level: "+h.ResolutionLevel)

19-221
19 Deep Learning

Zoom in to an area of interest. The resolution level automatically increases to show more detail.

xlim([33600, 35000])
ylim([104600, 106000])
title("Resolution Level: "+h.ResolutionLevel)

19-222
Preprocess Multiresolution Images for Training Classification Network

Align Spatial Extents

When you work with multiple resolution levels of a multiresolution image, the spatial extents of each
level must match. If spatial extents are aligned, then information such as masks can be extracted at
coarse resolution levels and correctly applied to matching locations at finer resolution levels. For
more information, see “Set Up Spatial Referencing for Blocked Images” on page 17-2.

Inspect the dimensions of the sample tumor image at each resolution level. Level 1 has the most
pixels and is the finest resolution level. Level 10 has the fewest pixels and is the coarsest resolution
level. The aspect ratio is not consistent, which usually indicates that the levels do not span the same
world area.
levelSizeInfo = table((1:length(tumorImage.Size))', ...
tumorImage.Size(:,1), ...
tumorImage.Size(:,2), ...
tumorImage.Size(:,1)./tumorImage.Size(:,2), ...
VariableNames=["Resolution Level" "Image Width" "Image Height" "Aspect Ratio"]);
disp(levelSizeInfo)

Resolution Level Image Width Image Height Aspect Ratio


________________ ___________ ____________ ____________

19-223
19 Deep Learning

1 2.1555e+05 97792 2.2042


2 1.0803e+05 49152 2.1979
3 54272 24576 2.2083
4 27136 12288 2.2083
5 13824 6144 2.25
6 7168 3072 2.3333
7 1577 3629 0.43455
8 3584 1536 2.3333
9 2048 1024 2
10 1024 512 2
11 512 512 1

Set the spatial referencing for all training data by using the
setSpatialReferencingForCamelyon16 helper function. This function is attached to the example
as a supporting file. The setSpatialReferencingForCamelyon16 function sets the WorldStart
and WorldEnd properties of each blockedImage object using the spatial referencing information
from the TIF file metadata.

normalImages = setSpatialReferencingForCamelyon16(normalImages);
tumorImages = setSpatialReferencingForCamelyon16(tumorImages);

Create Tissue Masks

The majority of a typical WSI consists of background pixels. To process WSI data efficiently, you can
create a region of interest (ROI) mask from a coarse resolution level, and then limit computations at
finer resolution levels to regions within the ROI. For more information, see “Process Blocked Images
Efficiently Using Mask” on page 17-22.

Consider these two factors when picking a mask level:

• The image size at the chosen mask level. To process the masks more quickly, use a lower
resolution level.
• The fidelity of ROI boundaries at the chosen level. To capture the boundary accurately, use a
higher resolution level.

This example uses a coarse resolution level to create masks for the training images of normal tissue.

normalMaskLevel = 8;

The background is uniformly light. You can segment out the tissue through a simple thresholding
operation. Apply a threshold in a block-wise manner to all normal training images using the apply
function. Save the results to disk by specifying the OutputLocation name-value argument.

trainNormalMaskDir = fullfile(trainingDir,"normal_mask_level"+num2str(normalMaskLevel));
if ~isfolder(trainNormalMaskDir)
normalMasks = apply(normalImages, @(bs)imclose(im2gray(bs.Data)<150, ones(5)), ...
BlockSize=[512 512],...
Level=normalMaskLevel, ...
UseParallel=canUseGPU, ...
OutputLocation=trainNormalMaskDir);
save(fullfile(trainNormalMaskDir,"normalMasks.mat"),"normalMasks")
else
load(fullfile(trainNormalMaskDir,"normalMasks.mat"),"normalMasks");
end

19-224
Preprocess Multiresolution Images for Training Classification Network

Display Tissue Masks

The tissue masks have only one level and are small enough to fit in memory. Display the tissue masks
in the Image Browser app using the browseBlockedImages helper function. This helper function
is attached to the example as a supporting file.

browseBlockedImages(normalMasks,1)

Based on the overview, select one blockedImage to further assess the accuracy of the tissue mask.
Display the blockedImage using the bigimageshow function. Display the mask as an overlay on the
blockedImage using the showlabels function. Set the transparency of each pixel using the
AlphaData name-value argument. For pixels within the ROI, the labels are fully transparent. For
pixels outside the ROI, the label is partially transparent and appears with a green tint.

idxSampleNormalImage = 42;
normalImage = normalImages(idxSampleNormalImage);
normalMask = normalMasks(idxSampleNormalImage);

hNormal = bigimageshow(normalImage);
showlabels(hNormal,normalMask,AlphaData=normalMask,Alphamap=[0.3 0])
title("Tissue; Background Has Green Tint")

19-225
19 Deep Learning

Zoom in to inspect an area of interest.

xlim([47540 62563])
ylim([140557 155581])

19-226
Preprocess Multiresolution Images for Training Classification Network

Create Tumor Masks

In tumor images, the ROI consists of tumor tissue. The color of tumor tissue is similar to the color of
healthy tissue, so you cannot use color segmentation techniques. Instead, create ROIs by using the
ground truth coordinates of the lesion boundaries that accompany the tumor images. These regions
are hand drawn at the finest level.

Display Tumor Boundaries

To get a better understanding of the tumor training data, read the ground truth boundary coordinates
and display the coordinates as freehand ROIs using the showCamelyon16TumorAnnotations
helper function. This helper function is attached to the example as a supporting file. Notice that
normal regions (shown with a green boundary) can occur inside tumor regions.

idxSampleTumorImage = 64;
tumorImage = tumorImages(idxSampleTumorImage);
showCamelyon16TumorAnnotations(tumorImage,trainAnnotationDir);
xlim([77810 83602])
ylim([139971 145763])

19-227
19 Deep Learning

Convert Polygon Coordinates to Binary Blocked Images

Specify the resolution level of the tumor masks.


tumorMaskLevel = 8;

Create a tumor mask for each image using the createMaskForCamelyon16TumorTissue helper
function. This helper function is attached to the example as a supporting file. For each image, the
function performs these operations.

• Read the (x, y) boundary coordinates for all ROIs in the annotated XML file.
• Separate the boundary coordinates for tumor and normal tissue ROIs into separate cell arrays.
• Convert the cell arrays of boundary coordinates to a binary blocked image using the
polyToBlockedImage function. In the binary image, the ROI indicates tumor pixels and the
background indicates normal tissue pixels. Pixels that are within both tumor and normal tissue
ROIs are classified as background.
trainTumorMaskDir = fullfile(trainingDir,"tumor_mask_level"+num2str(tumorMaskLevel));
if ~isfolder(trainTumorMaskDir)
mkdir(trainTumorMaskDir)

19-228
Preprocess Multiresolution Images for Training Classification Network

tumorMasks = createMaskForCamelyon16TumorTissue( ...


tumorImages,trainAnnotationDir,trainTumorMaskDir,tumorMaskLevel);
save(fullfile(trainTumorMaskDir,"tumorMasks.mat"),"tumorMasks")
else
load(fullfile(trainTumorMaskDir,"tumorMasks.mat"),"tumorMasks");
end

The tumor masks have only one resolution level and are small enough to fit in memory. Display the
tumor masks in the Image Browser app using the browseBlockedImages helper function. This
helper function is attached to the example as a supporting file.
browseBlockedImages(tumorMasks,1)

Confirm Fidelity of Mask to ROI Boundaries

Select a sample tumor image that has intricate regions, and display the blockedImage using the
bigimageshow function. Display the mask as an overlay on the blockedImage using the
showlabels function. Set the transparency of each pixel using the AlphaData name-value
argument. For pixels within the ROI, the labels are fully transparent. For pixels outside the ROI, the
label is partially transparent and appears with a green tint.
idxSampleTumorImage = 64;
tumorImage = tumorImages(idxSampleTumorImage);
tumorMask = tumorMasks(idxSampleTumorImage);
hTumor = bigimageshow(tumorImage);
showlabels(hTumor,tumorMask,AlphaData=tumorMask,Alphamap=[0.3 0]);
title("Tumor; Background Has Green Tint")
xlim([77810 83602])
ylim([139971 145763])

19-229
19 Deep Learning

Select Blocks For Training

You can train a network using data from any resolution level. Finer resolution levels provide more
homogenous blocks for either class. Coarser levels, which cover a larger spatial region for the same
block size, have more surrounding context. For this example, select blocks at the finest resolution
level.
trainingLevel = 1;

Specify the block size to match the input size of the network. This example uses a block size suitable
for the Inception-v3 network.
networkBlockSize = [299 299 3];

Create sets of normal and tumor blocks using the selectBlockLocations function. This function
selects blocks that are within the specified mask area. You can refine the number of selected blocks
for each class by specifying the BlockOffsets and InclusionThreshold name-value arguments.
Consider these two factors when calling the selectBlockLocations function.

• The amount of training data. Using as much of the training data as possible helps to generalize the
network during training and ensures a good class representation balance in the selected block set.

19-230
Preprocess Multiresolution Images for Training Classification Network

Increase the number of selected blocks by decreasing the BlockOffsets and


InclusionThreshold name-value arguments.
• The hardware and time available to train. Using more blocks requires more training time or more
powerful hardware. Decrease the number of selected blocks by increasing the BlockOffsets and
InclusionThreshold name-value arguments.

Select blocks within the normal images using the tissue masks. This example specifies values of
BlockOffsets and InclusionThreshold that result in a relatively small number of blocks.

normalOffsetFactor = 3.5;
normalInclusionThreshold = 0.97;
blsNormalData = selectBlockLocations(normalImages, ...
BlockSize=networkBlockSize(1:2), ...
BlockOffsets=round(networkBlockSize(1:2)*normalOffsetFactor), ...
Levels=trainingLevel, ...
Masks=normalMasks, ...
InclusionThreshold=normalInclusionThreshold, ...
ExcludeIncompleteBlocks=true, ...
UseParallel=canUseGPU);
disp(blsNormalData)

blockLocationSet with properties:

ImageNumber: [190577×1 double]


BlockOrigin: [190577×3 double]
BlockSize: [299 299 3]
Levels: [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 … ]

Select blocks within the tumor images using the tumor masks. This example specifies values of
BlockOffsets and InclusionThreshold that result in a relatively small number of blocks.

tumorOffsetFactor = 1;
tumorInclusionThreshold = 0.90;
blsTumorData = selectBlockLocations(tumorImages, ...
BlockSize=networkBlockSize(1:2), ...
BlockOffsets=round(networkBlockSize(1:2)*tumorOffsetFactor), ...
Levels=trainingLevel, ...
Masks=tumorMasks, ...
InclusionThreshold=tumorInclusionThreshold, ...
ExcludeIncompleteBlocks=true, ...
UseParallel=canUseGPU);
disp(blsTumorData)

blockLocationSet with properties:

ImageNumber: [181679×1 double]


BlockOrigin: [181679×3 double]
BlockSize: [299 299 3]
Levels: [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 … ]

Create Blocked Image Datastores for Training and Validation

Create a single blockedImageDatastore object by combining the sets of normal and tumor blocks.

[blsAll,allImages] = mergeBlockLocationSets(blsNormalData,normalImages,blsTumorData,tumorImages);
dsAllBlocks = blockedImageDatastore(allImages,BlockLocationSet=blsAll);

19-231
19 Deep Learning

Shuffle the blocks to reduce chances of overfitting and to increase generalization in the training
process.

dsAllBlocks = shuffle(dsAllBlocks);

Partition the blocks into training and validation data sets. Allocate 99% of the blocks for training and
use the remaining 1% for validation.

numericBlockLabels = [zeros(size(blsNormalData.ImageNumber)); ones(size(blsTumorData.ImageNumber)


blockLabels = categorical(numericBlockLabels,[0,1],["normal","tumor"]);
idx = splitlabels(blockLabels,0.99,"randomized");
dsTrain = subset(dsAllBlocks,idx{1});
dsVal = subset(dsAllBlocks,idx{2});

Training a classification network requires labeled training data. Label each block as normal or
tumor based on the image containing the block. Specify the block labels as normal and tumor.

numericImageLabels = [zeros(size(normalImages)),ones(size(tumorImages))];
imageLabels = categorical(numericImageLabels,[0,1],["normal","tumor"]);

Transform the blockedImageDatastore such that the datastore returns both blocks and
corresponding labels using the transform function and the labelCamelyon16Blocks helper
function. The helper function is attached to the example as a supporting file.

The labelCamelyon16Blocks helper function derives the block label from the image index, which
is stored in the ImageNumber field of the block metadata. The helper function returns the block data
and label as a two-element cell array, suitable for training a classification network.

dsTrainLabeled = transform(dsTrain, ...


@(block,blockInfo) labelCamelyon16Blocks(block,blockInfo,imageLabels),IncludeInfo=true);
dsValLabeled = transform(dsVal, ...
@(block,blockInfo) labelCamelyon16Blocks(block,blockInfo,imageLabels),IncludeInfo=true);

Augment Training Data

Augment the training data using the transform function and the
augmentBlocksReflectionRotation helper function. The helper function is attached to the
example as a supporting file.

The augmentBlocksReflectionRotation helper function increases the size of the training data
by creating three variations of each input block with reflections and 90 degree rotations.

dsTrainLabeled = transform(dsTrainLabeled,@augmentBlocksReflectionRotation);

Preview a batch of training data.

batch = preview(dsTrainLabeled);
montage(batch(:,1),BorderSize=5,Size=[1 4])

19-232
Preprocess Multiresolution Images for Training Classification Network

You can improve the ability of the network to generalize to other data by performing additional
randomized augmentation operations. For example, stain normalization is a common augmentation
technique for WSI images. Stain normalization reduces the variability in color and intensity present in
stained images from different sources [4 on page 19-233].

Save the training and validation datastores to disk.

save(fullfile(dataDir,"trainingAndValidationDatastores.mat"),"dsTrainLabeled","dsValLabeled");

You can now use the datastores to train and validate a classification network. For an example, see
“Classify Tumors in Multiresolution Blocked Images” on page 19-235.

References

[1] Ehteshami Bejnordi, Babak, Mitko Veta, Paul Johannes van Diest, Bram van Ginneken, Nico
Karssemeijer, Geert Litjens, Jeroen A. W. M. van der Laak, et al. “Diagnostic Assessment of Deep
Learning Algorithms for Detection of Lymph Node Metastases in Women With Breast Cancer.” JAMA
318, no. 22 (December 12, 2017): 2199–2210. https://doi.org/10.1001/jama.2017.14585.

[2] Szegedy, Christian, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, and Zbigniew Wojna.
“Rethinking the Inception Architecture for Computer Vision,” December 2, 2015. https://arxiv.org/abs/
1512.00567v3.

[3] ImageNet. https://www.image-net.org.

[4] Macenko, Marc, Marc Niethammer, J. S. Marron, David Borland, John T. Woosley, Xiaojun Guan,
Charles Schmitt, and Nancy E. Thomas. “A Method for Normalizing Histology Slides for Quantitative
Analysis.” In 2009 IEEE International Symposium on Biomedical Imaging: From Nano to Macro,
1107–10, 2009. https://doi.org/10.1109/ISBI.2009.5193250.

See Also
blockedImageDatastore | blockedImage | selectBlockLocations |
mergeBlockLocationSets | blockLocationSet | bigimageshow | transform

19-233
19 Deep Learning

Related Examples
• “Classify Tumors in Multiresolution Blocked Images” on page 19-235

More About
• “Set Up Spatial Referencing for Blocked Images” on page 17-2
• “Process Blocked Images Efficiently Using Partial Images or Lower Resolutions” on page 17-13
• “Process Blocked Images Efficiently Using Mask” on page 17-22
• “Create Labeled Blocked Image from ROIs and Masks” on page 17-47
• “Datastores for Deep Learning” (Deep Learning Toolbox)

19-234
Classify Tumors in Multiresolution Blocked Images

Classify Tumors in Multiresolution Blocked Images

This example shows how to classify multiresolution whole slide images (WSIs) that might not fit in
memory using an Inception-v3 deep neural network.

Deep learning methods for tumor classification rely on digital pathology, in which whole tissue slides
are imaged and digitized. The resulting WSIs have high resolution, on the order of 200,000-
by-100,000 pixels. WSIs are frequently stored in a multiresolution format to facilitate efficient display,
navigation, and processing of images.

The example outlines an architecture to use block based processing to train large WSIs. The example
trains an Inception-v3 based network using transfer learning techniques to classify individual blocks
as normal or tumor.

If you do not want to download the training data and train the network, then continue to the Train
Network or Download Pretrained Network on page 19-236 section of this example.

Prepare Training Data

Prepare the training and validation data by following the instructions in “Preprocess Multiresolution
Images for Training Classification Network” on page 19-219. The preprocessing example saves the
preprocessed training and validation datastores in the a file called
trainingAndValidationDatastores.mat.

Set the value of the dataDir variable as the location where the
trainingAndValidationDatastores.mat file is located. Load the training and validation
datastores into variables called dsTrainLabeled and dsValLabeled.
dataDir = fullfile(tempdir,"Camelyon16");
load(fullfile(dataDir,"trainingAndValidationDatastores.mat"))

Set Up Inception-v3 Network Layers For Transfer Learning

This example uses an Inception-v3 network [2], a convolutional neural network that is trained on
more than a million images from the ImageNet database [3]. The network is 48 layers deep and can
classify images into 1,000 object categories, such as keyboard, mouse, pencil, and many animals.

The inceptionv3 (Deep Learning Toolbox) function returns a pretrained Inception-v3 network.
Inception-v3 requires the Deep Learning Toolbox™ Model for Inception-v3 Network support package.
If this support package is not installed, then the function provides a download link.
net = inceptionv3;
lgraph = layerGraph(net);

The convolutional layers of the network extract image features. The last learnable layer and the final
classification layer classify an input image using the image features. These two layers contain
information on how to combine the features into class probabilities, a loss value, and predicted labels.
To retrain a pretrained network to classify new images, replace these two layers with new layers
adapted to the new data set. For more information, see “Train Deep Learning Network to Classify
New Images” (Deep Learning Toolbox).

Find the names of the two layers to replace using the helper function findLayersToReplace. This
function is attached to the example as a supporting file. In Inception-v3, these two layers are named
"predictions" and "ClassificationLayer_predictions".

19-235
19 Deep Learning

[learnableLayer,classLayer] = findLayersToReplace(lgraph);

The goal of this example is to perform binary segmentation between two classes, tumor and normal.
Create a new fully connected layer for two classes. Replace the final fully connected layer with the
new layer.

numClasses = 2;
newLearnableLayer = fullyConnectedLayer(numClasses,Name="predictions");
lgraph = replaceLayer(lgraph,learnableLayer.Name,newLearnableLayer);

Create a new classification layer for two classes. Replace the final classification layer with the new
layer.

newClassLayer = classificationLayer(Name="ClassificationLayer_predictions");
lgraph = replaceLayer(lgraph,classLayer.Name,newClassLayer);

Specify Training Options

Train the network using root mean squared propagation (RMSProp) optimization. Specify the
hyperparameter settings for RMSProp by using the trainingOptions (Deep Learning Toolbox)
function.

Reduce MaxEpochs to a small number because the large amount of training data enables the network
to reach convergence sooner. Specify a MiniBatchSize according to your available GPU memory.
While larger mini-batch sizes can make the training faster, larger sizes can reduce the ability of the
network to generalize. Set ResetInputNormalization to false to prevent a full read of the
training data to compute normalization stats.

options = trainingOptions("rmsprop", ...


MaxEpochs=1, ...
MiniBatchSize=256, ...
Shuffle="every-epoch", ...
ValidationFrequency=250, ...
InitialLearnRate=1e-4, ...
SquaredGradientDecayFactor=0.99, ...
ResetInputNormalization=false, ...
Plots="training-progress");

Train Network or Download Pretrained Network

By default, this example downloads a pretrained version of the trained classification network using
the helper function downloadTrainedCamelyonNet. The pretrained network can be used to run the
entire example without waiting for training to complete.

To train the network, set the doTraining variable in the following code to true. Train the network
using the trainNetwork (Deep Learning Toolbox) function.

Train on one or more GPUs, if available. Using a GPU requires Parallel Computing Toolbox™ and a
CUDA® enabled NVIDIA® GPU. For more information, see “GPU Computing Requirements” (Parallel
Computing Toolbox).

doTraining = ;
if doTraining
checkpointsDir = fullfile(dataDir,"checkpoints");
if ~exist(checkpointsDir,"dir")
mkdir(checkpointsDir);

19-236
Classify Tumors in Multiresolution Blocked Images

end
options.CheckpointPath=checkpointsDir;
options.ValidationData=dsValLabeled;
trainedNet = trainNetwork(dsTrainLabeled,lgraph,options);
modelDateTime = string(datetime("now",Format="yyyy-MM-dd-HH-mm-ss"));
save(dataDir+"trainedCamelyonNet-"+modelDateTime+".mat","trainedNet");

else
trainedCamelyonNet_url = "https://www.mathworks.com/supportfiles/vision/data/trainedCamelyonN
dataDir = fullfile(tempdir,"Camelyon16");
downloadTrainedNetwork(trainedCamelyonNet_url,dataDir);
load(fullfile(dataDir,"trainedCamelyonNet.mat"));
end

Download Test Data

The Camelyon16 test data set consists of 130 WSIs. These images have both normal and tumor tissue.
The size of each file is approximately 2 GB.

To download the test data, go to the Camelyon17 website and click the first "CAMELYON16 data set"
link. Open the "testing" directory, then follow these steps.

• Download the "lesion_annotations.zip" file. Extract all files to the directory specified by the
testAnnotationDir variable.
• Open the "images" directory. Download the files to the directory specified by the testImageDir
variable.

testDir = fullfile(dataDir,"testing");
testImageDir = fullfile(testDir,"images");
testAnnotationDir = fullfile(testDir,"lesion_annotations");
if ~exist(testDir,"dir")
mkdir(testDir);
mkdir(fullfile(testDir,"images"));
mkdir(fullfile(testDir,"lesion_annotations"));
end

Preprocess Test Data

Create blockedImage Objects to Manage Test Images

Get the file names of the test images. Then, create an array of blockedImage objects that manage
the test images. Each blockedImage object points to the corresponding image file on disk.

testFileSet = matlab.io.datastore.FileSet(testImageDir+filesep+"test*");
testImages = blockedImage(testFileSet);

Set the spatial referencing for all training data by using the
setSpatialReferencingForCamelyon16 helper function. This function is attached to the example
as a supporting file. The setSpatialReferencingForCamelyon16 function sets the WorldStart
and WorldEnd properties of each blockedImage object using the spatial referencing information
from the TIF file metadata.

testImages = setSpatialReferencingForCamelyon16(testImages);

19-237
19 Deep Learning

Create Tissue Masks

To process the WSI data efficiently, create a tissue mask for each test image. This process is the same
as the one used for the preprocessing the normal training images. For more information, see
“Preprocess Multiresolution Images for Training Classification Network” on page 19-219.

normalMaskLevel = 8;
testDir = fullfile(dataDir,"testing");
testTissueMaskDir = fullfile(testDir,"test_tissue_mask_level"+num2str(normalMaskLevel));

if ~isfolder(testTissueMaskDir)
testTissueMasks = apply(testImages, @(bs)im2gray(bs.Data)<150, ...
BlockSize=[512 512], ...
Level=normalMaskLevel, ...
UseParallel=canUseGPU, ...
DisplayWaitbar=false, ...
OutputLocation=testTissueMaskDir);
save(fullfile(testTissueMaskDir,"testTissueMasks.mat"),"testTissueMasks")
else
% Load previously saved data
load(fullfile(testTissueMaskDir,"testTissueMasks.mat"),"testTissueMasks");
end

The tissue masks have only one level and are small enough to fit in memory. Display the tissue masks
in the Image Browser app using the browseBlockedImages helper function. This helper function
is attached to the example as a supporting file.

browseBlockedImages(testTissueMasks,1);

19-238
Classify Tumors in Multiresolution Blocked Images

Preprocess Tumor Ground Truth Images

Specify the resolution level of the tumor masks.

tumorMaskLevel = 8;

Create a tumor mask for each ground truth tumor image using the
createMaskForCamelyon16TumorTissue helper function. This helper function is attached to the
example as a supporting file. The function performs these operations for each image:

• Read the (x, y) boundary coordinates for all ROIs in the annotated XML file.
• Separate the boundary coordinates for tumor and normal tissue ROIs into separate cell arrays.
• Convert the cell arrays of boundary coordinates to a binary blocked image using the
polyToBlockedImage function. In the binary image, the ROI indicates tumor pixels and the
background indicates normal tissue pixels. Pixels that are within both tumor and normal tissue
ROIs are classified as background.

testTumorMaskDir = fullfile(testDir,['test_tumor_mask_level' num2str(tumorMaskLevel)]);


if ~isfolder(testTumorMaskDir)
testTumorMasks = createMaskForCamelyon16TumorTissue(testImages,testAnnotationDir,testTumorMas
save(fullfile(testTumorMaskDir,"testTumorMasks.mat"),"testTumorMasks")
else
load(fullfile(testTumorMaskDir,"testTumorMasks.mat"),"testTumorMasks");
end

Predict Heatmaps of Tumor Probability

Use the trained classification network to predict a heatmap for each test image. The heatmap gives a
probability score that each block is of the tumor class. The example performs these operations for
each test image to create a heatmap:

• Select blocks using the selectBlockLocations function. Include all blocks that have at least
one tissue pixel by specifying the InclusionThreshold name-value argument as 0.
• Process batches of blocks using the apply function with the processing operations defined by the
predictBlock helper function. The helper function is attached to the example as a supporting
file. The predictBlock helper function calls the predict (Deep Learning Toolbox) function on a
block of data and returns the probability score that the block is tumor.
• Write the heatmap data to a TIF file using the write function. The final output after processing all
blocks is a heatmap showing the probability of finding tumors over the entire WSI.

numTest = numel(testImages);
outputHeatmapsDir = fullfile(testDir,"heatmaps");
networkBlockSize = [299,299,3];
tic
for ind = 1:numTest
% Check if TIF file already exists
[~,id] = fileparts(testImages(ind).Source);
outFile = fullfile(outputHeatmapsDir,id+".tif");
if ~exist(outFile,"file")
bls = selectBlockLocations(testImages(ind),Levels=1, ...
BlockSize=networkBlockSize, ...
Mask=testTissueMasks(ind),InclusionThreshold=0);

% Resulting heat maps are in-memory blockedImage objects


bhm = apply(testImages(ind),@(x)predictBlockForCamelyon16(x,trainedNet), ...

19-239
19 Deep Learning

Level=1,BlockLocationSet=bls,BatchSize=128, ...
PadPartialBlocks=true,DisplayWaitBar=false);

% Write results to a TIF file


write(bhm,outFile,BlockSize=[512 512]);
end
end
toc

Collect all of the written heatmaps as an array of blockedImage objects.

heatMapFileSet = matlab.io.datastore.FileSet(outputHeatmapsDir,FileExtensions=".tif");
bheatMapImages = blockedImage(heatMapFileSet);

Visualize Heatmap

Select a test image to display. On the left side of a figure, display the ground truth boundary
coordinates as freehand ROIs using the showCamelyon16TumorAnnotations helper function. This
helper function is attached to the example as a supporting file. Normal regions (shown with a green
boundary) can occur inside tumor regions (shown with a red boundary).

idx = 27;
figure
tiledlayout(1,2)
nexttile
hBim1 = showCamelyon16TumorAnnotations(testImages(idx),testAnnotationDir);
title("Ground Truth")

On the right side of the figure, display the heatmap for the test image.

nexttile
hBim2 = bigimageshow(bheatMapImages(idx),Interpolation="nearest");
colormap(jet)

Link the axes and zoom in to an area of interest.

linkaxes([hBim1.Parent,hBim2.Parent])
xlim([53982, 65269])
ylim([122475, 133762])
title("Predicted Heatmap")

19-240
Classify Tumors in Multiresolution Blocked Images

Classify Test Images at Specific Threshold

To classify blocks as tumor or normal, apply a threshold to the heatmap probability values.

Pick a threshold probability above which blocks are classified as tumor. Ideally, you would calculate
this threshold value using receiver operating characteristic (ROC) or precision-recall curves on the
validation data set.

thresh = 0.8;

Classify the blocks in each test image and calculate the confusion matrix using the apply function
with the processing operations defined by the computeBlockConfusionMatrixForCamelyon16
helper function. The helper function is attached to the example as a supporting file.

The computeBlockConfusionMatrixForCamelyon16 helper function performs these operations


on each heatmap:

• Resize and refine the ground truth mask to match the size of the heatmap.
• Apply the threshold on the heatmap.

19-241
19 Deep Learning

• Calculate a confusion matrix for all of the blocks at the finest resolution level. The confusion
matrix gives the number of true positive (TP), false positive (FP), true negative (TN), and false
negative (FN) classification predictions.
• Save the total counts of TP, FP, TN, and FN blocks as a structure in a blocked image. The blocked
image is returned as an element in the array of blocked images, bcmatrix.
• Save a numeric labeled image of the classification predictions in a blocked image. The values 0, 1,
2, and 3 correspond to TN, FP, FN, and TP results, respectively. The blocked image is returned as
an element in the array of blocked images, bcmatrixImage.

for ind = 1:numTest


[bcmatrix(ind),bcmatrixImage{ind}] = apply(bheatMapImages(ind), ...
@(bs,tumorMask,tissueMask)computeBlockConfusionMatrixForCamelyon16(bs,tumorMask,tissueMas
ExtraImages=[testTumorMasks(ind),testTissueMasks(ind)]);
end

Calculate the global confusion matrix over all test images.

cmArray = arrayfun(@(c)gather(c),bcmatrix);
cm = [sum([cmArray.tp]),sum([cmArray.fn]);
sum([cmArray.fp]),sum([cmArray.tn])];

Display the confusion chart of the normalized global confusion matrix. The majority of blocks in the
WSI images are of normal tissue, resulting in a high percentage of true negative predictions.

figure
confusionchart(cm,["Tumor","Normal"],Normalization="total-normalized")

19-242
Classify Tumors in Multiresolution Blocked Images

Visualize Classification Results

Compare the ground truth ROI boundary coordinates with the classification results. On the left side
of a figure, display the ground truth boundary coordinates as freehand ROIs. On the right side of the
figure, display the test image and overlay a color on each block based on the confusion matrix.
Display true positives as red, false positives as cyan, false negatives as yellow, and true negatives
with no color.

False negatives and false positives appear around the edges of the tumor region, which indicates that
the network has difficulty classifying blocks with partial classes.

idx = 27;
figure
tiledlayout(1,2)
nexttile
hBim1 = showCamelyon16TumorAnnotations(testImages(idx),testAnnotationDir);
title("Ground Truth")
nexttile
hBim2 = bigimageshow(testImages(idx));
cmColormap = [0 0 0; 0 1 1; 1 1 0; 1 0 0];
showlabels(hBim2,bcmatrixImage{idx}, ...
Colormap=cmColormap,AlphaData=bcmatrixImage{idx})
title("Classified Blocks")
linkaxes([hBim1.Parent,hBim2.Parent])
xlim([56000 63000])
ylim([125000 132600])

19-243
19 Deep Learning

Note: To reduce the classification error around the perimeter of the tumor, you can retrain the
network with less homogenous blocks. When preprocessing the Tumor blocks of the training data set,
reduce the value of the InclusionThreshold name-value argument.

Quantify Network Prediction with AUC-ROC Curve

Calculate the ROC curve values at different thresholds by using the


computeROCCurvesForCamelyon16 helper function. This helper function is attached to the
example as a supporting file.

threshs = [1 0.99 0.9:-.1:.1 0.05 0];


[tpr,fpr,ppv] = computeROCCurvesForCamelyon16(bheatMapImages,testTumorMasks,testTissueMasks,thres

Calculate the area under the curve (AUC) metric using the trapz function. The metric returns a
value in the range [0, 1], where 1 indicates perfect model performance. The AUC for this data set is
close to 1. You can use the AUC to fine-tune the training process.

figure
stairs(fpr,tpr,"-");
ROCAUC = trapz(fpr,tpr);

19-244
Classify Tumors in Multiresolution Blocked Images

title(["Area Under Curve: " num2str(ROCAUC)]);


xlabel("False Positive Rate")
ylabel("True Positive Rate")

References

[1] Ehteshami Bejnordi, Babak, Mitko Veta, Paul Johannes van Diest, Bram van Ginneken, Nico
Karssemeijer, Geert Litjens, Jeroen A. W. M. van der Laak, et al. “Diagnostic Assessment of Deep
Learning Algorithms for Detection of Lymph Node Metastases in Women With Breast Cancer.” JAMA
318, no. 22 (December 12, 2017): 2199–2210. https://doi.org/10.1001/jama.2017.14585.

[2] Szegedy, Christian, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, and Zbigniew Wojna.
“Rethinking the Inception Architecture for Computer Vision.” Preprint, submitted December 2, 2015.
https://arxiv.org/abs/1512.00567v3.

[3] ImageNet. https://www.image-net.org.

See Also
blockedImageDatastore | blockedImage | blockLocationSet | selectBlockLocations |
bigimageshow | trainingOptions | trainNetwork

Related Examples
• “Preprocess Multiresolution Images for Training Classification Network” on page 19-219

More About
• “Set Up Spatial Referencing for Blocked Images” on page 17-2
• “Process Blocked Images Efficiently Using Partial Images or Lower Resolutions” on page 17-13

19-245
19 Deep Learning

• “Process Blocked Images Efficiently Using Mask” on page 17-22


• “Create Labeled Blocked Image from ROIs and Masks” on page 17-47
• “Datastores for Deep Learning” (Deep Learning Toolbox)
• “List of Deep Learning Layers” (Deep Learning Toolbox)

19-246
Detect Image Anomalies Using Explainable FCDD Network

Detect Image Anomalies Using Explainable FCDD Network

This example shows how to detect defects on pill images using a one-class fully convolutional data
description (FCDD) anomaly detection network.

A crucial goal of anomaly detection is for a human observer to be able to understand why a trained
network classifies images as anomalies. FCDD enables explainable classification, which supplements
the class prediction with information that justifies how the neural network reached its classification
decision [1 on page 19-259]. The FCDD network returns a heatmap with the probability that each
pixel is anomalous. The classifier labels images as normal or anomalous based on the mean value of
the anomaly score heatmap.

Download Pill Images for Classification Data Set

This example uses the PillQC data set. The data set contains images from three classes: normal
images without defects, chip images with chip defects in the pills, and dirt images with dirt
contamination. The data set provides 149 normal images, 43 chip images, and 138 dirt images.
The size of the data set is 3.57 MB.

Set dataDir as the desired location of the data set. Download the data set using the
downloadPillQCData helper function. This function is attached to the example as a supporting file.
The function downloads a ZIP file and extracts the data into the subdirectories chip, dirt, and
normal.
dataDir = fullfile(tempdir,"PillDefects");
downloadPillQCData(dataDir)

This image shows an example image from each class. A normal pill with no defects is on the left, a pill
contaminated with dirt is in the middle, and a pill with a chip defect is on the right. While the images
in this data set contain instances of shadows, focus blurring, and background color variation, the
approach used in this example is robust to these image acquisition artifacts.

19-247
19 Deep Learning

Load and Preprocess Data

Create an imageDatastore that reads and manages the image data. Label each image as chip,
dirt, or normal according to the name of its directory.
imageDir = fullfile(dataDir,"pillQC-main","images");
imds = imageDatastore(imageDir,IncludeSubfolders=true,LabelSource="foldernames");

Partition Data into Training, Calibration, and Test Sets

Create training, calibration, and test sets using the splitAnomalyData (Computer Vision Toolbox)
function. This example implements an FCDD approach that uses outlier exposure, in which the
training data consists primarily of normal images with the addition of a small number of anomalous
images. Despite training primarily on samples only of normal scenes, the model learns how to
distinguish between normal and anomalous scenes.

Allocate 50% of the normal images and a small percentage (5%) of each anomaly class in the training
data set. Allocate 10% of the normal images and 20% of each anomaly class to the calibration set.
Allocate the remaining images to the test set.
normalTrainRatio = 0.5;
anomalyTrainRatio = 0.05;
normalCalRatio = 0.10;
anomalyCalRatio = 0.20;
normalTestRatio = 1 - (normalTrainRatio + normalCalRatio);
anomalyTestRatio = 1 - (anomalyTrainRatio + anomalyCalRatio);

anomalyClasses = ["chip","dirt"];
[imdsTrain,imdsCal,imdsTest] = splitAnomalyData(imds,anomalyClasses, ...
NormalLabelsRatio=[normalTrainRatio normalCalRatio normalTestRatio], ...
AnomalyLabelsRatio=[anomalyTrainRatio anomalyCalRatio anomalyTestRatio]);

Splitting anomaly dataset


-------------------------
* Finalizing... Done.
* Number of files and proportions per class in all the datasets:

Input Train Validation Test


NumFiles Ratio NumFiles Ratio NumFiles Ratio NumFiles
___________________ ____________________ ___________________ _____________

chip 43 0.1303 2 0.02381 9 0.17647 32

19-248
Detect Image Anomalies Using Explainable FCDD Network

dirt 138 0.41818 7 0.083333 28 0.54902 103 0


normal 149 0.45152 75 0.89286 14 0.27451 60 0

Further split the training data into two datastores, one containing only normal data and another
containing only anomaly data.

[imdsNormalTrain,imdsAnomalyTrain] = splitAnomalyData(imdsTrain,anomalyClasses, ...


NormalLabelsRatio=[1 0 0],AnomalyLabelsRatio=[0 1 0],Verbose=false);

Augment Training Data

Augment the training data by using the transform function with custom preprocessing operations
specified by the helper function augmentDataForPillAnomalyDetector. The helper function is
attached to the example as supporting files.

The augmentDataForPillAnomalyDetector function randomly applies 90 degree rotation and


horizontal and vertical reflection to each input image.

imdsNormalTrain = transform(imdsNormalTrain,@augmentDataForPillAnomalyDetector);
imdsAnomalyTrain = transform(imdsAnomalyTrain,@augmentDataForPillAnomalyDetector);

Add binary labels to the calibration and test data sets by using the transform function with the
operations specified by the addLabelData helper function. The helper function is defined at the end
of this example, and assigns images in the normal class a binary label 0 and images in the chip or
dirt classes a binary label 1.

dsCal = transform(imdsCal,@addLabelData,IncludeInfo=true);
dsTest = transform(imdsTest,@addLabelData,IncludeInfo=true);

Visualize a sample of nine augmented training images.

exampleData = readall(subset(imdsNormalTrain,1:9));
montage(exampleData(:,1));

19-249
19 Deep Learning

Create FCDD Model

This example uses a fully convolutional data description (FCDD) model [1 on page 19-259]. The basic
idea of FCDD is to train a network to produce an anomaly score map that describes the probability
that each region in the input image contains anomaly content.

The pretrainedEncoderNetwork function returns the first three downsampling stages of an


ImageNet pretrained Inception-v3 network for use as a pretrained backbone.

backbone = pretrainedEncoderNetwork("inceptionv3",3);

19-250
Detect Image Anomalies Using Explainable FCDD Network

Create an FCDD anomaly detector network by using the fcddAnomalyDetector (Computer Vision
Toolbox) function with the Inception-v3 backbone.

net = fcddAnomalyDetector(backbone);

Train Network or Download Pretrained Network

By default, this example downloads a pretrained version of the FCDD anomaly detector using the
helper function downloadTrainedNetwork. The helper function is attached to this example as a
supporting file. You can use the pretrained network to run the entire example without waiting for
training to complete.

To train the network, set the doTraining variable in the following code to true. Specify the number
of epochs to use for training numEpochs by entering a value in the field. Train the model by using the
trainFCDDAnomalyDetector (Computer Vision Toolbox) function.

Train on one or more GPUs, if available. Using a GPU requires Parallel Computing Toolbox™ and a
CUDA® enabled NVIDIA® GPU. For more information, see “GPU Computing Requirements” (Parallel
Computing Toolbox). Training takes about 3 minutes on an NVIDIA Titan RTX™.

doTraining =

numEpochs = ;
if doTraining
options = trainingOptions("adam", ...
Shuffle="every-epoch",...
MaxEpochs=numEpochs,InitialLearnRate=1e-4, ...
MiniBatchSize=32,...
BatchNormalizationStatistics="moving");
detector = trainFCDDAnomalyDetector(imdsNormalTrain,imdsAnomalyTrain,net,options);
modelDateTime = string(datetime("now",Format="yyyy-MM-dd-HH-mm-ss"));
save(fullfile(dataDir,"trainedPillAnomalyDetector-"+modelDateTime+".mat"),"detector");
else
trainedPillAnomalyDetectorNet_url = "https://ssd.mathworks.com/supportfiles/"+ ...
"vision/data/trainedFCDDPillAnomalyDetectorSpkg.zip";
downloadTrainedNetwork(trainedPillAnomalyDetectorNet_url,dataDir);
load(fullfile(dataDir,"folderForSupportFilesInceptionModel", ...
"trainedPillFCDDNet.mat"));
end

Set Anomaly Threshold

Select an anomaly score threshold for the anomaly detector, which classifies images based on
whether their scores are above or below the threshold value. This example uses a calibration data set
that contains both normal and anomalous images to select the threshold.

Obtain the mean anomaly score and ground truth label for each image in the calibration set.

scores = predict(detector,dsCal);
labels = imdsCal.Labels ~= "normal";

Plot a histogram of the mean anomaly scores for the normal and anomaly classes. The distributions
are well separated by the model-predicted anomaly score.

numBins = 20;
[~,edges] = histcounts(scores,numBins);

19-251
19 Deep Learning

figure
hold on
hNormal = histogram(scores(labels==0),edges);
hAnomaly = histogram(scores(labels==1),edges);
hold off
legend([hNormal,hAnomaly],"Normal","Anomaly")
xlabel("Mean Anomaly Score")
ylabel("Counts")

Calculate the optimal anomaly threshold by using the anomalyThreshold (Computer Vision
Toolbox) function. Specify the first two input arguments as the ground truth labels, labels, and
predicted anomaly scores, scores, for the calibration data set. Specify the third input argument as
true because true positive anomaly images have a labels value of true. The anomalyThreshold
function returns the optimal threshold and the receiver operating characteristic (ROC) curve for the
detector, stored as an rocmetrics (Deep Learning Toolbox) object.
[thresh,roc] = anomalyThreshold(labels,scores,true);

Set the Threshold property of the anomaly detector to the optimal value.
detector.Threshold = thresh;

19-252
Detect Image Anomalies Using Explainable FCDD Network

Plot the ROC by using the plot (Deep Learning Toolbox) object function of rocmetrics. The ROC
curve illustrates the performance of the classifier for a range of possible threshold values. Each point
on the ROC curve represents the false positive rate (x-coordinate) and true positive rate (y-
coordinate) when the calibration set images are classified using a different threshold value. The solid
blue line represents the ROC curve. The red dashed line represents a no-skill classifier corresponding
to a 50% success rate. The ROC area under the curve (AUC) metric indicates classifier performance,
and the maximum ROC AUC corresponding to a perfect classifier is 1.0.

plot(roc)
title("ROC AUC: "+ roc.AUC)

Evaluate Classification Model

Classify each image in the test set as either normal or anomalous.

testSetOutputLabels = classify(detector,dsTest);

Get the ground truth labels of each test image.

testSetTargetLabels = dsTest.UnderlyingDatastores{1}.Labels;

19-253
19 Deep Learning

Evaluate the anomaly detector by calculating performance metrics by using the


evaluateAnomalyDetection (Computer Vision Toolbox) function. The function calculates several
metrics that evaluate the accuracy, precision, sensitivity, and specificity of the detector for the test
data set.

metrics = evaluateAnomalyDetection(testSetOutputLabels,testSetTargetLabels,anomalyClasses);

Evaluating anomaly detection results


------------------------------------
* Finalizing... Done.
* Data set metrics:

GlobalAccuracy MeanAccuracy Precision Recall Specificity F1Score FalsePosi


______________ ____________ _________ _______ ___________ _______ _________

0.96923 0.97778 1 0.95556 1 0.97727 0

The ConfusionMatrix property of metrics contains the confusion matrix for the test set. Extract
the confusion matrix and display a confusion plot. The classification model in this example is very
accurate and predicts a small percentage of false positives and false negatives.

M = metrics.ConfusionMatrix{:,:};
confusionchart(M,["Normal","Anomaly"])
acc = sum(diag(M)) / sum(M,"all");
title("Accuracy: "+acc)

19-254
Detect Image Anomalies Using Explainable FCDD Network

If you specify multiple anomaly class labels, such as dirt and chip in this example, the
evaluateAnomalyDetection function calculates metrics for the whole data set and for each
anomaly class. The per-class metrics are returned in the ClassMetrics property of the
anomalyDetectionMetrics (Computer Vision Toolbox) object, metrics.
metrics.ClassMetrics

ans=2×2 table
Accuracy AccuracyPerSubClass
________ ___________________

Normal 1 {1×1 table}


Anomaly 0.95556 {2×1 table}

metrics.ClassMetrics(2,"AccuracyPerSubClass").AccuracyPerSubClass{1}

ans=2×1 table
AccuracyPerSubClass
___________________

chip 0.84375
dirt 0.99029

Explain Classification Decisions

You can use the anomaly heatmap predicted by the anomaly detector to help explain why an image is
classified as normal or anomalous. This approach is useful for identifying patterns in false negatives
and false positives. You can use these patterns to identify strategies for increasing class balancing of
the training data or improving the network performance.

Calculate Anomaly Heat Map Display Range

Calculate a display range that reflects the range of anomaly scores observed across the entire
calibration set, including normal and anomalous images. Using the same display range across images
allows you to compare images more easily than if you scale each image to its own minimum and
maximum. Apply the display range for all heatmaps in this example.
minMapVal = inf;
maxMapVal = -inf;
reset(dsCal)
while hasdata(dsCal)
img = read(dsCal);
map = anomalyMap(detector,img{1});
minMapVal = min(min(map,[],"all"),minMapVal);
maxMapVal = max(max(map,[],"all"),maxMapVal);
end
displayRange = [minMapVal,maxMapVal];

View Heatmap of Anomaly Image

Select an image of a correctly classified anomaly. This result is a true positive classification. Display
the image.
testSetAnomalyLabels = testSetTargetLabels ~= "normal";
idxTruePositive = find(testSetAnomalyLabels' & testSetOutputLabels,1,"last");
dsExample = subset(dsTest,idxTruePositive);

19-255
19 Deep Learning

img = read(dsExample);
img = img{1};
map = anomalyMap(detector,img);
imshow(anomalyMapOverlay(img,map,MapRange=displayRange,Blend="equal"))

View Heatmap of Normal Image

Select and display an image of a correctly classified normal image. This result is a true negative
classification.

idxTrueNegative = find(~(testSetAnomalyLabels' | testSetOutputLabels));


dsExample = subset(dsTest,idxTrueNegative);
img = read(dsExample);
img = img{1};
map = anomalyMap(detector,img);
imshow(anomalyMapOverlay(img,map,MapRange=displayRange,Blend="equal"))

19-256
Detect Image Anomalies Using Explainable FCDD Network

View Heatmaps of False Negative Images

False negatives are images with pill defect anomalies that the network classifies as normal. Use the
explanation from the network to gain insights into the misclassifications.

Find any false negative images from the test set. Obtain heatmap overlays of the false negative
images by using the transform function. The operations of the transform are specified by an
anonymous function that applies the anomalyMapOverlay (Computer Vision Toolbox) function to
obtain heatmap overlays for each false negative in the test set.

falseNegativeIdx = find(testSetAnomalyLabels' & ~testSetOutputLabels);


if ~isempty(falseNegativeIdx)
fnExamples = subset(dsTest,falseNegativeIdx);
fnExamplesWithHeatmapOverlays = transform(fnExamples,@(x) {...
anomalyMapOverlay(x{1},anomalyMap(detector,x{1}), ...
MapRange=displayRange,Blend="equal")});
fnExamples = readall(fnExamples);
fnExamples = fnExamples(:,1);
fnExamplesWithHeatmapOverlays = readall(fnExamplesWithHeatmapOverlays);
montage(fnExamples)
montage(fnExamplesWithHeatmapOverlays)
else
disp("No false negatives detected.")
end

19-257
19 Deep Learning

View Heatmaps of False Positive Images

False positives are images without pill defect anomalies that the network classifies as anomalous.
Find any false positives in the test set. Use the explanation from the network to gain insights into the
misclassifications. For example, if anomalous scores are localized to the image background, you can
explore suppressing the background during preprocessing.

falsePositiveIdx = find(~testSetAnomalyLabels' & testSetOutputLabels);


if ~isempty(falsePositiveIdx)
fpExamples = subset(dsTest,falsePositiveIdx);
fpExamplesWithHeatmapOverlays = transform(fpExamples,@(x) { ...
anomalyMapOverlay(x{1},anomalyMap(detector,x{1}), ...
MapRange=displayRange,Blend="equal")});
fpExamples = readall(fpExamples);
fpExamples = fpExamples(:,1);
fpExamplesWithHeatmapOverlays = readall(fpExamplesWithHeatmapOverlays);
montage(fpExamples)
montage(fpExamplesWithHeatmapOverlays)
else
disp("No false positives detected.")
end

No false positives detected.

19-258
Detect Image Anomalies Using Explainable FCDD Network

Supporting Functions

The addLabelData helper function creates a one-hot encoded representation of label information in
data.

function [data,info] = addLabelData(data,info)


if info.Label == categorical("normal")
onehotencoding = 0;
else
onehotencoding = 1;
end
data = {data,onehotencoding};
end

References

[1] Liznerski, Philipp, Lukas Ruff, Robert A. Vandermeulen, Billy Joe Franks, Marius Kloft, and Klaus-
Robert Müller. "Explainable Deep One-Class Classification." Preprint, submitted March 18, 2021.
https://arxiv.org/abs/2007.01760.

[2] Ruff, Lukas, Robert A. Vandermeulen, Billy Joe Franks, Klaus-Robert Müller, and Marius Kloft.
"Rethinking Assumptions in Deep Anomaly Detection." Preprint, submitted May 30, 2020. https://
arxiv.org/abs/2006.00339.

[3] Simonyan, Karen, and Andrew Zisserman. "Very Deep Convolutional Networks for Large-Scale
Image Recognition." Preprint, submitted April 10, 2015. https://arxiv.org/abs/1409.1556.

[4] ImageNet. https://www.image-net.org.

See Also
transform | pretrainedEncoderNetwork | fcddAnomalyDetector |
trainFCDDAnomalyDetector | predict | anomalyThreshold | anomalyMapOverlay |
evaluateAnomalyDetection | anomalyDetectionMetrics | rocmetrics | confusionchart

Related Examples
• “Classify Defects on Wafer Maps Using Deep Learning” on page 19-260
• “Detect Image Anomalies Using Pretrained ResNet-18 Feature Embeddings” on page 19-276

More About
• “Getting Started with Anomaly Detection Using Deep Learning” (Computer Vision Toolbox)
• “Datastores for Deep Learning” (Deep Learning Toolbox)

19-259
19 Deep Learning

Classify Defects on Wafer Maps Using Deep Learning

This example shows how to classify eight types of manufacturing defects on wafer maps using a
simple convolutional neural network (CNN).

Wafers are thin disks of semiconducting material, typically silicon, that serve as the foundation for
integrated circuits. Each wafer yields several individual circuits (ICs), separated into dies. Automated
inspection machines test the performance of ICs on the wafer. The machines produce images, called
wafer maps, that indicate which dies perform correctly (pass) and which dies do not meet
performance standards (fail).

The spatial pattern of the passing and failing dies on a wafer map can indicate specific issues in the
manufacturing process. Deep learning approaches can efficiently classify the defect pattern on a
large number of wafers. Therefore, by using deep learning, you can quickly identify manufacturing
issues, enabling prompt repair of the manufacturing process and reducing waste.

This example shows how to train a classification network that detects and classifies eight types of
manufacturing defect patterns. The example also shows how to evaluate the performance of the
network.

Download WM-811K Wafer Defect Map Data

This example uses the WM-811K Wafer Defect Map data set [1 on page 19-274] [2 on page 19-274].
The data set consists of 811,457 wafer maps images, including 172,950 labeled images. Each image
has only three pixel values. The value 0 indicates the background, the value 1 represents correctly
behaving dies, and the value 2 represents defective dies. The labeled images have one of nine labels
based on the spatial pattern of the defective dies. The size of the data set is 3.5 GB.

Set dataDir as the desired location of the data set. Download the data set using the
downloadWaferMapData helper function. This function is attached to the example as a supporting
file.

dataDir = fullfile(tempdir,"WaferDefects");
downloadWaferMapData(dataDir)

Preprocess and Augment Data

The data is stored in a MAT file as an array of structures. Load the data set into the workspace.

dataMatFile = fullfile(dataDir,"MIR-WM811K","MATLAB","WM811K.mat");
waferData = load(dataMatFile);
waferData = waferData.data;

Explore the data by displaying the first element of the structure. The waferMap field contains the
image data. The failureType field contains the label of the defect.

disp(waferData(1))

waferMap: [45×48 uint8]


dieSize: 1683
lotName: 'lot1'
waferIndex: 1
trainTestLabel: 'Training'
failureType: 'none'

19-260
Classify Defects on Wafer Maps Using Deep Learning

Reformat Data

This example uses only labeled images. Remove the unlabeled images from the structure.

unlabeledImages = zeros(size(waferData),"logical");
for idx = 1:size(unlabeledImages,1)
unlabeledImages(idx) = isempty(waferData(idx).trainTestLabel);
end
waferData(unlabeledImages) = [];

The dieSize, lotName, and waferIndex fields are not relevant to the classification of the images.
The example partitions data into training, validation, and test sets using a different convention than
specified by trainTestLabel field. Remove these fields from the structure using the rmfield
function.

fieldsToRemove = ["dieSize","lotName","waferIndex","trainTestLabel"];
waferData = rmfield(waferData,fieldsToRemove);

Specify the image classes.

defectClasses = ["Center","Donut","Edge-Loc","Edge-Ring","Loc","Near-full","Random","Scratch","no
numClasses = numel(defectClasses);

To apply additional preprocessing operations on the data, such as resizing the image to match the
network input size or applying random train the network for classification, you can use an augmented
image datastore. You cannot create an augmented image datastore from data in a structure, but you
can create the datastore from data in a table. Convert the data into a table with two variables:

• WaferImage - Wafer defect map images


• FailureType - Categorical label for each image

waferData = struct2table(waferData);
waferData.Properties.VariableNames = ["WaferImage","FailureType"];
waferData.FailureType = categorical(waferData.FailureType,defectClasses);

Display a sample image from each input image class using the displaySampleWaferMaps helper
function. This function is attached to the example as a supporting file.

displaySampleWaferMaps(waferData)

19-261
19 Deep Learning

Balance Data By Oversampling

Display the number of images of each class. The data set is heavily unbalanced, with significantly
fewer images of each defect class than the number of images without defects.

summary(waferData.FailureType)

Center 4294
Donut 555
Edge-Loc 5189
Edge-Ring 9680
Loc 3593
Near-full 149
Random 866
Scratch 1193
none 147431

To improve the class balancing, oversample the defect classes using the
oversampleWaferDefectClasses helper function. This function is attached to the example as a
supporting file. The helper function appends the data set with five modified copies of each defect
image. Each copy has one of these modifications: horizontal reflection, vertical reflection, or rotation
by a multiple of 90 degrees.

waferData = oversampleWaferDefectClasses(waferData);

19-262
Classify Defects on Wafer Maps Using Deep Learning

Display the number of images of each class after class balancing.

summary(waferData.FailureType)

Center 25764
Donut 3330
Edge-Loc 31134
Edge-Ring 58080
Loc 21558
Near-full 894
Random 5196
Scratch 7158
none 147431

Partition Data into Training, Validation, and Test Sets

Split the oversampled data set into training, validation, and test sets using the splitlabels
(Computer Vision Toolbox) function. Approximately 90% of the data is used for training, 5% is used
for validation, and 5% is used for testing.

labelIdx = splitlabels(waferData,[0.9 0.05 0.05],"randomized",TableVariable="FailureType");


trainingData = waferData(labelIdx{1},:);
validationData = waferData(labelIdx{2},:);
testingData = waferData(labelIdx{3},:);

Augment Training Data

Specify a set of random augmentations to apply to the training data using an imageDataAugmenter
(Deep Learning Toolbox) object. Adding random augmentations to the training images can avoid the
network from overfitting to the training data.

aug = imageDataAugmenter(FillValue=0,RandXReflection=true,RandYReflection=true,RandRotation=[0 36

Specify the input size for the network. Create an augmentedImageDatastore (Deep Learning
Toolbox) that reads the training data, resizes the data to the network input size, and applies random
augmentations.

inputSize = [48 48];


dsTrain = augmentedImageDatastore(inputSize,trainingData,"FailureType",DataAugmentation=aug);

Create datastores that read validation and test data and resize the data to the network input size. You
do not need to apply random augmentations to validation or test data.

dsVal = augmentedImageDatastore(inputSize,validationData,"FailureType");
dsVal.MiniBatchSize = 64;
dsTest = augmentedImageDatastore(inputSize,testingData,"FailureType");

Create Network

Define the convolutional neural network architecture. The range of the image input layer reflects the
fact that the wafer maps have only three levels.

layers = [
imageInputLayer([inputSize 1], ...
Normalization="rescale-zero-one",Min=0,Max=2);

convolution2dLayer(3,8,Padding="same")
batchNormalizationLayer

19-263
19 Deep Learning

reluLayer

maxPooling2dLayer(2,Stride=2)

convolution2dLayer(3,16,Padding="same")
batchNormalizationLayer
reluLayer

maxPooling2dLayer(2,Stride=2)

convolution2dLayer(3,32,Padding="same")
batchNormalizationLayer
reluLayer

maxPooling2dLayer(2,Stride=2)

convolution2dLayer(3,64,Padding="same")
batchNormalizationLayer
reluLayer

dropoutLayer

fullyConnectedLayer(numClasses)
softmaxLayer
classificationLayer];

Specify Training Options

Specify the training options for Adam optimization. Train the network for 30 epochs.

options = trainingOptions("adam", ...


ResetInputNormalization=true, ...
MaxEpochs=30, ...
InitialLearnRate=0.001, ...
L2Regularization=0.001, ...
MiniBatchSize=64, ...
Shuffle="every-epoch", ...
Verbose=false, ...
Plots="training-progress", ...
ValidationData=dsVal, ...
ValidationFrequency=20);

Train Network or Download Pretrained Network

By default, the example loads a pretrained wafer defect classification network. The pretrained
network enables you to run the entire example without waiting for training to complete.

To train the network, set the doTraining variable in the following code to true. Train the model
using the trainNetwork (Deep Learning Toolbox) function.

Train on a GPU if one is available. Using a GPU requires Parallel Computing Toolbox™ and a CUDA®
enabled NVIDIA® GPU. For more information, see “GPU Computing Requirements” (Parallel
Computing Toolbox).

doTraining = ;
if doTraining
trainedNet = trainNetwork(dsTrain,layers,options);

19-264
Classify Defects on Wafer Maps Using Deep Learning

modelDateTime = string(datetime("now",Format="yyyy-MM-dd-HH-mm-ss"));
save(fullfile(dataDir,"trained-WM811K-"+modelDateTime+".mat"),"trainedNet");

else
downloadTrainedWaferNet(dataDir);
trainedNet = load(fullfile(dataDir,"CNN-WM811K.mat"));
trainedNet = trainedNet.preTrainedNetwork;
end

Quantify Network Performance on Test Data

Classify each of test image using the classify (Deep Learning Toolbox) function.

defectPredicted = classify(trainedNet,dsTest);

Calculate the performance of the network compared to the ground truth classifications as a confusion
matrix using the confusionmat (Deep Learning Toolbox) function. Visualize the confusion matrix
using the confusionchart (Deep Learning Toolbox) function. The values across the diagonal of this
matrix indicate correct classifications. The confusion matrix for a perfect classifier has values only on
the diagonal.

defectTruth = testingData.FailureType;

cmTest = confusionmat(defectTruth,defectPredicted);
figure
confusionchart(cmTest,categories(defectTruth),Normalization="row-normalized", ...
Title="Test Data Confusion Matrix");

19-265
19 Deep Learning

Precision, Recall, and F1 Scores

This example evaluates the network performance using several metrics: precision, recall, and F1
scores. These metrics are defined for a binary classification. To overcome the limitation for this
multiclass problem, you can consider the prediction as a set of binary classifications, one for each
class.

Precision is the proportion of images that are correctly predicted to belong to a class. Given the count
of true positive (TP) and false positive (FP) classifications, you can calculate precision as:

TP
precision =
TP + FP

Recall is the proportion of images belonging to a specific class that were predicted to belong the
class. Given the count of TP and false negative (FN) classifications, you can calculate recall as:

TP
recall =
TP + FN

F1 scores are the harmonic mean of the precision and recall values:

2 * precision * recall
F1 =
precision + recall

19-266
Classify Defects on Wafer Maps Using Deep Learning

For each class, calculate the precision, recall, and F1 score using the counts of TP, FP, and FN results
available in the confusion matrix.
prTable = table(Size=[numClasses 3],VariableTypes=["cell","cell","double"], ...
VariableNames=["Recall","Precision","F1"],RowNames=defectClasses);

for idx = 1:numClasses


numTP = cmTest(idx,idx);
numFP = sum(cmTest(:,idx)) - numTP;
numFN = sum(cmTest(idx,:),2) - numTP;

precision = numTP / (numTP + numFP);


recall = numTP / (numTP + numFN);

defectClass = defectClasses(idx);
prTable.Recall{defectClass} = recall;
prTable.Precision{defectClass} = precision;
prTable.F1(defectClass) = 2*precision*recall/(precision + recall);
end

Display the metrics for each class. Scores closer to 1 indicate better network performance.
prTable

prTable=9×3 table
Recall Precision F1
__________ __________ _______

Center {[0.9169]} {[0.9578]} 0.93693


Donut {[0.8193]} {[0.9067]} 0.86076
Edge-Loc {[0.7900]} {[0.8384]} 0.81349
Edge-Ring {[0.9859]} {[0.9060]} 0.94426
Loc {[0.6642]} {[0.8775]} 0.75607
Near-full {[0.7556]} {[ 1]} 0.86076
Random {[0.9692]} {[0.7683]} 0.85714
Scratch {[0.4609]} {[0.8639]} 0.60109
none {[0.9696]} {[0.9345]} 0.95173

Precision-Recall Curves and Area-Under-Curve (AUC)

In addition to returning a classification of each test image, the network can also predict the
probability that a test image is each of the defect classes. In this case, precision-recall curves provide
an alternative way to evaluate the network performance.

To calculate precision-recall curves, start by performing a binary classification for each defect class
by comparing the probability against an arbitrary threshold. When the probability exceeds the
threshold, you can assign the image to the target class. The choice of threshold impacts the number
of TP, FP, and FN results and the precision and recall scores. To evaluate the network performance,
you must consider the performance at a range of thresholds. Precision-recall curves plot the tradeoff
between precision and recall values as you adjust the threshold for the binary classification. The AUC
metric summarizes the precision-recall curve for a class as a single number in the range [0, 1], where
1 indicates a perfect classification regardless of threshold.

Calculate the probability that each test image belongs to each of the defect classes using the
predict (Deep Learning Toolbox) function.
defectProbabilities = predict(trainedNet,dsTest);

19-267
19 Deep Learning

Use the rocmetrics function to calculate the precision, recall, and AUC for each class over a range
of thresholds. Plot the precision-recall curves.
roc = rocmetrics(defectTruth,defectProbabilities,defectClasses,AdditionalMetrics="prec");
figure
plot(roc,XAxisMetric="reca",YAxisMetric="prec");
xlabel("Recall")
ylabel("Precision")
grid on
title("Precision-Recall Curves for All Classes")

The precision-recall curve for an ideal classifier passes through the point (1, 1). The classes that have
precision-recall curves that tend towards (1, 1), such as Edge-Ring and Center, are the classes for
which the network has the best performance. The network has the worst performance for the
Scratch class.

Compute and display the AUC values of the precision/recall curves for each class.
prAUC = zeros(numClasses, 1);
for idx = 1:numClasses
defectClass = defectClasses(idx);
currClassIdx = strcmpi(roc.Metrics.ClassName, defectClass);
reca = roc.Metrics.TruePositiveRate(currClassIdx);
prec = roc.Metrics.PositivePredictiveValue(currClassIdx);
prAUC(idx) = trapz(reca(2:end),prec(2:end)); % prec(1) is always NaN

19-268
Classify Defects on Wafer Maps Using Deep Learning

end
prTable.AUC = prAUC;
prTable

prTable=9×4 table
Recall Precision F1 AUC
__________ __________ _______ _______

Center {[0.9169]} {[0.9578]} 0.93693 0.97314


Donut {[0.8193]} {[0.9067]} 0.86076 0.89514
Edge-Loc {[0.7900]} {[0.8384]} 0.81349 0.88453
Edge-Ring {[0.9859]} {[0.9060]} 0.94426 0.73498
Loc {[0.6642]} {[0.8775]} 0.75607 0.82643
Near-full {[0.7556]} {[ 1]} 0.86076 0.79863
Random {[0.9692]} {[0.7683]} 0.85714 0.95798
Scratch {[0.4609]} {[0.8639]} 0.60109 0.65661
none {[0.9696]} {[0.9345]} 0.95173 0.99031

Visualize Network Decisions Using GradCAM

Gradient-weighted class activation mapping (Grad-CAM) produces a visual explanation of decisions


made by the network. You can use the gradCAM (Deep Learning Toolbox) function to identify parts of
the image that most influenced the network prediction.

Donut Defect Class

The Donut defect is characterized by an image having defective pixels clustered in a concentric circle
around the center of the die. Most images of the Donut defect class do not have defective pixels
around the edge of the die.

These two images both show data with the Donut defect. The network correctly classified the image
on the left as a Donut defect. The network misclassified the image on the right as an Edge-Ring
defect. The images have a color overlay that corresponds to the output of the gradCAM function. The
regions of the image that most influenced the network classification appear with bright colors on the
overlay. For the image classified as an Edge-Ring defect, the defects at the boundary at the die were
treated as important. A possible reason for this could be there are far more Edge-Ring images in the
training set as compared to Donut images.

19-269
19 Deep Learning

19-270
Classify Defects on Wafer Maps Using Deep Learning

Loc Defect Class

The Loc defect is characterized by an image having defective pixels clustered in a blob away from the
edges of the die. These two images both show data with the Loc defect. The network correctly
classified the image on the left as a Loc defect. The network misclassified the image on the right and
classified the defect as an Edge-Loc defect. For the image classified as an Edge-Loc defect, the
defects at the boundary at the die are most influential in the network prediction. The Edge-Loc
defect differs from the Loc defect primarily in the location of the cluster of defects.

19-271
19 Deep Learning

19-272
Classify Defects on Wafer Maps Using Deep Learning

Compare Correct Classifications and Misclassifications

You can explore other instances of correctly classified and misclassified images. Specify a class to
evaluate.

defectClass = ;

Find the index of all images with the specified defect type as the ground truth or predicted label.

idxTrue = find(testingData.FailureType == defectClass);


idxPred = find(defectPredicted == defectClass);

Find the indices of correctly classified images. Then, select one of the images to evaluate. By default,
this example evaluates the first correctly classified image.

idxCorrect = intersect(idxTrue,idxPred);

idxToEvaluateCorrect = ;
imCorrect = testingData.WaferImage{idxCorrect(idxToEvaluateCorrect)};

Find the indices of misclassified images. Then, select one of the images to evaluate and get the
predicted class of that image. By default, this example evaluates the first misclassified image.

idxIncorrect = setdiff(idxTrue,idxPred);

idxToEvaluateIncorrect = ;
imIncorrect = testingData.WaferImage{idxIncorrect(idxToEvaluateIncorrect)};
labelIncorrect = defectPredicted(idxIncorrect(idxToEvaluateIncorrect));

Resize the test images to match the input size of the network.

imCorrect = imresize(imCorrect,inputSize);
imIncorrect = imresize(imIncorrect,inputSize);

Generate the score maps using the gradCAM (Deep Learning Toolbox) function.

scoreCorrect = gradCAM(trainedNet,imCorrect,defectClass);
scoreIncorrect = gradCAM(trainedNet,imIncorrect,labelIncorrect);

Display the score maps over the original wafer maps using the displayWaferScoreMap helper
function. This function is attached to the example as a supporting file.

figure
tiledlayout(1,2)
t = nexttile;
displayWaferScoreMap(imCorrect,scoreCorrect,t)
title("Correct Classification ("+defectClass+")")
t = nexttile;
displayWaferScoreMap(imIncorrect,scoreIncorrect,t)
title("Misclassification ("+string(labelIncorrect)+")")

19-273
19 Deep Learning

References

[1] Wu, Ming-Ju, Jyh-Shing R. Jang, and Jui-Long Chen. “Wafer Map Failure Pattern Recognition and
Similarity Ranking for Large-Scale Data Sets.” IEEE Transactions on Semiconductor Manufacturing
28, no. 1 (February 2015): 1–12. https://doi.org/10.1109/TSM.2014.2364237.

[2] Jang, Roger. "MIR Corpora." http://mirlab.org/dataset/public/.

[3] Selvaraju, Ramprasaath R., Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh,
and Dhruv Batra. “Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based
Localization.” In 2017 IEEE International Conference on Computer Vision (ICCV), 618–26. Venice:
IEEE, 2017. https://doi.org/10.1109/ICCV.2017.74.

[4] T., Bex. “Comprehensive Guide on Multiclass Classification Metrics.” October 14, 2021. https://
towardsdatascience.com/comprehensive-guide-on-multiclass-classification-metrics-af94cfb83fbd.

See Also
trainingOptions | trainNetwork | augmentedImageDatastore | imageDataAugmenter |
imageDatastore | classify | predict | confusionmat | confusionchart

19-274
Classify Defects on Wafer Maps Using Deep Learning

Related Examples
• “Detect Image Anomalies Using Explainable FCDD Network” on page 19-247
• “Detect Image Anomalies Using Pretrained ResNet-18 Feature Embeddings” on page 19-276

More About
• “Datastores for Deep Learning” (Deep Learning Toolbox)
• “Preprocess Images for Deep Learning” (Deep Learning Toolbox)
• “List of Deep Learning Layers” (Deep Learning Toolbox)

19-275
19 Deep Learning

Detect Image Anomalies Using Pretrained ResNet-18 Feature


Embeddings

This example shows how to train a similarity-based anomaly detector using one-class learning of
feature embeddings extracted from a pretrained ResNet-18 convolutional neural network.

This example applies patch distribution modeling (PaDiM) [1 on page 19-295] to train an anomaly
detection classifier. During training, you fit a Gaussian distribution that models the mean and
covariance of normal image features. During testing, the classifier labels images whose features
deviate from the Gaussian distribution by more than a certain threshold as anomalous. PaDiM is a
similarity-based method because the similarity between test images and the normal image
distribution drives classification. The PaDiM method has several practical advantages.

• PaDiM extracts features from a pretrained CNN without requiring that you retrain the network.
Therefore, you can run the example efficiently without special hardware requirements such as a
GPU.
• PaDiM is a one-class learning approach. The classification model is trained using only normal
images. Training does not require images with anomalies, which can be rare, expensive, or unsafe
to obtain for certain applications.
• PaDiM is an explainable classification method. The PaDiM classifier generates an anomaly score
for each spatial patch. You can visualize the scores as a heatmap to localize anomalies and gain
insight into the model.

The PaDiM method is suitable for image data sets that can be cropped to match the input size of the
pretrained CNN. The input size of the CNN depends on the data used to train the network. For
applications requiring more flexibility in image size, an alternative approach might be more
appropriate. For an example of such an approach, see “Detect Image Anomalies Using Explainable
FCDD Network” on page 19-247.

Download Concrete Crack Images for Classification Data Set

This example uses the Concrete Crack Images for Classification data set [4 on page 19-295] [5 on
page 19-295]. The data set contains images of two classes: Negative images (or normal images)
without cracks present in the road and Positive images (or anomaly images) with cracks. The data
set provides 20,000 images of each class. The size of the data set is 235 MB.

19-276
Detect Image Anomalies Using Pretrained ResNet-18 Feature Embeddings

Set dataDir as the desired location of the data set.


dataDir = fullfile(tempdir,"ConcreteCrackDataset");
if ~exist(dataDir,"dir")
mkdir(dataDir);
end

To download the data set, go to this link: https://prod-dcd-datasets-cache-zipfiles.s3.eu-


west-1.amazonaws.com/5y9wdsg2zt-2.zip. Extract the ZIP file to obtain a RAR file, then extract the
contents of the RAR file into the directory specified by the dataDir variable. When extracted
successfully, dataDir contains two subdirectories: Negative and Positive.

Load and Preprocess Data

Create an imageDatastore that reads and manages the image data. Label each image as Positive
or Negative according to the name of its directory.
imdsPositive = imageDatastore(fullfile(dataDir,"Positive"),LabelSource="foldernames");
imdsNegative = imageDatastore(fullfile(dataDir,"Negative"),LabelSource="foldernames");

Display an example of each class. Display a negative, or good, image without crack anomalies on the
left. In the good image, imperfections and deviations in texture are small. Display a positive, or
anomalous, image on the right. The anomalous image shows a large black crack oriented vertically.
samplePositive = preview(imdsPositive);
sampleNegative = preview(imdsNegative);
montage({sampleNegative,samplePositive})
title("Road Images Without (Left) and with (Right) Cracks")

Partition Data into Training, Calibration, and Test Sets

To simulate a more typical semisupervised workflow, create a training set of 250 images from the
Negative class only. Allocate 100 Negative images and 100 Positive images to a calibration set.

19-277
19 Deep Learning

This example uses a calibration set to pick a threshold for the classifier. The classifier labels images
with anomaly scores above the threshold as anomalous. Using separate calibration and test sets
avoids information leaking from the test set into the design of the classifier. Allocate 1000 Negative
images and 1000 Positive images to a test set.

numTrainNormal = 250;
numCal = 100;
numTest = 1000;

[imdsTestPos,imdsCalPos] = splitEachLabel(imdsPositive,numTest,numCal);
[imdsTrainNeg,imdsTestNeg,imdsCalNeg] = splitEachLabel(imdsNegative,numTrainNormal,numTest,numCal

trainFiles = imdsTrainNeg.Files;
calibrationFiles = cat(1,imdsCalPos.Files,imdsCalNeg.Files);
testFiles = cat(1,imdsTestPos.Files,imdsTestNeg.Files);

imdsTrain = imageDatastore(trainFiles,LabelSource="foldernames");
imdsCal = imageDatastore(calibrationFiles,LabelSource="foldernames");
imdsTest = imageDatastore(testFiles,LabelSource="foldernames");

Define an anonymous function, addLabelFcn, that creates a one-hot encoded representation of label
information from an input image. Then, transform the datastores by using the transform function
such that the datastores return a cell array of image data and a corresponding one-hot encoded array.
The transform function applies the operations specified by addLabelFcn.

addLabelFcn = @(x,info) deal({x,onehotencode(info.Label,1)},info);


tdsTrain = transform(imdsTrain,addLabelFcn,IncludeInfo=true);
tdsCal = transform(imdsCal,addLabelFcn,IncludeInfo=true);
tdsTest = transform(imdsTest,addLabelFcn,IncludeInfo=true);

Resize and Crop Images

Define an anonymous function, resizeAndCropImageFcn, that applies the


resizeAndCropForConcreteAnomalyDetector helper function to the input images. The
resizeAndCropForConcreteAnomalyDetector helper function resizes and center crops input
images, and is attached to the example as a supporting file. Transform the datastores by using the
transform function with the operations specified by resizeAndCropImageFcn. This operation
crops each image in the training, calibration, and test datastores to a size of 244-by-224 to match the
input size of the pretrained CNN.

resizeImageSize = [256 256];


targetImageSize = [224 224];
resizeAndCropImageFcn = @(x,info) deal({resizeAndCropForConcreteAnomalyDetector(x{1},resizeImageS
tdsTrain = transform(tdsTrain,resizeAndCropImageFcn);
tdsCal = transform(tdsCal,resizeAndCropImageFcn);
tdsTest = transform(tdsTest,resizeAndCropImageFcn);

Batch Training Data

Create a minibatchqueue (Deep Learning Toolbox) object that manages the mini-batches of training
data. The minibatchqueue object automatically converts data to a dlarray (Deep Learning
Toolbox) object that enables automatic differentiation in deep learning applications.

Specify the mini-batch data extraction format as "SSCB" (spatial, spatial, channel, batch).

minibatchSize = 128;
trainQueue = minibatchqueue(tdsTrain, ...

19-278
Detect Image Anomalies Using Pretrained ResNet-18 Feature Embeddings

PartialMiniBatch="return", ...
MiniBatchFormat=["SSCB","CB"], ...
MiniBatchSize=minibatchSize);

Create PaDiM Model

This example applies the PaDiM method described in [1 on page 19-295]. The basic idea of PaDiM is
to simplify 2-D images into a lower resolution grid of embedding vectors that encode features
extracted from a subset of layers of a pretrained CNN. Each embedding vector generated from the
lower resolution CNN layers corresponds to a spatial patch of pixels in the original resolution image.
The training step generates feature embedding vectors for all training set images and fits a statistical
Gaussian distribution to the training data. A trained PaDiM classifier model consists of the mean and
covariance matrix describing the learned Gaussian distribution for normal training images.

Extract Image Features from Pretrained CNN

This example uses the ResNet-18 network [2 on page 19-295] to extract features of input images.
ResNet-18 is a convolutional neural network with 18 layers and is pretrained on ImageNet [3 on page
19-295].

Extract features from three layers of ResNet-18 located at the end of the first, second, and third
blocks. For an input image of size 224-by-224, these layers correspond to activations with spatial
resolutions of 56-by-56, 28-by-28, and 14-by-14, respectively. For example, the XTrainFeatures1
variable contains 56-by-56 feature vectors from the bn2b_branch2b layer for each training set
image. The layer activations with higher and lower spatial resolutions provide a balance between
greater visual detail and global context, respectively.

net = resnet18("Weights","imagenet");

feature1LayerName = "bn2b_branch2b";
feature2LayerName = "bn3b_branch2b";
feature3LayerName = "bn4b_branch2b";

XTrainFeatures1 = [];
XTrainFeatures2 = [];
XTrainFeatures3 = [];

reset(trainQueue);
shuffle(trainQueue);
idx = 1;
while hasdata(trainQueue)
[X,T] = next(trainQueue);

XTrainFeatures1 = cat(4,XTrainFeatures1,activations(net,extractdata(X),feature1LayerName));
XTrainFeatures2 = cat(4,XTrainFeatures2,activations(net,extractdata(X),feature2LayerName));
XTrainFeatures3 = cat(4,XTrainFeatures3,activations(net,extractdata(X),feature3LayerName));
idx = idx+size(X,4);
end

Concatenate Feature Embeddings

Combine the features extracted from the three ResNet-18 layers by using the
concatenateEmbeddings on page 19-294 helper function defined at the end of this example. The
concatenateEmbeddings helper function upsamples the feature vectors extracted from the second
and third blocks of ResNet-18 to match the spatial resolution of the first block and concatenates the
three feature vectors.

19-279
19 Deep Learning

XTrainFeatures1 = gather(XTrainFeatures1);
XTrainFeatures2 = gather(XTrainFeatures2);
XTrainFeatures3 = gather(XTrainFeatures3);
XTrainEmbeddings = concatenateEmbeddings(XTrainFeatures1,XTrainFeatures2,XTrainFeatures3);

The variable XTrainEmbeddings is a numeric array containing feature embedding vectors for the
training image set. The first two spatial dimensions correspond to the number of spatial patches in
each image. The 56-by-56 spatial patches match the size of the bn2b_branch2b layer of ResNet-18.
The third dimension corresponds to the channel data, or the length of the feature embedding vector
for each patch. The fourth dimension corresponds to the number of training images.

whos XTrainEmbeddings

Name Size Bytes Class Attributes

XTrainEmbeddings 56x56x448x250 1404928000 single

Randomly Downsample Feature Embedding Channel Dimension

Reduce the dimensionality of the embedding vector by randomly selecting a subset of 100 out of 448
elements in the channel dimension to keep. As shown in [1 on page 19-295], this random
dimensionality reduction step increases classification efficiency without decreasing accuracy.

selectedChannels = 100;
totalChannels = 448;
rIdx = randi(totalChannels,[1 selectedChannels]);
XTrainEmbeddings = XTrainEmbeddings(:,:,rIdx,:);

Compute Mean and Covariance of Gaussian Distribution

Model the training image patch embedding vectors as a Gaussian distribution by calculating the
mean and covariance matrix across training images.

Reshape the embedding vector to have a single spatial dimension of length H*W.

[H, W, C, B] = size(XTrainEmbeddings);
XTrainEmbeddings = reshape(XTrainEmbeddings,[H*W C B]);

Calculate the mean of the embedding vector along the third dimension, corresponding to the average
of the 250 training set images. In this example, the means variable is a 3136-by-100 matrix, with
average feature values for each of the 56-by-56 spatial patches and 100 channel elements.

means = mean(XTrainEmbeddings,3);

For each embedding vector, calculate the covariance matrix between the 100 channel elements.
Include a regularization constant based on the identity matrix to make covars a full rank and
invertible matrix. In this example, the covars variable is a 3136-by-100-by-100 matrix.

covars = zeros([H*W C C]);


identityMatrix = eye(C);
for idx = 1:H*W
covars(idx,:,:) = cov(squeeze(XTrainEmbeddings(idx,:,:))') + 0.01* identityMatrix;
end

19-280
Detect Image Anomalies Using Pretrained ResNet-18 Feature Embeddings

Choose Anomaly Score Threshold for Classification

An important part of the semisupervised anomaly detection workflow is deciding on an anomaly score
threshold for separating normal images from anomaly images. This example uses the calibration set
to calculate the threshold.

In this example, the anomaly score metric is the Mahalanobis distance between the feature
embedding vector and the learned Gaussian distribution for normal images. The anomaly score for
each calibration image patch forms an anomaly score map that localizes predicted anomalies.

Calculate Anomaly Scores for Calibration Set

Calculate feature embedding vectors for the calibration set images. First, create a minibatchqueue
(Deep Learning Toolbox) object to manage the mini-batches of calibration observations. Specify the
mini-batch data extraction format as "SSCB" (spatial, spatial, channel, batch). Use a larger mini-
batch size to improve throughput and reduce computation time.

minibatchSize = 1;
calibrationQueue = minibatchqueue(tdsCal, ...
MiniBatchFormat=["SSCB","CB"], ...
MiniBatchSize=minibatchSize, ...
OutputEnvironment="auto");

Perform the following steps to compute the anomaly scores for the calibration set images.

• Extract features of the calibration images from the same three layers of ResNet-18 used in
training.
• Combine the features from the three layers into an overall embedding variable XCalEmbeddings
by using the concatenateEmbeddings helper function. The helper function is defined at the end
of this example.
• Downsample the embedding vectors to the same 100 channel elements used during training,
specified by rIdx.
• Reshape the embedding vectors into an H*W-by-C-by-B array, where B is the number of images in
the mini-batch.
• Calculate the Mahalanobis distance between each embedding feature vector and the learned
Gaussian distribution by using the calculateDistance helper function. The helper function is
defined at the end of this example.
• Create an anomaly score map for each image by using the createAnomalyScoreMap helper
function. The helper function is defined at the end of this example.

maxScoresCal = zeros(tdsCal.numpartitions,1);
minScoresCal = zeros(tdsCal.numpartitions,1);
meanScoresCal = zeros(tdsCal.numpartitions,1);
idx = 1;

while hasdata(calibrationQueue)
XCal = next(calibrationQueue);

XCalFeatures1 = activations(net,extractdata(XCal),feature1LayerName);
XCalFeatures2 = activations(net,extractdata(XCal),feature2LayerName);
XCalFeatures3 = activations(net,extractdata(XCal),feature3LayerName);

XCalFeatures1 = gather(XCalFeatures1);
XCalFeatures2 = gather(XCalFeatures2);

19-281
19 Deep Learning

XCalFeatures3 = gather(XCalFeatures3);
XCalEmbeddings = concatenateEmbeddings(XCalFeatures1,XCalFeatures2,XCalFeatures3);

XCalEmbeddings = XCalEmbeddings(:,:,rIdx,:);
[H, W, C, B] = size(XCalEmbeddings);
XCalEmbeddings = reshape(permute(XCalEmbeddings,[1 2 3 4]),[H*W C B]);

distances = calculateDistance(XCalEmbeddings,H,W,B,means,covars);

anomalyScoreMap = createAnomalyScoreMap(distances,H,W,B,targetImageSize);

% Calculate max, min, and mean values of the anomaly score map
maxScoresCal(idx:idx+size(XCal,4)-1) = squeeze(max(anomalyScoreMap,[],[1 2 3]));
minScoresCal(idx:idx+size(XCal,4)-1) = squeeze(min(anomalyScoreMap,[],[1 2 3]));
meanScoresCal(idx:idx+size(XCal,4)-1) = squeeze(mean(anomalyScoreMap,[1 2 3]));

idx = idx+size(XCal,4);
clear XCalFeatures1 XCalFeatures2 XCalFeatures3 anomalyScoreMap distances XCalEmbeddings XCal
end

Create Anomaly Score Histograms

Assign the known ground truth labels "Positive" and "Negative" to the calibration set images.

labelsCal = tdsCal.UnderlyingDatastores{1}.Labels ~= "Negative";

Use the minimum and maximum values of the calibration data set to normalize the mean scores to the
range [0, 1].

maxScore = max(maxScoresCal,[],"all");
minScore = min(minScoresCal,[],"all");

scoresCal = mat2gray(meanScoresCal, [minScore maxScore]);

Plot a histogram of the mean anomaly scores for the normal and anomaly classes. The distributions
are well separated by the model-predicted anomaly score.

[~,edges] = histcounts(scoresCal,20);
hGood = histogram(scoresCal(labelsCal==0),edges);
hold on
hBad = histogram(scoresCal(labelsCal==1),edges);
hold off
legend([hGood,hBad],"Normal (Negative)","Anomaly (Positive)")
xlabel("Mean Anomaly Score");
ylabel("Counts");

19-282
Detect Image Anomalies Using Pretrained ResNet-18 Feature Embeddings

Calculate Threshold Value

Create a receiver operating characteristic (ROC) curve to calculate the anomaly threshold. Each point
on the ROC curve represents the false positive rate (x-coordinate) and true positive rate (y-
coordinate) when the calibration set images are classified using a different threshold value. An
optimal threshold maximizes the true positive rate and minimizes the false positive rate. Using ROC
curves and related metrics allows you to select a threshold based on the tradeoff between false
positives and false negatives. These tradeoffs depend on the application-specific implications of
misclassifying images as false positives versus false negatives.

Create the ROC curve by using the perfcurve (Statistics and Machine Learning Toolbox) function.
The solid blue line represents the ROC curve. The red dashed line represents a random classifier
corresponding to a 50% success rate. Display the area under the curve (AUC) metric for the
calibration set in the title of the figure. A perfect classifier has an ROC curve with a maximum AUC of
1.

[xroc,yroc,troc,auc] = perfcurve(labelsCal,scoresCal,true);
figure
lroc = plot(xroc,yroc);
hold on
lchance = plot([0 1],[0 1],"r--");
hold off
xlabel("False Positive Rate")
ylabel("True Positive Rate")
title("ROC Curve AUC: "+auc);
legend([lroc,lchance],"ROC curve","Random Chance")

19-283
19 Deep Learning

This example uses the maximum Youden Index metric to select the anomaly score threshold from the
ROC curve. This corresponds to the threshold value that maximizes the distance between the blue
model ROC curve and the red random chance ROC curve.
[~,ind] = max(yroc-xroc);
anomalyThreshold = troc(ind)

anomalyThreshold = 0.2082

Evaluate Classification Model

Calculate Anomaly Score Map for Test Set

Calculate feature embedding vectors for the test set images. First, create a minibatchqueue (Deep
Learning Toolbox) object to manage the mini-batches of test observations. Specify the mini-batch data
extraction format as "SSCB" (spatial, spatial, channel, batch). Use a larger mini-batch size to improve
throughput and reduce computation time.
minibatchSize = 1;
testQueue = minibatchqueue(tdsTest, ...
MiniBatchFormat=["SSCB","CB"], ...

19-284
Detect Image Anomalies Using Pretrained ResNet-18 Feature Embeddings

MiniBatchSize=minibatchSize, ...
OutputEnvironment="auto");

Perform the following steps to compute the anomaly scores for the test set images.

• Extract features of the test images from the same three layers of ResNet-18 used in training.
• Combine the features from the three layers into an overall embedding variable
XTestEmbeddings by using the concatenateEmbeddings helper function. The helper function
is defined at the end of this example.
• Downsample the embedding vectors to the same 100 channel elements used during training,
specified by rIdx.
• Reshape the embedding vectors into an H*W-by-C-by-B array, where B is the number of images in
the mini-batch.
• Calculate the Mahalanobis distance between each embedding feature vector and the learned
Gaussian distribution by using the calculateDistance helper function. The helper function is
defined at the end of this example.
• Create an anomaly score map for each image by using the createAnomalyScoreMap helper
function. The helper function is defined at the end of this example.
• Concatenate the anomaly score maps across mini-batches. The anomalyScoreMapsTest variable
specifies score maps for all test set images.

idx = 1;

XTestImages = [];
anomalyScoreMapsTest = [];

while hasdata(testQueue)
XTest = next(testQueue);

XTestFeatures1 = activations(net,extractdata(XTest),feature1LayerName);
XTestFeatures2 = activations(net,extractdata(XTest),feature2LayerName);
XTestFeatures3 = activations(net,extractdata(XTest),feature3LayerName);

XTestFeatures1 = gather(XTestFeatures1);
XTestFeatures2 = gather(XTestFeatures2);
XTestFeatures3 = gather(XTestFeatures3);
XTestEmbeddings = concatenateEmbeddings(XTestFeatures1,XTestFeatures2,XTestFeatures3);

XTestEmbeddings = XTestEmbeddings(:,:,rIdx,:);
[H, W, C, B] = size(XTestEmbeddings);
XTestEmbeddings = reshape(XTestEmbeddings,[H*W C B]);

distances = calculateDistance(XTestEmbeddings,H,W,B,means,covars);

anomalyScoreMap = createAnomalyScoreMap(distances,H,W,B,targetImageSize);
XTestImages = cat(4,XTestImages,gather(XTest));
anomalyScoreMapsTest = cat(4,anomalyScoreMapsTest,gather(anomalyScoreMap));

idx = idx+size(XTest,4);
clear XTestFeatures1 XTestFeatures2 XTestFeatures3 anomalyScoreMap distances XTestEmbeddings
end

19-285
19 Deep Learning

Classify Test Images

Calculate an overall mean anomaly score for each test image. Normalize the anomaly scores to the
same range used to pick the threshold, defined by minScore and maxScore.

scoresTest = squeeze(mean(anomalyScoreMapsTest,[1 2 3]));


scoresTest = mat2gray(scoresTest, [minScore maxScore]);

Predict class labels for each test set image by comparing the mean anomaly score map value to the
anomalyThreshold value.

predictedLabels = scoresTest > anomalyThreshold;

Calculate Classification Accuracy

Assign the known ground truth labels "Positive" or "Negative" to the test set images.

labelsTest = tdsTest.UnderlyingDatastores{1}.Labels ~= "Negative";

Calculate the confusion matrix and the classification accuracy for the test set. The classification
model in this example is accurate and predicts a small percentage of false positives and false
negatives.

targetLabels = logical(labelsTest);
M = confusionmat(targetLabels,predictedLabels);
confusionchart(M,["Negative","Positive"])
acc = sum(diag(M)) / sum(M,"all");
title("Accuracy: "+acc);

19-286
Detect Image Anomalies Using Pretrained ResNet-18 Feature Embeddings

Explain Classification Decisions

You can visualize the anomaly score map predicted by the PaDiM model as a heatmap overlaid on the
image. You can use this localization of predicted anomalies to help explain why an image is classified
as normal or anomalous. This approach is useful for identifying patterns in false negatives and false
positives. You can use these patterns to identify strategies to improve the classifier performance.

Calculate Heatmap Display Range

Instead of scaling the heatmap for each image individually, visualize heatmap data using the same
display range for all images in a data set. Doing so yields uniformly cool heatmaps for normal images
and warm colors in anomalous regions for anomaly images.

Calculate a display range that reflects the range of anomaly score values observed in the calibration
set. Apply the display range for all heatmaps in this example. Set the minimum value of the
displayRange to 0. Set the maximum value of the display range by calculating the maximum score
for each of the 200 calibration images, then selecting the 80th percentile of the maximums. Calculate
the percentile value by using the prctile function.

19-287
19 Deep Learning

maxScoresCal = mat2gray(maxScoresCal);
scoreMapRange = [0 prctile(maxScoresCal,80,"all")];

View Heatmap of Anomaly

Select an image of a correctly classified anomaly. This result is a true positive classification. Display
the image.

idxTruePositive = find(targetLabels & predictedLabels);


dsTruePositive = subset(tdsTest,idxTruePositive);
dataTruePositive = preview(dsTruePositive);
imgTruePositive = dataTruePositive{1};
imshow(imgTruePositive)
title("True Positive Test Image")

Obtain an anomaly score map of the true positive anomaly image. Normalize the anomaly scores to
the minimum and maximum values of the calibration data set to match the range used to pick the
threshold.

anomalyTestMapsRescaled = mat2gray(anomalyScoreMapsTest, [minScore maxScore]);


scoreMapTruePositive = anomalyTestMapsRescaled(:,:,1,idxTruePositive(1));

Display the heatmap as an overlay over the image by using the


anomalyMapOverlayForConcreteAnomalyDetector helper function. This function is attached to
the example as a supporting file.

imshow(anomalyMapOverlayForConcreteAnomalyDetector(imgTruePositive,scoreMapTruePositive,ScoreMapR
title("Heatmap Overlay of True Positive Result")

19-288
Detect Image Anomalies Using Pretrained ResNet-18 Feature Embeddings

To quantitatively confirm the result, display the mean anomaly score of the true positive test image as
predicted by the classifier. The value is greater than the anomaly score threshold.

disp("Mean anomaly score of test image: "+scoresTest(idxTruePositive(1)))

Mean anomaly score of test image: 0.25415

View Heatmap of Normal Image

Select and display an image of a correctly classified normal image. This result is a true negative
classification.

idxTrueNegative = find(~(targetLabels | predictedLabels));


dsTrueNegative = subset(tdsTest,idxTrueNegative);
dataTrueNegative = preview(dsTrueNegative);
imgTrueNegative = dataTrueNegative{1};
imshow(imgTrueNegative)
title("True Negative Test Image")

19-289
19 Deep Learning

Obtain a heatmap of the normal image. Display the heatmap as an overlay over the image by using
the anomalyMapOverlayForConcreteAnomalyDetector helper function. This function is attached
to the example as a supporting file. Many true negative test images, such as this test image, have
either small anomaly scores across the entire image or large anomaly scores in a localized portion of
the image.

scoreMapTrueNegative = anomalyTestMapsRescaled(:,:,1,idxTrueNegative(1));
imshow(anomalyMapOverlayForConcreteAnomalyDetector(imgTrueNegative,scoreMapTrueNegative,ScoreMapR
title("Heatmap Overlay of True Negative Result")

19-290
Detect Image Anomalies Using Pretrained ResNet-18 Feature Embeddings

To quantitatively confirm the result, display the mean anomaly score of the true positive test image as
predicted by the classifier. The value is less than the anomaly score threshold.

disp("Mean anomaly score of test image: "+scoresTest(idxTrueNegative(1)))

Mean anomaly score of test image: 0.12314

View Heatmaps of False Positive Images

False positives are images without crack anomalies that the network classifies as anomalous. Use the
explanation from the PaDiM model to gain insight into the misclassifications.

Find false positive images from the test set. Display three false positive images as a montage.

idxFalsePositive = find(~targetLabels & predictedLabels);


dataFalsePositive = readall(subset(tdsTest,idxFalsePositive));
numelFalsePositive = length(idxFalsePositive);
numImages = min(numelFalsePositive,3);
if numelFalsePositive>0
montage(dataFalsePositive(1:numImages,1),Size=[1,numImages],BorderSize=10);
title("False Positives in Test Set")
end

19-291
19 Deep Learning

Obtain heatmaps of the false positive images.

hmapOverlay = cell(1,numImages);
for idx = 1:numImages
img = dataFalsePositive{idx,1};
scoreMapFalsePositive = anomalyTestMapsRescaled(:,:,1,idxFalsePositive(idx));
hmapOverlay{idx} = anomalyMapOverlayForConcreteAnomalyDetector(img,scoreMapFalsePositive,Scor
end

Display the heatmap overlays as a montage. The false positive images show features such as rocks
that have similar visual characteristics to cracks. The anomaly scores are high in these localized
regions. However, the training data set only labels images with cracks as anomalous, so the ground
truth label for these images is Negative. Training a classifier that recognizes rocks and other non-
crack defects as anomalous requires training data with non-crack defects labeled as anomalous.

if numelFalsePositive>0
montage(hmapOverlay,Size=[1,numImages],BorderSize=10)
title("Heatmap Overlays of False Positive Results")
end

19-292
Detect Image Anomalies Using Pretrained ResNet-18 Feature Embeddings

Display the mean anomaly scores of the false positive test images as predicted by the PaDiM model.
The mean scores are greater than the anomaly score threshold, resulting in misclassifications.

disp("Mean anomaly scores:"); scoresTest(idxFalsePositive(1:numImages))

Mean anomaly scores:

ans = 3×1

0.2125
0.2395
0.2651

View Heatmaps of False Negative Images

False negatives are images with crack anomalies that the network classifies as normal. Use the
explanation from the PaDiM model to gain insights into the misclassifications.

Find any false negative images from the test set. Display three false negative images as a montage.

idxFalseNegative = find(targetLabels & ~predictedLabels);


dataFalseNegative = readall(subset(tdsTest,idxFalseNegative));
numelFalseNegative = length(idxFalseNegative);
numImages = min(numelFalseNegative,3);
if numelFalseNegative>0
montage(dataFalseNegative(1:numImages,1),Size=[1,numImages],BorderSize=10);
title("False Negatives in Test Set")
end

Obtain heatmaps of the false negative images.

hmapOverlay = cell(1,numImages);
for idx = 1:numImages
img = dataFalseNegative{idx,1};
scoreMapFalseNegative = anomalyTestMapsRescaled(:,:,1,idxFalseNegative(idx));
hmapOverlay{idx} = anomalyMapOverlayForConcreteAnomalyDetector(img,scoreMapFalseNegative,Scor
end

19-293
19 Deep Learning

Display the heatmap overlays as a montage. The PaDiM model predicts large anomaly scores around
cracks, as expected.

if numelFalseNegative>0
montage(hmapOverlay,Size=[1,numImages],BorderSize=10)
title("Heatmap Overlays of False Negative Results")
end

Display the mean anomaly scores of the false negative test images as predicted by the PaDiM model.
The mean scores are less than the anomaly score threshold, resulting in misclassifications.

disp("Mean anomaly scores:"); scoresTest(idxFalsePositive(1:numImages))

Mean anomaly scores:

ans = 3×1

0.2125
0.2395
0.2651

Supporting Functions

The concatenateEmbeddings helper function combines features extracted from three layers of
ResNet-18 into one feature embedding vector. The features from the second and third blocks of
ResNet-18 are resized to match the spatial resolution of the first block.

function XEmbeddings = concatenateEmbeddings(XFeatures1,XFeatures2,XFeatures3)


XFeatures2Resize = imresize(XFeatures2,2,"nearest");
XFeatures3Resize = imresize(XFeatures3,4,"nearest");
XEmbeddings = cat(3,XFeatures1,XFeatures2Resize,XFeatures3Resize);
end

The calculateDistance helper function calculates the Mahalanobis distance between each
embedding feature vector specified by XEmbeddings and the learned Gaussian distribution for the
corresponding patch with mean specified by means and covariance matrix specified by covars.

19-294
Detect Image Anomalies Using Pretrained ResNet-18 Feature Embeddings

function distances = calculateDistance(XEmbeddings,H,W,B,means,covars)


distances = zeros([H*W 1 B]);
for dIdx = 1:H*W
distances(dIdx,1,:) = pdist2((squeeze(means(dIdx,:))),(squeeze(XEmbeddings(dIdx,:,:))'),"
end
end

The createAnomalyScoreMap helper function creates an anomaly score map for each image with
embeddings vectors specified by XEmbeddings. The createAnomalyScoreMap function reshapes
and resizes the anomaly score map to match the size and resolution of the original input images.

function anomalyScoreMap = createAnomalyScoreMap(distances,H,W,B,targetImageSize)


anomalyScoreMap = reshape(distances,[H W 1 B]);
anomalyScoreMap = imresize(anomalyScoreMap,targetImageSize,"bilinear");
for mIdx = 1:size(anomalyScoreMap,4)
anomalyScoreMap(:,:,1,mIdx) = imgaussfilt(anomalyScoreMap(:,:,1,mIdx),4,FilterSize=33);
end
end

References

[1] Defard, Thomas, Aleksandr Setkov, Angelique Loesch, and Romaric Audigier. “PaDiM: A Patch
Distribution Modeling Framework for Anomaly Detection and Localization.” In Pattern Recognition.
ICPR International Workshops and Challenges, 475–89. Lecture Notes in Computer Science. Cham,
Switzerland: Springer International Publishing, 2021. https://doi.org/10.1007/978-3-030-68799-1_35.

[2] He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. “Deep Residual Learning for Image
Recognition.” In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 770–78.
Las Vegas, NV, USA: IEEE, 2016. https://doi.org/10.1109/CVPR.2016.90.

[3] ImageNet. https://www.image-net.org.

[4] Özgenel, Ç. F., and Arzu Gönenç Sorguç. “Performance Comparison of Pretrained Convolutional
Neural Networks on Crack Detection in Buildings.” Taipei, Taiwan, 2018. https://doi.org/10.22260/
ISARC2018/0094.

[5] Zhang, Lei, Fan Yang, Yimin Daniel Zhang, and Ying Julie Zhu. “Road Crack Detection Using Deep
Convolutional Neural Network.” In 2016 IEEE International Conference on Image Processing (ICIP),
3708–12. Phoenix, AZ, USA: IEEE, 2016. https://doi.org/10.1109/ICIP.2016.7533052.

See Also
imageDatastore | activations | resnet18 | perfcurve | confusionmat | confusionchart

Related Examples
• “Detect Image Anomalies Using Explainable FCDD Network” on page 19-247
• “Classify Defects on Wafer Maps Using Deep Learning” on page 19-260

19-295
19 Deep Learning

More About
• “Datastores for Deep Learning” (Deep Learning Toolbox)
• “Preprocess Images for Deep Learning” (Deep Learning Toolbox)

19-296
20

Hyperspectral Image Processing

This topic describes functions that enable hyperspectral image analysis and provides examples for
spectral classification and anomaly detection using endmembers and abundance maps.

• “Getting Started with Hyperspectral Image Processing” on page 20-2


• “Hyperspectral Data Correction” on page 20-9
• “Spectral Indices” on page 20-13
• “Support for Singleton Dimensions” on page 20-18
• “Identify Vegetation and Non-Vegetation Spectra” on page 20-20
• “Explore Hyperspectral Data in the Hyperspectral Viewer” on page 20-22
• “Hyperspectral Image Analysis Using Maximum Abundance Classification” on page 20-33
• “Classify Hyperspectral Image Using Library Signatures and SAM” on page 20-40
• “Endmember Material Identification Using Spectral Library” on page 20-46
• “Target Detection Using Spectral Signature Matching” on page 20-53
• “Identify Vegetation Regions Using Interactive NDVI Thresholding” on page 20-61
• “Classify Hyperspectral Images Using Deep Learning” on page 20-66
• “Find Regions in Spatially Referenced Multispectral Image” on page 20-72
• “Classify Hyperspectral Image Using Support Vector Machine Classifier” on page 20-78
• “Manually Label ROIs in Multispectral Image” on page 20-83
• “Change Detection in Hyperspectral Images” on page 20-88
• “Ship Detection from Sentinel-1 C Band SAR Data Using YOLO v2 Object Detection”
on page 20-93
• “Automate Pixel Labeling of Hyperspectral Images Using ECOSTRESS Spectral Signatures in
Image Labeler” on page 20-106
20 Hyperspectral Image Processing

Getting Started with Hyperspectral Image Processing


Hyperspectral imaging measures the spatial and spectral characteristics of an object by imaging it at
different wavelengths. The wavelength range extends beyond the visible spectrum and covers from
ultraviolet (UV) to long wave infrared (LWIR) wavelengths. The most popular are the visible, near-
infrared, and mid-infrared wavelength bands. A hyperspectral imaging sensor acquires several
images with narrow and contiguous wavelengths within a specified spectral range. Each of these
images contains more subtle and detailed information. The different information in the various
wavelengths is particularly useful in remote sensing applications, such as identification of vegetation,
water bodies, and roads, as different landscapes have distinct spectral signatures.

Hyperspectral image processing involves representing, analyzing, and interpreting information


contained in the hyperspectral images.

Representing Hyperspectral Data


The values measured by a hyperspectral imaging sensor are stored to a binary data file by using band
sequential (BSQ), band-interleaved-by-pixel (BIP), or band-interleaved-by-line (BIL) encoding formats.
The data file is associated to a header file that contains ancillary information (metadata) like sensor
parameters, acquisition settings, spatial dimensions, spectral wavelengths, and encoding formats that
are required for proper representation of the values in the data file.

For hyperspectral image processing, the values read from the data file are arranged into a three-
dimensional (3-D) array of form M-by-N-by-C, where M and N are the spatial dimensions of the
acquired data, C is the spectral dimension specifying the number of spectral wavelengths used during
acquisition. Thus, you can consider the 3-D array as a set of two-dimensional (2-D) monochromatic
images captured at varying wavelengths. This set is known as the hyperspectral data cube or data
cube.

The hypercube function constructs the data cube by reading the data file and the metadata
information in the associated header file. The hypercube function creates a hypercube object and
stores the data cube, spectral wavelengths, and the metadata to its properties. You can use the
hypercube object as input to all other functions in the Image Processing Toolbox Hyperspectral
Imaging Library.

20-2
Getting Started with Hyperspectral Image Processing

Color Representation of Data Cube

To visualize and understand the object being imaged, it is useful to represent the data cube as a 2-D
image by using color schemes. The color representation of the data cube enables you to visually
inspect the data and supports decision making. You can use the colorize function to compute the
Red-Green-Blue (RGB), false-color, and color-infrared (CIR) representations of the data cube.

• The RGB color scheme uses the red, green, and blue spectral band responses to generate the 2-D
image of the hyperspectral data cube. The RGB color scheme brings a natural appearance, but
results in a significant loss of subtle information.
• The false-color scheme uses a combination of any number of bands other than the visible red,
green, and blue spectral bands. Use false-color representation to visualize the spectral responses
of bands outside the visible spectrum. The false-color scheme efficiently captures distinct
information across all spectral bands of hyperspectral data.
• The CIR color scheme uses spectral bands in the NIR range. The CIR representation of a
hyperspectral data cube is particularly useful in displaying and analyzing vegetation areas of the
data cube.

Preprocessing
The hyperspectral imaging sensors typically have high spectral resolution and low spatial resolution.
The spatial and the spectral characteristics of the acquired hyperspectral data are characterized by
its pixels. Each pixel is a vector of values that specify the intensities at a location (x,y) in z different
bands. The vector is known as the pixel spectrum, and it defines the spectral signature of the pixel
located at (x,y). The pixel spectra are important features in hyperspectral data analysis. But these
pixel spectra gets distorted due to factors such as sensor noise, atmospheric effects, and low
resolution.

20-3
20 Hyperspectral Image Processing

You can use the denoiseNGMeet function to remove noise from a hyperspectral data by using the
non-local meets global approach.

To enhance the spatial resolution of a hyperspectral data, you can use image fusion methods. The
fusion approach combines information from the low resolution hyperspectral data with a high
resolution multispectral data or panchromatic image of the same scene. This approach is also known
as sharpening or pansharpening in hyperspectral image analysis. Pansharpening specifically refers to
fusion between hyperspectral and panchromatic data. You can use the sharpencnmf function for
sharpening hyperspectral data using coupled non-matrix factorization method.

To compensate for the atmospheric effects, you must first calibrate the pixel values, which are digital
numbers (DNs). You must preprocess the data by calibrating DNs using radiometric and atmospheric
correction methods. This process improves interpretation of the pixel spectra and provides better
results when you analyse multiple data sets, as in a classification problem. For information about
radiometric calibration and atmospheric correction methods, see “Hyperspectral Data Correction” on
page 20-9.

The other preprocessing step that is important in all hyperspectral imaging applications is
dimensionality reduction. The large number of bands in the hyperspectral data increases the
computational complexity of processing the data cube. The contiguous nature of the band images
results in redundant information across bands. Neighboring bands in a hyperspectral image have
high correlation, which results in spectral redundancy. You can remove the redundant bands by
decorrelating the band images. Popular approaches for reducing the spectral dimensionality of a data
cube include band selection and orthogonal transforms.

• The band selection approach uses orthogonal space projections to find the spectrally distinct and
most informative bands in the data cube. Use the selectBands and removeBands functions for
the finding most informative bands and removing one or more bands, respectively.
• Orthogonal transforms such as principal component analysis (PCA) and maximum noise fraction
(MNF), decorrelate the band information and find the principal component bands.

20-4
Getting Started with Hyperspectral Image Processing

PCA transforms the data to a lower dimensional space and finds principal component vectors with
their directions along the maximum variances of the input bands. The principal components are in
descending order of the amount of total variance explained.

MNF computes the principal components that maximize the signal-noise-ratio, rather than the
variance. MNF transform is particularly efficient at deriving principal components from noisy
band images. The principal component bands are spectrally distinct bands with low interband
correlation.

The hyperpca and hypermnf functions reduce the spectral dimensionality of the data cube by
using the PCA and MNF transforms respectively. You can use the pixel spectra derived from the
reduced data cube for hyperspectral data analysis.

Spectral Unmixing
In a hyperspectral image, the intensity values recorded at each pixel specify the spectral
characteristics of the region that the pixel belongs to. The region can be a homogeneous surface or
heterogeneous surface. The pixels that belong to a homogeneous surface are known as pure pixels.
These pure pixels constitute the endmembers of the hyperspectral data.

Heterogeneous surfaces are a combination of two or more distinct homogeneous surfaces. The pixels
belonging to heterogeneous surfaces are known as mixed pixels. The spectral signature of a mixed
pixel is a combination of two or more endmember signatures. This spatial heterogeneity is mainly due
to the low spatial resolution of the hyperspectral sensor.

Spectral unmixing is the process of decomposing the spectral signatures of mixed pixels into their
constituent endmembers. The spectral unmixing process involves two steps:

1 Endmember extraction — The spectra of the endmembers are prominent features in the
hyperspectral data and can be used for efficient spectral unmixing, segmentation, and
classification of hyperspectral images. Convex geometry based approaches, such as pixel purity
index (PPI), fast iterative pixel purity index (FIPPI), and N-finder (N-FINDR) are some of the
efficient approaches for endmember extraction.

20-5
20 Hyperspectral Image Processing

• Use the ppi function to estimate the endmembers by using the PPI approach. The PPI
approach projects the pixel spectra to an orthogonal space and identifies extrema pixels in the
projected space as endmembers. This is a non-iterative approach, and the results depend on
the random unit vectors generated for orthogonal projection. To improve results, you must
increase the random unit vectors for projection, which can be computationally expensive.
• Use the fippi function to estimate the endmembers by using the FIPPI approach. The FIPPI
approach is an iterative approach, which uses an automatic target generation process to
estimate the initial set of unit vectors for orthogonal projection. The algorithm converges
faster than the PPI approach and identifies endmembers that are distinct from one another.
• Use the nfindr function to estimate the endmembers by using the N-FINDR method. N-
FINDR is an iterative approach that constructs a simplex by using the pixel spectra. The
approach assumes that the volume of a simplex formed by the endmembers is larger than the
volume defined by any other combination of pixels. The set of pixel signatures for which the
volume of the simplex is high are the endmembers.
2 Abundance map estimation — Given the endmember signatures, it is useful to estimate the
fractional amount of each endmember present in each pixel. You can generate the abundance
maps for each endmember, which represent the distribution of endmember spectra in the image.
You can label a pixel as belonging to an endmember spectra by comparing all of the abundance
map values obtained for that pixel.

Use the estimateAbundanceLS function to estimate the abundance maps for each endmember
spectra.

Spectral Matching
Interpret the pixel spectra by performing spectral matching.Spectral matching identifies the class of
an endmember material by comparing its spectra with one or more reference spectra. The reference
data consists of pure spectral signatures of materials, which are available as spectral libraries.

Use the readEcostressSig function to read the reference spectra files from the ECOSTRESS
spectral library. Then, you can compute the similarity between the files in the ECOSTRESS library
spectra and an endmember spectra by using the spectralMatch function.

The geometrical characteristics and the probability distribution values of the pixel spectra are the
important features for spectral matching. You can improve the matching efficiency by combining both
the geometrical and probabilistic characteristics. Such combination measures have higher
discrimination capabilities than the individual approaches and are more suitable for discriminating
spectrally similar targets (intra-species). This table lists the functions available for computing the
spectral matching score.

Method Description
sam Spectral angle mapper (SAM) matches two
spectra based on their geometrical
characteristics. The SAM measure computes
angle between two spectral signatures. The
smaller angle represents best matching between
two spectra. This measure is insensitive to
illumination changes.

20-6
Getting Started with Hyperspectral Image Processing

sid Spectral information divergence (SID) matches


two spectra based on their probability
distributions. This method is efficient in
identifying mixed pixels spectra. Low SID value
implies higher similarity between two spectra.
sidsam Combination of SID and SAM. The SID-SAM
approach has better discrimination capability
compared to SID and SAM individually. Minimum
score implies higher similarity between two
spectra.
jmsam Combination of Jeffries–Matusita (JM) distance
and SAM. Low distance values imply higher
similarity between two spectra. This method is
particularly efficient in discriminating spectrally
close targets.
ns3 Normalized spectra similarity score (NS3), which
combines Euclidean distance and SAM. Low
distance values imply higher similarity between
two spectra. This method has high discrimination
capability but requires extensive reference data
for high accuracy.

Applications
Hyperspectral image processing applications include classification, target detection, anomaly
detection, and material analysis.

• Segment and classify each pixel in a hyperspectral image through unmixing and spectral
matching. For examples of classification, see “Hyperspectral Image Analysis Using Maximum
Abundance Classification” on page 20-33 and “Classify Hyperspectral Image Using Library
Signatures and SAM” on page 20-40.
• You can perform target detection by matching the known spectral signature of a target material to
the pixel spectra in hyperspectral data. For an example, see “Target Detection Using Spectral
Signature Matching” on page 20-53.
• You can also use hyperspectral image processing for anomaly detection and material analysis,
such as vegetation analysis.

• Use the anomalyRX function to detect anomalies in a hyperspectral image.


• Use the spectralIndices function to analyze the spectral characteristics of various
materials present in a hyperspectral data.

See Also
Apps
Hyperspectral Viewer

Functions
hypercube | spectralMatch | anomalyRX | ndvi | ppi | estimateAbundanceLS

20-7
20 Hyperspectral Image Processing

Related Examples
• “Classify Hyperspectral Image Using Library Signatures and SAM” on page 20-40
• “Hyperspectral Image Analysis Using Maximum Abundance Classification” on page 20-33
• “Target Detection Using Spectral Signature Matching” on page 20-53

20-8
Hyperspectral Data Correction

Hyperspectral Data Correction


Hyperspectral sensors used for remote sensing applications acquire the spectral characteristics of
the Earth's surface in many narrow and contiguous bands. When solar radiation is incident on a
surface material, the material reflects the incident radiation. The amount of energy reflected signifies
the spectral characteristics of the surface material.

The incident radiation reflected by the surface is known as the surface reflectance. The reflected
radiation measured by the sensor positioned at the top of the atmosphere (TOA) is known as the TOA
radiance. Ideally, the TOA radiance is equal to the surface reflectance. But, in real conditions, the
incident and the reflected radiation are affected by atmospheric phenomena such as scattering and
absorption. As a result, the TOA radiance value is the sum of reflections from the surface, reflections
from clouds, and scattering from air molecules and aerosol particles in the atmosphere.

Along with the characteristics of the light source and the surface material, the radiation values
measured by the sensor are influenced by the sensor gain and bias (offset) at each spectral
wavelength. The raw data recorded by the hyperspectral sensors is known as the digital numbers
(DNs). To use the hyperspectral data for quantitative analysis, you must calibrate the data for TOA
radiance values, and estimate the actual surface reflectance values from the DNs.

The process of estimating TOA radiance values from the DNs is known as radiometric calibration. The
process of estimating the surface reflectance values by removing the atmospheric effects is known as
atmospheric correction.

You can perform radiometric calibration and atmospheric correction procedures as preprocessing
steps for thorough spectral analysis.

Radiometric Calibration
DN to TOA Radiance

To estimate TOA radiance values from DNs, calibrate sensor gain and bias in each spectral band.

Gainλ and Biasλ are the gain and bias values for each spectral band (λ), respectively.

20-9
20 Hyperspectral Image Processing

You can find the TOA radiance values for uncalibrated hyperspectral data by using the dn2radiance
function. The function reads the gain and the bias (offset) values for each spectral band from the
header file associated with the hyperspectral data.

TOA Radiance to TOA Reflectance

You can estimate the TOA reflectance values from TOA radiance values. TOA reflectance specifies the
ratio of TOA radiance to the radiation incident on the surface.

d is the Earth-sun distance in astronomical units, ESUNλ is the mean solar irradiance for each
spectral band, and θE is the sun elevation angle. You can estimate the TOA reflectance values from
TOA radiance values by using the radiance2Reflectance function.

DN to TOA Reflectance

You can directly compute TOA reflectance values from DNs, if the reflectance gain (RGain) and
reflectance offset (ROffset) parameters of each spectral band are available.

The dn2reflectance function calibrates the DNs to TOA reflectance values by using the reflectance
gain and offset parameters available in the metadata.

Atmospheric Correction
Atmospheric correction methods estimate the surface reflectance values from TOA radiance or TOA
reflectance values. The atmospheric correction methods are classified as empirical methods and
model-based methods.

• Empirical methods are scene-based approaches that estimate relative surface reflectance values.
Empirical methods are computationally efficient and does not require a priori measurements.
• Model-based methods are dependent on in situ atmospheric data and are useful for accurate
estimation of surface reflectance values.

Method Description
subtractDarkPixel Dark pixel subtraction or dark object subtraction,
is an empirical method suitable for removing
atmospheric haze from hyperspectral images.
Atmospheric haze is characterized by high DN
values, and results in unnatural brightening of
the images. The dark pixels are minimum values
pixels in each band. Dark pixels are assumed to
have zero surface reflectance, and their values
account for the additive effect of the atmospheric
path radiance.

20-10
Hyperspectral Data Correction

empiricalLine The empirical line calibration method assumes a


linear relationship between the surface
reflectance and the measured reflectance values.
This method assumes that the input
hyperspectral data has one or more known target
pixels for which the surface reflectance values
are available. The calibration method consists of
regressing the measured spectral value of the
target pixels against the a priori surface
reflectance values.

You can use the empirical line calibration method


if the data is acquired under uniform atmospheric
conditions, and the measurements related to the
target are time invariant.
flatField Flat field correction assumes that the surface
being imaged includes a bright, uniform area that
has neutral spectral reflectance. The mean
spectrum of such an area includes the combined
effects of solar irradiance, atmospheric
scattering, and absorption. The relative surface
reflectance values are estimated by dividing each
pixel spectrum by the mean spectrum.
iarr Internal average relative reflectance (IARR) is an
empirical approach that computes relative
surface reflectance by normalizing each pixel
spectrum with the mean spectrum. The method
assumes that the surface is heterogeneous, and
the spectral reflectance characteristics cancel
out. As a result, the mean spectrum of the surface
is similar to a flat field spectrum.

This method is particularly helpful in estimating


the relative surface reflectance values for regions
without vegetation.
logResiduals Logarithmic residual correction of hyperspectral
data is performed by dividing each pixel
spectrum in the hyperspectral data by the
spectral geometric mean and the spatial
geometric mean. This method is an empirical
approach that relies on the statistics of the
acquired hyperspectral image.

You can use this method to remove solar


irradiance and atmospheric transmittance effects.

20-11
20 Hyperspectral Image Processing

sharc The satellite hypercube atmospheric rapid


correction (SHARC) method computes absolute
surface reflectance values based on the analytical
solutions of the radiative transfer equation. The
surface reflectance values are computed by
considering the adjacency effect for each point in
the surface and the atmospheric effects.

You can use this method if the atmospheric model


parameters necessary to compute the accurate
surface reflectance values are available.
fastInScene The fast in-scene method is an empirical
approach which performs atmospheric correction
based on in-scene characteristics. The method
determines correction parameters directly from
the pixel spectra of the acquired hyperspectral
data. This method results in an approximate
correction, but it is computationally faster than
model-based methods.

Use the fast in-scene method to correct


atmospheric effects on hyperspectral data with
diverse pixel spectra and sufficient number of
dark pixels. The method estimates the baseline
spectrum by using the dark pixels.
rrs Remote sensing reflectance (RRS) for correcting
atmospheric effects from hyperspectral data
containing large water bodies. RRS method
estimates the water-leaving radiance and is the
atmospheric correction method for hyperspectral
ocean color data.
correctOOB Out-of-band correction method. This method
removes out-of-band (OOB) effects from
multispectral data by using the measured
radiance and the sensor spectral response values.

20-12
Spectral Indices

Spectral Indices
Spectral index is the ratio of broadband spectral bands or as the normalized differences between two
bands.

There are various hyperspectral sensors to capture hyperspectral data whose band centers are
usually slightly different. Thus defining bands gives you the freedom to apply spectral indices to a
wide range of sensors. The “Band Definition” on page 20-13 section provide details of band
definitions used in the Image Processing Toolbox Hyperspectral Imaging Library.

Spectral indices characterize the specific features of interest of a target by exploiting its biophysical
and chemical properties. These features of interest enable you to identify plant, water, soil, and
various forms of built-up regions such as road, house, railway track, and parking lot. For more
information on supported spectral indices, see “List of Supported Spectral Indices” on page 20-15
section.

Band Definition
Image Processing Toolbox Hyperspectral Imaging Library uses various band definitions to compute
spectral indices. The toolbox selects the nearest wavelength to the center of each band available in
input hyperspectral data.

20-13
20 Hyperspectral Image Processing

Depending on the range of the wavelengths for a band, the band definition can be of two types.

• Broadband — Bands generally have wider wavelength ranges.


• Narrowband — Bands generally have narrow wavelength ranges.

This table lists the broadband definitions used in the toolbox.

Band Minimum Center Maximum


B 400 nm 470 nm 500 nm
G 500 nm 550 nm 600 nm
R 600 nm 650 nm 700 nm
NIR 760 nm 860 nm 960 nm
SWIR1 1550 nm 1650 nm 1750 nm
SWIR2 2080 nm 2220 nm 2350 nm

This table lists the narrowband definitions used in the toolbox.

Band Minimum Center Maximum


B531 525 nm 531 nm 550 nm
B550 540 nm 550 nm 560 nm
B570 560 nm 570 nm 575 nm
B670 650 nm 670 nm 690 nm
B700 680 nm 700 nm 730 nm
B795 720 nm 795 nm 800 nm
B800 780 nm 800 nm 865 nm
B819 815 nm 819 nm 824 nm

20-14
Spectral Indices

B990 830 nm 990 nm 995 nm


B1510 1500 nm 1510 nm 1515 nm
B1599 1590 nm 1599 nm 1620 nm
B1680 1670 nm 1680 nm 1690 nm
B2000 1980 nm 2000 nm 2040 nm
B2100 2085 nm 2100 nm 2110 nm
B2200 2170 nm 2200 nm 2220 nm

List of Supported Spectral Indices


Image Processing Toolbox Hyperspectral Imaging Library supports various spectral indices used to
identify vegetation, minerals, burned areas, and built up regions. This table lists the hyperspectral
indices supported by the spectralIndices function. The equations of these indices uses the band
definitions as specified in “Band Definition” on page 20-13.

Index Name Equation Description


Cellulose absorption index (CAI) CAI = 0.5 B2000 + B2200 CAI identifies dried plant
− B2100 materials relative to the
cellulose sensitive wavelengths
in the range 2000 nm to 2200
nm. Use this index to monitor
crop residue, plant health, and
fuel conditions in an ecosystem.

The index value varies in the


range (-3, 4).
Clay minerals ratio (CMR) SWIR1 CMR identifies hydrothermally
CMR =
SWIR2 altered rocks containing clay
and alunite. Use this index to
map minerals in rock surfaces.
Enhanced vegetation index EVI identifies vegetation regions
EVI
(EVI) with a high leaf area index. It
NIR − R uses blue reflectance to correct
=
NIR + 6 ∗ R − 7.5 ∗ B + 1 the soil background by including
atmospheric influences.

For vegetation pixels, the index


value varies in the range (0, 1).
Green vegetation index (GVI) GVI = −0.2848 ∗ B + −0.2435 ∗GVI
G + −0.5436
identifies ∗ R vegetation
green + ...
by reducing the
0.7243 ∗ NIR + 0.0840 ∗ SWIR1 + −0.1800 ∗ SWIR2background soil
effect.

The index value varies in the


range (-1, 1).

20-15
20 Hyperspectral Image Processing

Modified chlorophyll absorption MCARI identifies the vegetation


ratio index (MCARI) MCARI = B700 − B670 regions that contain chlorophyll
B700 by minimizing the combined
− 0.2 ∗ B700 − B550 ∗ effects of soil and non-
B670
photosynthetic surfaces.
Modified triangular vegetation MTVI = 1.2 1.2 ∗ B800 − B550 MTVI identifies vegetation
index (MTVI) − 2.5 ∗ B670 − B550 regions. This index includes the
800 nm wavelength, which is
influenced by changes in leaf
and canopy structure.
Modified normalized difference G − SWIR1 MNDWI identifies open water
MNDWI =
water index (MNDWI) G + SWIR1 surfaces, reducing background
noise from soil, vegetation, and
built-up areas.
Moisture stress index (MSI) B1599 MSI maps the level of leaf water
MSI = content in vegetation canopies.
B819

The index value varies in the


range (0, 3).
Normalized burn ratio (NBR) NIR − SWIR2 NBR identifies burned areas in
NBR =
NIR + SWIR2 larger fire zones. You can obtain
a burn sensitive image by
subtracting the pre-fire and
post-fire NBR images.
Normalized difference built-up SWIR1 − NIR NDBI identifies urban areas
NDBI =
index (NDBI) SWIR1 + NIR with high reflectance in the
SWIR region as compared to the
NIR region.
Normalized difference mud B795 − B990 NDMI identifies shallow water
index (NDMI) NDMI = or muddy surfaces.
B795 + B990
Normalized difference nitrogen log
1
− log
1 NDNI maps the amount of
index (NDNI) B1510 B1680 nitrogen content in vegetation
NDNI = 1 1
log + log canopies. Because NDNI is a
B1510 B1680
logarithmic index, use data
values in the range 0 to 1 for
accurate results.
Normalized difference NIR − R NDVI identifies vegetation
NDVI = ,
vegetation index (NDVI) NIR + R canopies.

The index value varies in the


range (-1, 1). A value close to 1
indicates healthy vegetation, 0
indicates unhealthy vegetation,
and -1 indicates no vegetation.
Optimized soil adjusted NIR − R OSAVI identifies sparse
OSAVI =
vegetation index (OSAVI) NIR + R + 0.16 vegetation, where soil is visible
through the canopy.

20-16
Spectral Indices

Photochemical reflectance index B531 − B570 PRI maps the photosynthetic


(PRI) PRI = efficiency of a region by
B531 + B570
detecting the changes of
carotenoid pigments in live
foliage.

The index value varies in the


range (-1, 1).
Simple ratio (SR) NIR SR finds the ratio of vegetation
SR =
R and chlorophyll absorption
wavelength features.

References
[1] Bannari, A., D. Morin, F. Bonn, and A. R. Huete. “A Review of Vegetation Indices.” Remote Sensing
Reviews 13, no. 1–2 (August 1995): 95–120. https://doi.org/10.1080/02757259509532298.

[2] Xue, Jinru, and Baofeng Su. “Significant Remote Sensing Vegetation Indices: A Review of
Developments and Applications.” Journal of Sensors 2017 (2017): 1–17. https://doi.org/
10.1155/2017/1353691.

[3] Haboudane, D. “Hyperspectral Vegetation Indices and Novel Algorithms for Predicting Green LAI
of Crop Canopies: Modeling and Validation in the Context of Precision Agriculture.” Remote
Sensing of Environment 90, no. 3 (April 15, 2004): 337–52. https://doi.org/10.1016/
j.rse.2003.12.013.

[4] Thenkabail, Prasad S., John G. Lyon, and Alfredo Huete, eds. Hyperspectral Indices and Image
Classifications for Agriculture and Vegetation. Boca Raton: CRC Press, 2018.

See Also
Functions
ndvi | spectralIndices

Related Examples
• “Compute Spectral Indices for Hyperspectral Data”
• “Detect Water Regions Using MNDWI”
• “Measure Vegetation Cover in Hyperspectral Data Using NDVI Image”

20-17
20 Hyperspectral Image Processing

Support for Singleton Dimensions


Image Processing Toolbox Hyperspectral Imaging Library represents hyperspectral images as three-
dimensional (3-D) arrays of the form M-by-N-by-P, where M and N are the spatial dimensions of the
acquired data, and P is the number of spectral wavelengths used during acquisition. However,
hyperspectral data can also be spectral reflectance curves obtained from ground spectrometers, and
are arranged sequentially without a spatial dimension. This spectral response can be stored as a
vector or a matrix. A vector of spectral data contains a single spectral reflectance curve, and a matrix
of spectral data contains multiple spectral reflectance curves in sequence.

To process spectral data use these functions in Image Processing Toolbox Hyperspectral Imaging
Library, that also support hyperspectral images.

Explore, Analyze, Data Correction Dimensionality Spectral Spectral


and Visualize Reduction Unmixing Matching and
Target Detection
hypercube dn2radiance hyperpca ppi sam

dn2reflectance hypermnf fippi sid

radiance2Refle inverseProject nfindr jmsam


ctance ion
estimateAbunda
sidsam
empiricalLine nceLS

flatField ns3

iarr spectralMatch

logResiduals ndvi

subtractDarkPi anomalyRX
xel

To use 1-D or 2-D spectral data for the hyperspectral functions, you must reshape it into 3-D volume
data:

• Reshape 1-D spectral data of size 1-by-P into a 3-D hypercube of size 1-by-1-by-P, where P is the
spectral dimension. For example,

spectralDim = size(spectralData1D,2);
dataCube = reshape(spectralData1D,[1 1 spectralDim]);
hCube = hypercube(dataCube,wavelength);

reshapes the 1-D spectral data spectralData1D of size 1-by-P into the 3-D volume data
dataCube of size 1-by-1-by-P by using the reshape function. The hypercube function then
creates a 3-D hypercube object by adding the wavelength information wavelength to the 3-D
volume data dataCube.
• Reshape 2-D spectral data of size M-by-P into a 3-D hypercube of size M-by-1-by-P or 1-by-M-by-P,
where M represents the number of spectral reflectance curves and P is the spectral dimension.
For example,

20-18
Support for Singleton Dimensions

[numSpectra,spectralDim] = size(spectralData2D);
dataCube = reshape(spectralData2D,[numSpectra 1 spectralDim]);
hCube = hypercube(dataCube,wavelength);

reshapes the 2-D spectral data spectralData2D of size M-by-P into the 3-D volume data
dataCube of size M-by-1-by-P by using the reshape function. The hypercube function then
creates a 3-D hypercube object by adding the wavelength information wavelength to the 3-D
volume data dataCube.

For more information on how to use 2-D spectral data for hyperspectral function, see “Identify
Vegetation and Non-Vegetation Spectra” on page 20-20 example.

See Also
hypercube | spectralMatch | ndvi

More About
• “Identify Vegetation and Non-Vegetation Spectra” on page 20-20
• “Getting Started with Hyperspectral Image Processing” on page 20-2

20-19
20 Hyperspectral Image Processing

Identify Vegetation and Non-Vegetation Spectra

This example shows you how to:

• Use 2-D spectral data as a hypercube for the hyperspectral functions.


• Separate vegetation and non-vegetation spectra by using ndvi function.

This example requires the Image Processing Toolbox™ Hyperspectral Imaging Library. You can install
the Image Processing Toolbox Hyperspectral Imaging Library from Add-On Explorer. For more
information about installing add-ons, see “Get and Manage Add-Ons”. The Image Processing Toolbox
Hyperspectral Imaging Library requires desktop MATLAB®, as MATLAB® Online™ and MATLAB®
Mobile™ do not support the library.

Load 2-D Spectral Data

Load 2-D spectral data containing 20 endmembers of the Indian Pines data set into the workspace.

load("indian_pines_endmembers_20.mat")

Load the wavelength values for each band of the Indian Pines data set into the workspace.

load("indian_pines_wavelength.mat")

Prepare Test Data to Use for Hyperspectral Functions

Reshape the 2-D spectral data into a 3-D volume data using the reshape function.

[numSpectra,spectralDim] = size(endmembers);
dataCube = reshape(endmembers,[numSpectra 1 spectralDim]);

Create a 3-D hypercube object, with a singleton dimension, by specifying the 3-D volume data
dataCube and wavelength information wavelength to the hypercube function.

hCube = hypercube(dataCube,wavelength);

Compute NDVI to Separate Vegetation and Non-Vegetation Spectra

Compute the NDVI value for each spectrum in the hypercube object.

ndviVal = ndvi(hCube);

Vegetation spectra typically have NDVI values greater than zero and non-vegetation spectra typically
have NDVI values less than zero. Perform thresholding to separate the vegetation and non-vegetation
spectra.

index = ndviVal > 0;

Plot the vegetation and non-vegetation endmembers.

subplot(2,1,1)
plot(endmembers(index,:)')
title("Vegetation endmembers")
xlabel("Bands")
ylabel("Reflectance Values")
axis tight
subplot(2,1,2)

20-20
Identify Vegetation and Non-Vegetation Spectra

plot(endmembers(~index,:)')
title("Non-Vegetation endmembers")
xlabel("Bands")
ylabel("Reflectance Values")
axis tight

See Also
hypercube | spectralMatch | ndvi

More About
• “Support for Singleton Dimensions” on page 20-18
• “Getting Started with Hyperspectral Image Processing” on page 20-2

20-21
20 Hyperspectral Image Processing

Explore Hyperspectral Data in the Hyperspectral Viewer

This example shows how to explore hyperspectral data using the Hyperspectral Viewer app. Using
the capabilities of the app, you can view the individual bands of a hyperspectral data set as grayscale
images. You can also view color composite representations of the data set as RGB, color infrared
(CIR), and false-color images. You can also visualize hyperspectral indices of the data. In addition to
exploring these visual representations of the spatial dimensions of the data, you can create plots of
individual points or small regions of the data along the spectral dimension. These plots, called
spectral profiles, can identify elements in the hyperspectral data.

This example requires the Image Processing Toolbox™ Hyperspectral Imaging Library. You can install
the Image Processing Toolbox Hyperspectral Imaging Library from Add-On Explorer. For more
information about installing add-ons, see “Get and Manage Add-Ons”. The Image Processing Toolbox
Hyperspectral Imaging Library requires desktop MATLAB®, as MATLAB® Online™ and MATLAB®
Mobile™ do not support the library.

Load Hyperspectral Data into Workspace

For this example, load an aerial hyperspectral data set of an area called Jasper Ridge, captured via
the airborne visible/infrared imaging spectrometer (AVIRIS). The data set contains areas of water,
land, road, and vegetation. Load the hyperspectral data set into a hypercube object in the MATLAB
workspace.

hcube = hypercube('jasperRidge2_R198.img');

This command creates a hypercube object in the workspace called hcube. The hcube object
contains a 100-by-100-by-198 cube of hyperspectral data.

View Hyperspectral Data in Hyperspectral Viewer

Open the Hyperspectral Viewer app. First, click the Apps tab on the MATLAB toolstrip. Then, in the
Image Processing and Computer Vision section, click the Hyperspectral Viewer button.

With the app open, load the hyperspectral data into the app. On the app toolstrip, click Import and
select Hypercube Object. In the Import from Workspace dialog box, select the hypercube object
you loaded into the workspace, hcube. Alternatively, you can also specify a data set when you open
the app using the command:

hyperspectralViewer(hcube)

The app displays several views of the Jasper Ridge hyperspectral data. The Bands pane displays the
bands of the hyperspectral data as a stack of grayscale images. A second pane includes color
composite representations of the hyperspectral data, displaying the False Color tab by default. The
Histogram pane displays a histogram of the band currently displayed in the Bands pane. The
Spectral Plot pane displays a plot of the spectral dimension of the data by wavelength or by band.
(You can rearrange these panes by clicking and dragging them inside the app. To return to the
standard pane arrangement, click Default Layout on the app toolstrip.)

20-22
Explore Hyperspectral Data in the Hyperspectral Viewer

Explore the Spectral Bands

Explore the spectral bands of the Jasper Ridge data set as a stack of grayscale images in the Bands
pane. Use the slider at the bottom of the pane to navigate through the images. Because each band
isolates a specific range of wavelengths, aspects of the scene might be clearer in some bands than
others.

20-23
20 Hyperspectral Image Processing

To get a closer look at a band, click Zoom In or Zoom Out in the axes toolbar that appears when you
point the cursor over the image.

To improve the contrast of a band image, click Adjust Contrast on the app toolstrip. When you do,
the app overlays a contrast adjustment window on the histogram of the image, displayed in the
Histogram pane. To adjust the contrast, move the window over the histogram or resize the window
by clicking and dragging the handles. The app adjusts the contrast using a technique called contrast
stretching. In this process, pixel values below a specified value are displayed as black, pixel values
above a specified value are displayed as white, and pixel values in between these two values are
displayed as shades of gray. The result is a linear mapping of a subset of pixel values to the entire
range of grays, from black to white, producing an image of higher contrast. To return to the default
view, click Snap Data Range. To remove the contrast adjustment window from the histogram, click
Adjust Contrast.

Explore Color Representations of Hyperspectral Data

Explore the Jasper Ridge hyperspectral data as a color composite image. To create these color
images, the Hyperspectral Viewer automatically chooses three of the bands in the hyperspectral
dataset to use for the red, green, and blue channels of a color image. The choice of which bands the
app uses depends on the type of color representation. The app supports three types of color
composite renditions: False Color, RGB, and Color Infrared (CIR). It can be useful to view all of the
color composite images because each one uses different bands and can highlight different spectral
details, thus increasing the interpretability of the data.

By default, the app displays a false-color representation of the data. False-color composites visualize
wavelengths that the human eye cannot see. The tab of the pane identifies the type of the color
image, False Color, and the bands that the app used to form it, (145,99,19), in red-green-blue order.
The Spectral Plot pane in the app indicates which bands are used. To change these band selections,
click and drag the handle of the band indicator in the Spectral Plot. If you choose a different band,
the app updates the text in the tab with the new bands and adds the word "Custom", such as, False
Color-Custom.

20-24
Explore Hyperspectral Data in the Hyperspectral Viewer

To create the RGB color composite image, the app chooses bands in the visible part of the
electromagnetic spectrum. The resulting composite image resembles what the human eye observes
naturally. For example, vegetation appears green and water is blue. While RGB composites can
appear natural to our eyes, it can be difficult to distinguish subtle differences in features. Natural
color images can be low in contrast.

To create the CIR color composite image, the app chooses red, green, and near-infrared wavelengths.
Near infrared wavelengths are slightly longer than red, and they are outside of the range visible to
the human eye.

20-25
20 Hyperspectral Image Processing

Create Spectral Profile Plots of Pixels and Regions

After exploring the grayscale and color visualizations of the hyperspectral data, you can plot points or
small regions of the data along the spectral dimension to create spectral profiles. You can plot a
single pixel or a region up to 10-by-10 pixels square. Use the Neighborhood Size parameter to
specify the region size. When you select a region, the app uses the mean of all the pixels in the region
to plot the data. Plotting a region, rather than an individual pixel, can smooth out spectral profiles.

To create spectral plots, click Add Spectral Plot on the app toolstrip, move the cursor over a
visualization in the app, and click to select the points or regions. You can make your selections on any
of the visualizations provided by the app. Your choice of which visualization to use can depend on
which one provides the best view of the particular feature of the data you are interested in. When you
make a selection, the app puts a point icon at that position on all of the visualizations. After selecting
all the points, click Add Spectral Plot again to stop adding spectral plots. To delete a point, click on
the cross next to the point in the Spectral Plot pane. You can view the statistical information of each
spectral plot by selecting the information symbol next to each point in the Spectral Plot pane.

For example, the following figure shows four points selected in each visualization, each point
representing a particular type of data: water, vegetation, road, and land.

20-26
Explore Hyperspectral Data in the Hyperspectral Viewer

As you select each point, the app plots the data on the Spectral Plot, using a different color to
identify each plot. By default, the Spectral Plot also includes a legend identifying the plot for each
point. To toggle off inclusion of the legend, click Show Legend.

Visualize Spectral Indices of Hyperspectral Data

Since R2023a

You can visualize spectral indices of the hyperspectral data by selecting the desired spectral index
from the Spectral Indices section of the app toolstrip. Only the spectral indices that are applicable
to the imported hyperspectral data are active.

20-27
20 Hyperspectral Image Processing

For example, to visualize the simple ratio (SR) index, select SR from the spectral indices. The app
opens a separate pane to visualize the selected spectral indices.

20-28
Explore Hyperspectral Data in the Hyperspectral Viewer

You can create a mask from the spectral index image by using the slider below the image to specify
lower and upper thresholds. To go back to the spectral index image without thresholds, first slide the
lower threshold to minimum and then the upper threshold to the maximum.

You can also select the Custom index, and define a custom spectral index for the imported
hyperspectral data. You must define a custom index compatible with the customSpectralIndex
function. Specify the custom index formula as a function handle and the wavelengths for the custom
index computation as a numeric vector. The wavelengths must be unique, must be specified in
nanometers, and must lie within the range of wavelengths within the hyperspectral data cube.

20-29
20 Hyperspectral Image Processing

Export to Workspace

Since R2023b

From the app toolstrip, select Export to Workspace. You can choose to export one of these options.

• FalseColor, RGB, and CIR Color Bands


• Spectral Indices
• Spectral Signatures

20-30
Explore Hyperspectral Data in the Hyperspectral Viewer

If you select the FalseColor, RGB, CIR Color Bands option, in the Export to Workspace dialog box,
select and name the color representations that you want to export to the workspace. The app exports
the selected color representations to the MATLAB workspace as numeric arrays.

If you select the Spectral Indices option, in the Export to Workspace dialog box, select and name the
spectral indices and thresholded spectral index masks that you want to export to the workspace. The
app exports the selected spectral indices and thresholded spectral index masks to the MATLAB
workspace as numeric arrays.

If you select the Spectral Signatures option, in the Export to Workspace dialog box, select and
name the spectral signatures, plotted in the spectral plots, that you want to export to the workspace.
The app exports the selected spectral signatures to the MATLAB workspace as numeric vectors.

20-31
20 Hyperspectral Image Processing

20-32
Hyperspectral Image Analysis Using Maximum Abundance Classification

Hyperspectral Image Analysis Using Maximum Abundance


Classification

This example shows how to identify different regions in a hyperspectral image by performing
maximum abundance classification (MAC). An abundance map characterizes the distribution of an
endmember across a hyperspectral image. Each pixel in the image is either a pure pixel or a mixed
pixel. The set of abundance values obtained for each pixel represents the percentage of each
endmembers present in that pixel. In this example, you will classify the pixels in a hyperspectral
image by finding the maximum abundance value for each pixel and assigning it to the associated
endmember class.

This example requires the Image Processing Toolbox™ Hyperspectral Imaging Library. You can install
the Image Processing Toolbox Hyperspectral Imaging Library from Add-On Explorer. For more
information about installing add-ons, see “Get and Manage Add-Ons”. The Image Processing Toolbox
Hyperspectral Imaging Library requires desktop MATLAB®, as MATLAB® Online™ and MATLAB®
Mobile™ do not support the library.

This example uses a data sample from the Pavia University dataset as test data. The test data
contains nine endmembers that represent these ground truth classes: Asphalt, Meadows, Gravel,
Trees, Painted metal sheets, Bare soil, Bitumen, Self blocking bricks, and Shadows.

Load and Visualize Data

Load the .mat file containing the test data into the workspace. The .mat file contains an array
paviaU, representing the hyperspectral data cube and a matrix signatures, representing the nine
endmember signatures taken from the hyperspectral data. The data cube has 103 spectral bands with
wavelengths ranging from 430 nm to 860 nm. The geometric resolution is 1.3 meters and the spatial
resolution of each band image is 610-by-340.

load('paviaU.mat');
img = paviaU;
sig = signatures;

Compute the central wavelength for each spectral band by evenly spacing the wavelength range
across the number of spectral bands.

wavelengthRange = [430 860];


numBands = 103;
wavelength = linspace(wavelengthRange(1),wavelengthRange(2),numBands);

Create a hypercube object using the hyperspectral data cube and the central wavelengths. Then
estimate an RGB image from the hyperspectral data. Set the ContrastStretching parameter value
to true in order to improve the contrast of the RGB output. Visualize the RGB image.

hcube = hypercube(img,wavelength);
rgbImg = colorize(hcube,'Method','RGB','ContrastStretching',true);
figure
imshow(rgbImg)

20-33
20 Hyperspectral Image Processing

The test data contains the endmember signatures of nine ground truth classes. Each column of sig
contain the endmember signature of a ground truth class. Create a table that lists the class name for
each endmember and the corresponding column of sig.

num = 1:size(sig,2);
endmemberCol = num2str(num');
classNames = {'Asphalt';'Meadows';'Gravel';'Trees';'Painted metal sheets';'Bare soil';...

20-34
Hyperspectral Image Analysis Using Maximum Abundance Classification

'Bitumen';'Self blocking bricks';'Shadows'};


table(endmemberCol,classNames,'VariableName',{'Column of sig';'Endmember Class Name'})

ans=9×2 table
Column of sig Endmember Class Name
_____________ ________________________

1 {'Asphalt' }
2 {'Meadows' }
3 {'Gravel' }
4 {'Trees' }
5 {'Painted metal sheets'}
6 {'Bare soil' }
7 {'Bitumen' }
8 {'Self blocking bricks'}
9 {'Shadows' }

Plot the endmember signatures.

figure
plot(sig)
xlabel('Band Number')
ylabel('Data Values')
ylim([400 2700])
title('Endmember Signatures')
legend(classNames,'Location','NorthWest')

20-35
20 Hyperspectral Image Processing

Estimate Abundance Maps

Create abundance maps the endmembers by using the estimateAbundanceLS function and select
the method as full constrained least squares (FCLS). The function outputs the abundance maps as a
3-D array with the spatial dimensions as the input data. Each channel is the abundance map of the
endmember from the corresponding column of signatures. In this example, the spatial dimension of
the input data is 610-by-340 and the number of endmembers is 9. So, the size of the output
abundance map is 610-by-340-by-9.

abundanceMap = estimateAbundanceLS(hcube,sig,'Method','fcls');

Display the abundance maps.

fig = figure('Position',[0 0 1100 900]);


n = ceil(sqrt(size(abundanceMap,3)));
for cnt = 1:size(abundanceMap,3)
subplot(n,n,cnt)
imagesc(abundanceMap(:,:,cnt))
title(['Abundance of ' classNames{cnt}])
hold on
end
hold off

20-36
Hyperspectral Image Analysis Using Maximum Abundance Classification

Perform Maximum Abundance Classification

Find the channel number of the largest abundance value for each pixel. The channel number returned
for each pixel corresponds to the column in sig that contains the endmember signature associated
with the maximum abundance value of that pixel. Display a color coded image of the pixels classified
by maximum abundance value.

[~,matchIdx] = max(abundanceMap,[],3);
figure
imagesc(matchIdx)
colormap(jet(numel(classNames)))
colorbar('TickLabels',classNames)

20-37
20 Hyperspectral Image Processing

Segment the classified regions and overlay each of them on the RGB image estimated from the
hyperspectral data cube.

segmentImg = zeros(size(matchIdx));
overlayImg = zeros(size(abundanceMap,1),size(abundanceMap,2),3,size(abundanceMap,3));
for i = 1:size(abundanceMap,3)
segmentImg(matchIdx==i) = 1;
overlayImg(:,:,:,i) = imoverlay(rgbImg,segmentImg);
segmentImg = zeros(size(matchIdx));
end

Display the classified and the overlaid hyperspectral image regions along with their class names.
From the images, you can see that the asphalt, trees, bare soil, and brick regions have been
accurately classified.

figure('Position',[0 0 1100 900]);


n = ceil(sqrt(size(abundanceMap,3)));
for cnt = 1:size(abundanceMap,3)
subplot(n,n,cnt);
imagesc(uint8(overlayImg(:,:,:,cnt)));
title(['Regions Classified as ' classNames{cnt}])
hold on
end
hold off

20-38
Hyperspectral Image Analysis Using Maximum Abundance Classification

See Also
estimateAbundanceLS | hypercube | colorize

More About
• “Getting Started with Hyperspectral Image Processing” on page 20-2

20-39
20 Hyperspectral Image Processing

Classify Hyperspectral Image Using Library Signatures and


SAM

This example shows how to classify pixels in a hyperspectral image by using the spectral angle
mapper (SAM) classification algorithm. This algorithm classifies each pixel in the test data by
computing the spectral match score between the spectrum of a pixel and the pure spectral signatures
read from the ECOSTRESS spectral library. This example uses a data sample from the Jasper Ridge
dataset as the test data. The test data contains four endmembers latent, consisting of roads, soil,
water, and trees. In this example, you will:

1 Generate a score map for different regions present in the test data by computing the SAM
spectral match score between the spectrum of each test pixel and a pure spectrum. The pure
spectra are from the ECOSTRESS spectral library.
2 Classify the regions by using minimum score criteria, and assign a class label for each pixel in
the test data.

This example requires the Image Processing Toolbox™ Hyperspectral Imaging Library. You can install
the Image Processing Toolbox Hyperspectral Imaging Library from Add-On Explorer. For more
information about installing add-ons, see “Get and Manage Add-Ons”. The Image Processing Toolbox
Hyperspectral Imaging Library requires desktop MATLAB®, as MATLAB® Online™ and MATLAB®
Mobile™ do not support the library.

Read Test Data

Read test data from the Jasper Ridge dataset by using the hypercube function. The function returns
a hypercube object, which stores the hyperspectral data cube and the corresponding wavelength
and metadata information read from the test data. The test data has 198 spectral bands and their
wavelengths range from 399.4 nm to 2457 nm. The spectral resolution is up to 9.9 nm and the spatial
resolution of each band image is 100-by-100.

hcube = hypercube('jasperRidge2_R198.img')

hcube =
hypercube with properties:

DataCube: [100×100×198 int16]


Wavelength: [198×1 double]
Metadata: [1×1 struct]

Estimate an RGB image from the data cube. Apply contrast stretching to enhance the contrast of the
output RGB image.

rgbImg = colorize(hcube,'Method','rgb','ContrastStretching',true);

Display the RGB image of the test data.

figure
imagesc(rgbImg);
axis image off
title('RGB Image of Data Cube')

20-40
Classify Hyperspectral Image Using Library Signatures and SAM

Read Signatures from ECOSTRESS Spectral Library

The ECOSTRESS spectral library consists of pure spectral signatures for individual surface materials.
If the spectrum of a pixel matches a signature from the ECOSTRESS library, the pixel consists
entirely of that single surface material. The library is a compilation of over 3400 spectral signatures
for both natural and manmade materials. Since you know the endmembers latent in the test data,
choose the ECOSTRESS spectral library files related to those four endmembers.

Read spectral files related to water, vegetation, soil, and concrete from the ECOSTRESS spectral
library. Use the spectral signatures of these types:

• Manmade to classify roads and highway structures


• Soil to classify sand, silt, and clay regions
• Vegetation to classify tree regions
• Water to classify water regions
fileroot = matlabshared.supportpkg.getSupportPackageRoot();
addpath(fullfile(fileroot,'toolbox','images','supportpackages','hyperspectral',...
'hyperdata','ECOSTRESSSpectraFiles'));

filenames = ["water.seawater.none.liquid.tir.seafoam.jhu.becknic.spectrum.txt",...
"vegetation.tree.eucalyptus.maculata.vswir.jpl087.jpl.asd.spectrum.txt",...
"soil.utisol.hapludult.none.all.87p707.jhu.becknic.spectrum.txt",...
"soil.mollisol.cryoboroll.none.all.85p4663.jhu.becknic.spectrum.txt",...
"manmade.concrete.pavingconcrete.solid.all.0092uuu_cnc.jhu.becknic.spectrum.txt"];
lib = readEcostressSig(filenames)

20-41
20 Hyperspectral Image Processing

lib=1×5 struct array with fields:


Name
Type
Class
SubClass
ParticleSize
Genus
Species
SampleNo
Owner
WavelengthRange
Origin
CollectionDate
Description
Measurement
FirstColumn
SecondColumn
WavelengthUnit
DataUnit
FirstXValue
LastXValue
NumberOfXValues
AdditionalInformation
Wavelength
Reflectance

Extract the class names from the library structure.

classNames = [lib.Class];

Plot the pure spectral signatures read from the ECOSTRESS spectral library.

figure
hold on
for idx = 1:numel(lib)
plot(lib(idx).Wavelength,lib(idx).Reflectance,'LineWidth',2)
end
axis tight
box on
title('Pure Spectral Signatures from ECOSTRESS Library')
xlabel('Wavelength (\mum)')
ylabel('Reflectance (%)')
legend(classNames,'Location','northeast')
title(legend,'Class Names')
hold off

20-42
Classify Hyperspectral Image Using Library Signatures and SAM

Compute Score Map for Pixels in Test Data

Find the spectral match score between each pixel spectrum and the library signatures by using the
spectralMatch function. By default, the spectralMatch function computes the degree of
similarity between two spectra by using the SAM classification algorithm. The function returns an
array with the same spatial dimensions as the hyperspectral data cube and channels equal to the
number of library signatures specified. Each channel contains the score map for a single library
signature. In this example, there are five ECOSTRESS spectral library files specified for comparison,
and each band of the hyperspectral data cube has spatial dimensions of 100-by-100 pixels. The size of
the output array of score maps thus is 100-by-100-by-5.

scoreMap = spectralMatch(lib,hcube);

Display the score maps.

figure
montage(scoreMap,'Size',[1 numel(lib)],'BorderSize',10)
title('Score Map Obtained for Each Pure Spectrum','FontSize',14)
colormap(jet);
colorbar

20-43
20 Hyperspectral Image Processing

Classify Pixels Using Minimum Score Criteria

Lower SAM values indicate higher spectral similarity. Use the minimum score criteria to classify the
test pixels by finding the best match for each pixel among the library signatures. The result is a pixel-
wise classification map in which the value of each pixel is the index of library signature file in lib for
which that pixel exhibits the lowest SAM value. For example, if the value of a pixel in the
classification map is 1, the pixel exhibits high similarity to the first library signature in lib.

[~,classMap] = min(scoreMap,[],3);

Create a class table that maps the classification map values to the ECOSTRESS library signatures
used for spectral matching.

classTable = table((min(classMap(:)):max(classMap(:)))',classNames',...
'VariableNames',{'Classification map value','Matching library signature'})

classTable=5×2 table
Classification map value Matching library signature
________________________ __________________________

1 "Sea Water"
2 "Tree"
3 "Utisol"
4 "Mollisol"
5 "Concrete"

Display the RGB image of the hyperspectral data and the classification results. Visual inspection
shows that spectral matching classifies each pixel effectively.

fig = figure('Position',[0 0 700 300]);


axes1 = axes('Parent',fig,'Position',[0.04 0 0.4 0.9]);
imagesc(rgbImg,'Parent',axes1);
axis off
title('RGB Image of Data Cube')
axes2 = axes('Parent',fig,'Position',[0.47 0 0.45 0.9]);
imagesc(classMap,'Parent',axes2)
axis off
colormap(jet(numel(lib)))
title('Pixel-wise Classification Map')
ticks = linspace(1.4,4.8,numel(lib));
colorbar('Ticks',ticks,'TickLabels',classNames)

20-44
Classify Hyperspectral Image Using Library Signatures and SAM

References

[1] Kruse, F.A., A.B. Lefkoff, J.W. Boardman, K.B. Heidebrecht, A.T. Shapiro, P.J. Barloon, and A.F.H.
Goetz. “The Spectral Image Processing System (SIPS)—Interactive Visualization and Analysis of
Imaging Spectrometer Data.” Remote Sensing of Environment 44, no. 2–3 (May 1993): 145–63.
https://doi.org/10.1016/0034-4257(93)90013-N.

See Also
hypercube | colorize | readEcostressSig | spectralMatch

More About
• “Getting Started with Hyperspectral Image Processing” on page 20-2

20-45
20 Hyperspectral Image Processing

Endmember Material Identification Using Spectral Library

This example shows how to identify the classes of endmember materials present in a hyperspectral
image. The endmembers are pure spectral signature that signifies the reflectance characteristics of
pixels belonging to a single surface material. The existing endmember extraction or identification
algorithms extracts or identifies the pure pixels in a hyperspectral image. However, these techniques
do not identify the material name or class to which the endmember spectrum belong to. In this
example, you will extract the endmember signatures and then, classify or identify the class of an
endmember material in the hyperspectral image by using spectral matching.

This example requires the Image Processing Toolbox™ Hyperspectral Imaging Library. You can install
the Image Processing Toolbox Hyperspectral Imaging Library from Add-On Explorer. For more
information about installing add-ons, see “Get and Manage Add-Ons”. The Image Processing Toolbox
Hyperspectral Imaging Library requires desktop MATLAB®, as MATLAB® Online™ and MATLAB®
Mobile™ do not support the library.

This example uses 1) the spectral signatures in the ECOSTRESS spectral library as the reference
spectra and 2) a data sample from the Jasper Ridge dataset as the test data, for endmember material
identification.

Read Reference Data from ECOSTRESS Spectral Library

Add the full file path containing the ECOSTRESS library files and specify the names of the files to be
read from the library.

fileroot = matlabshared.supportpkg.getSupportPackageRoot();
addpath(fullfile(fileroot,'toolbox','images','supportpackages','hyperspectral','hyperdata','ECOST
filenames = ["water.seawater.none.liquid.tir.seafoam.jhu.becknic.spectrum.txt",...
"water.tapwater.none.liquid.all.tapwater.jhu.becknic.spectrum.txt",...
"water.ice.none.solid.all.ice_dat_.jhu.becknic.spectrum.txt",...
"vegetation.tree.eucalyptus.maculata.vswir.jpl087.jpl.asd.spectrum.txt",...
"soil.utisol.hapludult.none.all.87p707.jhu.becknic.spectrum.txt",...
"soil.mollisol.cryoboroll.none.all.85p4663.jhu.becknic.spectrum.txt",...
"manmade.road.tar.solid.all.0099uuutar.jhu.becknic.spectrum.txt",...
"manmade.concrete.pavingconcrete.solid.all.0092uuu_cnc.jhu.becknic.spectrum.txt"];
lib = readEcostressSig(filenames);

Display the lib data and inspect its values. The data is a struct variable specifying the class,
subclass, wavelength, and reflectance related information.

lib

lib=1×8 struct array with fields:


Name
Type
Class
SubClass
ParticleSize
Genus
Species
SampleNo
Owner
WavelengthRange
Origin
CollectionDate

20-46
Endmember Material Identification Using Spectral Library

Description
Measurement
FirstColumn
SecondColumn
WavelengthUnit
DataUnit
FirstXValue
LastXValue
NumberOfXValues
AdditionalInformation
Wavelength
Reflectance

Plot the spectral signatures read from the ECOSTRESS spectral library.

figure
hold on
for idx = 1:numel(lib)
plot(lib(idx).Wavelength,lib(idx).Reflectance,'LineWidth',2);
end
axis tight
box on
xlabel('Wavelength (\mum)');
ylabel('Reflectance (%)');
classNames = {lib.Class};
legend(classNames,'Location','northeast')
title('Reference Spectra from ECOSTRESS Library');
hold off

20-47
20 Hyperspectral Image Processing

Read Test Data

Read a test data from Jasper Ridge dataset by using the hypercube function. The function returns a
hypercube object that stores the data cube and the metadata information read from the test data.
The test data has 198 spectral bands and their wavelengths range from 399.4 nm to 2457 nm. The
spectral resolution is up to 9.9 nm and the spatial resolution of each band image is 100-by-100. The
test data contains four endmembers latent that includes road, soil, water, and trees.

hcube = hypercube('jasperRidge2_R198.hdr');

Extract Endmember Spectra

To compute the total number of spectrally distinct endmembers present in the test data, use the
countEndmembersHFC function. This function finds the number of endmembers by using the
Harsanyi–Farrand–Chang (HFC) method. Set the probability of false alarm (PFA) to a low value in
order to avoid false detections.

numEndmembers = countEndmembersHFC(hcube,'PFA',10^-27);

Extract the endmembers of the test data by using the N-FINDR method.

endMembers = nfindr(hcube,numEndmembers);

Read the wavelength values from the hypercube object hcube. Plot the extracted endmember
signatures. The test data comprises of 4 endmember materials and the class names of these materials
can be identified through spectral matching.

20-48
Endmember Material Identification Using Spectral Library

figure
plot(hcube.Wavelength,endMembers,'LineWidth',2)
axis tight
xlabel('Wavelength (nm)')
ylabel('Data Values')
title('Endmembers Extracted using N-FINDR')
num = 1:numEndmembers;
legendName = strcat('Endmember',{' '},num2str(num'));
legend(legendName)

Identify Endmember Material

To identify the name of an endmember material, use the spectralMatch function. The function
computes the spectral similarity between the library files and an endmember spectrum to be
classified. Select spectral information divergence (SID) method for computing the matching score.
Typically, a low value of SID score means better matching between the test and the reference spectra.
Then, the test spectrum is classified to belong to the class of the best matching reference spectrum.

For example, to identify the class of the third and fourth endmember material, find the spectral
similarity between the library signatures and the respective endmember spectrum. The index of the
minimum SID score value specifies the class name in the spectral library. The third endmember
spectrum is identified as Sea Water and the fourth endmember spectrum is identified as Tree.

wavelength = hcube.Wavelength;
detection = cell(1,1);
cnt = 1;
queryEndmember = [3 4];

20-49
20 Hyperspectral Image Processing

for num = 1:numel(queryEndmember)


spectra = endMembers(:,queryEndmember(num));
scoreValues = spectralMatch(lib,spectra,wavelength,'Method','sid');
[~, matchIdx] = min(scoreValues);
detection{cnt} = lib(matchIdx).Class;
disp(strcat('Endmember spectrum ',{' '},num2str(queryEndmember(num)),' is identified as ',{'
cnt=cnt+1;
end

Endmember spectrum 3 is identified as Sea Water


Endmember spectrum 4 is identified as Tree

Segment Endmember Regions in Test Data

To visually inspect the identification results, localise and segment the image regions specific to the
endmember materials in the test data. Use the sid function to compute pixel-wise spectral similarity
between the pixel spectrum and the extracted endmember spectrum. Then, perform thresholding to
segment the desired endmember regions in the test data and generate the segmented image. Select
the value for threshold as 15 to select the best matching pixels.

For visualization, generate the RGB version of the test data by using the colorize function and
then, overlay the segmented image onto the test image.

threshold = 15;
rgbImg = colorize(hcube,'method','rgb','ContrastStretching',true);
overlayImg = rgbImg;
labelColor = {'Blue','Green'};
segmentedImg = cell(size(hcube.DataCube,1),size(hcube.DataCube,2),numel(queryEndmember));
for num = 1:numel(queryEndmember)
scoreMap = sid(hcube,endMembers(:,queryEndmember(num)));
segmentedImg{num} = scoreMap <= threshold;
overlayImg = imoverlay(overlayImg,segmentedImg{num},labelColor{num});
end

Display Results

Visually inspect the identification results by displaying the segmented images and the overlayed
image that highlights the Sea Water and Tree endmember regions in the test data.

figure('Position',[0 0 900 400])


plotdim = [0.02 0.2 0.3 0.7;0.35 0.2 0.3 0.7];
for num = 1:numel(queryEndmember)
subplot('Position',plotdim(num,:))
imagesc(segmentedImg{num})
title(strcat('Segmented Endmember region :',{' '},detection{num}));
colormap([0 0 0;1 1 1])
axis off
end

20-50
Endmember Material Identification Using Spectral Library

figure('Position',[0 0 900 400])


subplot('Position',[0 0.2 0.3 0.7])
imagesc(rgbImg)
title('RGB Transformation of Test Data');
axis off
subplot('Position',[0.35 0.2 0.3 0.7])
imagesc(overlayImg)
title('Overlay Segmented Regions')
hold on
dim = [0.66 0.6 0.3 0.3];
annotation('textbox',dim,'String','Sea Water','Color',[1 1 1],'BackgroundColor',[0 0 1],'FitBoxTo
dim = [0.66 0.5 0.3 0.3];
annotation('textbox',dim,'String','Tree','BackgroundColor',[0 1 0],'FitBoxToText','on');
hold off
axis off

20-51
20 Hyperspectral Image Processing

References

[1] Kruse, F.A., A.B. Lefkoff, J.W. Boardman, K.B. Heidebrecht, A.T. Shapiro, P.J. Barloon, and A.F.H.
Goetz. “The Spectral Image Processing System (SIPS)—Interactive Visualization and Analysis of
Imaging Spectrometer Data.” Remote Sensing of Environment 44, no. 2–3 (May 1993): 145–63.
https://doi.org/10.1016/0034-4257(93)90013-N.

See Also
hypercube | colorize | spectralMatch | readEcostressSig | countEndmembersHFC

More About
• “Getting Started with Hyperspectral Image Processing” on page 20-2

20-52
Target Detection Using Spectral Signature Matching

Target Detection Using Spectral Signature Matching

This example shows how to detect a known target in the hyperspectral image by using spectral
matching method. The pure spectral signature of the known target material is used to detect and
locate the target in a hyperspectral image. In this example, you will use the spectral angle mapper
(SAM) spectral matching method to detect man-made roofing materials (known target) in a
hyperspectral image. The pure spectral signature of the roofing material is read from the
ECOSTRESS spectral library and is used as the reference spectrum for spectral matching. The
spectral signatures of all the pixels in the data cube are compared with the reference spectrum and
the best matching pixel spectrum is classified as belonging to the target material.

This example requires the Image Processing Toolbox™ Hyperspectral Imaging Library. You can install
the Image Processing Toolbox Hyperspectral Imaging Library from Add-On Explorer. For more
information about installing add-ons, see “Get and Manage Add-Ons”. The Image Processing Toolbox
Hyperspectral Imaging Library requires desktop MATLAB®, as MATLAB® Online™ and MATLAB®
Mobile™ do not support the library.

This example uses the data sample taken from the Pavia University dataset as the test data. The
dataset contains endmember signatures for 9 groundtruth classes and each signature is a vector of
length 103. The ground truth classes include Asphalt, Meadows, Gravel, Trees, Painted metal sheets,
Bare soil, Bitumen, Self blocking bricks, and Shadows. Of these classes, the painted metal sheets
typically belongs to the roofing materials type and it is the desired target to be located.

Read Test Data

Read the test data from Pavia University dataset by using the hypercube function. The function
returns a hypercube object that stores the data cube and the metadata information read from the
test data. The test data has 103 spectral bands and their wavelengths range from 430 nm to 860 nm.
The geometric resolution is 1.3 meters and the spatial resolution of each band image is 610-by-340.

hcube = hypercube("paviaU.hdr");

Estimate an RGB color image from the data cube by using the colorize function. Set the
ContrastStretching parameter value to true in order to improve the contrast of RGB color
image. Display the RGB image.

rgbImg = colorize(hcube,Method="rgb",ContrastStretching=true);
imshow(rgbImg)
title("RGB Image")

20-53
20 Hyperspectral Image Processing

Read Reference Spectrum

Read the spectral information corresponding to a roofing material from the ECOSTRESS spectral
library by using the readEcostressSig function.
lib = readEcostressSig("manmade.roofingmaterial.metal.solid.all.0692uuucop.jhu.becknic.spectrum.t

Inspect the properties of the reference spectrum read from the ECOSTRESS library. The output
structure lib stores the metadata and the data values read from the ECOSTRESS library.

20-54
Target Detection Using Spectral Signature Matching

lib

lib = struct with fields:


Name: "Copper Metal"
Type: "manmade"
Class: "Roofing Material"
SubClass: "Metal"
ParticleSize: "Solid"
Genus: [0x0 string]
Species: [0x0 string]
SampleNo: "0692UUUCOP"
Owner: "National Photographic Interpretation Center"
WavelengthRange: "All"
Origin: "Spectra obtained from the Noncoventional Exploitation FactorsData Sys
CollectionDate: "N/A"
Description: "Extremely weathered bare copper metal from government building roof f
Measurement: "Directional (10 Degree) Hemispherical Reflectance"
FirstColumn: "X"
SecondColumn: "Y"
WavelengthUnit: "micrometer"
DataUnit: "Reflectance (percent)"
FirstXValue: "0.3000"
LastXValue: "12.5000"
NumberOfXValues: "536"
AdditionalInformation: "none"
Wavelength: [536x1 double]
Reflectance: [536x1 double]

Read the wavelength and the reflectance values stored in lib. The wavelength and the reflectance
pair comprises the reference spectrum or the reference spectral signature.

wavelength = lib.Wavelength;
reflectance = lib.Reflectance;

Plot the reference spectrum read from the ECOSTRESS library.

plot(wavelength,reflectance,LineWidth=2)
axis tight
xlabel("Wavelength (\mum)")
ylabel("Reflectance (%)")
title("Reference Spectrum")

20-55
20 Hyperspectral Image Processing

Perform Spectral Matching

Find the spectral similarity between the reference spectrum and the data cube by using the
spectralMatch function. By default, the function uses the spectral angle mapper (SAM) method for
finding the spectral match. The output is a score map that signifies the matching between each pixel
spectrum and the reference spectrum. Thus, the score map is a matrix of spatial dimension same as
that of the test data. In this case, the size of the score map is 610-by-340. SAM is insensitive to gain

20-56
Target Detection Using Spectral Signature Matching

factors and hence, can be used to match pixel spectrum that inherently have an unknown gain factor
due to topographic illumination effects.

scoreMap = spectralMatch(lib,hcube);

Display the score map.

figure(Position=[0 0 500 600])


imagesc(scoreMap)
colormap parula
colorbar
title("Score Map")

20-57
20 Hyperspectral Image Processing

Classify and Detect Target

Typical values for the SAM score lie in the range [0, 3.142] and the measurement unit is radians. A
lower value of the SAM score represents better matching between the pixel spectrum and the
reference spectrum. Use a thresholding approach to spatially localize the target region in the input
data. To determine the threshold, inspect the histogram of the score map. The minimum SAM score
value with prominent number of occurrences can be used to select the threshold for detecting the
target region.
figure
imhist(scoreMap);
title("Histogram Plot of Score Map")
xlabel("Score Map Values")
ylabel("Number of Occurrences")

From the histogram plot, you can infer the minimum score value with prominent number of
occurrence as approximately 0.22. Accordingly, you can set a value around the local maxima as the
threshold. For this example, you can select the threshold for detecting the target as 0.25. The pixel
values that are less than the maximum threshold are classified as the target region.
maxthreshold = 0.25;

Perform thresholding to detect the target region with maximum spectral similarity. Overlay the
thresholded image on the RGB image of the hyperspectral data.
thresholdedImg = scoreMap <= maxthreshold;
overlaidImg = imoverlay(rgbImg,thresholdedImg,"green");

20-58
Target Detection Using Spectral Signature Matching

Display the results.

fig = figure(Position=[0 0 900 500]);


axes1 = axes(Parent=fig,Position=[0.04 0.11 0.4 0.82]);
imagesc(thresholdedImg,Parent=axes1);
colormap([0 0 0;1 1 1])
title("Detected Target Region")
axis off
axes2 = axes(Parent=fig,Position=[0.47 0.11 0.4 0.82]);
imagesc(overlaidImg,Parent=axes2)
axis off
title("Overlaid Detection Results")

Validate Detection Results

You can validate the obtained target detection results by using the ground truth data taken from
Pavia University dataset.

Load a MAT file containing the ground truth data. To validate the result quantitatively, calculate the
mean squared error between the ground truth and the output. The error value is less if the obtained
results are close to the ground truth.

load("paviauRoofingGT.mat");
err = immse(im2double(paviauRoofingGT), im2double(thresholdedImg));
disp("The mean squared error is "+err)

The mean squared error is 0.004026

fig = figure(Position=[0 0 900 500]);


axes1 = axes(Parent=fig,Position=[0.04 0.11 0.4 0.82]);

20-59
20 Hyperspectral Image Processing

imagesc(thresholdedImg,Parent=axes1);
colormap([0 0 0;1 1 1]);
title("Result Obtained")
axis off
axes2 = axes(Parent=fig,Position=[0.47 0.11 0.4 0.82]);
imagesc(paviauRoofingGT,Parent=axes2)
colormap([0 0 0;1 1 1]);
axis off
title("Ground Truth")

References

[1] Kruse, F.A., A.B. Lefkoff, J.W. Boardman, K.B. Heidebrecht, A.T. Shapiro, P.J. Barloon, and A.F.H.
Goetz. “The Spectral Image Processing System (SIPS)—Interactive Visualization and Analysis of
Imaging Spectrometer Data.” Remote Sensing of Environment 44, no. 2–3 (May 1993): 145–63.
https://doi.org/10.1016/0034-4257(93)90013-N.

[2] Chein-I Chang. “An Information-Theoretic Approach to Spectral Variability, Similarity, and
Discrimination for Hyperspectral Image Analysis.” IEEE Transactions on Information Theory 46, no. 5
(August 2000): 1927–32. https://doi.org/10.1109/18.857802.

See Also
hypercube | colorize | spectralMatch | readEcostressSig

More About
• “Getting Started with Hyperspectral Image Processing” on page 20-2

20-60
Identify Vegetation Regions Using Interactive NDVI Thresholding

Identify Vegetation Regions Using Interactive NDVI


Thresholding

This example shows how to identify the types of vegetations regions in a hyperspectral image through
interactive thresholding of a normalized difference vegetation index (NDVI) map. The NDVI map of a
hyperspectral dataset indicates the density of vegetation in various regions of the hyperspectral data.
The NDVI value is computed using the near-infrared (NIR) and visible red (R) spectral band images
from the hyperspectral data cube.

NIR − R
NDVI =
NIR + R

The NDVI value of a pixel is a scalar from -1 to 1. The pixels in regions with healthy or dense
vegetation reflect more NIR light, resulting in high NDVI values. The pixels in regions with unhealthy
vegetation or barren land absorb more NIR light, resulting in low or negative NDVI values. Based on
its NDVI value, you can identify vegetation in a region as dense vegetation, moderate vegetation,
sparse vegetation, or no vegetation. These are the typical NDVI value range for each type of region:

• Dense vegetation - Greater than or equal to 0.6


• Moderate vegetation - In the range [0.4, 0.6)
• Sparse vegetation - In the range [0.2 0.4)
• No vegetation - Below 0.2

You can segment the desired vegetation regions by performing thresholding using the NDVI values.
In this example, you will interactively select and change the threshold values to identify different
vegetation regions in a hyperspectral data cube based on their NDVI values.

This example requires the Image Processing Toolbox™ Hyperspectral Imaging Library. You can install
the Image Processing Toolbox Hyperspectral Imaging Library from Add-On Explorer. For more
information about installing add-ons, see “Get and Manage Add-Ons”. The Image Processing Toolbox
Hyperspectral Imaging Library requires desktop MATLAB®, as MATLAB® Online™ and MATLAB®
Mobile™ do not support the library.

Read Hyperspectral Data

Read hyperspectral data from an ENVI format file into the workspace. This example uses a data
sample from the Pavia dataset, which contains both vegetation and barren regions.
hcube = hypercube('paviaU.dat','paviaU.hdr');

Compute NDVI

Compute the NDVI value for each pixel in the data cube by using the ndvi function. The function
outputs a 2-D image in which the value of each pixel is the NDVI value for the corresponding pixel in
the hyperspectral data cube.
ndviImg = ndvi(hcube);

Identify Vegetation Regions Using Thresholding

Identify different regions in the hyperspectral data using multilevel thresholding. Define a label
matrix to assign label values to pixels based on specified threshold values. You can set the thresholds
based on the computed NDVI values.

20-61
20 Hyperspectral Image Processing

• Label value 1 - Specify the threshold value as 0.6, and find the pixels with NDVI values greater
than or equal to the threshold. These are dense vegetation pixels.
• Label value 2 - Specify a lower threshold limit of 0.4 and an upper threshold limit of 0.6. Find the
pixels with NDVI values greater than or equal to 0.4 and less than 0.6. These are moderate
vegetation pixels.
• Label value 3 - Specify a lower threshold limit of 0.2 and an upper threshold limit of 0.4. Find the
pixels with NDVI values greater than or equal to 0.2 and less than 0.4. These are the sparse
vegetation pixels.
• Label value 4 - Specify the threshold value as 0.2, and find the pixels with NDVI values less than
the threshold. These are no vegetation pixels.

L = zeros(size(ndviImg));
L(ndviImg >= 0.6) = 1;
L(ndviImg >= 0.4 & ndviImg < 0.6) = 2;
L(ndviImg >= 0.2 & ndviImg < 0.4) = 3;
L(ndviImg < 0.2) = 4;

Estimate a contrast-stretched RGB image from the original data cube by using the colorize
function.

rgbImg = colorize(hcube,'Method','rgb','ContrastStretching',true);

Define a colormap to display each value in the label matrix in a different color. Overlay the label
matrix on the RGB image.

cmap = [0 1 0; 0 0 1; 1 1 0; 1 0 0];
overlayImg = labeloverlay(rgbImg,L,'Colormap',cmap);

Create Interactive Interface for NDVI Thresholding

To build an interactive interface, first create a figure window using the uifigure function. Then add
two panels to the figure window, for displaying the input image and the overlaid image side by side.

h = uifigure('Name','Interactive NDVI Thresholding','Position',[200,50,1000,700]);

viewPanel1 = uipanel(h,'Position',[2 220 400 450],'Title','Input Image');


ax1 = axes(viewPanel1);
image(rgbImg,'Parent',ax1)

viewPanel2 = uipanel(h,'Position',[400 220 400 450],'Title','Types of Vegetation Regions in Input


ax2 = axes(viewPanel2);
image(overlayImg,'Parent',ax2)

Annotate the figure window with the color for each label and its associated vegetation density. The
colormap value for dense vegetation is green, moderate vegetation is blue, sparse vegetation is
yellow, and no vegetation is red.

annotation(h,'rectangle',[0.82 0.82 0.03 0.03],'Color',[0 1 0],'FaceColor',[0 1 0]);


annotation(h,'rectangle',[0.82 0.77 0.03 0.03],'Color',[0 0 1],'FaceColor',[0 0 1]);
annotation(h,'rectangle',[0.82 0.72 0.03 0.03],'Color',[1 1 0],'FaceColor',[1 1 0]);
annotation(h,'rectangle',[0.82 0.67 0.03 0.03],'Color',[1 0 0],'FaceColor',[1 0 0]);
annotation(h,'textbox',[0.85 0.80 0.9 0.05],'EdgeColor','None','String','Dense Vegetation');
annotation(h,'textbox',[0.85 0.75 0.9 0.05],'EdgeColor','None','String','Moderate Vegetation');
annotation(h,'textbox',[0.85 0.70 0.9 0.05],'EdgeColor','None','String','Sparse Vegetation');
annotation(h,'textbox',[0.85 0.65 0.9 0.05],'EdgeColor','None','String','No Vegetation');

20-62
Identify Vegetation Regions Using Interactive NDVI Thresholding

Create sliders for interactively changing the thresholds. Use uislider function to add a slider for
adjusting the minimum threshold value and a slider for adjusting the maximum threshold value.

slidePanel1 = uipanel(h,'Position',[400,120,400,70],'Title','Minimum Threshold Value');


minsld = uislider(slidePanel1,'Position',[30,40,350,3],'Value',-1,'Limits',[-1 1],'MajorTicks',-1
slidePanel2 = uipanel(h,'Position',[400,30,400,70],'Title','Maximum Threshold Value');
maxsld = uislider(slidePanel2,'Position',[30,35,350,3],'Value',1,'Limits',[-1 1],'MajorTicks',-1:

Change Threshold Interactively

Use the function ndviThreshold to change the minimum and maximum threshold limits. When you
move the slider thumb and release the mouse button, the ValueChangedFcn callback updates the
slider value and sets the slider value as the new threshold. You must call the ndviThreshold
function separately for the minimum threshold slider and maximum threshold slider. Change the
threshold limits by adjusting the sliders. This enables you to inspect the types of vegetation regions
within your specified threshold limits.

minsld.ValueChangedFcn = @(es,ed) ndviThreshold(minsld,maxsld,ndviImg,rgbImg,ax2,cmap);


maxsld.ValueChangedFcn = @(es,ed) ndviThreshold(minsld,maxsld,ndviImg,rgbImg,ax2,cmap);

The ndviThreshold function generates a new label matrix using the updated threshold values and
dynamically updates the overlaid image in the figure window.

20-63
20 Hyperspectral Image Processing

Create Callback Function

Create callback function to interactively change the threshold limits and dynamically update the
results.
function ndviThreshold(minsld,maxsld,ndviImg,rgbImg,ax2,cmap)
L = zeros(size(ndviImg));
minth = round(minsld.Value,2);
maxth = round(maxsld.Value,2);

if minth > maxth


error('Minimum threshold value must be less than the maximum threshold value')
end

if minth >= 0.6


% Label 1 for Dense Vegetation
L(ndviImg >= minth & ndviImg <= maxth) = 1;
overlayImg = labeloverlay(rgbImg,L,'Colormap',cmap);
elseif minth >= 0.4 && minth < 0.6
% Label 1 for Dense Vegetation
% Label 2 for Moderate Vegetation
if maxth >= 0.6
L(ndviImg >= minth & ndviImg < 0.6) = 2;
L(ndviImg >= 0.6 & ndviImg <= maxth) = 1;
else
L(ndviImg >= minth & ndviImg < maxth) = 2;
end
overlayImg = labeloverlay(rgbImg,L,'Colormap',cmap);
elseif minth >= 0.2 && minth <0.4
% Label 1 for Dense Vegetation
% Label 2 for Moderate Vegetation
% Label 3 for Sparse vegetation
if maxth < 0.4
L(ndviImg >= minth & ndviImg <= maxth) = 3;
elseif maxth >=0.4 && maxth < 0.6
L(ndviImg >= minth & ndviImg < 0.4) = 3;
L(ndviImg >= 0.4 & ndviImg <= maxth) = 2;
elseif maxth >= 0.6
L(ndviImg >= minth & ndviImg < 0.4) = 3;
L(ndviImg >= 0.4 & ndviImg < 0.6) = 2;
L(ndviImg >= 0.6 & ndviImg <= maxth) = 1;
end
overlayImg = labeloverlay(rgbImg,L,'Colormap',cmap);
elseif minth < 0.2
% Label 1 for Dense Vegetation
% Label 2 for Moderate Vegetation
% Label 3 for Sparse vegetation
% Label 4 for No Vegetation
L(ndviImg >= minth & ndviImg < 0.2) = 4;

if maxth >= 0.6


L(ndviImg >= 0.6 & ndviImg <= maxth) = 1;
L(ndviImg >= 0.4 & ndviImg < 0.6) = 2;
L(ndviImg >= 0.2 & ndviImg < 0.4) = 3;
elseif maxth >=0.4 && maxth < 0.6
L(ndviImg >= 0.4 & ndviImg <= maxth) = 2;
L(ndviImg >= 0.2 & ndviImg < 0.4) = 3;
elseif maxth >=0.2 && maxth < 0.4

20-64
Identify Vegetation Regions Using Interactive NDVI Thresholding

L(ndviImg >= maxth & ndviImg < 0.4) = 3;


end
overlayImg = labeloverlay(rgbImg,L,'Colormap',cmap);
elseif minth == maxth
if maxth < 0.2
L(ndviImg == minth) = 4;
elseif maxth >=0.4 && maxth < 0.6
L(ndviImg == maxth) = 2;
elseif maxth >=0.2 && maxth < 0.4
L(ndviImg == maxth) = 3;
elseif maxth >= 0.6
L(ndviImg == maxth) = 1;
end
overlayImg = labeloverlay(rgbImg,L,'Colormap',cmap);
end
% Display the overlay image.
image(overlayImg,'parent',ax2);
% Store updated labelled image
minsld.UserData = L;
maxsld.UserData = L;
end

References
[1] J.W. Rouse, R.H. Hass, J.A. Schell, and D.W. Deering. “Monitoring Vegetation Systems in the Great
Plains with ERTS.” In Proceedings of the Third Earth Resources Technology Satellite- 1
Symposium, 1:309–17. Greenbelt, NASA SP-351, Washington, DC, 1973.

[2] Haboudane, D. “Hyperspectral Vegetation Indices and Novel Algorithms for Predicting Green LAI
of Crop Canopies: Modeling and Validation in the Context of Precision Agriculture.” Remote
Sensing of Environment 90, no. 3 (April 15, 2004): 337–52. https://doi.org/10.1016/
j.rse.2003.12.013.

See Also
hypercube | colorize | uifigure | uipanel | uislider | labeloverlay

More About
• “Getting Started with Hyperspectral Image Processing” on page 20-2
• “Measure Vegetation Cover in Hyperspectral Data Using NDVI Image”

20-65
20 Hyperspectral Image Processing

Classify Hyperspectral Images Using Deep Learning

This example shows how to classify hyperspectral images using a custom spectral convolution neural
network (CSCNN) for classification.

This example requires the Image Processing Toolbox™ Hyperspectral Imaging Library. You can install
the Image Processing Toolbox Hyperspectral Imaging Library from Add-On Explorer. For more
information about installing add-ons, see “Get and Manage Add-Ons”. The Image Processing Toolbox
Hyperspectral Imaging Library requires desktop MATLAB®, as MATLAB® Online™ and MATLAB®
Mobile™ do not support the library.

Hyperspectral imaging measures the spatial and spectral features of an object at different
wavelengths ranging from ultraviolet through long infrared, including the visible spectrum. Unlike
color imaging, which uses only three types of sensors sensitive to the red, green, and blue portions of
the visible spectrum, hyperspectral images can include dozens or hundreds of channels. Therefore,
hyperspectral images can enable the differentiation of objects that appear identical in an RGB image.

This example uses a CSCNN that learns to classify 16 types of vegetation and terrain based on the
unique spectral signatures of each material. The example shows how to train a CSCNN and also
provides a pretrained network that you can use to perform classification.

Load Hyperspectral Data Set

This example uses the Indian Pines data set, included with the Image Processing Toolbox™
Hyperspectral Imaging Library. The data set consists of a single hyperspectral image of size 145-
by-145 pixels with 220 color channels. The data set also contains a ground truth label image with 16
classes, such as Alfalfa, Corn, Grass-pasture, Grass-trees, and Stone-Steel-Towers.

Read the hyperspectral image using the hypercube function.

hcube = hypercube("indian_pines.dat");

Visualize a false-color version of the image using the colorize function.

rgbImg = colorize(hcube,method="rgb");
imshow(rgbImg)

20-66
Classify Hyperspectral Images Using Deep Learning

Load the ground truth labels and specify the number of classes.

gtLabel = load("indian_pines_gt.mat");
gtLabel = gtLabel.indian_pines_gt;
numClasses = 16;

Preprocess Training Data

Reduce the number of spectral bands to 30 using the hyperpca function. This function performs
principal component analysis (PCA) and selects the spectral bands with the most unique signatures.

dimReduction = 30;
imageData = hyperpca(hcube,dimReduction);

Normalize the image data.

sd = std(imageData,[],3);
imageData = imageData./sd;

Split the hyperspectral image into patches of size 25-by-25 pixels with 30 channels using the
createImagePatchesFromHypercube helper function. This function is attached to the example as
a supporting file. The function also returns a single label for each patch, which is the label of the
central pixel.

windowSize = 25;
inputSize = [windowSize windowSize dimReduction];
[allPatches,allLabels] = createImagePatchesFromHypercube(imageData,gtLabel,windowSize);

indianPineDataTransposed = permute(allPatches,[2 3 4 1]);


dsAllPatches = augmentedImageDatastore(inputSize,indianPineDataTransposed,allLabels);

Not all of the cubes in this data set have labels. However, training the network requires labeled data.
Select only the labeled cubes for training. Count how many labeled patches are available.

patchesLabeled = allPatches(allLabels>0,:,:,:);
patchLabels = allLabels(allLabels>0);
numCubes = size(patchesLabeled,1);

Convert the numeric labels to categorical.

patchLabels = categorical(patchLabels);

20-67
20 Hyperspectral Image Processing

Randomly divide the patches into training and test data sets.
[trainingIdx,valIdx,testIdx] = dividerand(numCubes,0.3,0,0.7);
dataInputTrain = patchesLabeled(trainingIdx,:,:,:);
dataLabelTrain = patchLabels(trainingIdx,1);
dataInputTest = patchesLabeled(testIdx,:,:,:);
dataLabelTest = patchLabels(testIdx,1);

Transpose the input data.


dataInputTransposeTrain = permute(dataInputTrain,[2 3 4 1]);
dataInputTransposeTest = permute(dataInputTest,[2 3 4 1]);

Create datastores that read batches of training and test data.


dsTrain = augmentedImageDatastore(inputSize,dataInputTransposeTrain,dataLabelTrain);
dsTest = augmentedImageDatastore(inputSize,dataInputTransposeTest,dataLabelTest);

Create CSCNN Classification Network

Define the CSCNN architecture.


layers = [
image3dInputLayer(inputSize,Name="Input",Normalization="None")
convolution3dLayer([3 3 7],8,Name="conv3d_1")
reluLayer(Name="Relu_1")
convolution3dLayer([3 3 5],16,Name="conv3d_2")
reluLayer(Name="Relu_2")
convolution3dLayer([3 3 3],32,Name="conv3d_3")
reluLayer(Name="Relu_3")
convolution3dLayer([3 3 1],8,Name="conv3d_4")
reluLayer(Name="Relu_4")
fullyConnectedLayer(256,Name="fc1")
reluLayer(Name="Relu_5")
dropoutLayer(0.4,Name="drop_1")
fullyConnectedLayer(128,Name="fc2")
dropoutLayer(0.4,Name="drop_2")
fullyConnectedLayer(numClasses,Name="fc3")
softmaxLayer(Name="softmax")
classificationLayer(Name="output")];
lgraph = layerGraph(layers);

Visualize the network using Deep Network Designer.


deepNetworkDesigner(lgraph)

Specify Training Options

Specify the required network parameters. For this example, train the network for 100 epochs with an
initial learning rate of 0.001, a batch size of 256, and Adam optimization.
numEpochs = 100;
miniBatchSize = 256;
initLearningRate = 0.001;
momentum = 0.9;
learningRateFactor = 0.01;

options = trainingOptions("adam", ...


InitialLearnRate=initLearningRate, ...

20-68
Classify Hyperspectral Images Using Deep Learning

LearnRateSchedule="piecewise", ...
LearnRateDropPeriod=30, ...
LearnRateDropFactor=learningRateFactor, ...
MaxEpochs=numEpochs, ...
MiniBatchSize=miniBatchSize, ...
GradientThresholdMethod="l2norm", ...
GradientThreshold=0.01, ...
VerboseFrequency=100, ...
ValidationData=dsTest, ...
ValidationFrequency=100);

Train the Network

By default, the example downloads a pretrained classifier for the Indian Pines data set. The
pretrained network enables you to classify the Indian Pines data set without waiting for training to
complete.

To train the network, set the doTraining variable in the following code to true. If you choose to
train the network, use of a CUDA capable NVIDIA™ GPU is highly recommended. Use of a GPU
requires Parallel Computing Toolbox™. For more information about supported GPU devices, see “GPU
Computing Requirements” (Parallel Computing Toolbox).

doTraining = false;
if doTraining
net = trainNetwork(dsTrain,lgraph,options);
modelDateTime = string(datetime("now",Format="yyyy-MM-dd-HH-mm-ss"));
save("trainedIndianPinesCSCNN-"+modelDateTime+".mat","net");
else
dataDir = pwd;
trainedNetwork_url = "https://ssd.mathworks.com/supportfiles/image/data/trainedIndianPinesCSC
downloadTrainedNetwork(trainedNetwork_url,pwd);
load(fullfile(dataDir,"trainedIndianPinesCSCNN.mat"));
end

Downloading pretrained network.


This can take several minutes to download...
Done.

Classify Hyperspectral Image Using Trained CSCNN

Calculate the accuracy of the classification for the test data set. Here, accuracy is the fraction of the
correct pixel classification over all the classes.

predictionTest = classify(net,dsTest);
accuracy = sum(predictionTest == dataLabelTest)/numel(dataLabelTest);
disp("Accuracy of the test data = "+num2str(accuracy))

Accuracy of the test data = 0.99359

Reconstruct the complete image by classifying all image pixels, including pixels in labeled training
patches, pixels in labeled test patches, and unlabeled pixels.

prediction = classify(net,dsAllPatches);
prediction = double(prediction);

The network is trained on labeled patches only. Therefore, the predicted classification of unlabeled
pixels is meaningless. Find the unlabeled patches and set the label to 0.

20-69
20 Hyperspectral Image Processing

patchesUnlabeled = find(allLabels==0);
prediction(patchesUnlabeled) = 0;

Reshape the classified pixels to match the dimensions of the ground truth image.

[m,n,d] = size(imageData);
indianPinesPrediction = reshape(prediction,[n m]);
indianPinesPrediction = indianPinesPrediction';

Display the ground truth and predicted classification.

cmap = parula(numClasses);

figure
tiledlayout(1,2,TileSpacing="Tight")
nexttile
imshow(gtLabel,cmap)
title("Ground Truth Classification")

nexttile
imshow(indianPinesPrediction,cmap)
colorbar
title("Predicted Classification")

To highlight misclassified pixels, display a composite image of the ground truth and predicted labels.
Gray pixels indicate identical labels and colored pixels indicate different labels.

20-70
Classify Hyperspectral Images Using Deep Learning

figure
imshowpair(gtLabel,indianPinesPrediction)

See Also
hypercube | colorize | hyperpca | augmentedImageDatastore | imageDatastore |
trainNetwork | trainingOptions | classify

Related Examples
• “Semantic Segmentation of Multispectral Images Using Deep Learning” on page 19-131

20-71
20 Hyperspectral Image Processing

Find Regions in Spatially Referenced Multispectral Image

This example shows how to identify water and vegetation regions in a Landsat 8 multispectral image
and spatially reference the image.

This example requires the Image Processing Toolbox™ Hyperspectral Imaging Library. You can install
the Image Processing Toolbox Hyperspectral Imaging Library from Add-On Explorer. For more
information about installing add-ons, see “Get and Manage Add-Ons”. The Image Processing Toolbox
Hyperspectral Imaging Library requires desktop MATLAB®, as MATLAB® Online™ and MATLAB®
Mobile™ do not support the library.

Spectral indices characterize the specific features of interest of a target using its biophysical and
chemical properties. These features of interest enable you to identify plant, water, and soil regions, as
well as various forms of built-up regions. This example uses modified normalized difference water
index (MNDWI) and green vegetation index (GVI) spectral indices to identify water and vegetation
regions respectively. For more information on spectral indices, see “Spectral Indices” on page 20-13.

Load and Visualize Multispectral Data

Landsat 8 is an Earth observation satellite that carries the Operational Land Imager (OLI) and
Thermal Infrared Sensor (TIRS) instruments.

Download the Landsat 8 data set. The test data set has 8 spectral bands with wavelengths that range
from 440 nm to 2200 nm. The test data is 7721-by-7651 pixels with a spatial resolution of 30 meters.

Download the data set and unzip the file by using the downloadLandsat8Dataset helper function.
This function is attached to the example as a supporting file.

zipfile = "LC08_L1TP_113082_20211206_20211206_01_RT.zip";
landsat8Data_url = "https://ssd.mathworks.com/supportfiles/image/data/" + zipfile;
downloadLandsat8Dataset(landsat8Data_url,pwd)

Downloading the Landsat 8 OLI dataset.


This can take several minutes to download...
Done.

Read the Landsat 8 multispectral data into the workspace as a hypercube object.

hCube = hypercube("LC08_L1TP_113082_20211206_20211206_01_RT_MTL.txt");

Estimate an RGB image from the data cube by using colorize function. Apply contrast stretching to
enhance the contrast of the output RGB image.

rgbImg = colorize(hCube,Method="rgb",ContrastStretching=true);

Adjust image intensity values using the imadjustn function for better visualization.

rgbImg = imadjustn(rgbImg);

Display the RGB image of the test data. Notice that without spatial referencing, this figure does not
provide any geographic information.

figure
imshow(rgbImg)
title("RGB Image of Data Cube")

20-72
Find Regions in Spatially Referenced Multispectral Image

Display Region of Interest on a Map

The Landsat 8 data set contains a GeoTIFF file. Obtain information about the GeoTIFF file by using
the geotiffinfo (Mapping Toolbox) function.
filename = "LC08_L1TP_113082_20211206_20211206_01_RT_B1.TIF";
info = geotiffinfo(filename);

Get the map raster reference object. The reference object contains information such as the x-y world
limits.
R = info.SpatialRef;

Create a polygon in geographic coordinates that represents the geographic extent of the region by
using a geopolyshape (Mapping Toolbox) object.
xlimits = R.XWorldLimits;
ylimits = R.YWorldLimits;

20-73
20 Hyperspectral Image Processing

dataRegion = mappolyshape(xlimits([1 1 2 2 1]),ylimits([1 2 2 1 1]));


dataRegion.ProjectedCRS = R.ProjectedCRS;

Plot the region of interest using satellite imagery.

figure
geoplot(dataRegion, ...
LineWidth=2, ...
EdgeColor="yellow", ...
FaceColor="red", ...
FaceAlpha=0.2)
hold on
geobasemap satellite

Import the shapefile worldcities.shp, which contains geographic information about major world
cities as a geospatial table, by using the readgeotable (Mapping Toolbox) function. A geospatial
table is a table or timetable object that contains a Shape variable and attribute variables. For
more information on geospatial tables, see “Create Geospatial Tables” (Mapping Toolbox). You can
also use the worldrivers.shp and worldlakes.shp shapefiles to display major world rivers and
lakes, respectively.

cities = readgeotable("worldcities.shp");

Query the test data to determine which major cities are within the geographic extent of the
rectangular data region. The data region contains a single city from the worldcities.shp
geospatial table.

20-74
Find Regions in Spatially Referenced Multispectral Image

[citiesX,citiesY] = projfwd(R.ProjectedCRS,cities.Shape.Latitude,cities.Shape.Longitude);
citiesMapShape = mappointshape(citiesX,citiesY);
citiesMapShape.ProjectedCRS = R.ProjectedCRS;
inRegion = isinterior(dataRegion,citiesMapShape);
citiesInRegion = cities(inRegion,:);

Plot and label the major city in the region of interest.

geoplot(citiesInRegion, ...
MarkerSize=14)
text(citiesInRegion.Shape.Latitude+0.07,citiesInRegion.Shape.Longitude+0.03,citiesInRegion.Name,
HorizontalAlignment="left", ...
FontWeight="bold", ...
FontSize=14, ...
Color=[1 1 1])
title("Geographic Extent of Landsat 8 Multispectal Image")

Find Water and Vegetation Regions in the Image

Compute the spectral index value for each pixel in the data cube by using spectralIndices
function. Use the MNDWI and GVI to detect water and green vegetation regions, respectively.

indices = spectralIndices(hCube,["MNDWI","GVI"]);

Water regions typically have MNDWI values greater than 0. Vegetation regions typically have GVI
values greater than 1. Specify the threshold values for performing thresholding of the MNDWI and
GVI images to segment the water and green vegetation regions.

20-75
20 Hyperspectral Image Processing

threshold = [0 1];

Generate binary images with a value of 1 for pixels with a score greater than the specified thresholds.
All other pixels have a value of 0. The regions in the MNDWI and GVI binary images with a value of 1
correspond to the water and green vegetation regions, respectively.

Overlay the binary images on the RGB image by using labeloverlay function.

overlayImg = rgbImg;
labelColor = [0 0 1; 0 1 0];
for num = 1:numel(indices)
indexMap = indices(num).IndexImage;
thIndexMap = indexMap > threshold(num);
overlayImg = labeloverlay(overlayImg,thIndexMap,Colormap=labelColor(num,:),Transparency=0.5);
end

Resize the overlaid RGB image by using the mapresize (Mapping Toolbox) function. For this
example, reduce the size of the overlaid RGB image to one fourth of the original size.

scale = 1/4;
[reducedOverlayImg,reducedR] = mapresize(overlayImg,R,scale);

Convert the GeoTIFF information to a map projection structure using the geotiff2mstruct
(Mapping Toolbox) function, to use for displaying the data in an axesm-based map.

mstruct = geotiff2mstruct(info);

Calculate the latitude-longitude limits of the GeoTIFF image.

[latLimits,lonLimits] = projinv(R.ProjectedCRS,xlimits,ylimits);

Display the overlaid image on an axesm-based map. The axes displays the water regions in blue and
the green vegetation regions in green.

figure
ax = axesm(mstruct,Grid="on", ...
GColor=[1 1 1],GLineStyle="-", ...
MapLatlimit=latLimits,MapLonLimit=lonLimits, ...
ParallelLabel="on",PLabelLocation=0.5,PlabelMeridian="west", ...
MeridianLabel="on",MlabelLocation=0.5,MLabelParallel="south", ...
MLabelRound=-1,PLabelRound=-1, ...
PLineVisible="on",PLineLocation=0.5, ...
MLineVisible="on",MlineLocation=0.5);
[X,Y] = worldGrid(reducedR);
mapshow(X,Y,reducedOverlayImg)
axis off
dim = [0.8 0.5 0.3 0.3];
annotation(textbox=dim,String="Water Bodies", ...
Color=[1 1 1], ...
BackgroundColor=[0 0 1], ...
FaceAlpha=0.5, ...
FitBoxToText="on")
dim = [0.8 0.4 0.3 0.3];
annotation(textbox=dim,String="Green Vegetation", ...
BackgroundColor=[0 1 0], ...
FaceAlpha=0.5, ...
FitBoxToText="on")
title("Water and Vegetation Region of Spatially Referenced Image")

20-76
Find Regions in Spatially Referenced Multispectral Image

See Also
hypercube | spectralIndices | colorize | geotiffinfo | geoplot | mapshow

Related Examples
• “Identify Vegetation Regions Using Interactive NDVI Thresholding” on page 20-61
• “Spatially Reference Imported Rasters” (Mapping Toolbox)
• “Spectral Indices” on page 20-13

20-77
20 Hyperspectral Image Processing

Classify Hyperspectral Image Using Support Vector Machine


Classifier

This example shows how to preprocess a hyperspectral image and classify it using a support vector
machine (SVM) classifier.

This example requires the Image Processing Toolbox™ Hyperspectral Imaging Library. You can install
the Image Processing Toolbox Hyperspectral Imaging Library from Add-On Explorer. For more
information about installing add-ons, see “Get and Manage Add-Ons”. The Image Processing Toolbox
Hyperspectral Imaging Library requires desktop MATLAB®, as MATLAB® Online™ and MATLAB®
Mobile™ do not support the library.

Hyperspectral images are acquired over multiple spectral bands and consist of several hundred band
images, each representing the same scene across different wavelengths. This example uses the Indian
Pines data set, acquired by the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) sensor
across a wavelength range of 400 to 2500 nm. This data set contains 16 classes and 220 band images.
Each image is of size 145-by-145 pixels.

20-78
Classify Hyperspectral Image Using Support Vector Machine Classifier

In this example, you:

1 Preprocess a hyperspectral image using a 2-D Gaussian filter.


2 Perform classification using a SVM classifier.
3 Display classification results, such as the classification accuracy, classification map, and
confusion matrix.

Load Hyperspectral Data Set

Read the hyperspectral data into the workspace by using the hypercube function.

hcube = hypercube("indian_pines.dat");

20-79
20 Hyperspectral Image Processing

Load the ground truth for the data set into the workspace.

gtLabel = load("indian_pines_gt.mat");
gtLabel = gtLabel.indian_pines_gt;
numClasses = 16;

Preprocess Hyperspectral Data

Remove the water absorption bands from the data by using the removeBands function.

band = [104:108 150:163 220]; % Water absorption bands


newhcube = removeBands(hcube,BandNumber=band);

Estimate RGB images of the input hypercube by using the colorize function.

rgbImg = colorize(newhcube,method="rgb");

Apply a Gaussian filter (σ = 2) to each band image of the hyperspectral data using the imgaussfilt
function, and then convert them to grayscale images.

hsData = newhcube.DataCube;
[M,N,C] = size(hsData);
hsDataFiltered = zeros(size(hsData));
for band = 1:C
bandImage = hsData(:,:,band);
bandImageFiltered = imgaussfilt(bandImage,2);
bandImageGray = mat2gray(bandImageFiltered);
hsDataFiltered(:,:,band) = uint8(bandImageGray*255);
end

Prepare Data for Classification

Reshape the filtered hyperspectral data to a set of feature vectors containing filtered spectral
responses for each pixel.

DataVector = reshape(hsDataFiltered,[M*N C]);

Reshape the ground truth image to a vector containing class labels.

gtVector = gtLabel(:);

Find the location indices of the ground truth vector that contain class labels. Discard labels with the
value 0, as they are unlabeled and do not represent a class.

gtLocs = find(gtVector~=0);
classLabel = gtVector(gtLocs);

Create training and testing location indices for the specified training percentage by using the
cvpartition (Statistics and Machine Learning Toolbox) function.

per = 0.1; % Training percentage


cv = cvpartition(classLabel,HoldOut=1-per);

Split the ground truth location indices into training and testing location indices.

locTrain = gtLocs(cv.training);
locTest = gtLocs(~cv.training);

20-80
Classify Hyperspectral Image Using Support Vector Machine Classifier

Classify Using SVM

Train the SVM classifier using the fitcecoc (Statistics and Machine Learning Toolbox) function.

svmMdl = fitcecoc(DataVector(locTrain,:),gtVector(locTrain,:));

Test the SVM classifier using the test data.

[svmLabelOut,~] = predict(svmMdl,DataVector(locTest,:));

Display Classification Results

Calculate and display the classification accuracy.

svmAccuracy = sum(svmLabelOut == gtVector(locTest))/numel(locTest);


disp(["Overall Classification Accuracy (%) = ",num2str(svmAccuracy.*100)])

"Overall Classification Accuracy (%) = " "95.9996"

Create an SVM classification map.

svmPredLabel = gtLabel;
svmPredLabel(locTest) = svmLabelOut;

Display the RGB image, ground truth map, and SVM classification map.

cmap = parula(numClasses);
figure
montage({rgbImg,gtLabel,svmPredLabel},cmap,Size=[1 3],BorderSize=10)
title("RGB Image | Ground Truth Map | SVM Classification Map")
% Specify the intensity limits for the colorbar
minR = min(min(gtLabel(:)),min(svmPredLabel(:)));
maxR = max(max(gtLabel(:)),max(svmPredLabel(:)));
clim([minR maxR])
colorbar

Display the confusion matrix.

fig = figure;
confusionchart(gtVector(locTest),svmLabelOut,ColumnSummary="column-normalized")

20-81
20 Hyperspectral Image Processing

fig_Position = fig.Position;
fig_Position(3) = fig_Position(3)*1.5;
fig.Position = fig_Position;
title("Confusion Matrix: SVM Classification Results")

References
[1] Patro, Ram Narayan, Subhashree Subudhi, Pradyut Kumar Biswal, and Fabio Dell’acqua. “A
Review of Unsupervised Band Selection Techniques: Land Cover Classification for
Hyperspectral Earth Observation Data.” IEEE Geoscience and Remote Sensing Magazine 9,
no. 3 (September 2021): 72–111. https://doi.org/10.1109/MGRS.2021.3051979.

See Also
hypercube | colorize | removeBands | fitcecoc | confusionchart | imgaussfilt

Related Examples
• “Classify Hyperspectral Images Using Deep Learning” on page 20-66
• “Classify Hyperspectral Image Using Library Signatures and SAM” on page 20-40

20-82
Manually Label ROIs in Multispectral Image

Manually Label ROIs in Multispectral Image

This example shows how to manually select regions of interest (ROIs) from a multispectral image and
save them in a shapefile.

This example requires the Image Processing Toolbox™ Hyperspectral Imaging Library. You can install
the Image Processing Toolbox Hyperspectral Imaging Library from Add-On Explorer. For more
information about installing add-ons, see “Get and Manage Add-Ons”. The Image Processing Toolbox
Hyperspectral Imaging Library requires desktop MATLAB®, as MATLAB® Online™ and MATLAB®
Mobile™ do not support the library.

Many supervised learning applications require labeled training data. This example shows how to
manually label multispectral or hyperspectral images by selecting ROIs and saving them in a
shapefile. You can use the shapefile to train deep learning networks.

In this example, you perform these steps:

1 Read a multispectral image and select multiple ROIs.


2 Convert the ROIs into geographic coordinates.
3 Save the geographic coordinates of the ROIs in a shapefile.
4 Read the shapefile and visualize the ROIs in a geographic axes.

Load Multispectral Data

Landsat 8 is an Earth observation satellite that carries the Operational Land Imager (OLI) and
Thermal Infrared Sensor (TIRS) instruments.

The Landsat 8 data set has 8 spectral bands with wavelengths that range from 440 nm to 2200 nm.
The data is 7721-by-7651 pixels in dimension with a spatial resolution of 30 meters.

Download the data set and unzip the file by using the downloadLandsat8Dataset helper function.
The helper function is attached to this example as a supporting file.

zipfile = "LC08_L1TP_113082_20211206_20211206_01_RT.zip";
landsat8Data_url = "https://ssd.mathworks.com/supportfiles/image/data/" + zipfile;
downloadLandsat8Dataset(landsat8Data_url,pwd)

Downloading the Landsat 8 OLI dataset.


This can take several minutes to download...
Done.

Read the Landsat 8 multispectral data into the workspace as a hypercube object.

hCube = hypercube("LC08_L1TP_113082_20211206_20211206_01_RT_MTL.txt");

Estimate an RGB image from the data cube by using the colorize function. Apply contrast
stretching to enhance the contrast of the output RGB image.

rgbImg = colorize(hCube,Method="rgb",ContrastStretching=true);

Adjust the intensity values of the image for better visualization using the imadjustn function.

rgbImg = imadjustn(rgbImg);

20-83
20 Hyperspectral Image Processing

Read the spatial referencing information for the Landsat 8 data from the corresponding GeoTIFF
image.

info = georasterinfo("LC08_L1TP_113082_20211206_20211206_01_RT_B1.TIF");

Calculate the data region using the corner coordinates of the GeoTIFF image.

R = info.RasterReference;
xlimits = R.XWorldLimits;
ylimits = R.YWorldLimits;
dataRegion = mappolyshape(xlimits([1 1 2 2 1]),ylimits([1 2 2 1 1]));
dataRegion.ProjectedCRS = R.ProjectedCRS;

Select ROIs and Save in Shapefile

Specify the number of ROIs to select. For this example, select three ROIs.

numOfAreas = 3;

Visualize the estimated RGB image. Use the pickPolyshape helper function, defined at the end of
this example, to select rectangular ROIs and store the x- and y-coordinates of the ROIs in the cell
arrays polyX and polyY, respectively.

figure
imshow(rgbImg)
polyX = cell(numOfAreas,1);
polyY = cell(numOfAreas,1);
for ch = 1:numOfAreas
[x,y] = pickPolyshape(R);
polyX{ch} = x;
polyY{ch} = y;
end

20-84
Manually Label ROIs in Multispectral Image

Create ROI shapes from the ROI coordinates by using the mappolyshape (Mapping Toolbox)
function.

shape = mappolyshape(polyX,polyY);
shape.ProjectedCRS = R.ProjectedCRS;

Create a geospatial table from the ROI shapes.

gt = table(shape,VariableNames="Shape");

Write the ROI shapes to the shapefile format. You can use this shapefile as labeled data.

shapewrite(gt,"Landsat8ROIs.shp")

Read Shapefile and Visualize ROIs in Geographic Axes

Read the shapefile as a geospatial table.

20-85
20 Hyperspectral Image Processing

S = readgeotable("Landsat8ROIs.shp");
S.Shape.ProjectedCRS = R.ProjectedCRS;

Visualize the ROIs in a geographic axes along with the data region of the Landsat 8 multispectral
image.

figure
geoplot(dataRegion)
hold on
geobasemap satellite
geoplot(S)

Supporting Functions

The pickPolyshape helper function performs these tasks:

1 Creates a customizable rectangular ROI.


2 Calculates the x- and y-coordinates of the corners of the ROI.
3 Transforms the intrinsic coordinates of the ROI to world coordinates.

function [xWorld,yWorld] = pickPolyshape(R)


roi = drawrectangle(Color="r");
x1 = roi.Position(1);
y1 = roi.Position(2);
x2 = x1 + roi.Position(3);
y2 = y1 + roi.Position(4);

20-86
Manually Label ROIs in Multispectral Image

[xWorld,yWorld] = intrinsicToWorld(R,[x2 x1 x1 x2 x2],[y1 y1 y2 y2 y1]);


end

20-87
20 Hyperspectral Image Processing

Change Detection in Hyperspectral Images

This example shows how to detect changes in land cover from hyperspectral images taken at different
times.

Change detection (CD) is the process of determining the changes in land cover based on multiple
observations at different times. Information regarding land cover changes is important for effective
analysis of the ecological environment, natural resources, and social development of a region.

In this example, you perform these steps:

1 Read and visualize hyperspectral images acquired at different times.


2 Estimate the change between the images.
3 Compare the detected change with the ground truth.
4 Display the detected changes.

This example requires the Image Processing Toolbox™ Hyperspectral Imaging Library. You can install
the Image Processing Toolbox Hyperspectral Imaging Library from Add-On Explorer. For more
information about installing add-ons, see “Get and Manage Add-Ons”. The Image Processing Toolbox
Hyperspectral Imaging Library requires desktop MATLAB®, as MATLAB® Online™ and MATLAB®
Mobile™ do not support the library.

Load and Visualize Hyperspectral Images

This example uses the Bay Area data set [1 on page 20-92], collected in the years 2013 and 2015
with the AVIRIS sensor, surrounding the city of Patterson, California. The spatial dimensions of the
data set are 600-by-500 pixels, and it includes 224 spectral bands with a spatial resolution of
approximately 30 meters per pixel.

Download the data set and unzip the file by using the downloadChangeDetectionData helper
function. The helper function is attached to this example as a supporting file.

zipfile = "bayArea_dataset.zip";
data_url = "https://ssd.mathworks.com/supportfiles/image/data/" + zipfile;
downloadChangeDetectionData(data_url,pwd)

Downloading the change detection dataset.


This can take several minutes to download...
Done.

Load the data set into the workspace. The variables T1 and T2 are the datacubes of the two
hyperspectral images. The MAT file also contains a ground truth change map for the same region,
groundTruth.

load bayArea_dataset.mat

Display the RGB bands of both hyperspectral images. The images are AVIRIS images, for which band
numbers 26, 16, and 8 are the R, G, and B wavelengths, respectively.

image1 = imadjustn(rescale(T1(:,:,[26 16 8])));


image2 = imadjustn(rescale(T2(:,:,[26 16 8])));
figure
montage({image1 image2},Size=[1 2],BorderSize=5)

20-88
Change Detection in Hyperspectral Images

Detect Changes Between Hyperspectral Images

Specify the window size value for change detection. A larger window size enables you to handle
larger misalignment errors between the two images.

windowSize = 7;

Estimate the changes between the two images. Assuming that both hyperspectral images are
perfectly aligned, which means that both images have the spectral signature of the same physical
area at each corresponding pixel location, you can assess the dissimilarity between corresponding
pixels in the two images by measuring the distance between the spectral signatures.

change = changeDetection(T1,T2,windowSize);

Compare With Ground Truth

Convert the estimated change map change to a binary image using a threshold.

threshold = 0.07;
changeMap = mat2gray(change);
bw = changeMap > threshold;

Convert the ground truth change map groundTruth to a binary image. The threshold for the ground
truth is 0 because groundTruth has a data type of uint8 instead of double.

bwGroundTruth = groundTruth > 0;

Compare the estimated binary change map to the ground truth binary change map.

20-89
20 Hyperspectral Image Processing

figure
montage({bwGroundTruth bw},Size=[1 2],BorderSize=5)

Compute the similarity between the estimated and ground truth binary change maps.

similarity = bfscore(bw,bwGroundTruth)

similarity = 0.6039

Display Detected Changes

Visualize the detected changes between the images as an overlay on the second hyperspectral image.

labelOut = labeloverlay(image2,bw,Colormap="autumn",Transparency=0.5);
figure
imshow(labelOut)

20-90
Change Detection in Hyperspectral Images

Supporting Functions

The changeDetection helper function estimates the changes between two images captured at
different times. Because the time of image acquisition is not known, the helper function uses the
symmetric local coregistration adjustment (SLCRA) algorithm, which evaluates the dissimilarity in
both directions and fuses the dissimilarity measurement.
function changeMap = changeDetection(imageData1,imageData2,windowSize)
% Get the center of window.
centerPixel = ((windowSize-1)/2);

% Get the size of the input data.


[row,col,~] = size(imageData1);

if isinteger(imageData1)
imageData1 = single(imageData1);
imageData2 = single(imageData2);
end

% Apply zero padding to handle the edge pixels.


imageData1 = padarray(imageData1,[centerPixel centerPixel],0,"both");
imageData2 = padarray(imageData2,[centerPixel centerPixel],0,"both");

20-91
20 Hyperspectral Image Processing

% Initialize the change map output.


changeMap = zeros(row,col);

for r = (centerPixel + 1):(row + centerPixel)


for c = (centerPixel + 1):(col + centerPixel)
rowNeighborhood = (r - centerPixel):(r + centerPixel);
colNeighborhood = (c - centerPixel):(c + centerPixel);
% Find the Euclidean distance between the reference signature and
% the neighborhood of the target signature.
spectra1 = reshape(imageData1(r,c,:),1,[]);
spectra2 = reshape(imageData2(rowNeighborhood,colNeighborhood,:), ...
windowSize*windowSize,size(imageData1,3));
a = min(pdist2(spectra1,spectra2));
% Find the Euclidean distance between the target signature and
% the neighborhood of the reference signature.
spectra1 = reshape(imageData2(r,c,:),1,[]);
spectra2 = reshape(imageData1(rowNeighborhood,colNeighborhood,:), ...
windowSize*windowSize,size(imageData1,3));
b = min(pdist2(spectra1, spectra2));
% Store the pixel-wise results in the change map.
changeMap(r - centerPixel,c - centerPixel) = max(a,b);
end
end
end

References

[1] Heras, Dora Blanco, Álvaro Ordóñez Iglesias, Jorge Alberto Suárez Garea, Francisco Argüello
Pedreira, Javier López Fandiño, and Pablo Quesada Barriuso. Hyperspectral Change Detection
Dataset. October 5, 2016. Distributed by Centro Singular de Investigación en Tecnoloxías
Intelixentes. https://citius.usc.es/investigacion/datasets/hyperspectral-change-detection-dataset.

20-92
Ship Detection from Sentinel-1 C Band SAR Data Using YOLO v2 Object Detection

Ship Detection from Sentinel-1 C Band SAR Data Using YOLO


v2 Object Detection

This example shows how to detect ships from Sentinel-1 C Band SAR Data using YOLO v2 object
detection.

Synthetic aperture radar (SAR) remote sensing is an important method for marine monitoring due to
its all-day, all-weather capability. Ship detection in SAR images plays a critical role in shipwreck
rescue, fishery and traffic management, and other marine applications. SAR imagery provides data
with high spatial and temporal resolution, which is useful for ship detection.

This example shows how to use a pretrained YOLO v2 object detection network to perform these
tasks.

1 Detect ships in sample test image.


2 Detect ships in large-scale test image with block processing.
3 Plot the large-scale test image on the map and show the ships detected in the large-scale test
image on the map.

This example also shows how you can train the YOLO v2 object detector from scratch and evaluate
the detector on test data.

Detect Ships Using Pretrained Network

Load Pretrained Network

Create a folder in which to store the pretrained YOLO v2 object detection network and test images.
You can use the pretrained network and test images to run the example without waiting for training
to complete.

Download the pretrained network and load it into the workspace by using the
helperDownloadObjectDetector helper function. The helper function is attached to this example
as a supporting file.

dataDir = "SARShipDetection";
detector = helperDownloadObjectDetector(dataDir);

Detect Ships in Test Image Using Pretrained Network

Read a sample test image and detect the ships it contains using the pretrained object detection
network. The size of the test image is and must be the same as the size of the input to the object
detection network. Use a threshold of 0.6 to reduce the false positives.

testImg = fullfile(dataDir,"shipDetectionYoloV2","test1.jpg");
Img = imread(testImg);
[bboxes,scores,labels] = detect(detector,Img,Threshold=0.6);

Display the test image and the output with bounding boxes for the detected ships as a montage.

detectedIm = insertObjectAnnotation(Img,"Rectangle",bboxes,scores,LineWidth=2,Color="red");
figure
montage({Img,detectedIm},BorderSize=8)
title("Detected Ships in Test Image")

20-93
20 Hyperspectral Image Processing

Detect Ships in Large-Scale Test Image Using Pretrained Network

Create a large-scale test image of the Singapore Strait region. Create a polygon in geographic
coordinates that represents the geographic extent of the region by using the geopolyshape
(Mapping Toolbox) object. Plot the region of interest using satellite imagery.

lat = [2.420848 2.897566 1.389878 0.908674 2.420848];


lon = [103.816467 106.073883 106.386215 104.131699 103.816467];
dataRegion = geopolyshape(lat,lon);
dataRegion.GeographicCRS = geocrs(4326);
figure
geoplot(dataRegion, ...
LineWidth=2, ...
EdgeColor="yellow", ...
FaceColor="red", ...
FaceAlpha=0.2)
hold on
geobasemap satellite

20-94
Ship Detection from Sentinel-1 C Band SAR Data Using YOLO v2 Object Detection

To perform ship detection on the large-scale image, use the blockedImage object. You can use this
object to load a large-scale image and process it on systems with limited resources.

Create the blocked image. Display the image using the bigimageshow function.

largeImg = fullfile(dataDir,"shipDetectionYoloV2","test-ls.jpg");
bim = blockedImage(largeImg);
figure
bigimageshow(bim)
title("Large-scale Test Image (16000-by-24000-by-3)")

20-95
20 Hyperspectral Image Processing

Set the block size to the input size of the detector.

blockSize = [800 800 3];

Create a function handle to the detectShipsLargeImage helper function. The helper function,
which contains the ship detection algorithm definition, is attached to this example as a supporting
file.

detectionFcn = @(bstruct) detectShipsLargeImage(bstruct,detector);

Produce a processed blocked image bimProc with annotations for the bounding boxes by using the
apply object function with the detectionFcn function handle. This function call also saves the
bounding boxes in the getBboxes MAT file.

bimProc = apply(bim,detectionFcn,BlockSize=blockSize);

Display the output with bounding boxes containing the detected ships.

figure
bigimageshow(bimProc)
title("Detected Ships in Large-Scale Test Image")
uicontrol(Visible="off")

20-96
Ship Detection from Sentinel-1 C Band SAR Data Using YOLO v2 Object Detection

A bounding box indicates the region of interest (ROI) of the detected ship. Get the world coordinates
of the detected ship ROIs from the bounding boxes using the createShipROIs helper function. The
createShipROIs helper function is attached to this example as a supporting file. Because the image
metadata is not available, the helper function manually creates the spatial referencing object by
setting image attributes such as XWorldLimits and YWorldLimits.

Store the x- and y-coordinates of the ship ROIs in the polyX and polyY cell arrays, respectively.

[polyX,polyY] = createShipROIs();

Create ship ROI shapes from the ROI coordinates by using the geopolyshape (Mapping Toolbox)
object.

shape = geopolyshape(polyX,polyY);
shape.GeographicCRS = geocrs(4326);

Visualize the ship ROIs in a geographic axes with the data region.

figure
geoplot(dataRegion, ...
LineWidth=2, ...
EdgeColor="yellow", ...
FaceColor="red", ...
FaceAlpha=0.2)
hold on
geobasemap satellite
geoplot(shape, ...

20-97
20 Hyperspectral Image Processing

LineWidth=2, ...
EdgeColor="blue", ...
FaceColor="cyan", ...
FaceAlpha=0.2)

Train and Evaluate YOLO v2 Object Detection Network for Ship Detection

This part of the example shows how to train and evaluate a YOLO v2 object detection network from
scratch.

Load Data

This example uses the Large-Scale SAR Ship Detection Dataset-v1.0 [1 on page 20-105], which was
built on the Sentinel-1 satellite. The large-scale images are collected from 15 original large-scene
space-borne SAR images. The polarization modes include VV and VH, the imaging mode is
interferometic wide (IW) and the data set has the characteristics of large-scale ocean observation,
small ship detection, abundant pure backgrounds, fully automatic detection flow, and numerous
standardized research baselines. This data set contains 15 large-scale SAR images whose ground
truths are correctly labeled. The size of the large-scale images is unified to 24000-by-16000 pixels.
The format is a three-channel, 24-bit, grayscale, JPG. The annotation file is an XML file that records
the target location information, which comprises Xmin, Xmax, Ymin, and Ymax. To facilitate network
training, the large-scale images are directly cut into 9000 subimages with a size of 800-by-800 pixels
each.

To download the Large-Scale SAR Ship Detection Dataset-v1.0, go to this website of the Large-Scale
SAR Ship Detection Dataset-v1.0 and click the Download links for the JPEGImages_sub_train,

20-98
Ship Detection from Sentinel-1 C Band SAR Data Using YOLO v2 Object Detection

JPEGImages_sub_test, and Annotations_sub RAR files, located at the bottom of the webpage.
You must provide an email address or register on the website to download the data. Extract the
contents of JPEGImages_sub_train, JPEGImages_sub_test, and Annotations_sub to folders
with the names JPEGImages_sub_train, JPEGImages_sub_test, and Annotations_sub,
respectively, in the current working folder.

The data set uses 6000 images (600 subimages each from the first 10 large-scale images) as the
training data and the remaining 3000 images (600 subimages each from the remaining 5 large-scale
images) as the test data. In this example, you use 6600 images (600 subimages each from the first 11
large-scale images) as the training data and the remaining 2400 images (600 subimages each from
the remaining 4 large-scale images) as the test data. Hence, move the 600 image files with names
11_1_1 to 11_20_30 from the JPEGImages_sub_test folder to the JPEGImages_sub_train
folder. Move the annotation files with names 01_1_1 to 11_20_30, corresponding to the training
data, to a folder with the name Annotations_train in the current working folder, and the
annotation files with names 12_1_1 to 15_20_30, corresponding to the test data, to a folder named
Annotations_test in the current working folder.

Prepare Data for Training

To train and test the network only on images that contain ships, use the createDataTable helper
function to perform these steps.
1 Discard images that do not have any ships from the data set. This operation leaves 1350 images
in the training set and 509 images in the test set.
2 Organize the training and test data into two separate tables, each with two columns, where the
first column contains the image file paths and the second column contains the ship bounding
boxes.

The createDataTable helper function is attached to this example as a supporting file.


trainingSet = fullfile(pwd,"JPEGImages_sub_train");
annotationsTrain = fullfile(pwd,"Annotations_train");
trainingDataTbl = createDataTable(trainingSet,annotationsTrain);
testSet = fullfile(pwd,"JPEGImages_sub_test");
annotationsTest = fullfile(pwd,"Annotations_test");
testDataTbl = createDataTable(testSet, annotationsTest);

Display first few rows of the training data table.


trainingDataTbl(1:5,:)

ans=5×2 table
imageFileName ship
____________________________________________________ ________________

"C:\Users\Example\JPEGImages_sub_train/01_10_12.jpg" {2×4 double }


"C:\Users\Example\JPEGImages_sub_train/01_10_17.jpg" {[95 176 12 11]}
"C:\Users\Example\JPEGImages_sub_train/01_10_20.jpg" {4×4 double }
"C:\Users\Example\JPEGImages_sub_train/01_10_21.jpg" {4×4 double }
"C:\Users\Example\JPEGImages_sub_train/01_10_22.jpg" {2×4 double }

Create datastores for loading the image and label data during training and evaluation by using the
imageDatastore and boxLabelDatastore (Computer Vision Toolbox) objects.
imdsTrain = imageDatastore(trainingDataTbl.imageFileName);
bldsTrain = boxLabelDatastore(trainingDataTbl(:,"ship"));

20-99
20 Hyperspectral Image Processing

imdsTest = imageDatastore(testDataTbl.imageFileName);
bldsTest = boxLabelDatastore(testDataTbl(:,"ship"));

Combine the image and box label datastores.

trainingData = combine(imdsTrain,bldsTrain);
testData = combine(imdsTest,bldsTest);

Display one of the training images and box labels.

data = read(trainingData);
Img = data{1};
bbox = data{2};
annotatedImage = insertShape(Img,"Rectangle",bbox);
figure
imshow(annotatedImage)
title("Sample Annotated Training Image")

20-100
Ship Detection from Sentinel-1 C Band SAR Data Using YOLO v2 Object Detection

Augment Training Data

Augment the training data by randomly flipping the image and associated box labels horizontally
using the transform function with the augmentData helper function. The augmentData helper
function is attached to this example as a supporting file. Augmentation increases the variability of the
training data without increasing the number of labeled training samples. To ensure unbiased
evaluation, do not apply data augmentation to the test data.

augmentedTrainingData = transform(trainingData,@augmentData);

Read the same image multiple times and display the augmented training data.

augmentedData = cell(4,1);
for k = 1:4

20-101
20 Hyperspectral Image Processing

data = read(augmentedTrainingData);
augmentedData{k} = insertShape(data{1},"rectangle",data{2});
reset(augmentedTrainingData);
end
figure
montage(augmentedData,BorderSize=8)
title("Sample Augmented Training Data")

Define Network Architecture

This example uses a YOLO v2 object detection network, which consists of a feature extraction
network followed by a detection network. Use a pretrained ResNet-50 network for feature extraction.
The detection sub-network is a small convolutional neural network (CNN) compared to the feature
extraction network, comprising a few convolutional layers and layers specific for YOLO v2. Using

20-102
Ship Detection from Sentinel-1 C Band SAR Data Using YOLO v2 Object Detection

ResNet-50 as the base network requires the Deep Learning Toolbox™ Model for ResNet-50 Network
support package. If this support package is not installed, then the function provides a download link.

Specify the network input size as 800-by-800-by-3, which is the original image size.

inputSize = [800 800 3];

Define the number of object classes to detect. This example has only one object class with the name
"ship".

numClasses = 1;

Estimate the anchor boxes based on the size of objects in the training data using the
estimateAnchorBoxes (Computer Vision Toolbox) function.

numAnchors = 7;
[anchorBoxes,~] = estimateAnchorBoxes(trainingData,numAnchors);

Load a pretrained ResNet-50 model by using the resnet50 (Deep Learning Toolbox) function.

featureExtractionNetwork = resnet50;

Specify the feature extraction layer as "activation_40_relu" to replace the layers after the
feature extraction layer with the detection subnetwork. This feature extraction layer outputs feature
maps that are downsampled by a factor of 16. This downsampling provides a good trade-off between
spatial resolution and the strength of the extracted features.

featureExtractionLayer = "activation_40_relu";

Create the YOLO v2 object detection network.

lgraph = yolov2Layers(inputSize,numClasses,anchorBoxes,featureExtractionNetwork,featureExtraction

Visualize the network using the Deep Network Designer (Deep Learning Toolbox).

deepNetworkDesigner(lgraph);

Specify Training Options

Train the object detection network using the Adam optimization solver. Specify the hyperparameter
settings using the trainingOptions (Deep Learning Toolbox) function. Specify the mini-batch size
as 8 and the learning rate as 0.001 over the span of training. Use the test data as the validation data.
You can experiment with tuning the hyperparameters based on your GPU memory.

options = trainingOptions("adam",...
GradientDecayFactor=0.9,...
SquaredGradientDecayFactor=0.999,...
InitialLearnRate=0.001,...
LearnRateSchedule="none",...
MiniBatchSize=8,...
L2Regularization=0.0005,...
MaxEpochs=150,...
BatchNormalizationStatistics="moving",...
DispatchInBackground=false,...
ResetInputNormalization=false,...
Shuffle="every-epoch",...
VerboseFrequency=20,...

20-103
20 Hyperspectral Image Processing

CheckpointPath=tempdir,...
ValidationData=testData);

Train Network

To train the network, set the doTraining variable to true. Train on a GPU, preferably a CUDA®-
enabled NVIDIA® GPU if one is available. Using a GPU requires a Parallel Computing Toolbox™
license. For more information, see “GPU Computing Requirements” (Parallel Computing Toolbox).
Training takes about 19 hours on an NVIDIA™ Titan Xp GPU with 12 GB memory and can take longer
depending on your GPU hardware. If your GPU has less memory, lower the mini-batch size using the
trainingOptions (Deep Learning Toolbox) function to prevent running out of memory.

doTraining = false;
if doTraining
disp("Training YOLO V2 model ...")
[detector,info] = trainYOLOv2ObjectDetector(augmentedTrainingData,lgraph,options);
save("trainedYOLOV2_ship_detector.mat","detector");
end

Evaluate Detector Using Test Data

Evaluate the trained object detector on the test data using the average precision metric. The average
precision provides a single number that incorporates the ability of the detector to make correct
classifications (precision) and the ability of the detector to find all relevant objects (recall).

Use the detector to detect ships in all images in the test data.

detectionResults = detect(detector,testData,MiniBatchSize=8);

Evaluate the object detector using the average precision metric.

metrics = evaluateObjectDetection(detectionResults,testData);
classID = 1;
precision = metrics.ClassMetrics.Precision{classID};
recall = metrics.ClassMetrics.Recall{classID};

The precision/recall (PR) curve highlights how precise a detector is at varying levels of recall. The
ideal precision is 1 at all recall levels. Using more data can improve the average precision but often
requires more training time.

Plot the PR curve.

figure
plot(recall,precision)
xlabel("Recall")
ylabel("Precision")
grid on
title(sprintf("Average Precision = %.2f",metrics.ClassMetrics.mAP(classID)))

20-104
Ship Detection from Sentinel-1 C Band SAR Data Using YOLO v2 Object Detection

References

[1] Tianwen Zhang, Xiaoling Zhang, Xiao Ke, Xu Zhan, Jun Shi, Shunjun Wei, Dece Pan, et al. “LS-
SSDD-v1.0: A Deep Learning Dataset Dedicated to Small Ship Detection from Large-Scale Sentinel-1
SAR Images.” Remote Sensing 12, no. 18 (September 15, 2020): 2997. https://doi.org/10.3390/
rs12182997

See Also
yolov2Layers | resnet50 | evaluateObjectDetection

20-105
20 Hyperspectral Image Processing

Automate Pixel Labeling of Hyperspectral Images Using


ECOSTRESS Spectral Signatures in Image Labeler

This example shows how to load hyperspectral images into the Image Labeler (Computer Vision
Toolbox) and automatically label pixels.

In this example, you load hyperspectral image data into the Image Labeler and assign pixel labels
using an automation algorithm. The automation algorithm matches the spectral signature of each
pixel with the spectral signatures in the ECOSTRESS library and annotates every pixel automatically
using the spectral angle mapper (SAM) classification algorithm [1 on page 20-114].

This example requires the Image Processing Toolbox™ Hyperspectral Imaging Library. You can install
the Image Processing Toolbox Hyperspectral Imaging Library from Add-On Explorer. For more
information about installing add-ons, see “Get and Manage Add-Ons”. The Image Processing Toolbox
Hyperspectral Imaging Library requires desktop MATLAB®, as MATLAB® Online™ and MATLAB®
Mobile™ do not support the library.

Import Hyperspectral Image into Image Labeler

To make the multi-channel hyperspectral images suitable for importing into the Image Labeler, load
the hyperspectral images as an image datastore using the custom read function
readColorizedHyperspectralImage, defined at the end of this example. The custom read
function returns a 3-channel image for the Image Labeler to display.

file = "jasperRidge2_R198.img";
imds = imageDatastore(file,ReadFcn=@readColorizedHyperspectralImage,FileExtensions=".img");

You can use the Hyperspectral Viewer app to explore colorization methods that help accentuate
regions of interest within the data. For the Jasper Ridge data, the colorize function with Method
specified as "falsecolored" returns a false-colored image using bands 145, 99, and 19.

Open the Image Labeler app. First, select the Apps tab on the MATLAB toolstrip. Then, in the Image
Processing and Computer Vision section, select Image Labeler. Alternatively, you can open the
Image Labeler app programmatically by entering this command at the MATLAB Command Prompt.

imageLabeler

On the Image Labeler Start Page, select New Individual Project.

20-106
Automate Pixel Labeling of Hyperspectral Images Using ECOSTRESS Spectral Signatures in Image Labeler

Perform these steps to import the image datastore imds into the Image Labeler.

1 On the Image Labeler tab of the app toolstrip, select Import, and then select From
Workspace.
2 Use the dialog box to select the image datastore imds.

20-107
20 Hyperspectral Image Processing

Define Pixel Labels

The Jasper Ridge hyperspectral image contains spectral signatures that help classify the types of
material in the image. The materials known to be present in the image are sea water, vegetation, soil
(utisol and mollisol), and concrete. Perform these steps for each class to define pixel labels for the
five classes.

1 On the Image Labeler tab, select Add Label and then select Pixel.
2 In the dialog box, enter SeaWater as the name for the pixel label and select OK.
3 Repeat this process for the remaining material types, Tree, Utisol, Mollisol, and Concrete.

Alternatively, you can use the labelDefinitionCreator (Computer Vision Toolbox) function to
programmatically create pixel label definitions for the five classes.

classes = ["SeaWater","Tree","Utisol","Mollisol","Concrete"];
ldc = labelDefinitionCreator;
for i = 1:numel(classes)
addLabel(ldc,classes(i),labelType.PixelLabel)
end
labelDefinitions = create(ldc)

labelDefinitions=5×6 table
Name Type LabelColor PixelLabelID Group Description
____________ __________ __________ ____________ ________ ___________

{'SeaWater'} PixelLabel {0×0 char} {[1]} {'None'} {' '}


{'Tree' } PixelLabel {0×0 char} {[2]} {'None'} {' '}
{'Utisol' } PixelLabel {0×0 char} {[3]} {'None'} {' '}
{'Mollisol'} PixelLabel {0×0 char} {[4]} {'None'} {' '}

20-108
Automate Pixel Labeling of Hyperspectral Images Using ECOSTRESS Spectral Signatures in Image Labeler

{'Concrete'} PixelLabel {0×0 char} {[5]} {'None'} {' '}

Save the label definition table to a MAT file.

save JasperRidgeLabelDefinitions.mat labelDefinitions

Perform these steps to import the label definitions into the Image Labeler app.

1 On the Image Labeler tab, select Import, and then select Label Definitions.
2 In the dialog box, select the file JasperRidgeLabelDefinitions.mat.

Define Automation Algorithm

Automation algorithms implement an API that enables the Image Labeler app to call user-defined
algorithms for labeling. For hyperspectral images, you can define automation algorithms based on
spectral matching, object detection networks, semantic segmentation networks, and so on, for pixel
labeling. This example uses an automation algorithm based on spectral matching. For more
information on writing an automation algorithm, see “Create Automation Algorithm for Labeling”
(Computer Vision Toolbox).

The ECOSTRESS spectral library consists of over 3400 spectral signatures for both natural and
manmade surface materials [2 on page 20-115][3 on page 20-115][4 on page 20-115]. You can
automatically label the pixels by matching the spectra of each pixel to the spectral signatures in the
library. Use the spectral angle mapping (SAM) technique in the automation algorithm class
SpectralAngleMapperAutomationAlgorithm for labeling the pixels. The automation algorithm
class SpectralAngleMapperAutomationAlgorithm is attached to this example as a supporting
file.

20-109
20 Hyperspectral Image Processing

Create a +vision/+labeler/ folder within the current working folder. Copy the automation
algorithm class file SpectralAngleMapperAutomationAlgorithm.m to the folder +vision/
+labeler.

exampleFolder = pwd;
automationFolder = fullfile("+vision","+labeler");
mkdir(exampleFolder,automationFolder)
copyfile("SpectralAngleMapperAutomationAlgorithm.m",automationFolder)

These are the key components of the automation algorithm class.

1 The SpectralAngleMapperAutomationAlgorithm constructor


2 The settingsDialog method
3 The run method

The constructor uses the readEcostressSig function to load all the ECOSTRESS spectral
signatures provided within the Image Processing Toolbox Hyperspectral Imaging Library. You can
modify the constructor to add more spectral signatures from the ECOSTRESS library.

function this = SpectralAngleMapperAutomationAlgorithm()

% List the ECOSTRESS files containing material signatures


% present in the data to be labeled. Modify this list as
% needed based on the data to be labeled.
filenames = [
"manmade.concrete.pavingconcrete.solid.all.0092uuu_cnc.jhu.becknic.spectrum.txt"
"manmade.road.tar.solid.all.0099uuutar.jhu.becknic.spectrum.txt"
"manmade.roofingmaterial.metal.solid.all.0384uuualm.jhu.becknic.spectrum.txt"
"manmade.roofingmaterial.metal.solid.all.0692uuucop.jhu.becknic.spectrum.txt"
"soil.mollisol.cryoboroll.none.all.85p4663.jhu.becknic.spectrum.txt"
"soil.utisol.hapludult.none.all.87p707.jhu.becknic.spectrum.txt"
"vegetation.grass.avena.fatua.vswir.vh353.ucsb.asd.spectrum.txt"
"vegetation.tree.abies.concolor.tir.vh063.ucsb.nicolet.spectrum.txt"
"vegetation.tree.bambusa.tuldoides.tir.jpl216.jpl.nicolet.spectrum.txt"
"vegetation.tree.eucalyptus.maculata.vswir.jpl087.jpl.asd.spectrum.txt"
"vegetation.tree.pinus.ponderosa.tir.vh254.ucsb.nicolet.spectrum.txt"
"vegetation.tree.tsuga.canadensis.vswir.tsca-1-47.ucsb.asd.spectrum.txt"
"water.ice.none.solid.all.ice_dat_.jhu.becknic.spectrum.txt"
"water.seawater.none.liquid.tir.seafoam.jhu.becknic.spectrum.txt"
"water.tapwater.none.liquid.all.tapwater.jhu.becknic.spectrum.txt"
];

% Load the ECOSTRESS signatures.


this.EcostressSignatures = readEcostressSig(filenames);

end

The settingsDialog method launches a custom dialog box that enables you to map label definitions
to spectral signatures. The settingsDialog method uses the SAMSettings class, which has been
generated, using the App Designer, to create the custom dialog design and modified to support a
dynamic number of ECOSTRESS signatures. The SAMSettings class is attached to this example as a
supporting file. Ensure this file remains in the current working folder. Use the dialog box to map the
label definitions to spectral signatures before running the automation algorithm.

20-110
Automate Pixel Labeling of Hyperspectral Images Using ECOSTRESS Spectral Signatures in Image Labeler

The run method of the automation algorithm loads the hyperspectral image, runs the spectral
matching algorithm, and returns a categorical array containing the pixel labels. You can use a similar
approach to implement other automation algorithms that require processing all the bands within the
hyperspectral data.

function C = run(this, ~)

% Load the hyperspectral image being processed using the


% CurrentIndex property.
src = this.GroundTruth.DataSource;
filename = src.Source.Files{this.CurrentIndex};
hcube = hypercube(filename);

% Match hypercube spectra with endmember signatures from


% spectral library.
selectedSigs = this.EcostressSignatures(this.SelectedEcostressSig);
scores = spectralMatch(selectedSigs,hcube);

% Classify the spectral signatures by finding the minimum score.


[~,L] = min(scores,[],3);

% Determine the pixel label classes.

20-111
20 Hyperspectral Image Processing

pxLabels = this.GroundTruth.LabelDefinitions.Type == labelType.PixelLabel;


def = this.GroundTruth.LabelDefinitions(pxLabels,:);
classes = def.Name;

% Return a categorical image


C = categorical(L,1:numel(selectedSigs),classes);

end

Import Automation Algorithm

Import the automation algorithm into the Image Labeler app by performing these steps.

1 On the Image Labeler tab, select Select Algorithm.


2 Select Add Whole Image Algorithm, and then select Import Algorithm.
3 Navigate to the +vision/+labeler folder in the current working folder and choose
SpectralAngleMapperAutomationAlgorithm.m from the file selection dialog box.

Run Automation Algorithm

Perform these steps to configure the automation algorithm.

1 On the Image Labeler tab, select Automate. This creates the Automate tab.
2 On the Automate tab, select Settings. This opens the settings dialog created using the
settingsDialog method in the automation algorithm. Use the settings dialog to map label
definitions to the corresponding spectral signatures.

20-112
Automate Pixel Labeling of Hyperspectral Images Using ECOSTRESS Spectral Signatures in Image Labeler

Perform these steps to run the configured automation algorithm.

1 On the Automate tab, select Run.


2 Review the automation results. Use the tools in the Label Pixels tab to correct any automation
errors.
3 If the labels are satisfactory then, on the Automate tab, select Accept, in the Automate tab.

20-113
20 Hyperspectral Image Processing

Export Ground Truth Labels

Perform these steps to export the label data for the hyperspectral image.

1 Save the current labeling project using the Save Project option on the Image Labeler tab. After
selecting Save Project, select Save. Specify a name for the labeling project and click Save.
2 On the Image Labeler tab, select Export, and then select To Workspace.
3 Specify the location and name of the workspace variable to which you want to save the ground
truth label data.

You can use the exported ground truth in downstream tasks, such as for training a deep learning
network or verifying the results of hyperspectral image processing algorithms.

Supporting Functions

function colorizedImg = readColorizedHyperspectralImage(file)


% Colorize the hyperspectral image using false coloring to accentuate regions of interest wit
hcube = hypercube(file);
colorizedImg = colorize(hcube,Method="falsecolored");
end

References

[1] Kruse, F.A., A.B. Lefkoff, J.W. Boardman, K.B. Heidebrecht, A.T. Shapiro, P.J. Barloon, and A.F.H.
Goetz. “The Spectral Image Processing System (SIPS)—Interactive Visualization and Analysis of
Imaging Spectrometer Data.” Remote Sensing of Environment 44, no. 2–3 (May 1993): 145–63.
https://doi.org/10.1016/0034-4257(93)90013-N.

20-114
Automate Pixel Labeling of Hyperspectral Images Using ECOSTRESS Spectral Signatures in Image Labeler

[2] ECOSTRESS Spectral Library: https://speclib.jpl.nasa.gov

[3] Meerdink, Susan K., Simon J. Hook, Dar A. Roberts, and Elsa A. Abbott. “The ECOSTRESS
Spectral Library Version 1.0.” Remote Sensing of Environment 230 (September 2019): 111196.
https://doi.org/10.1016/j.rse.2019.05.015.

[4] Baldridge, A.M., S.J. Hook, C.I. Grove, and G. Rivera. “The ASTER Spectral Library Version 2.0.”
Remote Sensing of Environment 113, no. 4 (April 2009): 711–15. https://doi.org/10.1016/
j.rse.2008.11.007.

See Also
Image Labeler

20-115
21

Code Generation for Image Processing


Toolbox Functions

• “Code Generation for Image Processing” on page 21-2


• “Generate Code for Object Detection” on page 21-5
• “Generate Code to Resize Images to Fixed Output Size” on page 21-22
21 Code Generation for Image Processing Toolbox Functions

Code Generation for Image Processing


Some Image Processing Toolbox functions enable you to generate standalone C code, C code that
depends on a precompiled, platform-specific shared library, or both. Generating code requires
MATLAB Coder.

For a complete list of Image Processing Toolbox functions that support code generation, see
Functions Supporting Code Generation. For an example of using code generation, see “Generate
Code for Object Detection” on page 21-5.

Types of Code Generation Support in Image Processing Toolbox


Image Processing Toolbox offers three types of code generation support.

• Functions that generate standalone C code. You can incorporate this code into applications that
run on many platforms, such as ARM processors. An example of a function that supports only
standalone C code is immse.
• Functions that generate C code that depends on a platform-specific shared library (.dll, .so,
or .dylib). Use of a shared library preserves performance optimizations in these functions, but
this limits the target platforms on which you can run this code to only platforms that can host
MATLAB. To view a list of host platforms, see system requirements. An example of a function that
supports only C code that depends on a shared library is bwpack.
• Functions that generate standalone C code or C code that depends on a shared library, depending
on which target platform you specify in MATLAB Coder configuration settings. If you specify the
generic MATLAB Host Computer target platform, then these functions generate C code that
depends on a shared library. If you specify any other target platform, then these functions
generate standalone C code. An example of a function that supports both standalone C code and C
code that depends on a shared library is regionprops.

The diagram illustrates the difference between generating C code and generating code that uses a
shared library.

21-2
Code Generation for Image Processing

Generate Code with Image Processing Functions


In generated code, each supported toolbox function has the same name, arguments, and functionality
as its Image Processing Toolbox counterpart. To use code generation with image processing
functions, follow these steps:

• Write your MATLAB function or application as you would normally, using functions from the Image
Processing Toolbox.
• Add the %#codegen compiler directive at the end of the function signature. This directive
instructs the MATLAB code analyzer to diagnose issues that would prohibit successful code
generation.
• Open the MATLAB Coder app, create a project, and add your file to the project. In the app, you
can check the readiness of your code for code generation. For example, your code may contain
functions that are not enabled for code generation. Make any modifications required for code
generation.
• Generate code by clicking Generate on the Generate Code page of the MATLAB Coder app. You
can choose to generate a MEX file, a shared library, a dynamic library, or an executable.

21-3
21 Code Generation for Image Processing Toolbox Functions

Even if you addressed all readiness issues identified by MATLAB Coder, you might still encounter
build issues. The readiness check only looks at function dependencies. When you try to generate
code, MATLAB Coder might discover coding patterns that are not supported for code generation.
View the error report and modify your MATLAB code until you get a successful build.

See Also
codegen | MATLAB Coder

Related Examples
• “Generate Code for Object Detection” on page 21-5

More About
• “Code Generation Workflow” (MATLAB Coder)
• “Generate C Code by Using the MATLAB Coder App” (MATLAB Coder)
• Functions Supporting Code Generation
• “MATLAB Coder”

21-4
Generate Code for Object Detection

Generate Code for Object Detection

This example shows how to generate C code using MATLAB® Coder™ from MATLAB applications
that use Image Processing Toolbox™ functions. The example describes how to setup your MATLAB
environment and prepare your MATLAB code for code generation.

This example also demonstrates how to solve issues that you might encounter in your MATLAB code
that prevent code generation. To illustrate the process, the code used by this example includes some
readiness issues and build issues that you must overcome before you can generate code.

Set Up Compiler

Specify which C/C++ compiler you want to use with MATLAB Coder to generate code by using the
mex function with the -setup option.

mex -setup

Define Entry-Point Function

The entry-point function is a MATLAB function used as the source code for code generation. First,
prototype the image processing workflow without support for code generation. This example defines
a function called detectCells.m that performs cell detection using segmentation and morphological
techniques. This function is attached to the example as a supporting file.

Test the example code with a sample image, cell.tif.

I = imread('cell.tif');
Iseg = detectCells(I);

21-5
21 Code Generation for Image Processing Toolbox Functions

21-6
Generate Code for Object Detection

Confirm the accuracy of the segmentation by overlaying the segmented image on the original image.

imshow(labeloverlay(I,Iseg))

21-7
21 Code Generation for Image Processing Toolbox Functions

Because you modify this code for code generation, it is good to work with a copy of the code. This
example includes a copy of the helper function detectCells.m named detectCellsCodeGen.m.
The version of the function used for code generation includes the MATLAB Coder compilation
directive %#codegen at the end of the function signature. This directive instructs the MATLAB code
analyzer to diagnose issues that would prohibit successful code generation.

Open the MATLAB Coder app by using the coder function. (Alternatively, in MATLAB, select the Apps
tab, navigate to Code Generation and click the MATLAB Coder app.)

coder

Specify the name of your entry-point function, detectCellsCodeGen, and press Enter.

21-8
Generate Code for Object Detection

Determine Code Readiness for Code Generation

Click Next. MATLAB Coder identifies any issues that might prevent code generation. The example
code contains five unsupported function calls.

21-9
21 Code Generation for Image Processing Toolbox Functions

Review the readiness issues. Click Review Issues. In the report, MATLAB Coder displays your code
in an editing window with the readiness issues listed below, flagging uses of the imshow function
which does not support code generation.

21-10
Generate Code for Object Detection

Correct Readiness Issues

Address the readiness issues. Remove the calls to imshow and related display code from your
example. The display statements are not necessary for the segmentation operation. You can edit the
example code directly in MATLAB Coder. When you have removed the code, click Save to save your
edits and rerun the readiness check. After rerunning the readiness check, MATLAB Coder displays
the No issues found message.

21-11
21 Code Generation for Image Processing Toolbox Functions

Define Size and Data Type of Function Inputs

Every input to your code must be specified to be of fixed size, variable size or a constant. There are
several ways to specify the size of your input argument but the easiest way is by giving MATLAB
Coder an example of calling your function. Enter a script that calls your function in the text entry
field. For this example, enter the following code in the MATLAB prompt and press Autodefine Input
Types.

I = imread('cell.tif');
Iseg = detectCellsCodeGen(I);

21-12
Generate Code for Object Detection

21-13
21 Code Generation for Image Processing Toolbox Functions

21-14
Generate Code for Object Detection

For more information about defining inputs, see the MATLAB Coder documentation. After MATLAB
Coder returns with the input type definition, click Next.

21-15
21 Code Generation for Image Processing Toolbox Functions

Check for and Resolve Run-Time Issues

Even though you performed MATLAB Coder readiness checks, additional issues might arise during
the build process that can prevent code generation. While the readiness checks look at function
dependencies to determine readiness, the build process examines coding patterns. You can use the
same code you entered to define input types (which is preloaded into the dialog box). Click Check for
Issues.

21-16
Generate Code for Object Detection

This example contains a build issue: it passes an array of strel objects to imdilate and arrays of
objects are not supported for code generation.

21-17
21 Code Generation for Image Processing Toolbox Functions

Address the build issues identified. For this example, modify the call to imdilate to avoid passing an
array of strel objects. Replace the single call to imdilate with two separate calls to imdilate
where you pass one strel object with each call.

21-18
Generate Code for Object Detection

Rerun the test build to make sure your changes fixed the issue. Click Check for Issues. MATLAB
Coder displays a message declaring that no issues were detected.

21-19
21 Code Generation for Image Processing Toolbox Functions

Generate Code

You are now ready to generate code. Click Next.

Choose the type of code you want to generate and select the target platform. MATLAB Coder can
generate C or C++ source code, a MEX file, a static library, a shared library, or a standalone
executable. For Production Hardware, you can select from many choices including ARM and Intel
processors.

This example uses the default options. The build type is Source Code and the language is C. For
device options, specify a generic device from a device vendor and a MATLAB Host Computer for
the device type. When you choose MATLAB Host Computer, MATLAB Coder generates code that
depends on a precompiled shared library. Image Processing Toolbox functions use a shared library to
preserve performance optimizations.

Click Generate.

21-20
Generate Code for Object Detection

MATLAB Coder displays the generated code.

Click Next to complete the process. MATLAB Coder displays information about what it generated. By
default, MATLAB Coder creates a codegen subfolder in your work folder that contains the generated
output.

See Also
codegen | MATLAB Coder

More About
• “Code Generation for Image Processing” on page 21-2
• “Code Generation Workflow” (MATLAB Coder)
• “Generate C Code by Using the MATLAB Coder App” (MATLAB Coder)
• “Input Type Specification for Code Generation” (MATLAB Coder)
• Functions Supporting Code Generation

21-21
21 Code Generation for Image Processing Toolbox Functions

Generate Code to Resize Images to Fixed Output Size

This example shows how to generate a MEX file containing C code that resizes images to a consistent
output size.

The imresize function offers two options for resizing 2-D images: rescaling by a scale factor, and
resizing to a target output size. To convert a MATLAB function to C or C++ code, you can use the
codegen function. This example shows how to generate a MEX file that resizes images in three
scenarios:
1 Resize a specific image file with a filename that does not vary.
2 Resize any image file. When resizing different image files, the filenames can vary and the images
can have different input sizes.
3 Resize a batch of specified input image files, which can vary and have different input sizes.

In all three scenarios, the example uses a constant scale factor or target size to ensure that the
output size is consistent.

Scenario 1: Generate Code to Resize Constant Input Image

In this scenario, you want to generate code that resizes a single image with a fixed filename. The
input image is a constant input to the codegen function because the value of the argument is always
the same. Specify the filename of the image.
imageFileName = "foggyroad.jpg";

Define Entry-Point Function

The MATLAB function that you want to convert is called the entry-point function. In this scenario, the
entry-point function is a custom function called imresizeConstant. The imresizeConstant
function reads a single image from a file and then resizes the image using the imresize function.
The function supports both rescaling by an isotropic scale factor and resizing both image dimensions
to a target size.

• To resize an image by an isotropic scale factor, specify the scaleOrSize input argument as a
scalar.
• To resize both dimensions to a target size, specify the scaleOrSize input argument as a two-
element vector of the form [numrows numcols].
type imresizeConstant.m

function out = imresizeConstant(fileName,scaleOrSize)


% Copyright 2022 The MathWorks, Inc.

% Reads the image using imread


I = imread(fileName);

% Resize the image using Scale or Size


out = imresize(I,scaleOrSize);
end

Option 1: Resize Using Constant Scale Factor

Specify the scale factor.

21-22
Generate Code to Resize Images to Fixed Output Size

scaleFactor = 0.5;

Generate the MEX file using the codegen function. You must define the data type of each input
argument of the entry-point function at compile time because the C and C++ coding languages use
static typing. For this scenario, you can further specify both input arguments as constant values using
the coder.Constant class. For more information, see “Specify Properties of Entry-Point Function
Inputs” (MATLAB Coder).

codegen -config:mex imresizeConstant.m -args {coder.Constant(imageFileName),coder.Constant(scaleF

Code generation successful: To view the report, open('codegen\mex\imresizeConstant\html\report.ml

To open the code generation report, click View report. In the Function pane, the orange font
color of both input arguments indicates that they are constant arguments to the entry-point function.
If you pause on the out variable, the Variable Info tooltip indicates that out has a fixed size, as
expected. The size is equal to one half of the input size. Because the input image does not change, the
output image size is consistent each time you run the MEX file.

Option 2: Resize Using Constant Target Size

Specify a constant target size.

targetSize = [256 256];

Generate the MEX file using the codegen function. Define the input argument types, and use the
coder.Constant class to specify both inputs as constant values. The compiled function resizes the
input image to a target size because targetSize is a vector.

codegen -config:mex imresizeConstant.m -args {coder.Constant(imageFileName),coder.Constant(target

Code generation successful: To view the report, open('codegen\mex\imresizeConstant\html\report.ml

Open the code generation report. In the Function pane, the orange font color indicates that both
inputs are constant. If you pause on the out variable, the Variable Info tooltip indicates that out has
a fixed size, equal to the target size specified by targetSize.

21-23
21 Code Generation for Image Processing Toolbox Functions

Scenario 2: Generate Code to Resize Variable Input Image

In this scenario, you want to generate code that accepts any input image filename for resizing. The
input image is a variable input to the codegen function because the value of the argument can
change.

To ensure that different input images always have the same output size, you must resize using a
constant target size. Rescaling by a scale factor does not guarantee that the output images will have
a fixed size, because input images with different sizes will yield different output sizes.

Define Entry-Point Function

For this scenario, define a new entry-point function imresizeVariable.m. This function ensures
that the resized image has three channels. If the input image is a grayscale image, the function
concatenates three copies of the image along the third dimension. If the input image is a
multispectral image or volume, the function preserves only the first three channels.
type imresizeVariable.m

function out = imresizeVariable(fileName,targetSize)


% Copyright 2022 The MathWorks, Inc.

% Read the image using imread


I = imread(fileName);

% Ensure the image has three channels by concatenating the channels of a


% grayscale image and taking the first three channels of a multispectral
% image
if (size(I,3)<3)
I = cat(3,I,I,I);
end
I = I(:,:,1:3);

% Resize the image to target size


out = imresize(I,targetSize);
end

21-24
Generate Code to Resize Images to Fixed Output Size

Resize Using Constant Target Size

Specify a constant target size to use as input to the entry-point function.

targetSize = [256 256];

Generate the MEX file using the codegen function. Specify a variable filename of data type string,
and a coder.Constant class for the target output size.

codegen -config:mex imresizeVariable.m -args {imageFileName,coder.Constant(targetSize)} -report

Code generation successful: To view the report, open('codegen\mex\imresizeVariable\html\report.ml

Open the code generation report. In the Function pane, the orange font color indicates that only
targetSize is constant, while fileName can vary. If you pause on the out variable, the Variable
Info tooltip indicates that all three dimensions of the output image are fixed, as desired.

Note that if you accidentally specify a scale factor instead of a target size, the Variable Info tooltip
displays a question mark for one or more of the dimensions. The question mark indicates that the size
of the output image is not fixed.

21-25
21 Code Generation for Image Processing Toolbox Functions

Scenario 3: Generate Code to Resize Multiple Input Images

In this scenario, you want to generate code that accepts multiple input image filenames for resizing.
The input image is a variable input to the codegen function because the value of the argument can
change.

To ensure that different input images always have the same output size, you must resize using a
constant target size. Specify the filenames of the image.
imageFileName1 = "foggyroad.jpg";
imageFileName2 = "foosball.jpg";
imFileNames = {imageFileName1,imageFileName2};

Define Entry-Point Function

Define an entry-point function named imresizeVariableBatch.m that reads a batch of input


images specified as a cell array, and resizes each image to a target size. The function returns the
resized images as a 4-D array, where the fourth dimension corresponds to the number of input
images.
type imresizeVariableBatch.m

function out = imresizeVariableBatch(imageFileNameCellArray,targetSize)


% Copyright 2022 The MathWorks, Inc.

numImages = coder.const(numel(imageFileNameCellArray));

% Allocate memory for output image


out = coder.nullcopy(zeros([targetSize,3,numImages],"uint8"));

% For each image in the input cell array, read and resize the image.
for i = coder.unroll(1:numImages)
I = imread(imageFileNameCellArray{i});

if (size(I,3)<3)
I = cat(3,I,I,I);
end
I = I(:,:,1:3);

out(:,:,:,i) = imresize(I,targetSize);
end
end

Resize Using Constant Target Size

Specify a constant target size to use as input to the entry-point function.


targetSize = [256 256];

Generate the MEX file using the codegen function. Specify the first input argument as a cell array
containing the filenames. The codegen function configures the MEX file to accept a cell array of two
strings, which can vary. Specify a coder.Constant class for the target output size.
codegen -config:mex imresizeVariableBatch.m -args {imFileNames,coder.Constant(targetSize)} -repor

Code generation successful: To view the report, open('codegen\mex\imresizeVariableBatch\html\repo

Open the code generation report. In the Function pane, the orange font color indicates that only
targetSize is constant, while imageFileNameCellArray can vary. If you pause on the out

21-26
Generate Code to Resize Images to Fixed Output Size

variable, the Variable Info tooltip indicates that all dimensions of the 4-D output image array are
fixed.

See Also
imresize | codegen | coder.Constant

More About
• “Specify Properties of Entry-Point Function Inputs” (MATLAB Coder)

21-27
22

GPU Computing with Image Processing


Toolbox Functions

• “Image Processing on a GPU” on page 22-2


• “Perform Thresholding and Morphological Operations on GPU” on page 22-3
• “Perform Pixel-Based Operations on GPU” on page 22-8
22 GPU Computing with Image Processing Toolbox Functions

Image Processing on a GPU


To take advantage of the performance benefits offered by a modern graphics processing unit (GPU),
certain Image Processing Toolbox functions have been enabled to perform image processing
operations on a GPU. This can provide GPU acceleration for complicated image processing workflows.
These techniques can be implemented exclusively or in combination to satisfy design requirements
and performance goals.

To run image processing code on a graphics processing unit (GPU), you must have the Parallel
Computing Toolbox software. To perform a supported image processing operation on a GPU, follow
these steps:

• Move the data from the CPU to the GPU. Use the gpuArray function to transfer an array from
MATLAB to the GPU. For more information, see “Create GPU Arrays from Existing Data” (Parallel
Computing Toolbox).
• Perform the image processing operation on the GPU. For a list of all the toolbox functions that
have been GPU-enabled, see Functions Supporting GPU Computing.
• Move the data back onto the CPU from the GPU. Use the gather function to retrieve an array
from the GPU and transfer the array to the MATLAB workspace as a regular MATLAB array.

If you call a function with GPU support using at least one gpuArray input argument, then the
function runs automatically on a GPU and generates a gpuArray as the result. You can mix inputs
using both gpuArray and MATLAB arrays in the same function call. In this case, the function
automatically transfers the MATLAB arrays to the GPU for execution.

When working with a GPU, note the following:

• Performance improvements can depend on the GPU device.


• There can be small differences in the results returned on a GPU from those returned on a CPU.

To learn about integrating custom CUDA kernels directly into MATLAB to accelerate complex
algorithms, see “Run CUDA or PTX Code on GPU” (Parallel Computing Toolbox).

See Also
gpuArray | gather

Related Examples
• “Perform Thresholding and Morphological Operations on GPU” on page 22-3
• “Perform Pixel-Based Operations on GPU” on page 22-8

More About
• “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox)
• Functions Supporting GPU Computing

22-2
Perform Thresholding and Morphological Operations on GPU

Perform Thresholding and Morphological Operations on GPU

This example shows how to perform image processing operations on a GPU. The example uses
filtering to highlight the watery areas in an aerial photograph.

Read and display an image.


imOriginal = imread("concordaerial.png");
imshow(imOriginal)

Move the image to the GPU by creating a gpuArray (Parallel Computing Toolbox) object.
imGPUoriginal = gpuArray(imOriginal);

As a preprocessing step, change the RGB image to a grayscale image. im2gray performs the
conversion operation on a GPU because the input argument is a gpuArray.
imGPUgray = im2gray(imGPUoriginal);

View the image in the Image Viewer app and inspect the pixel values to find the value of watery
areas. To use Image Viewer, you must bring the image data back onto the CPU by using the gather
(Parallel Computing Toolbox) function. As you move the mouse over the image, you can view the value
of the pixel under the cursor at the bottom of the Image Viewer. In the image, areas of water are dark
and have pixel values less than 70.

22-3
22 GPU Computing with Image Processing Toolbox Functions

imageViewer(gather(imGPUgray));

To get a new image that contains only the pixels with values less than 70, threshold the image on the
GPU.

imWaterGPU = imGPUgray<70;

Display the thresholded image. Unlike Image Viewer, the imshow function supports gpuArray input.

figure
imshow(imWaterGPU)

22-4
Perform Thresholding and Morphological Operations on GPU

Remove small objects from the image while preserving the shape and size of larger objects by using
morphological opening. The imopen function performs morphological opening and supports
gpuArray input.

imWaterMask = imopen(imWaterGPU,strel("disk",5));
imshow(imWaterMask)

22-5
22 GPU Computing with Image Processing Toolbox Functions

Create a copy of the original image that will contain the enhanced data. Convert the data type to
single.

imGPUenhanced = im2single(imGPUoriginal);

Get the blue channel from the original image.

blueChannelOriginal = imGPUenhanced(:,:,3);

Enhance the saturation of the blue channel by increasing the strength of the blue channel for pixels
where the mask is 1 (true).

blueChannelEnhanced = blueChannelOriginal + 0.2*single(imWaterMask);

The maximum value of the enhanced blue channel exceeds the maximum value expected of images of
data type single. Rescale the data to the expected range [0, 1] by using the rescale function.

blueChannelEnhanced = rescale(blueChannelEnhanced);

Replace the blue channel with the enhanced blue channel.

imGPUenhanced(:,:,3) = blueChannelEnhanced;

Display the enhanced image. Pixels corresponding to water have a more saturated blue color in the
enhanced image than in the original image.

22-6
Perform Thresholding and Morphological Operations on GPU

imshow(imGPUenhanced)
title("Enhanced Image")

After filtering the image on the GPU, move the data back to the CPU by using the gather function.
Write the modified image to a file.

outCPU = gather(imGPUenhanced);
imwrite(outCPU,"concordwater.png")

See Also
gpuArray | gather

Related Examples
• “Perform Pixel-Based Operations on GPU” on page 22-8

More About
• “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox)
• Functions Supporting GPU Computing

22-7
22 GPU Computing with Image Processing Toolbox Functions

Perform Pixel-Based Operations on GPU

This example shows how to perform pixel-based operations on a GPU by using functions that send
both the data and the operations to the GPU for processing. This method is most effective for
element-wise operations that require two or more data sets.

Read and display an image.

I = imread("strawberries.jpg");
imshow(I)

Move the data from the CPU to the GPU by creating a gpuArray (Parallel Computing Toolbox) object.

Igpu = gpuArray(I);

Perform an operation on the GPU. This example defines a custom function called rgb2gray_custom
that converts an RGB image to grayscale by using a custom weighting of the red, green, and blue
color channels. This function is defined at the end of the example. Pass the handle to the custom
function and data to the GPU for evaluation by the arrayfun (Parallel Computing Toolbox) function.

Igray_gpu = arrayfun(@rgb2gray_custom, ...


Igpu(:,:,1),Igpu(:,:,2),Igpu(:,:,3));

22-8
Perform Pixel-Based Operations on GPU

Move the data back to the CPU from the GPU by using the gather (Parallel Computing Toolbox)
function.

Igray = gather(Igray_gpu);

Display the result.

imshow(Igray)

Supporting Function

The rgb2gray_custom helper functions takes a linear combination of three channels and returns a
single channel output image.

function gray = rgb2gray_custom(r,g,b)


gray = 0.5*r + 0.25*g + 0.25*b;
end

See Also
gpuArray | gather | arrayfun

Related Examples
• “Perform Thresholding and Morphological Operations on GPU” on page 22-3

22-9
22 GPU Computing with Image Processing Toolbox Functions

More About
• “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox)
• Functions Supporting GPU Computing

22-10

You might also like