Fundamentals of Image Processing
Fundamentals of Image Processing
1.Image as a matrix
The simplest way to represent the image is in the form of a matrix. It is commonly seen
that people use up to a byte to represent every pixel of the image. This means that
values between 0 to 255 represent the intensity for each pixel in the image where 0 is
black and 255 is white. For every color channel in the image, one such matrix is
generated. In practice, it is also common to normalize the values between 0 and 1
2.Image as a function
It can be written as function f: ℝ² → ℝ that outputs the intensity at any input point
(x,y). The value of intensity can be between 0 to 255 or 0 to 1 if values are normalized.
Images can be transformed when they are looked upon as functions. A change in the
function can result in changes in the pixel values of the image. There are other ways too
by which we can perform image transformation.
2.1.1 Image Processing Operations
Essentially, there are three main operations that can be performed on an image.
Point Operations
Local Operations
Global Operations
Point Operation
In this, the output value depends only on the input value at that particular coordinate. A
very famous point operation example that one uses a lot while editing images is reversing
the contrast. In the most simple terms, it flips the dark pixels into light pixels and vice
versa. The point operation that helps us to achieve this is stated below.
Let’s say you clicked a still scene using a camera. But there can be noise in the image due
to many reasons like dust particles on the lens, damage in a sensor, and many
more. Noise reduction using point operations can be very tedious. One way is to take
multiple still scenes and average the value at every pixel and hope that the noise gets
removed. But at times, it is not possible to get multiple images of a scene and the
stillness of a scene can not be guaranteed every time. To do this, we need to move from
point operation to local operation.
Local Operation
In local operation, output value is dependent on the input value and its neighbours. A
simple example to understand local operation is the moving average. The above
operation is a local operation as the output is dependent on the input pixel and its
neighbours. Due to the operation, noise pixels in the image are smoothened out in the
output.
Global Operation
As the name suggests, in global operation, the value at the output pixel is dependent on
the entire input image. An example of the global operation is the Fourier transformation
STEPS IN DIP:
The fundamental steps in any typical Digital Image Processing pipeline are as follows:
1. Image Acquisition
The image is captured by a camera and digitized (if the camera output is not digitized
automatically) using an analogue-to-digital converter for further processing in a computer.
2. Image Enhancement
In this step, the acquired image is manipulated to meet the requirements of the specific
task for which the image will be used. Such techniques are primarily aimed at highlighting
the hidden or important details in an image, like contrast and brightness adjustment, etc.
Image enhancement is highly subjective in nature.
3. Image Restoration
This step deals with improving the appearance of an image and is an objective operation since
the degradation of an image can be attributed to a mathematical or probabilistic model. For
example, removing noise or blur from images.
This step aims at handling the processing of colored images (16-bit RGB or RGBA images),
for example, peforming color correction or color modeling in images.
6. Image Compression
7. Morphological Processing
Image components that are useful in the representation and description of shape need to be
extracted for further processing or downstream tasks. Morphological Processing provides
the tools (which are essentially mathematical operations) to accomplish this. For example,
erosion and dilation operations are used to sharpen and blur the edges of objects in an
image, respectively.
8. Image Segmentation
This step involves partitioning an image into different key parts to simplify and/or change
the representation of an image into something that is more meaningful and easier to analyze.
Image segmentation allows for computers to put attention on the more important parts of
the image, discarding the rest, which enables automated systems to have improved
performance.
Image segmentation procedures are generally followed by this step, where the task for
representation is to decide whether the segmented region should be depicted as a boundary
or a complete region. Description deals with extracting attributes that result in some
quantitative information of interest or are basic for differentiating one class of objects
from another.
After the objects are segmented from an image and the representation and description
phases are complete, the automated system needs to assign a label to the object—to let the
human users know what object has been detected, for example, “vehicle” or “person”, etc.
COMPONENTS:
The field of digital image processing is built on the foundation of mathematical and
probabilistic formulation, but human intuition and analysis play the main role to make
the selection between various techniques, and the choice or selection is basically
made on subjective, visual judgements.
In human visual perception, the eyes act as the sensor or camera, neurons act as the
connecting cable and the brain acts as the processor.
1. Structure of Eye
2. Image Formation in the Eye
3. Brightness Adaptation and Discrimination
The human eye is a slightly asymmetrical sphere with an average diameter of the
length of 20mm to 25mm. It has a volume of about 6.5cc. The eye is just like a
camera. The external object is seen as the camera take the picture of any object.
Light enters the eye through a small hole called the pupil, a black looking aperture
having the quality of contraction of eye when exposed to bright light and is focused
on the retina which is like a camera film.
The lens, iris, and cornea are nourished by clear fluid, know as anterior chamber. The
fluid flows from ciliary body to the pupil and is absorbed through the channels in the
angle of the anterior chamber. The delicate balance of aqueous production and
absorption controls pressure within the eye.
Cones in eye number between 6 to 7 million which are highly sensitive to colors.
Human visualizes the colored image in daylight due to these cones. The cone vision is
also called as photopic or bright-light vision.
Rods in the eye are much larger between 75 to 150 million and are distributed over
the retinal surface. Rods are not involved in the color vision and are sensitive to low
levels of illumination.
The distance between the lens and the retina is about 17mm and the focal length is
approximately 14mm to 17mm.
Neighbours of a pixel
A pixel p at (x,y) has 4-horizontal/vertical neighbours at (x+1,y), (x-1,y), (x,y+1) and (x,y-
1). These are called the 4-neighbours of p : N4(p).
A pixel p at (x,y) has 4 diagonal neighbours at (x+1,y+1), (x+1,y-1), (x-1,y+1) and (x-1,y-1).
These are called the diagonal-neighbours of p : ND(p).
For example, in the adjacency of pixels with a range of possible intensity values 0 to
255, set V could be any subset of these 256 values.
a) 4-adjacency: Two pixels p and q with values from V are 4-adjacent if q is in the set
N4(p).
b) 8-adjacency: Two pixels p and q with values from V are 8-adjacent if q is in the set
N8(p).
1. q is in N4(p), or
2. 2) q is in ND(p) and the set N4(p)∩N4(q) has no pixels whose values are from
V.
There are three types of connectivity on the basis of adjacency. They are:
COLOUR MODELS:
All color values R, G, and B have been normalized in the range [0, 1]. However, we can
represent each of R, G, and B from 0 to 255. Each RGB color image consists of three
component images, one for each primary color as shown in the figure below. These
three images are combined on the screen to produce a color image.
The total number of bits used to represent each pixel in RGB image is called pixel
depth. For example, in an RGB image if each of the red, green, and blue images is an 8-
bit image, the pixel depth of the RGB image is 24-bits. The figure below shows the
component images of an RGB image.
It’s true that the RGB model draws upon our familiarity with mixing primary colors to
create other colors, but in terms of actual perception, RGB is very unnatural. People
don’t look at a grapefruit and think about the proportions of red, green, and blue that
are hidden inside the somewhat dull, yellowish-orangish color of the rind or the shinier,
reddish flesh. Though you probably never realized it, you think about color more in
terms of hue, saturation, and intensity.
Hue is the color itself. When you look at something and try to assign a word
to the color that you see, you are identifying the hue. The concept of hue is
consistent with the way in which a particular wavelength of light
corresponds to a particular perceived color.
Saturation refers to the “density” of the hue within the light that is
reaching your eye. If you look at a wall that is more or less white but with a
vague hint of peach, the hue is still peach, but the saturation is very low. In
other words, the peach-colored light reaching your eye is thoroughly
diluted by white light. The color of an actual peach, on the other hand,
would have a high saturation value.
Intensity is essentially brightness. In a grayscale photograph, brighter
areas appear less gray (i.e., closer to white) and darker areas appear more
gray. A grayscale imaging system faithfully records the intensity of the
light, despite the fact that it ignores the colors. The HSI color model does
something similar in that it separates intensity from color (both hue and
saturation contribute to what we call color).
HSI is closely related to two other color models: HSL (hue, saturation, lightness) and
HSV (hue, saturation, value). The differences between these models are rather subtle;
the important thing at this point is to be aware that all three models are used and that
they all adopt the same general approach to quantifying color.
The following three images are screen captures from a graphics program called
Inkscape; they indicate the H, S, and L components of the three colors shown above (in
the RGB section).
Converting RGB to HSI color model
Converting between the color models requires computing values one pixel at a time. So
it may be computationally intensive if we try converting it too many times in a short
amount of time. This holds especially for images with larger dimensions.
To avoid dividing by 0, it’s a good practice to add a very small number in denominators
of conversion formulas.
The R, G and B variables presented in the formulas above are pixel color channel
components – red, green and blue.
Data products
The data from various sensors are presented in a form and format with specified
radiometric and geometric accuracy which can be readily used by various application
scientists for specific themes of their interest. Remote sensing data can be procured
by a number of users for various applications and information extraction, in the form of
a ‘data product’. This may be in the form of photographic output for visual processing
or in a digital format amenable for further computer processing.
There are varieties of remote sensing data which are acquired by different sensors
and satellites. Before reaching to users, the data undergo some processing steps.
Requirements of users may vary depending upon their interests and project objectives,
hence there are various remote sensing data providers/suppliers which prepare variety
of data products in different formats.
Remote sensing data products are generated in certain ‘data formats’ about which the
users must be aware of, for various practical reasons. Pre-processed remote sensing
data are generated into a number of products, like hardcopy prints on various types of
papers, digital data on various types of computer compatible media, like tapes, compact
discs (CDs), DVDs, and various other computer compatible storage devices. If the data
product is in hard copy print then it is impossible to carry out any further processing or
conversion before use. But if the product is in digital form, it may be possible to
convert the data into a processed digital image. It may be further required to carry
out certain processing before any image analysis operation is performed. Types of data
products may vary from country to country and/or from one data provider to another.
All remote sensing data products carry a specific index number. This index number is
generated using the satellite path which runs from North Pole to South Pole of Earth.
This pole to pole coverage on the Earth for each pass of the satellite is given a specific
number, called path number or track number.
These representations of numbers form the B & W or color images when they are
displayed on a screen or output on a hard copy. Thus, the image has to be retained in its
digital form in order to carry computer processing/ classification. The digital output is
supplied on a suitable computer compatible storage media, such as DVDs, CD-ROMs,
DAT, etc., depending on user requests. The data may be arranged in band sequential
(BSQ), band interleaved by line (BIL) or band interleaved pixel (BIP) formats. Similarly,
the concept of image data format comes in, with the question of how to arrange these
pixels to achieve optimum level of desired processing and display.
Data storage sequence in BIP format is shown in Fig. 6.4, for an image of size 3×3 (i.e.
3 rows and 3 columns) having three bands. Band, row and column (pixel) are generally
represented as B, R and P, respectively. B1, R1 and P1, respectively represent band 1,
row 1 and column (pixel)1. In this format, first pixel of row 1 of band 1 is stored first
then the first pixel of row 1 of band 2 and then the first pixel of row 1 of band 3.
These are followed by the second pixel of row 1 of band 1, and then second pixel of row
1 of band 2 and then second pixel of row 1 of band 3 and likewise.
Band Interleaved by Line (BIL)
Data storage sequence in BIL format is shown here in Fig. 6.5 for a three band image
of size 3x3 (i.e. 3 rows and 3 columns). B and R represent band and row. B1 and R1
represent band 1 and row 1. In this format, all the pixels of row 1 of band 1 are stored
in sequence first, then all the pixels of row 1 of band 2 and then the pixels of row 1 of
band 3. These are followed by the all the pixels of row 2 of band 1, and then all the
pixels of row 1 of band 2 and then all the pixels of row 1 of band 3 and likewise. You
should note that both the BIP and BIL format store data/pixels in a line (row) at a
time.
Band Sequential (BSQ)
BSQ format stores each band of data as a separate file. Arrangement sequence of
data in each file is shown in Fig. 6.6 for a three band image of size 3×3 (i.e. 3 rows and
3 columns). B and R, respectively represent band and row. B1 and R1 represent band 1
and row 1, respectively. In this format, all the pixels of band 1 are stored in sequence
first, followed by all the pixels of band 2 and then the pixels of band 3.
Keeping to these three basic data formats, a number of other formats, like tiff,
geotiff, png, adrg, super structured, jfif, jpeg, etc., are developed by different
organisations. Most of the image processing software for remote sensing data
processing support these file formats. You can see the list of data file formats
supported by the image processing software in the software’s documentation manual. If
a software does not have a certain data format in its list then that file cannot be
opened and used in that particular software.
The central processing unit (CPU) is the computing part of the computer. It consists
of a control unit and an arithmetic logic unit. The CPU performs: numerical integer
~nd/or floating point calculations, and directs input and output from and to mass
storage devices, color monitors, digitizers, plotters, etc. The CPU's efficiency is often
measured in terms of how many millions of instructions per second (MIPS) it can
process, e.g., 500 MIPS. It is also customary to describe CPU in terms of the number
of cycles it can process in 1 second measured in megahertz, e.g.;--1000 Mhz(1 GHz).-
Manufacturers market computers with CPUs faster than 4 GHz, and this speed will
continue to increase. The system bus connects the CPU with the main memory, managing
data transfer and instructions between the two. Therefore, another important
consideration when purchasing a computer is bus speed.
Personal computers (with 16- to 64-bit CPUs) are the workhorses of digital image
processing and GIS analysis. Personal computers are based on microprocessor
technology where the entire CPU is placed on a single chip. The most common operating
systems for personal computers are various Microsoft Windows operating systems and
the Macintosh operating system. Personal computers useful for digital image processing
seem to always cost approximately $2,500 with 2 GB of random access memory (RAM),
a high-resolution color monitor (e.g., capable of displaying : 1024 x 768 pixels), a
reasonably sized hard disk (e.g., >300 Gb), and a rewriteable disk (e.g., CD-RW or
DVDRW).
Mainframe computers (with 2:64-bit CPU) perform calculations more rapidly than PCs
or workstations and are able to support hundreds of users simultaneously, especially
parallel mainframe computers such as a CRAY. This makes mainframes ideal for
intensive, CPU-dependent tasks such as image registration/rectification, mosaicking
multiple scenes, spatial frequency filtering, terrain rendering, classification,
hyperspectral image analysis, and complex spatial GIS modelling. If desired, the output
from intensive mainframe ·processing can be passed to a workstation or personal
computer for subsequent-less-intensive or inexpensive processing.
Read-Only Memory, Random Access Memory, Serial and Parallel Processing, and
Arithmetic Coprocessor
Computers have banks of memory that contain instructions . that are indispensable to
the successful functioning of the computer. A computer may contain a single CPU or
multiple CPUs and process data serially (sequentially) or in parallel. Most CPUs now have
special-purpose math coprocessors.
Read-only memory (ROM) retains information even after the computer is shut down
because power is supplied from a battery that must be replaced occasionally. For
example, the date and time are stored in ROM after the computer is turned off. When
restarted, the computer looks in the date and time ROM registers and displays the
correct information. Most computers have sufficient ROM for digital image processing
applications; therefore, it is not a serious consideration.
Computers should have sufficient RAM for the operating system, image processing
applications software, and any remote sensor data that must be held in temporary
memory while calculations are performed. Computers with 64-bit CPUs can address
more RAM than 32-bit machines (see Table 3-1). RAM is broken down into two types:
dynamic RAM (DRAM) and static RAM (SRAM). The data stored in DRAM is updated
thousands of times per second; SRAM does not need to be refreshed. SRAM is faster
but is also more expensive. It seems that one can never have too much RAM for image
processing applications. RAM prices continue to decline while RAM speed continues to
increase
Photoshop is very useful for processing photographs and images that have three or
fewer bands of data.
The computer operating system and compiler(s) must be easy to use yet powerful
enough so that analysts can program their own relatively sophisticated algorithms and
experiment with them on the system. It is not wise to configure an image processing
system around an unusual operating system or compiler because it becomes difficult to
communicate with the peripheral devices and share applications with other scientists.
Operating System:
The operating system is the first program loaded into memory (RAM) when the
computer is turned on. It controls all of the computer's higher-order functions. The
operating system kernel resides in memory at all times . The operating system provides
the user interface and controls multitasking. It handles the input and output to the
hard disk and all peripheral devices such as compact disks, scanners, printers, plotters,
and color displays. All digital image processing application programs must communicate
with the operating system. The operating system sets the protocols for the application
programs that are executed by it. The difference between a single-user operating
system and a network operating system is the latter's multi-user capability. For
example, Microsoft Windows XP (home edition) and the Macintosh OS are single-user
operating systems designed for one person at a desktop computer working
independently. Various Microsoft Windows, UNIX, and Linux network operating
systems are designed to manage multiple user requests at the same time and complex
network security.
Compiler:
It is often useful for remote sensing analysts to program in one of the high-level
languages just listed. Very seldom will single digital image processing software system
perform ail' of the-functions needed for a given project. Therefore, the ability to
modify existing software or integrate newly developed algorithms with the existing
software is important.
Digital remote sensor data (and other ancillary raster GIS data)-are often stored in a
matrix band sequential (BSQ) format in which each spectral band of imagery (or GIS
data) is stored as an individual file. Each picture element of each band is typically
represented in the computer by a single 8- bit byte with values from 0 to 255. The
best way to make brightness values rapidly available to the computer is to place the
data on a hard disk, CD-ROM, DVD, or DVD RAM where each pixel of the data matrix
may be accessed at random (not serially) and at great speed (e.g., within microseconds).
The cost of hard disk, CD-ROM, or DVD storage per gigabyte continues to decline .
Companies are now developing new mass storage technologies based on atomic
resolution storage (ARS), which holds the promise of storage densities of close to 1
terabit per square inch - the equivalent of nearly 50 DVDs on something the size of a
credit card. The technology uses microscopic probes less than one-thousandth the
width of a human hair. -When the probes are brought near a conducting mate rial,
electrons write data on the surface. The same probes can detect and retrieve data and
can be used to write over old data.
Storing remote sensor data is no trivial matter. Significant sums of money are spent
purchasing remote sensor data by commercial companies, natural resource agencies, and
universities. Unfortunately, most of the time not enough attention is given to how the
expensive data are stored or archived to protect the long-term investment. Figure 3-5
depicts ·several types of analog and digital remote sensor data mass storage devices
and the average time to physical obsolescence, that is, when the media begin to
deteriorate and information is lost. Interestingly, properly exposed, washed, and fixed
analog black and white aerial photog1-uph/ negatives have considerable longevity )',
often more that1 l-00 years. Color negatives with their respective dye layers have
longevity, but not as much as the black-and-white negatives. Similarly, black-and-white
paper prints have greater longevity than color prints (Kodak, 1995). Hard and floppy
magnetic disks have relatively short longevity, often less than 20 years. Magnetic tape
media (e.g., 3/4-in. tape, 8-mm tape, and 1/2- in. tape, shown in Figure 3-6) can become
unreadable within 10 to 15 years if not rewound and properly stored in a cool, dry
environment.
Optical disks can now be written to, read, and -written over again at relatively high
speeds and can store much more data than other portable media such as floppy disks.
The technology used in rewriteable optical systems is magneto-optics, where data is
recorded magnetically like disks and tapes, but the bits are much smaller because a
laser is used to etch the bit. The laser beets the bit to 150 °C, at which temperature
the bit is realigned when subjected to a magnetic field. To record new data, existing
bits must first be set to zero.
Only the optical disk provides relatively long-term storage potential (> 100 years). In
addition, optical disks store large volumes of data on relatively small media. Advances in
optical compact disc (CD) technology promise to increase the storage capacity to > 17
Gb using new rewriteable digital video disc (DVD) technology. In most remote sensing
laboratories, rewritable CD-RWs or DVD-RWs have supplanted tapes as the backup
system of choice. DVD drives are back' wards compatible and can read data from CDs.
The display of remote sensor data on a computer screen is one of the most
fundamental elements of digital image analysis (Brown and Feringa, 2003). Careful
selection of the computer display characteristics will provide the optimum visual image
analysis environment for the human interpreter. The two most important
characteristics are computer display spatial and color resolution.
The image processing system should be able to display at least 1024 rows by 1024
columns on the computer screen at. one time. This allows larger geographic areas to be
examined and places the terrain of interest in its regional context. J Most Earth
scientists prefer this regional perspective when performing terrain analysis using
remote sensor data. Furthermore, it is disconcerting to have to analyze four 512 X 512
images when a single 1024 x 1024 display provides the information at a glance. An ideal
screen display resolution is 1600 x 1200 pixels.
The computer screen color resolution is the number of gray scale tones or colors (e.g.,
256) that can be displayed on a CRT monitor.at one time out of a palette of available
colors (e.g., 16.7 million). For many applications, such as highcontrast black-and-white
linework cartography, only 1 bit of color is required [i.e., either the line is black or
white (0 or I)]. For more sophisticated computer graphics for which many shades of
gray or color combinations are required, up to 8 bits (or 256 colors) may be required.
Most thematic mapping and GIS applications may be performed quite well by systems
that display just 64 user-selectable colors out of ' a palette of 256 colors.
Conversely, the analysis and display of remote sensor image data may require much
higher CRT screen color-resolution than cartographic and GJS applications (Slocum,
1999). For example, most relatively sophisticated digital image processing systems can
display a tremendous number of unique colors (e.g., 16.7 million) from a large color
palette (e.g., 16.7 million). The primary reason for these color requirements is that
image analysts must often display a composite of several images at one time on a CRT.
This process _is called color compositing.
Generally, 4096 carefully selected colors out of a very large palette (e.g., 16.7 million)
appears to be the minimum acceptable for the creation of remote sensing color
composites. This provides 12-bits of color, with 4 bits available for each of the blue,
green, and red image planes (Table 3-3). For image pr-0cessing applications other than
compositing (e.g., black-and-white image display, color density slicing, pattern
recognition classification), the 4,096 available colors and large color palette are more
than adequate. However, the larger the palette and the greater the number of
displayable colors at one time, the better the representation of the remote sensor
data on the CRT screen for visual analysis. More information about how images are
displayed using an image processor is in Chapter _5. The network configured in Figure
3-2 has six 24-bit color workstations.
Several remote sensing systems now collect data with 10-, 11-, and even 12-bit
radiometric resolution with brightness values ranging from 0 to 1023, 0 to 2047, and 0
to 4095, respectively. Unfortunately, despite advances in video technology, at the
present time it is necessary to generalize (i.e., dumb down) the radiometric precision of
the remote sensor data to 8 bits per pixel simply because current video display
technology cannot handle the demands of the increased precision.