0% found this document useful (0 votes)
61 views64 pages

MIP Unit 5

MEDICAL IMAGE PROCESSING REGULATION 2021

Uploaded by

suhagaja
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
61 views64 pages

MIP Unit 5

MEDICAL IMAGE PROCESSING REGULATION 2021

Uploaded by

suhagaja
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 64

UNIT-5

APPLICATIONS OF MEDICAL IMAGE ANALYSIS


Medical Image Compression

Image compression address the problem of reducing the amount of data


required to represent a digital image with no significant loss of information. Interest
in image compression dates back more than 25 years. The field is now poised
significant growth through the practical application of the theoretic work that began
in 1940s, when C.E. Shannon and others first formulated the probabilistic view of
information and its representation, transmission and compression.

Imagestakealot ofstorage space:


- 1024x1024x32xbits imagesrequires4 MB
- suppose you have some video that is 640x 480 x 24 bitsx 30 frames per second ,
1 minute of video would require 1.54 GB
Manybytes takealongtimeto transferslowconnections–suppose wehave56,000 bps
- 4MBwill takealmost10 minutes, -1.54 GBwill takealmost66 hours
Storageproblems,plusthedesiretoexchangeimagesovertheInternet,haveledtoa
largeinterestinimagecompression algorithms.

Definition: Image compression refers to the process of redundancy amount of data


required to represent the givenquantity of information for digital image. The basis of
reduction process is removal of redundant data.

5.1 Data compression requires the identification and extraction of source


redundancy. In other words,datacompression seeks to reduce the number of bits
used to store or transmit information.

NeedforCompression:
In terms of storage, the capacity of a storage device can be effectively
increased with methodsthat compress a bodyof data on its wayto a storage device
and decompress it when it is retrieved.
In terms of communications, the bandwidth of a digital communication link
can be effectively increased by compressing data at the sending end and
decompressing data at the receiving end.
At any given time, the ability of the Internet to transfer data is fixed. Thus, if data
can effectively be compressed wherever possible, significant improvements of data
throughput can be achieved. Many files can be combined into one compressed
document making sending easier.
10
5.2 DATA REDUNDANCY: Data are the means by which information is
conveyed. Various amounts of data can be used to convey the same amount of
information. Example: Four different representation of the same information
(number five)
1) A picture(1001, 632 bits);
2) A word“five”spelled inEnglishusingthe ASCIIcharacterset(32 bits);
3) A singleASCIIdigit (8bits);
4) Binaryinteger(3bits)

Compression algorithms remove redundancy


If more data are used than is strictly necessary, then we say that there is
redundancy in the dataset.

Datacompressionisdefinedastheprocessofencodingdatausingarepresentationthat
reducestheoverallsizeof data.Thisreductionispossiblewhentheoriginaldataset
containssometypeofredundancy.Digitalimage
compression is a field that studies methods for reducing the total number of bits
required to represent an image. This can be achieved by eliminating various types
of redundancy that exist in the pixel values. In general, three basic redundancies
exist in digital images that follow

10
REDUNDANCYINDIGITAL IMAGES
–Codingredundancy-usuallyappearasresultsoftheuniformrepresentation ofeachpixel
–Spatial/Temporal redundancy-because the adjacent pixels tend to have
similarity inpractical.
– Irrelevant Information-Image contain information which are ignored by the
human visual system
5.2.1 CodingRedundancy:

Our quantized data is represented using code words. The code words are ordered
in the same wayas the intensitiesthat theyrepresent; thus thebit pattern00000000,
correspondingto thevalue 0, represents the darkest points in an image and the bit
pattern 11111111, corresponding to the value 255, represents the brightest points.

An 8-bit coding scheme has the capacity to represent 256 distinct levels of intensity
in an image. But if there are only 16 different grey levels in an image, the image
exhibits coding redundancy becauseitcouldberepresented usinga4-bit
codingscheme.Codingredundancycanalsoarisedue to the use of fixed-length code
words.

Grey level histogram of an image also can provide a great deal of insight into the
construction of codes to reduce the amount of data used to represent it.

Let us assume, that a discrete random variable rkin the interval (0,1) represents the
grey levels of an image and that each rk occurs with probabilityPr(rk).
Probabilitycan be estimated from the histogram of an image using

Pr(rk) =hk/nfork=0,1……L-1 (3)


Where L is the number of grey levels and hkis the frequency of occurrence of grey
levelk (the number of times that thekth grey levelappears in the image) andn is the
total number of the pixelsintheimage.Ifthenumberofthe bits used to represent each
value of rkis l (rk), the average number of bits required to represent each pixel is:

10
Example:

5.2.2 Interpixel Redundancy:

Consider the images shown in Figs. 1.1(a) and (b). As Figs. 1.1(c) and (d) show,
these images have virtually identical histograms. Note also that both histograms are
trimodal, indicating the presence of three dominant ranges of gray-level values.
Because the gray levels in these images arenot equallyprobable, variable-length
codingcan beused to reducethe codingredundancythat would result from a straight
or natural binary encoding of their pixels. The coding process, however, would not
alter the level of correlation between the pixels within the images. In other words,

the codes used to represent the gray levels of each image have nothing to do with
the
correlationbetweenpixels.Thesecorrelationsresultfromthestructuralorgeometricrelati
onships between the objects in the image.
10
Fig.1.1Twoimagesandtheirgray-
levelhistogramsandnormalizedautocorrelation coefficients along one line.
Figures 1.1(e) and (f) show the respective autocorrelation coefficients computed
along one line of each image.

The scaling factor in Eq. above accounts for the varying number of sum terms that
arise for each integervalue of n.Of course, n must be strictly less than N, the number
of pixels on a line. The variable x is the coordinate of the line used in the
computation. Note the dramatic difference between theshapeofthefunctions shownin
Figs. 1.1(e)and(f). Their shapes can be qualitatively related to the structure in the
images in Figs. 1.1(a) and (b).This relationshipis particularly noticeable in Fig. 1.1
(f), where the high correlation between pixels separated by 45 and 90 samples can
be directly related to the spacing between the vertically oriented matches of Fig.
1.1(b). In addition, the adjacent pixels of both images are highly correlated. When n
is 1, γ is 0.9922 and 0.9928 for the images of Figs.
1.1(a)and(b),respectively.Thesevaluesaretypical of mostproperly sampled television
images.
These illustrations reflect another important form of data redundancy—one directly
related to the interpixel correlations within an image. Because the value of any
given pixel can be reasonably predicted from the value ofits neighbors, the
information carried byindividual pixels is relatively small. Much of the visual
contribution of a single pixel to an i ma g e1 0i s redundant; it could have been guessed
on the basis of the values of its neighbors. A variety of names, including spatial
redundancy, geometric redundancy, and interframe redundancy, have been coined
to refer tothese interpixel dependencies. We use the term interpixel redundancy to
encompass them all.
In order to reduce the interpixel redundancies in an image, the 2-D pixel array
normally used for human viewing and interpretation must be transformed into a
more efficient (but usually "nonvisual") format. For example, the differences
between adjacent pixels can be used to represent an image. Transformations of this
type (that is, those that remove interpixelredundancy) are referred to as mappings.
They are called reversible mappings if the original image elements can be
reconstructed from the transformed data set.
5.2.3 PsychovisualRedundancy:

Thebrightness of a region, as perceived bythe eye, depends on factors other than


simplythe light reflected by the region. For example, intensity variations (Mach
bands) can be perceived in an areaofconstantintensity.Suchphenomenaresult
fromthefactthattheeyedoesnotrespondwith equal sensitivity to all visual
information. Certaininformationsimply haslessrelative importance than other
information in normal visual processing. This information is said to be
psychovisually redundant. It can be eliminated without significantly impairing the
quality of image perception.
Thatpsychovisualredundanciesexist
shouldnotcomeasasurprise,becausehumanperceptionof the information in an image
normally does not involve quantitative analysis of every pixel value in theimage. In
general, an observersearches for distinguishingfeatures such as edges ortextural
regions and mentally combines them into recognizable groupings. The brain then
correlates these groupings with prior knowledge in order to complete the image
interpretation process. Psychovisual redundancy is fundamentally different from the
redundancies discussedearlier. Unlike coding and interpixel redundancy,
psychovisual redundancy is associated with real or quantifiable visual information.
Its elimination is possible only because the information itself is not essential for
normal visual processing. Sincetheelimination ofpsychovisuallyredundant data
results in a loss of quantitative information, it iscommonlyreferred to as
quantization.

10
This terminology is consistent with normal usage of the word, which generally
means the mapping of a broad range of input values to a limited number of output
values. As it is an irreversible operation (visual information is lost), quantization
results in lossydata compression.

5.3 IMAGECOMPRESSION MODELS

Figure shows, a compression system consists of two distinct structural blocks: an


encoder and a decoder. An input image f(x, y) is fed into the encoder, which creates
a set of symbols from the inputdata. After transmission over the channel, the
encoded representation is fed to the decoder, where a reconstructed output image
f^(x, y) isgenerated. In general, f^(x, y) may or may not be an exact replica of f(x,
y). If it is, the system is error free or information preserving; if not, some level of
distortion is present in the reconstructed image. Both the encoder and decoder
shown in Fig. 3.1consistof tworelativelyindependentfunctionsorsubblocks.The
encoderismadeupofa source encoder, which removes input redundancies, and a
channel encoder, which increases the noise immunityof the source encoder's output.
As would be expected, the decoder includes a channel decoder followed by a
sourcedecoder. If the channel between the encoder and decoder is noise free (not
prone to error), the channel encoder and decoder are omitted, and the general
encoder and decoder become the source encoder and decoder, respectively.

10
 SourceEncoder
Reduces/eliminatesanycoding,interpixelorpsychovisualredundancies.TheSourc
eEncoder contains 3processes:
• Mapper
Transforms the image into array of coefficients reducing interpixel
redundancies. This isareversible process whichisnotlossy.Mayor
maynotreduce directlythe amountofdata required to represent the image.
• Quantizer: This process reduces the accuracy and hence psychovisual
redundancies of a given image. This process isirreversible and therefore
lossy. It must be omitted when error-free compression is desired.
• SymbolEncoder
This is the source encoding process where fixed or variable-length code is
used to represent mapped and quantized data sets. This is a reversible process
(not lossy). Removescodingredundancybyassigningshortest
codesforthemostfrequentlyoccurring output values.
 SourceDecodercontains two components.
• SymbolDecoder:Thisistheinverseofthesymbolencoderandreverseofthe
variable-length coding isapplied.
• InverseMapper:Inverseoftheremovaloftheinterpixelredundancy.
• Theonly lossy
elementistheQuantizerwhichremovesthepsychovisualredundancies causing
irreversibleloss. EveryLossyCompression methods contain the Quantizer
module.
• Iferror-freecompression isdesiredthequantizermoduleisremoved.
TheChannelEncoderand Decoder:
The channel encoder and decoder play an important role in the overall encoding-
decodingprocesswhenthechannel is noisy or prone to error. They are designed to
reduce the impact of channel noise by insertinga controlled form of redundancy into
the source encoded data. As the output of the source encoder contains little
redundancy, it would be highly sensitive to transmission noise without the addition
ofthis"controlled redundancy." One of the most useful channel encoding techniques
was devised by R. W. Hamming (Hamming [1950]). It is based on appending
enough bits to the data being encoded to ensure that some minimum number of bits
must change between valid code words. Hamming showed, for example, that if 3
bits of redundancyare added to a 4-bit word, so that the distance between anytwo
valid code words is 3, all single-bit errors can be detected and corrected. (By
appending additional bits of redundancy, multiple-bit errors can be detected and
corrected.) The 7- bit Hamming (7, 4) code word h1, h2, h3…., h6, h7 associated
10
with a 4-bit binary number b3b2b1b0 is

Where denotes the exclusive OR operation. Note that bits h1, h2, and h4 are
even- parity bits for the bit fields b3 b2 b0, b3b1b0, and b2b1b0, respectively.
(Recall that a string of binary bits has even parityifthenumber ofbitswith a
valueof1is even.)TodecodeaHammingencoded result, thechannel decodermust
check the encoded value for odd parityoverthebit fields in which even
paritywas previouslyestablished. Asingle-bit erroris indicatedbya nonzero
parityword c4c2c1, where
Ifanonzerovalueisfound,thedecodersimply
complementsthecodewordbitposition indicatedby
theparityword.Thedecodedbinaryvalueisthenextractedfromthecorrectedcode
word as h3h5h6h7.
DCT-Based Compression

The Discrete Cosine Transform (DCT) is a popular transform used by the


JPEG (Joint Photographic Experts Group) image compression standard for
lossy compression of images. Since it is used so frequently, DCT is often
referred to in the literature as JPEG-DCT. DCT used in JPEG coding method
comprising four steps. The source image is first partitioned into sub-blocks of
size 8x8 pixels in dimension. Then each block is transformed from spatial
domain to frequency domain using a 2-D DCT basis function. The resulting
frequency coefficients are quantized and finally output to a lossless entropy
coder. DCT is an efficient image compression method since it can decorrelate
pixels in the image (since the cosine basis is orthogonal) and compact most
image energy to a few transformed coefficients. Moreover, DCT coefficients
10
can be lossily quantized according to some human visual characteristics.
Therefore, the JPEG image file format is very efficient. This makes it very
popular, especially in the World Wide Web. However, JPEG may be replaced
by wavelet-based image compression algorithms, which have better
compression performance.
JPEG
JPEG is an image compression standard that was developed
by the “Joint Photographic Experts Group”. JPEG was formally accepted as an
international standard in 1992.It is the International standard compression for
still pictures.The sliding scale is the quality. If compression increases quality
will be reduced.The cones are color sensitive and rods are gray level sensitive.
We have 100 illion rods and 6 million cones in human eye.
R+G+B = Y(Luminance) = 0.299R+0.587G + 0.114B
R+G+B = Cb(Blue Chrominance) = - 0.168R- 0.331G + 0.5B+128
R+G+B = Cr(Red Chrominance) = 0.5R- 0.418G - 0.813B+128

11
• JPEG is a lossy image compression method. It employs a transform coding
method using the DCT (Discrete Cosine Transform).
• An image is a function of i and j (or conventionally x and y) in the spatial
domain.
The 2D DCT is used as one step in JPEG in order to yield a
frequency response which is a function F(u, v) in the spatial
frequency domain, indexed by two integers u and v

11
Main Steps in JPEG Image Compression
• Transform RGB to YIQ or YUV and subsample color.
• DCT on image blocks.
• Quantization.
• Zig-zag ordering and run-length encoding.
• Entropy coding
Each image is divided into 8 × 8 blocks. The 2D DCT is applied to each block
image f(i, j), with output being the DCT coefficients F(u, v) for each block.
F(u, v) represents a DCT coefficient, Q(u, v) is a “quantization matrix” entry,
and Fˆ(u, v) represents the quantized DCTcoefficients which JPEG will use in
the succeeding entropy
Coding.
Run-length Coding (RLC) on AC coefficients
RLC aims to turn the Fˆ(u, v) values into sets {#-zeros-toskip , next non-zero
value}.
• To make it most likely to hit a long run of zeros: a zig-zag
scan is used to turn the 8×8 matrix Fˆ(u, v) into a 64-vector

The DC coefficients are coded separately from the AC ones.Differential Pulse


Code Modulation (DPCM) is the codingmethod.
The DC and AC coefficients finally undergo an entropy coding step to gain a
11
possible further compression.
• Use DC as an example: each DPCM coded DC coefficient is represented by
(SIZE, AMPLITUDE), where SIZE indicates how many bits are needed for
representing the coefficient, and AMPLITUDE contains the actual bits.
Four Commonly Used JPEG Modes
• Sequential Mode — the default JPEG mode, implicitly assumed in the
discussions so far. Each graylevel image or
color image component is encoded in a single left-to-right,
top-to-bottom scan.
• Progressive Mode. - higher AC components provide detail information.
• Hierarchical Mode. - Hierarchical JPEG imagescan be transmitted in multiple
passes progressively improving quality.
• Lossless Mode –

JPEG2000 Standard
To provide a better rate-distortion tradeoff and improved subjective image
quality.
– To provide additional functionalities lacking in the current JPEG standard
In addition, JPEG2000 is able to handle up to 256 channels
of information whereas the current JPEG standard is only able to handle three
color channels.

11
MPEG (Motion Picture Expert Group)
MPEG Video Standard. MPEG (Motion Picture Expert Group) was set up in
1988 to develop a set of standard algorithms for applications that require
storage of video and audio on digital storage media. The basic structure of
compression algorithm proposed by MPEG is simple. An input image is
divided into blocks of 8 X 8 pixels. For a given 8 X 8 block, we subtract the
prediction generated using the previous frame. The difference between the
block being encoded and the prediction is transformed using a DCT. The
transform coefficients are quantized and transmitted to the receiver.
MPEG (Motion Pictures Experts Group) is a family of standards for audio and
video compression and transmission. It is developed and maintained by the
Motion Pictures Experts Group, a working group of the International
Organization for Standardization (ISO) and the International Electrotechnical
Commission (IEC).
There are several different types of MPEG standards, including −
MPEG-1 −This standard is primarily used for audio and video compression
for CD-ROMs and low-quality video on the internet.
MPEG-2 −This standard is used for digital television and DVD video, as well
as high-definition television (HDTV).
MPEG-4 −This standard is used for a wide range of applications, including
video on the internet, mobile devices, and interactive media.
MPEG-7 −This standard is used for the description and indexing of audio and
video content.
MPEG-21 − This standard is used for the delivery and distribution of
multimedia content over the internet.
MPEG uses a lossy form of compression, which means that some data is lost
when the audio or video is compressed. The degree of compression can be
adjusted, with higher levels of compression resulting in smaller file sizes but
lower quality, and lower levels of compression resulting in larger file sizes but
higher quality.
Advantage of MPEG
There are several advantages to using MPEG −
High compression efficiency − MPEG is a highly efficient compression
standard and can significantly reduce the file size of audio and video files
11
while maintaining good quality.Widely supported − MPEG is a widely used
and well-established audio and video format, and it is supported by a wide
range of media players, video editors, and other software.
Good quality − While MPEG uses lossy compression, it can still produce
good quality audio and video at moderate to high compression levels.
Flexible − The degree of compression used in an MPEG file can be adjusted,
allowing you to choose the balance between file size and quality.
Versatile − MPEG can be used with a wide range of audio and video types,
including music, movies, television shows, and other types of multimedia
content.
Streamable − MPEG files can be streamed over the internet, making it easy to
deliver audio and video content to a wide audience.
Scalable − MPEG supports scalable coding, which allows a single encoded
video to be adapted to different resolutions and bitrates. This makes it well-
suited for use in applications such as video-on-demand and live streaming.
Disadvantage of MPEG
There are also some disadvantages to using MPEG −
Lossy compression − Because MPEG uses lossy compression, some data is
lost when the audio or video is compressed. This can result in some loss of
quality, particularly at higher levels of compression.
Limited color depth − Some versions of MPEG have a limited color depth
and can only support 8 bits per channel. This can result in visible banding or
other artifacts in videos with high color gradations or smooth color transitions.
Non-ideal for text and graphics − MPEG is not well suited for video with
sharp transitions, high-contrast text, or graphics with hard edges. These types
of video can appear pixelated or jagged when saved as MPEG.
Complexity − The MPEG standards are complex and require specialized
software and hardware to encode and decode audio and video.
Patent fees − Some MPEG standards are covered by patents, which may
require the payment of licensing fees to use the technology.
Compatibility issues − Some older devices and software may not support
newer versions of the MPEG standard.
Spatial Compression: The spatial compression of each frame is done with
JPEG (ora modification of it). Each frame is a picture that can be
independently compressed.
11
iv. Temporal Compression: In temporal compression, redundant frames are
removed.
Frame Sequence

v. To temporally compress data, the MPEG method first divides frames into
three categories:

vi. I-frames, P-frames, and B-frames. Figure1 shows a sample sequence off
names.

I-frames: An intracoded frame (I-frame) is an independent frame that is not


related to any other frame.
They are present at regular intervals. An I-frame must appear periodically to
handle some sudden change in the frame that the previous and following
frames cannot show. Also, when a video is broadcast, a viewer may tune at
any time. If there is only one I-frame at the beginning of the broadcast, the
11
viewer who tunes in late will not receive a complete picture. I-frames are
independent of other frames and cannot be constructed from other frames.

P-frames: A predicted frame (P-frame) is related to the preceding I-frame or P-frame. In


other words, each P-frame contains only the changes from the preceding frame. The
changes, however, cannot cover a big segment. For example, for a fast-moving object, the
new changes may not be recorded in a P-frame. P-frames can be constructed only from
previous I- or P-frames. P-frames carry much less information than other frame types and
carry even fewer bits after compression.
B-frames: A bidirectional frame (B-frame) is related to the preceding and following I-
frame or P-frame. In other words, each B-frame is relative to the past and the future. Note
that a B-frame is never related to another B-frame.

 According to the MPEG standard the entire movie is considered as a video sequence
which consist of pictures each having three components, one luminance component and
two chrominance components (y, u & v).
 The luminance component contains the gray scale picture & the chrominance components
provide the color, hue & saturation.
 Each component is a rectangular array of samples & each row of the array is called the
raster line.
 The eye is more sensitive to spatial variations of luminance but less sensitive to similar
variations in chrominance. Hence MPEG – 1 standard samples the chrominance
components at half the resolution of luminance components.
 The input to MPEG encoder is called the resource data and the output of the MPEG
decoder is called the reconstructed data.
 The MPEG decoder has three parts, audio layer, video layer, system layer.
 The system layer reads and interprets the various headers in the source data and transmits
this data to either audio or video layer.
 The basic building block of an MPEG picture is the macro block as shown:

 The macro block consist of 16×16 block of luminance gray scale samples divided into
four 8×8 blocks of chrominance samples.
 The MPEG compression of a macro block consists of passing each of the °6 blocks their
DCT quantization and entropy encoding similar to JPEG.
 A picture in MPEG is made up of slices where each slice is continuous set of macro
blocks having a similar gray scale component.
 The concept of slice is important when a picture contains uniform areas.
 The MPEG standard defines a quantization stage having values (1, 31). Quantization for
intra coding is:

11
Where
DCT = Discrete cosine transform of the coefficienting encoded
Q = Quantization coefficient from quantization table

Quantization rule for encoding,

 The quantized numbers Q_(DCT )are encoded using non adaptive Haffman method and
the standard defines specific Haffman code tables which are calculated by collecting
statistics.

Wavelet based compression


A major disadvantage of the Fourier Transform is it captures global frequency
information, meaning frequencies that persist over an entire signal. This kind of
signal decomposition may not serve all applications well (e.g.
Electrocardiography (ECG) where signals have short intervals of characteristic
oscillation). An alternative approach is the Wavelet Transform, which
decomposes a function into a set of wavelets.
Over the past ten years, the wavelet transform has been widely used in signal
processing research, particularly, in image compression. In many applications,
wavelet-based schemes achieve better performance than other coding schemes
like the one based on DCT. Since there is no need to block the input image and
its basis functions have variable length, wavelet based coding schemes can
avoid blocking artifacts. Wavelet based coding also facilitates progressive
transmission of images

A Wavelet is a wave-like oscillation that is localized in time, an example is given below.


Wavelets have two basic properties: scale and location. Scale (or dilation) defines how
“stretched” or “squished” a wavelet is. This property is related to frequency as defined for
waves. Location defines where the wavelet is positioned in time (or space).

Mother wavelet ᵱ (ᵆ ) is a frequency domain function and Father wavelet ∅(ᵆ ) is a time
domain function. 11
There are two types of Wavelet Transforms: Continuous and Discrete.
Definitions of each type are given in the above figure. The key difference
between these two types is the Continuous Wavelet Transform (CWT) uses
every possible wavelet over a range of scales and locations i.e. an infinite
number of scales and locations. While the Discrete Wavelet Transform (DWT)
uses a finite set of wavelets i.e. defined at a particular set of scales and
locations.

there are a wide variety of wavelets to choose from to best match that shape. A handful of
options are given in the figure below.

11
From top to bottom, left to right: Daubechies 4, Daubechies 16, Haar, Coiflet
1, Symlet 4, Symlet 8, Biorthogonal 1.3, &Biorthogonal 3.1
Wavelet Transform Wavelets are functions defined over a finite interval. The
basic idea of the wavelet transform is to represent an arbitrary function ƒ(x) as
a linear combination of a set of such wavelets or basis functions. These basis
functions are obtained from a single prototype wavelet called the mother
wavelet by dilations (scaling) and translations (shifts). The purpose of wavelet
transform is to change the data from time-space domain to time-frequency
domain which makes better compression results. The simplest form of
wavelets, the

Haar wavelet function

12
Morelet wavelet
It has Fourier basis with Gaussian function
2

ᵱ (ᵆ ) = ᵅᵆᵅ (ᵅᵱ ᵅ ᵆ )ᵅᵆᵅ ( )
2
Mexicam Wavelet
It is a 2nd order derivative Gaussian
2

ᵱ (ᵆ ) = (1 − ᵆ 2)ᵅᵆᵅ ()
2
Example: Detecting R-peaks in ECG Signal
In this example, I use a type of discrete wavelet transform to help detect R-
peaks from an Electrocardiogram (ECG) which measures heart activity. R-
peaks are typically the highest peak in an ECG signal. They are part of the
QRS-complex which is a characteristic oscillation that corresponds to the
contraction of the ventricles and expansion of the atria. Detecting R-peaks is
helpful in computing heart rate and heart rate variability (HRV).

In the real world, we rarely have ECG signals that look as clean as the above
graphic. As seen in this example, ECG data is typically noisy. For R-peak
detection, simple peak-finding algorithms will fail to generalize when applied
to raw data. The wavelet transform can help convert the signal into a form that
makes it much easier for our peak finder function.
Here I use the maximal overlap discrete wavelet transform (MODWT) to
extract R-peaks from the ECG waveform. The Symletwavelet with 4 vanishing
moments (sym4) at 7 different scales are 1u2s e d .
Block Diagram

12
Image Preprocessing
Before being used for model training and inference, pictures must first undergo image
preprocessing. This includes, but is not limited to, adjustments to the size, orientation,
and color. The purpose of pre-processing is to raise the image's quality so that we can
analyze it more effectively. Preprocessing allows us to eliminate unwanted distortions
and improve specific qualities that are essential for the application we are working on.
Those characteristics could change depending on the application. An image must be
preprocessed in order for software to function correctly and produce the desired results.

1. Orientation:
When a picture is taken, its metadata informs our computers how to show the input image
in relation to how it is stored on disk. Its EXIF orientation is the name given to that
metadata, and incorrect EXIF data handling has long been a source of frustration for
developers everywhere. This also holds true for models: if we've established annotated
bound boxes on how we viewed an image to be orientated but our model is "seeing" the
picture in a different orientation.

2. Resize:
Although altering an image's size may seem simple, there are things to keep in mind. Few
devices capture exactly square images, despite the fact that many model topologies require
square input images. Stretching an image's dimensions to make it square is one option,
as is maintaining the image's aspect ratio while adding additional pixels to fill in the
newly formed "dead space."

3. Random Flips:
Our model must acknowledge that an object need not always be read from left to right or
up to down by randomly reflecting it about its x- or y-axis. In order-dependent
circumstances, such as when deciphering text, flipping may be irrational.

4. Grayscale:
One type of image transformation that can be applied to all images (train and test) is a
change in color. Random changes can also be made to images only during training as
augmentations. Every image is often subjected to grayscaling, which alters the color. While
we may believe that "more signal is always better, we may actually observe more timely
model performance when images are rendered in grayscale.

12
5. Different Exposure:
If a model might be expected to operate in a range of lighting conditions, changing
image brightness to be randomly brighter and darker is most appropriate. The
maximum and minimum levels of brightness in the space must be taken into account.

Medical imaging: Image processing can provide sharp, high-quality images for
scientific and medical studies, ultimately assisting doctors in making diagnoses.

Convolutional Neural Networks (CNN) learn to do tasks like object detection, image
segmentation, and classification by taking in an input image and applying filters to it.

Retinal Based Images

Features of a Retinal Image:

The retina is the light sensitive layer at the back of the eye that is visualisable with
specialist equipment when imaging through the pupil. The features of a typical view of
the retina include the optic disc where the blood vessels and nerves enter from the
back of the eye into the retina. The blood vessels emerge from the optic disc and
branch out to cover most of the retina. The macula is the central region of the retina
about which the blood vessels circle and partially penetrate has the optic disc on the left
and the macula towards the centre-right) and is the most important for vision.

There are a number of diseases of the retina of which diabetic retinopathy


(pathology of the retina due to diabetes) has generated the most interest for automated
computer detection. Diabetic retinopathy (DR) is a progressive disease that results in
eye-sight loss or even blindness if not treated.

Pre-proliferative diabetic retinopathy (loosely DR that is not immediately threatening


eye-sight loss) is characterized by a number of clinical symptoms, including
microaneurysms (small round outgrowths from capillaries that appear as small
round red dots lessthan 125 μm in diameter in color retinal images), dot-
haemorrhages (which are often indistinguishable from microaneurysms), exudate
(fatty lipid deposits that appear as yellow irregular patches with sharp edges often
organized in clusters) and haemorrhage (clotting of leaked blood into the retinal
tissue). These symptoms are more serious if located near the centre of the macular.

12
As the photographer does not have complete control over the patient’s eye which forms
a part of the imaging optical system, retinal images often contain artifacts and/or are
of poorer quality than desirable. Patients often have tears covering the eye and,
particularly the elderly, may have cataract that obscures and blurs the view of the
retina. In addition, patients often do not or cannot hold their eye still during the
imaging process hence retinal images are often unevenly illuminated with parts of
the retinal image brighter or darker than the rest of the image, or, in worst cases,
washed out with a substantial or complete loss of contrast.

Ultrasound of liver

A vascular ultrasound of the liver is performed to help evaluate the liver and its
network of blood vessels (within the liver and entering and exiting the liver). Using
vascular ultrasound can help physicians diagnose and review the outcome of
treatments for various liver-related problems and diseases.A liver ultrasound is a
noninvasive test that produces images of a person’s liver and its blood vessels. It can
help diagnose various liver conditions, such as fatty liver, liver cancer, and
gallstones.A liver ultrasound is a type of transabdominal ultrasound. This means a
technician scans the abdomen using a device that resembles a microphone. The
process uses sound waves to create digital images. Liver ultrasounds are safe and
usually do not take long.

12
Some common types of liver ultrasound scans include:

 Contrast imaging: This involves injecting dye into the blood vessels to make it easier to
see the liver and its vessels. It can be especially helpful for diagnosing growths and lesions
on the liver and detecting liver cancer.
 Elastography: This is a technique to see how stiff the liver tissue is, which could
signal cirrhosis or another problem. It involves delivering a series of pulses to the liver to
see the liver tissue. A doctor may compare elastography scores over time to detect changes
in liver health.
 Combined techniques: A doctor may combine techniques, such as by doing an ultrasound
and an MRI scan.

A liver ultrasound may indicate structural changes consistent with the presence of
certain conditions, including:

1. Different types of hepatitis, such as viral hepatitis

2. Lliver fibrosis or cirrhosis

3. Nonalcoholic fatty liver disease

4. Alcohol-related liver injuries

5. Pregnancy-related liver injuries, such as cholestasis or cholangitis

gallstones

6. Liver cancer

7. Infections

8. Blockages

9. Hemangioma, or lumping of blood vessels

Often, doctors will need to use a range of diagnostic tools to definitively identify
the reason for these changes. This process may include a medical history, physical exam,
blood tests, and potentially a biopsy. 12
Ultrasound scans use high frequency sound waves to create a picture of a part of
the body.

The ultrasound scanner has a microphone that gives off sound waves. The sound waves
bounce off the organs inside your body and a microphone picks them up. The microphone
links to a computer that turns the sound waves into a picture.

You usually have them in the hospital x-ray department

You might need to stop eating for 6 hours beforehand. Let the scan team know if this will
be a problem for any reason, for example if you are diabetic.

They might ask you to drink plenty before your scan so that you have a comfortably full
bladder.

Kidney Ultrasound
A kidney ultrasound may be performed to assist in placement of needles used to biopsy
(obtain a tissue sample) the kidneys , to drain fluid from a cyst or abscess, or to place a
drainage tube. This procedure may also be used to determine blood flow to the kidneys
through the renal arteries and veins.

12
A kidney ultrasound can show:

 Something abnormal in the size or shape of your kidneys


 Blood flow to your kidneys
 Signs of injury or damage to your kidneys
 Kidney stones, cysts (fluid-filled sacs) or tumors
 Your bladder (the organ that stores urine before it leaves your body)

A kidney ultrasound may also be used to help detect physical signs of chronic kidney
disease (CKD), which can lead to kidney failure. For example, the kidneys of someone with
CKD maybeTrusted Source smaller, have thinning of certain kidney tissues, or show the
presence of cysts.

Other reasons you might need a kidney ultrasound include:

 guiding your doctor to insert a needle for a tissue biopsy of your kidney
 helping you doctor to locate a kidney abscess or cyst
 helping your doctor place a drainage tube into your kidney
 allowing your doctor to check on a transplanted kidney

12
Mammogram

A mammogram is an X-ray picture of the breast. Doctors use a mammogram to


look for early signs of breast cancer. Regular mammograms can find breast cancer
early.Mammogramsare done with a machine designed to look only at breast tissue. The
mammogram machine has 2 plates that compress or flatten the breast to spread the tissue
apart. Digital mammograms are much more common. Digital images are recorded and
saved as files in a computer.

In general, there are two main types of mammograms:

1. Digital mammography in 2D.

2. Digital mammography in 3D (digital breast tomosynthesis).

Three-dimensional (3D) mammography is also known as breast tomosynthesis or digital


breast tomosynthesis (DBT). As with a standard (2D) mammogram, each breast is
compressed from two different angles (once from top to bottom and once from side to
side) while x-rays are taken. But for a 3D mammogram, the machine takes many low-dose
x-rays as it moves in a small arc around the breast. A computer then puts the images
together into a series of thin slices. This allows doctors to see the breast tissues more
clearly in three dimensions. (A standard two-dimensional [2D] mammogram can be taken
at the same time, or it can be reconstructed from the 3D mammogram images.)

Many studies have found that 3D mammography appears to lower the chance of
being called back for follow-up testing after screening. It also appears to find more breast
cancers, and several studies have shown it can be helpful in women with dense breasts. A
large study is now in progress to better compare outcomes between 3D mammograms and
standard (2D) mammograms.

Mammograms expose the breasts to small amounts of radiation. But the benefits of
mammography outweigh any possible harm from the radiation exposure. Modern
machines use low radiation doses to get breast x-rays that are high in image quality. On
average the total dose for a typical mammogram with 2 views of each breast is about 0.4
millisieverts, or mSv. (AmSv is a measure of radiation dose.) The radiation dose from 3D
mammograms can range from slightly lower to slightly higher than that from standard 2D
mammograms.

12
Breast tomosynthesis may also result in:

 earlier detection of small breast cancers that may be hidden on a conventional


mammogram
 fewer unnecessary biopsies or additional tests
 greater likelihood of detecting multiple breast tumors
 clearer images of abnormalities within dense breast tissue
 greater accuracy in pinpointing the size, shape and location of breast abnormalities

What’s the difference between a screening mammogram and a diagnostic


mammogram?

A screening mammogram is a routine (usually annual) mammogram that


healthcare providers recommend to look for signs of cancer or abnormal breast tissue
before you have symptoms. Screening mammography helps with the early detection of
breast cancer. Early detection allows for early treatment, which may be more effective than
if the cancer is found at a later stage.

A routine screening mammogram usually includes at least two pictures of each


breast taken at different angles, typically from top to bottom and from side to side. If you
have breast implants, you’ll need additional images.

Healthcare providers order a diagnostic mammogram if a screening mammogram


shows abnormal tissue or there’s a new breast issue.

While both types of mammograms use the same machines, diagnostic


mammography uses additional imaging techniques, such as spot compression,
supplementary angles or magnification views and is supervised by the radiologist at the
time of the study.

13
Blood vessels circulate blood throughout your body. They help deliver oxygen to
vital organs and tissues, and also remove waste products. Blood vessels include veins,
arteries and capillaries.

Segmentation of ROI

There are two classes of segmentation techniques.

 Computer vision approaches


 AI based techniques

Semantic segmentation is an approach detecting, for every pixel, the belonging class. For
example, in a figure with many people, all the pixels belonging to persons will have the same
class id and the pixels in the background will be classified as background.

Instance segmentation is an approach that identifies, for every pixel, the specific
belonging instance of the object. It detects each distinct object of interest in the image. For
example, when each person in a figure is segmented as an individual object.

Panoptic segmentation combines both semantic and instance segmentation. Like


semantic segmentation, panoptic segmentation is an approach that identifies, for every pixel,
the belonging class. Moreover, like in instance segmentation, panoptic segmentation
distinguishes different instances of the same class.

1. ROI of Blood vessels


Blood vessels are channels that carry blood throughout your body. They form a
closed loop, like a circuit, that begins and ends at your heart. Together, the heart vessels
and blood vessels form your circulatory system. Your body contains about 60,000 miles of
blood vessels.

There are five types of blood vessels:

1. Arteries which
carry the blood away from the heart; the arterioles;
the capillaries, where the exchange of water and chemicals between the blood and
the tissues occurs.

2. Venules and the veins, which carry blood from the capillaries back
towards the heart.

3. Tunica intima: The inner layer surrounds the blood as it flows through your
body. It regulates blood pressure, prevents blood clots and keeps toxins out of your
blood. It keeps your blood flowing smoothly.

13
4. Media: The middle layer contains elastic fibers that keep your blood flowing in
one direction. The media also helps vessels expand and contract.

5. Adventitia: The outer layer contains nerves and tiny vessels. It delivers
oxygen and nutrients from your blood to your cells and helps remove waste. It also
gives blood vessels their structure and support.

Vessel segmentation can be done with two main approaches:

1. Locating all the blood vessels with an appropriate method and visualizing
them. This can, for instance, be done by ray casting the original vessel data or by creating
a mesh around the edges of the vessels. The problem with this approach is that the
number of vessels is huge. Their structure varies a lot, going from the neck area up to the
brain. The placement and structure of the vessels also differs a lot from person to person,
which can make the results more uncertain.

2. Subtracting all, or as much as possible, of the obscuring bone tissue. Finding


the bone allows for a segmentation of the vessels by masking away the bone. When this
is done, the contrast filled vessels can easily be separated from surrounding lower intensity
tissue by using thresholding. Large parts of the scull can be located quickly with several
methods, since they have defined structure and are not adjacent to any important vessels.
However, as mentioned, bone varies a lot in density and can resemble blood vessels in
some areas in the lower skull region and parts of the vertebras.

13
1. Matched Filtering Method

Matched Filtering for Blood Vessel Segmentation One of the earliest and reasonably
effective proposals for the segmentation of blood vessels in retinal images is the use of
oriented matched-filters for the detection of long linear structures. Blood vessels
often have a Gaussian like cross-section that is fairly consistent along the length of
vessel segments. Provided the vessels are not too tortuous then they can be approximated
as elongated cylinders of Gaussian cross-section between the vessel branch points. Thus,
the two-dimensional model consisting of an elongated cylinder of Gaussian cross-section
should correlate well with a vessel segment provided they have both the same
orientation. The model is moved to each possible position in the image and the
correlation of the local patch of image to the model is calculated to form a correlation
image. Peaks in the correlation image occur at the locations of the blood vessels.

13
One of the most intuitive ways to find all vessels would be to locate one or
several places in the image where most of the blood must pass, and search outwards from
there. If the scan includes some of the chest area, the aorta is a good point to start.
Otherwise, one point in each common carotid vessel should be enough. If region growing
could be executed here, the problem would be more or less solved. However, as previously
discussed this is not entirely possible. Due to the intensity overlap and close proximity
of bone, the region growing will leak out and include bone parts such as vertebrae
and parts of the scull. In some few cases, where the vessels are better separated from
bone structure, it might be possible to minimize leakage to a point where it is no
longer critical. But by doing so, it is likely to have included too little of the vessels
themselves. The region growing method works very well though, if the only goal is to
strip the skull bone. It is possible to place a seed point at arbitrary locations inside the
brain, and then region grow the brain with an upper threshold set just so that none of the
skull bone is included. If there are any vessels in the head that were excluded as well, they
can be selected by some closing operations.

In order to actually find the vessels, it is essential to know the diameter of the
vessels. Generally, the blood vessels in the neck are much thicker than those in the head.
Even if the two are split up, it is still impossible to set an exact diameter that fits every
vessel. 13
2. Lesion Based Segmentation

The process of delineating the boundary of a lesion from an image or image series either
by use of interactive computer tools (manual) or by automated image segmentation
algorithms.

Retinal Lesions

We now turn attention to the segmentation of features of interest and lesions in


retinal images. In the following pages three general techniques in image processing,
namely linear filtering in image space, morphological processing, and wavelets are
illustrated by way of application to the detection of retinal blood vessels and
microaneurysms. Analysis of blood vessels is of particular interest as vascular disease
such as diabetes cause visible and measurable changes to the blood vessel network.
Detecting (i.e. segmenting) the blood vessels and measuring blood vessel parameters
provides information on the severity and likely progression of a variety of diseases.
Microaneurysms are a particular vascular disorder in which small pouches grow out of
the side of capillaries. They appear in color fundus images as small round red dots
and in fluorescein angiographic images as small hyperfluorescent round dots. The
detected number of microaneurysms is known to correlate with the severity and
likely progression of diabetic retinopathy

Skin Lesions

Skin lesion segmentation, which is one of the medical image segmentation areas, is
important for the detection of melanoma. Melanoma, the most life-threatening skin

13
cancer, can suddenly occur on normal skin without warning and can develop on a
preexisting lesion. Therefore, lesions must be carefully monitored.

It contains images of the classes melanoma (MEL), melanocytic nevus (NV), basal
cell carcinoma (BCC), actinic keratosis (AK), benign keratosis (BKL),
dermatofibroma (DF), vascular lesion (VASC) and squamous cell carcinoma
(SCC).

There are two main categories of skin lesions:

1. Primary and Secondary lesions. Primary skin lesions are abnormal skin conditions that
may be present at birth or acquired later.

Secondary skin lesions are a result of irritated or manipulated primary skin lesions.

Primary lesions may be present at birth or acquired later in a person’s life. The most
common primary skin lesions include

Birthmarks: These are the most common primary skin lesions. They include moles,
port-wine stains, nevi, etc.

Blisters: Blisters are skin lesions that are less than half a centimeter in diameter and
filled with clear fluid. Small blisters are called vesicles and larger ones are called the
bullae. Blisters may be caused by burns (including sunburns), viral infections (herpes
zoster), friction due to shoes or clothes, insect bites, drug reactions, etc.

Macules: Macules are flat skin lesions. They are small (less than one centimeter in
diameter) and may be brownish or reddish. Freckles and flat moles are examples of
macules. A macular rash is commonly seen in measles.

Nodules: Nodules are soft or firm, raised skin lesions that are less than two centimeters
in diameter. The nodules are seen in certain diseases such as neurofibromatosis and
leprosy.

Papule: Papules are raised lesions and usually develop with other papules. A patch of
papules or nodules is called a plaque. Plaques are commonly seen in psoriasis. Papules
may be seen in viral infections, such as measles, or may occur due to mosquito bites.

Pustule: Pustules are pus-filled lesions. Boils and abscesses are examples of pustules.

Wheals: Wheals are swollen, raised bumps or plaques that appear suddenly on the skin.
They are mostly caused by an allergic reaction. For example, hives (also called
urticaria), insect bites, etc.

13
Secondary skin lesions, which get inflamed and irritated, develop after primary skin
lesions or due to an injury. The most common secondary skin lesions include

Crust: A crust or a scab is a type of skin lesion that forms over a scratched, injured or
irritated primary skin lesion. It is formed from the dried secretions over the skin.

Ulcer: Ulcers are a break in the continuity of the skin or mucosa. Skin ulcers are caused
by an infection or trauma. Poor blood circulation, diabetes, smoking and/or bedridden
status increase the risk of ulcers.

Scales: Scales are patches of skin cells that build up and flake off the skin. Patches are
often seen in psoriasis and cause bleeding when they are removed.

Scar: Injuries, such as scratches, cuts and scrapes, can leave scars. Some scars may be
thick and raised. These may cause itching or oozing and appear reddish or brownish.
These are called keloids.

Skin atrophy: Skin atrophy occurs when areas of the skin become thin and wrinkled.
This could occur due to the frequent use of steroid creams, radiation therapy or poor
blood circulation.

2. Morphological Operations

Morphology means the study of the shape and structure of living things from a biological
perspective. Morphology is a discipline of biology related to the study of the shape and
structure of the organism and its unique structural characteristics.

There are different types of morphology:

 Cellular Morphology
 Tissue Morphology
 Organ Morphology
 The Whole Organism

3 types of lesions

A flat mark on your skin of a different color than your skin tone (macule or patch).

An elevated, pimple-like bump (papule or plaque).

An elevated, solid bump (nodule).

Image segmentation involves the process of partitioning image data into multiple
sets of pixels/voxels. In other words, every pixel/voxel is assigned a label/value, where
those with the same label/value belong to the same segment. There are a vast number o f
methods for doing image segmentation
13
Thresholding, clustering, region growing and edge detection, just to mention a few
– and they can be applied to varying problems: Some of them can be used for doing blood
vessel segmentation. The reason for doing segmentation is in most cases to get a different
view on the image, mostly creating a more comprehensible representation of the original,
making it easier to analyze.

1. Thresholding

The simplest method of image segmentation is called the thresholding method.


This method is based on a clip-level (or a threshold value) to turn a gray-scale image
into a binary image.

The key of this method is to select the threshold value (or values when multiple-
levels are selected). Several popular methods are used in industry including the maximum
entropy method, balanced histogram thresholding.

2. Clustering methods

The K-means algorithm is an iterative technique that is used to partition an


image into K clusters. In this case, distance is the squared or absolute difference
between a pixel and a cluster center. The difference is typically based on pixel color,
intensity, texture, and location, or a weighted combination of these factors.

3. Histogram-based methods

Histogram-based methods are very efficient compared to other image


segmentation methods because they typically require only one pass through the pixels. In
this technique, a histogram is computed from all of the pixels in the image, and the peaks
and valleys in the histogram are used to locate the clusters in the image.[1] Color or
intensity can be used as the measure.

4. Edge detection

Edge detection is a well-developed field on its own within image processing.


Region boundaries and edges are closely related, since there is often a sharp adjustment
in intensity at the region boundaries. Edge detection techniques have therefore been used
as the base of another segmentation technique.

5. Region-growing methods

Region-growing methods rely mainly on the assumption that the neighboring


pixels within one region have similar values. The common procedure is to compare one
pixel with its neighbors. If a similarity criterion
13
is satisfied, the pixel can be set to belong
to the same cluster as one or more of its neighbors. The selection of the similarity
criterion is significant and the results are influenced by noise in all instances.

6. Partial differential equation-based methods

Using a partial differential equation (PDE)-based method and solving the PDE
equation by a numerical scheme, one can segment the image.[40] Curve propagation is a
popular technique in this category, with numerous applications to object extraction,
object tracking, stereo reconstruction, etc.

7. Graph partitioning methods

Graph partitioning methods are an effective tools for image segmentation since
they model the impact of pixel neighborhoods on a given cluster of pixels or pixel,
under the assumption of homogeneity in images.

8. Watershed transformation

The watershed transformation considers the gradient magnitude of an image as a


topographic surface. Pixels having the highest gradient magnitude intensities (GMIs)
correspond to watershed lines, which represent the region boundaries. Water placed
on any pixel enclosed by a common watershed line flows downhill to a common local
intensity minimum (LIM). Pixels draining to a common minimum form a catch basin,
which represents a segment.

3. ROI of Tumours

Tumours are groups of abnormal cells that form lumps or growths. They can start
in any one of the trillions of cells in our bodies. Tumours grow and behave differently,
depending on whether they are cancerous (malignant), non-cancerous (benign) or
precancerous.

A tumor is a solid mass of tissue that forms when abnormal cells group together.
Tumors can affect bones, skin, tissue, organs and glands. Many tumors are not cancer
(they’re benign). But they still may need treatment. Cancerous, or malignant, tumors can
be life-threatening and require cancer treatment.

A tumor is a solid mass of tissue. It may or may not be cancerous.

A cyst is a small sac that may contain fluid, air or solid material. The majority of
cysts are not cancerous.

13
Cancerous: Malignant or cancerous tumors can spread into nearby tissue, glands
and other parts of the body.

Noncancerous: Benign tumors are not cancerous and are rarely life-threatening.

Precancerous: These noncancerous tumors can become cancerous if not treated.

Types of cancerous tumors include:

Bone tumors (osteosarcoma and chordomas).

Brain tumors such as glioblastoma and astrocytoma.

Malignant soft tissue tumors and sarcomas.

Organ tumors such as lung cancer and pancreatic cancer.

Ovarian germ cell tumors.

Skin tumors (such as squamous cell carcinoma).

Types of benign tumors

Common noncancerous tumors include:

Benign bone tumors (osteomas).

Brain tumors such as meningiomas and schwannomas.

Gland tumors such as pituitary adenomas.

Lymphatic tumors such as angiomas.

Benign soft tissue tumors such as lipomas.

Uterine fibroids.

Types of precancerous tumors

Precancerous tumors include:

Actinic keratosis, a skin condition.

Cervical dysplasia.
14
Colon polyps.

Ductal carcinoma in situ, a type of breast tumor.

4. ROI of Lung Nodule_

Lung nodules are small masses of tissue in the lung that appear as round, white
spots on a chest X-ray or computed tomography (CT) scan. Because they rarely have
symptoms, they are usually found incidentally in 1 of every 500 chest X-rays taken for
other, unrelated ailments, like a respiratory illness.

Lung nodules are small clumps of cells in the lungs. They're very common. Most
lung nodules are scar tissue from past lung infections. Lung nodules usually don't cause
symptoms. They're often found by accident on a chest X-ray or CT scan done for some
other reason.

Pulmonary nodules, or lung nodules, are common, and are usually benign or
non-cancerous. Here’s what you need to know about these spots.

Most nodules are smaller than 10 mm, or the size of a cherry. Larger lung
nodules, or nodules located near an airway, may have symptoms such as a chronic
cough, blood-tinged mucus and saliva, shortness of breath, fever or wheezing.

“In our part of the world, very small (less than 6 mm) nodules are commonly
identified incidentally on chest CT scans for reasons like chest pain or shortness of
breath, or to evaluate for pulmonary embolism,” “The significant majority are benign,
although in certain instances they may require follow-up to prove that.”

The most common causes of lung nodules are tissue that has become inflamed
from infection or benign lung tumors. Causes of lung nodules can include:

Bacterial infections, such as tuberculosis and pneumonia Fungal infections


14
Rheumatoid arthritis,Sarcoidosis

Imaging, like an X-ray or a CT scan, can determine the size, shape and location of
your lung nodules. This can help your physician determine the cause and, as a result, the
treatment needed.

Though most lung nodules are not cancerous, it’s important to detect them early.
Northwestern Medicine offers a low-dose CT lung cancer screening program specifically
for individuals at high risk of lung cancer. To determine your eligibility for the program,
your physician will discuss your history, including your smoking history and age.

A positron emission tomography (PET) scan provides detailed metabolic


images that can help fine-tune a diagnosis of cancer. For probable benign lung nodules,
your provider may suggest CT scans to monitor the growth over time as a precaution.
However, a biopsy may be needed to determine what’s causing the lung nodule from
occurring.

Feature Extraction

Feature extraction refers to the process of transforming raw data into


numerical features that can be processed while preserving the information in the
original data set. It yields better results than applying machine learning directly to the raw
data.

Feature Extraction aims to reduce the number of features in a dataset by creating


new features from the existing ones (and then discarding the original features)

1. The need for Dimensionality Reduction

2. PCA(Principal Component Analysis)

3. Kernel PCA

4. LDA (Linear Discriminant Analysis)

The feature Extraction technique gives us new features which are a linear
combination of the existing features. The new set of features will have different values as
compared to the original feature values. The main aim is that fewer features will be
required to capture the same information.

PCA is a method of obtaining important variables (in form of components)


from a large set of variables available in a data set. It tends to find the direction of

14
maximum variation (spread) in data. PCA is more useful when dealing with 3 or
higher-dimensional data.

We can infer from the above figure that from the first 6 Principal Components we
are able to capture 80% of the data. This shows us the Power of PCA that with only using 6
features we able to capture most of the data.

1. Mean

2. Variance

3. SD

4. Entropy

5. Skew

6. Kurtosis

We need to note that all the PC’s will be perpendicular to each other. The main
intention behind this is that no information present in PC1 will be present in PC2 when
they are perpendicular to each other.

Though PCA is a very useful technique to extract only the important features but
should be avoided for supervised algorithms as it completely hampers the data. If we still
wish to go for Feature Extraction Technique then we should go for LDA instead.

14
The main difference between LDA and PCA is:

1. LDA is supervised PCA is unsupervised.

2. LDA =Describes the direction of maximum separability in data.


PCA=Describes the direction of maximum variance in data.

3. LDA requires class label information unlike PCA to perform fit.

LDA works in a similar manner as PCA but the only difference is that LDA requires class
label information, unlike PCA.

Implementation of LDA using Python

a. We have first standardized the data and applied LDA.

Then I have used a linear model like Logistic Regression to fit the data. Then plotted the
Decision Boundary for better class separability understanding

14
Feature extraction with Morphological features

Eight morphological features including Volume, Surface Area,


Compactness, NRL (Normalized Radial Length) Mean, Sphericity, NRL
entropy, NRL ratio, Roughness were calculated to describe the
morphological properties. The first three features showed the 3D
properties of the lesion.
square of surface area
Compactness = total volume of the lesion

A sphere will have the lowest compactness index. A highly non-convex


lesion, such as a spiculated mass, will have a high compactness index. The
latter five features were based on the normalized radial length (NRL), defined
as the Euclidean distance from the object’s center (Center of Mass) to each of
its contour pixels and normalized relative to the maximum radial length of the
lesion.

Feature extraction with Texture features

Texture is a repeating pattern of local variations in image intensity, and is


characterized by the spatial distribution of intensity levels in a
neighborhood.

There are 10 GLCM texture features (energy, maximum probability,


contrast, homogeneity, entropy, correlation, sum average, sum variance,
difference average, and difference variance)

Feature Selection

Diagnostic Feature Selection

After feature extraction, a total of eight morphological features and ten


GLCM texture features were obtained for each lesion. The artificial neural
network (ANN) was utilized to obtain an optimal classifier to differentiate
between benign and malignant lesions.

14
The features were selected using LNKnet package in order to identify the
ones that yield maximum discrimination capability thus achieving the
optimal diagnostic performance. Each parameter set was normalized to

have zero mean and unit variance before training. Forward search
strategy was applied to find the optimal feature subset, which was
obtained when the trained classifier produced the least error rate.

Multilayer Perceptron (MLP) is one of the most common ANN topologies,


where units are structured in input, hidden, and output layers. The specific
structure of MLP was determined by selecting the one leading to the best
performance.

The selection based on morphology or texture features alone used 1


input layer with 3 nodes, 1 hidden layer with 2 nodes, and 1 output node
from 0 to 1 indicating level of malignancy, where 0 means absolute
benign and 1 means absolute malignant.

Weights and bias of neural network were determined by two-phase


training procedure. The first phase had 30 iterations of back
propagation, and the second phase had a longer run of conjugated
gradient descent to ensure full convergence. The logistic sigmoid
function was used to interpret the output variation in terms of
probability of class membership within the range (0,1). The performance
of the selected classifier was evaluated using ROC analysis.

Well, if we compare the neural network to our brain, a node is a replica of a neuron that
receives a set of input signals—external stimuli.

14
The role of the Activation Function is to derive output from a set of input values fed to a node (or a layer).

K-nearest neighbors (KNN) is a type of supervised learning algorithm used for both regression and
classification. KNN tries to predict the correct class for the test data by calculating the distance between
the test data and all the training points. Then select the K number of points which is closet to the test
data.

Suppose there are two categories, i.e., Category A and Category B, and we have a new data point x1, so
this data point will lie in which of these categories. To solve this type of problem, we need a K-NN
algorithm. With the help of K-NN, we can easily identify the category or class of a particular dataset.
Consider the below diagram:

14
The K-NN working can be explained on the basis of the below algorithm:

Step-1: Select the number K of the neighbors

Step-2: Calculate the Euclidean distance of K number of neighbors

Step-3: Take the K nearest neighbors as per the calculated Euclidean distance.

Step-4: Among these k neighbors, count the number of the data points in each category.

Step-5: Assign the new data points to that category for which the number of the neighbor is maximum.

Step-6: Our model is ready.

Firstly, we will choose the number of neighbors, so we will choose the k=5.

Next, we will calculate the Euclidean distance between the data points. The Euclidean distance is
the distance between two points, which we have already studied in geometry. It can be calculated
as:

14
By calculating the Euclidean distance we got the nearest neighbors, as three nearest neighbors in
category A and two nearest neighbors in category B.

An ROC curve (receiver operating characteristic curve) is a graph showing the performance of a
classification model at all classification thresholds. This curve plots two parameters: True Positive Rate.
False Positive Rate.

14
AUC - Area Under the Curve

AUC stands for "Area under the ROC Curve." That is, AUC measures the
entire two-dimensional area underneath the entire ROC curve (think integral
calculus) from (0,0) to (1,1).

AUC provides an aggregate measure of performance across all possible classification thresholds. One
way of interpreting AUC is as the probability that the model ranks a random positive example more
highly than a random negative example. For example, given the following examples, which are
arranged from left to right in ascending order of logistic regression predictions:

AUC represents the probability that a random positive (green) example is


positioned to the right of a random negative (red) example.

AUC ranges in value from 0 to 1. A model whose predictions are 100%


wrong has an AUC of 0.0; one whose predictions are 100% correct has an
AUC of 1.0.

AUC is desirable for the following two reasons:

AUC is scale-invariant. It measures how well predictions are ranked, rather


than their absolute values.

AUC is classification-threshold-invariant. It measures the quality of the


model's predictions irrespective of what classification threshold is chosen.

15
Shape and Texture

BOUNDARY DESCRIPTORS

Simple Descriptors

Length of a Contour

By counting the number of pixels along the contour.For a chain coded curve with unit
spacing in both directions, the number of vertical and horizontal components plus 21/2
times the number of omponents give the exact length of curve.

● Boundary Diameter

It is defines as

Diam (B) = max [D(pi , pj)] i, j

Where D is the distance measure which can be either Euclidean distance or D4 distance.

The value of the diameter and the orientation of the major axis of the boundary are two
useful Descriptors.

● Curvature

It is the rate of change of slope.

15
Curvature can be determined by using the difference between the slopes of adjacent
boundary segments at the point of intersection of the segments.

 Shape Numbers

Shape number is the smallest magnitude of the first difference of a chain code
representation.

The order of a shape number is defined as the number of digits in its representation.
Shape order is even for a closed boundary.

Chain codes are used to represent the binary by a connected sequence of straight –line
segments. This represented is based on 4-connectivity and 8-connectivity of the
segments.The chain code works best with binary images and is a concise way of
representing a shape contour. The chain code direction convention is given below:

15
15
REGIONAL DESCRIPTORS

Area, perimeter and compactness are the simple region Descriptors

Compactness = (perimeter)2/area

Topological Descriptors

● Rubber-sheet Distortions

Topology is the study of properties of a figure that are unaffected by any deformation, as
long as there is no tearing or joining of the figure.

● Euler Number

Euler number (E) of region depends on the

number of connected components (C) and holes (H).

E=C−H

A connected component of a set is a subset of maximal size such that any two of its
points can be joined by a connected curve lying entirely within the subset.

15
 Texture

In the image processing, the texture can be defined as a function of spatial variation of
the brightness intensity of the pixels. Texture is the main term used to define objects or
concepts of a given image.

Texture is a feature used to partition imagesinto regions of interest and to classify


thoseregions.

• Texture provides information in the spatialarrangement of colours or intensities in


animage.

• Texture is characterized by the spatialdistribution of intensity levels in aneighborhood.

Texture is a repeating pattern of local ariations in image intensity:

– Texture cannot be defined for a point.

Texture consists of texture primitives or

texture elements, sometimes called texels.

– Texture can be described as fine, coarse, grained,smooth, etc.

– Such features are found in the tone and structure of a texture.

– Tone is based on pixel intensity properties in the texel, whilst structure represents the
spatial relationship between texels. If texels are small and tonal differences between

texels are large a fine texture results. – If texels are large and consist of several pixels, a
coarse texture results.

15
There are two primary issues in texture analysis:

1. Texture classification

2. Texture segmentation

• Texture segmentation is concerned withautomatically determining the


boundariesbetween various texture regions in an image.

Texture classification is concerned with identifying a given textured region from a given
set of texture classes.

– Each of these regions has unique texture characteristics.

– Statistical methods are extensively used.

e.g. GLCM (Gray Level Co-occurrence Matrix), contrast, entropy, homogeneity

There are three approaches to defining exactly what texture is:

Structural: texture is a set of primitive texels in some regular or repeated relationship.

Statistical: texture is a quantitative measure of the arrangement of intensities in a region.


This set of measurements is called a feature vector.

Modelling: texture modelling techniques involve constructing models to specify textures.

Statistical methods are particularly usefulwhen the texture primitives are small,resulting
in microtextures.

• When the size of the texture primitive is large,first determine the shape and properties
ofthe basic primitive and the rules which governthe placement of these primitives,
formingmacrotextures.

One of the simplest of the texture operators is the range or difference between maximum
and minimum intensity values in a neighborhood.

– The range operator converts the original image to one in which brightness represents
texture.

Another estimator of texture is the variance in neighborhood regions.


15
– This is the sum of the squares of the differences between the intensity of the central
pixel and its neighbours.

The statistical measures described so far are easy to calculate, but do not provide any
information about the repeating nature of texture.

• A Gray Level Co-occurrence Matrix (GLCM)

contains information about the positions of pixels having similar gray level values.

The statistical measures described so far are easy to calculate, but do not provide any
information about the repeating nature of texture.

• A gray level co-occurrence matrix (GLCM)

contains information about the positions of pixels having similar gray level values.

A co-occurrence matrix is a two-dimensional array, P, in which both the rows and the

columns represent a set of possible image values.

– A GLCM Pd[i,j] is defined by first specifying a displacement vector d=(dx,dy) and


counting all

pairs of pixels separated by d having gray levels i and j.

wherenij is the number of occurrences of the pixel values (i,j) lying at distance d in the
image.

15
– The co-occurrence matrix Pd has dimension n×n, where n is the number of gray levels
in the image.

The elements of Pd[i,j] can be normalized by dividing each entry by the total number of
pixel pairs.

Maximum Probability

This is simply the largest entry in the matrix, and corresponds to the strongest response.

Moments

The order k element difference moment can be defined as:

Contrast

Contrast is a measure of the local variations present in an image.

Homogeneity

A homogeneous image will result in a cooccurrence matrix with a combination of high


and low P[i,j]’s.

15
Entropy

Entropy is a measure of information content.

It measures the randomness of intensity distribution.

Correlation

Correlation is a measure of image linearity

Performance Measure

Digital image processing is the use of computer algorithm to enhance the properties of digital images.
Digital image processing techniques include preprocessing (filtering), segmentation and classification
technique. The effectiveness of these techniques can be estimated using performance metrics.
Performance metrics are used to determine the effectiveness of image processing technique in
achieving expected results. They are the quantities that are used to compare the performances of
different systems. In image processing, there are pre-processing performance metrics, segmentation
performance metrics and classification performance metrics depending on the stage the metrics are
applied.

Peak Signal-to-Noise Ratio

The Peak Signal-to-Noise Ratio (PSNR) of an image is the ratio of the maximum power
of the signal to the maximum power of the noise distorting the image [39]. The PSNR is
measured in decibel.

Mean Square Error

The mean square error (MSE) is the average of the squared intensity differences between
the filtered image pixels and reference (noiseless) image pixels.The metrics assumes that
the reduction in perceptual quality of an image is directly related to the visibility of the
error signal .

15
PSNR Gain

The PSNR gain of a new filter is the value in which the PSNR of the new filter is more
than the PSNR of an existing filter.

When the value of gain is positive, it means that the new filter is better that the existing
filter. However, if the gain is negative, the existing filter is better. The gain in
performance is measured in decibel.

True Acceptance Rate

True Acceptance Rate (TAR) is defined as the percentage of times a system correctly
verifies a true claim of identity [42]. A filter whose output has the highest value of TAR
when classified has the best performance and higher the value the better the technique.

False Acceptance Rate

FAR is defined as the percentage of times a system incorrectly verifies a true claim of
identity. A filter whose output has the lowest value of FAR when classified has the best
performance and higher the value the better the technique.

Pixel Error Rate

Pixel Error Rate (PERR) is defined as the percentage of a pixel error in the filtered image
with respect to the total number of pixels in the noiseless image. The pixel error is the
difference in the number of black pixels in the noiseless image and filtered image after
both are converted to binary images. It can also be defined as the total number of pixels
in the out image that have the wrong colour. Pixel error is the difference between the
number of black pixels in a noiseless image and the number of black pixels in a filtered
image. The parameter M and N are the row size and column size of the image
respectively. A classification technique with the lowest value of PERR has the best
performance and lower the value the better the technique.

Recognition Accuracy

Recognition accuracy (RA) is the accuracy with which all the features in an image are
recognized. A filter whose output has the highest value of RA when classified has the
best performance and higher the value the better the technique.

16
Confusion Matrix

A confusion matrix is a performance evaluation tool in machine learning, representing


the accuracy of a classification model. It displays the number of true positives, true
negatives, false positives, and false negatives.

True Positive: Interpretation: You predicted positive and it’s true.

True Negative: Interpretation: You predicted negative and it’s true.

False Positive: (Type 1 Error) Interpretation: You predicted positive and it’s false.

False Negative: (Type 2 Error) Interpretation: You predicted negative and it’s false.

16
16
References

1. Rafael C. Gonzalez, Richard E. Woods, Steven L. Eddins, “Digital Image Processing


Using Matlab”, 3rd Edition Tata McGraw Hill Pvt. Ltd., 2011.

2. Anil Jain K. “Fundamentals of Digital Image Processing”, PHI Learning Pvt. Ltd.,
2011.

3. William K. Pratt, “Introduction to Digital Image Processing”, CRC Press, 2013.

Question Bank

S.No PART-A

1. What is image compression?

2. Investigate the performance metrics for evaluating image compression.

3. List the need for Compression?

4. Define compression ratio.

5. What is Redundancy?

16
6. Validate the types of data redundancy.

7. What is the operation of source encoder?

8. What is the function of channel encoder?

9. Categorize video compression standards.

10. Specify the fundamental coding theorem.

11. What is meant by inverse mapping?

PART-B

1. What is data redundancy? Illustrate various types of data redundancy in detail.

2. Demonstrate in detail about Image compression model?

3. Discuss in detail source encoder and decoder,

4. Analyze Shannon’s first theorem for noiseless coding theorem.

5. Apply and analyze Shannon’s second theorem for noisy coding theorem.

6. Evaluate fundamental coding theorem.

7. Summarize the different types of redundancy.

8. Compare and contrast noiseless and noisy coding theorem.

16

You might also like