MIP Unit 5
MIP Unit 5
NeedforCompression:
In terms of storage, the capacity of a storage device can be effectively
increased with methodsthat compress a bodyof data on its wayto a storage device
and decompress it when it is retrieved.
In terms of communications, the bandwidth of a digital communication link
can be effectively increased by compressing data at the sending end and
decompressing data at the receiving end.
At any given time, the ability of the Internet to transfer data is fixed. Thus, if data
can effectively be compressed wherever possible, significant improvements of data
throughput can be achieved. Many files can be combined into one compressed
document making sending easier.
10
5.2 DATA REDUNDANCY: Data are the means by which information is
conveyed. Various amounts of data can be used to convey the same amount of
information. Example: Four different representation of the same information
(number five)
1) A picture(1001, 632 bits);
2) A word“five”spelled inEnglishusingthe ASCIIcharacterset(32 bits);
3) A singleASCIIdigit (8bits);
4) Binaryinteger(3bits)
Datacompressionisdefinedastheprocessofencodingdatausingarepresentationthat
reducestheoverallsizeof data.Thisreductionispossiblewhentheoriginaldataset
containssometypeofredundancy.Digitalimage
compression is a field that studies methods for reducing the total number of bits
required to represent an image. This can be achieved by eliminating various types
of redundancy that exist in the pixel values. In general, three basic redundancies
exist in digital images that follow
10
REDUNDANCYINDIGITAL IMAGES
–Codingredundancy-usuallyappearasresultsoftheuniformrepresentation ofeachpixel
–Spatial/Temporal redundancy-because the adjacent pixels tend to have
similarity inpractical.
– Irrelevant Information-Image contain information which are ignored by the
human visual system
5.2.1 CodingRedundancy:
Our quantized data is represented using code words. The code words are ordered
in the same wayas the intensitiesthat theyrepresent; thus thebit pattern00000000,
correspondingto thevalue 0, represents the darkest points in an image and the bit
pattern 11111111, corresponding to the value 255, represents the brightest points.
An 8-bit coding scheme has the capacity to represent 256 distinct levels of intensity
in an image. But if there are only 16 different grey levels in an image, the image
exhibits coding redundancy becauseitcouldberepresented usinga4-bit
codingscheme.Codingredundancycanalsoarisedue to the use of fixed-length code
words.
Grey level histogram of an image also can provide a great deal of insight into the
construction of codes to reduce the amount of data used to represent it.
Let us assume, that a discrete random variable rkin the interval (0,1) represents the
grey levels of an image and that each rk occurs with probabilityPr(rk).
Probabilitycan be estimated from the histogram of an image using
10
Example:
Consider the images shown in Figs. 1.1(a) and (b). As Figs. 1.1(c) and (d) show,
these images have virtually identical histograms. Note also that both histograms are
trimodal, indicating the presence of three dominant ranges of gray-level values.
Because the gray levels in these images arenot equallyprobable, variable-length
codingcan beused to reducethe codingredundancythat would result from a straight
or natural binary encoding of their pixels. The coding process, however, would not
alter the level of correlation between the pixels within the images. In other words,
the codes used to represent the gray levels of each image have nothing to do with
the
correlationbetweenpixels.Thesecorrelationsresultfromthestructuralorgeometricrelati
onships between the objects in the image.
10
Fig.1.1Twoimagesandtheirgray-
levelhistogramsandnormalizedautocorrelation coefficients along one line.
Figures 1.1(e) and (f) show the respective autocorrelation coefficients computed
along one line of each image.
The scaling factor in Eq. above accounts for the varying number of sum terms that
arise for each integervalue of n.Of course, n must be strictly less than N, the number
of pixels on a line. The variable x is the coordinate of the line used in the
computation. Note the dramatic difference between theshapeofthefunctions shownin
Figs. 1.1(e)and(f). Their shapes can be qualitatively related to the structure in the
images in Figs. 1.1(a) and (b).This relationshipis particularly noticeable in Fig. 1.1
(f), where the high correlation between pixels separated by 45 and 90 samples can
be directly related to the spacing between the vertically oriented matches of Fig.
1.1(b). In addition, the adjacent pixels of both images are highly correlated. When n
is 1, γ is 0.9922 and 0.9928 for the images of Figs.
1.1(a)and(b),respectively.Thesevaluesaretypical of mostproperly sampled television
images.
These illustrations reflect another important form of data redundancy—one directly
related to the interpixel correlations within an image. Because the value of any
given pixel can be reasonably predicted from the value ofits neighbors, the
information carried byindividual pixels is relatively small. Much of the visual
contribution of a single pixel to an i ma g e1 0i s redundant; it could have been guessed
on the basis of the values of its neighbors. A variety of names, including spatial
redundancy, geometric redundancy, and interframe redundancy, have been coined
to refer tothese interpixel dependencies. We use the term interpixel redundancy to
encompass them all.
In order to reduce the interpixel redundancies in an image, the 2-D pixel array
normally used for human viewing and interpretation must be transformed into a
more efficient (but usually "nonvisual") format. For example, the differences
between adjacent pixels can be used to represent an image. Transformations of this
type (that is, those that remove interpixelredundancy) are referred to as mappings.
They are called reversible mappings if the original image elements can be
reconstructed from the transformed data set.
5.2.3 PsychovisualRedundancy:
10
This terminology is consistent with normal usage of the word, which generally
means the mapping of a broad range of input values to a limited number of output
values. As it is an irreversible operation (visual information is lost), quantization
results in lossydata compression.
10
SourceEncoder
Reduces/eliminatesanycoding,interpixelorpsychovisualredundancies.TheSourc
eEncoder contains 3processes:
• Mapper
Transforms the image into array of coefficients reducing interpixel
redundancies. This isareversible process whichisnotlossy.Mayor
maynotreduce directlythe amountofdata required to represent the image.
• Quantizer: This process reduces the accuracy and hence psychovisual
redundancies of a given image. This process isirreversible and therefore
lossy. It must be omitted when error-free compression is desired.
• SymbolEncoder
This is the source encoding process where fixed or variable-length code is
used to represent mapped and quantized data sets. This is a reversible process
(not lossy). Removescodingredundancybyassigningshortest
codesforthemostfrequentlyoccurring output values.
SourceDecodercontains two components.
• SymbolDecoder:Thisistheinverseofthesymbolencoderandreverseofthe
variable-length coding isapplied.
• InverseMapper:Inverseoftheremovaloftheinterpixelredundancy.
• Theonly lossy
elementistheQuantizerwhichremovesthepsychovisualredundancies causing
irreversibleloss. EveryLossyCompression methods contain the Quantizer
module.
• Iferror-freecompression isdesiredthequantizermoduleisremoved.
TheChannelEncoderand Decoder:
The channel encoder and decoder play an important role in the overall encoding-
decodingprocesswhenthechannel is noisy or prone to error. They are designed to
reduce the impact of channel noise by insertinga controlled form of redundancy into
the source encoded data. As the output of the source encoder contains little
redundancy, it would be highly sensitive to transmission noise without the addition
ofthis"controlled redundancy." One of the most useful channel encoding techniques
was devised by R. W. Hamming (Hamming [1950]). It is based on appending
enough bits to the data being encoded to ensure that some minimum number of bits
must change between valid code words. Hamming showed, for example, that if 3
bits of redundancyare added to a 4-bit word, so that the distance between anytwo
valid code words is 3, all single-bit errors can be detected and corrected. (By
appending additional bits of redundancy, multiple-bit errors can be detected and
corrected.) The 7- bit Hamming (7, 4) code word h1, h2, h3…., h6, h7 associated
10
with a 4-bit binary number b3b2b1b0 is
Where denotes the exclusive OR operation. Note that bits h1, h2, and h4 are
even- parity bits for the bit fields b3 b2 b0, b3b1b0, and b2b1b0, respectively.
(Recall that a string of binary bits has even parityifthenumber ofbitswith a
valueof1is even.)TodecodeaHammingencoded result, thechannel decodermust
check the encoded value for odd parityoverthebit fields in which even
paritywas previouslyestablished. Asingle-bit erroris indicatedbya nonzero
parityword c4c2c1, where
Ifanonzerovalueisfound,thedecodersimply
complementsthecodewordbitposition indicatedby
theparityword.Thedecodedbinaryvalueisthenextractedfromthecorrectedcode
word as h3h5h6h7.
DCT-Based Compression
11
• JPEG is a lossy image compression method. It employs a transform coding
method using the DCT (Discrete Cosine Transform).
• An image is a function of i and j (or conventionally x and y) in the spatial
domain.
The 2D DCT is used as one step in JPEG in order to yield a
frequency response which is a function F(u, v) in the spatial
frequency domain, indexed by two integers u and v
11
Main Steps in JPEG Image Compression
• Transform RGB to YIQ or YUV and subsample color.
• DCT on image blocks.
• Quantization.
• Zig-zag ordering and run-length encoding.
• Entropy coding
Each image is divided into 8 × 8 blocks. The 2D DCT is applied to each block
image f(i, j), with output being the DCT coefficients F(u, v) for each block.
F(u, v) represents a DCT coefficient, Q(u, v) is a “quantization matrix” entry,
and Fˆ(u, v) represents the quantized DCTcoefficients which JPEG will use in
the succeeding entropy
Coding.
Run-length Coding (RLC) on AC coefficients
RLC aims to turn the Fˆ(u, v) values into sets {#-zeros-toskip , next non-zero
value}.
• To make it most likely to hit a long run of zeros: a zig-zag
scan is used to turn the 8×8 matrix Fˆ(u, v) into a 64-vector
JPEG2000 Standard
To provide a better rate-distortion tradeoff and improved subjective image
quality.
– To provide additional functionalities lacking in the current JPEG standard
In addition, JPEG2000 is able to handle up to 256 channels
of information whereas the current JPEG standard is only able to handle three
color channels.
11
MPEG (Motion Picture Expert Group)
MPEG Video Standard. MPEG (Motion Picture Expert Group) was set up in
1988 to develop a set of standard algorithms for applications that require
storage of video and audio on digital storage media. The basic structure of
compression algorithm proposed by MPEG is simple. An input image is
divided into blocks of 8 X 8 pixels. For a given 8 X 8 block, we subtract the
prediction generated using the previous frame. The difference between the
block being encoded and the prediction is transformed using a DCT. The
transform coefficients are quantized and transmitted to the receiver.
MPEG (Motion Pictures Experts Group) is a family of standards for audio and
video compression and transmission. It is developed and maintained by the
Motion Pictures Experts Group, a working group of the International
Organization for Standardization (ISO) and the International Electrotechnical
Commission (IEC).
There are several different types of MPEG standards, including −
MPEG-1 −This standard is primarily used for audio and video compression
for CD-ROMs and low-quality video on the internet.
MPEG-2 −This standard is used for digital television and DVD video, as well
as high-definition television (HDTV).
MPEG-4 −This standard is used for a wide range of applications, including
video on the internet, mobile devices, and interactive media.
MPEG-7 −This standard is used for the description and indexing of audio and
video content.
MPEG-21 − This standard is used for the delivery and distribution of
multimedia content over the internet.
MPEG uses a lossy form of compression, which means that some data is lost
when the audio or video is compressed. The degree of compression can be
adjusted, with higher levels of compression resulting in smaller file sizes but
lower quality, and lower levels of compression resulting in larger file sizes but
higher quality.
Advantage of MPEG
There are several advantages to using MPEG −
High compression efficiency − MPEG is a highly efficient compression
standard and can significantly reduce the file size of audio and video files
11
while maintaining good quality.Widely supported − MPEG is a widely used
and well-established audio and video format, and it is supported by a wide
range of media players, video editors, and other software.
Good quality − While MPEG uses lossy compression, it can still produce
good quality audio and video at moderate to high compression levels.
Flexible − The degree of compression used in an MPEG file can be adjusted,
allowing you to choose the balance between file size and quality.
Versatile − MPEG can be used with a wide range of audio and video types,
including music, movies, television shows, and other types of multimedia
content.
Streamable − MPEG files can be streamed over the internet, making it easy to
deliver audio and video content to a wide audience.
Scalable − MPEG supports scalable coding, which allows a single encoded
video to be adapted to different resolutions and bitrates. This makes it well-
suited for use in applications such as video-on-demand and live streaming.
Disadvantage of MPEG
There are also some disadvantages to using MPEG −
Lossy compression − Because MPEG uses lossy compression, some data is
lost when the audio or video is compressed. This can result in some loss of
quality, particularly at higher levels of compression.
Limited color depth − Some versions of MPEG have a limited color depth
and can only support 8 bits per channel. This can result in visible banding or
other artifacts in videos with high color gradations or smooth color transitions.
Non-ideal for text and graphics − MPEG is not well suited for video with
sharp transitions, high-contrast text, or graphics with hard edges. These types
of video can appear pixelated or jagged when saved as MPEG.
Complexity − The MPEG standards are complex and require specialized
software and hardware to encode and decode audio and video.
Patent fees − Some MPEG standards are covered by patents, which may
require the payment of licensing fees to use the technology.
Compatibility issues − Some older devices and software may not support
newer versions of the MPEG standard.
Spatial Compression: The spatial compression of each frame is done with
JPEG (ora modification of it). Each frame is a picture that can be
independently compressed.
11
iv. Temporal Compression: In temporal compression, redundant frames are
removed.
Frame Sequence
v. To temporally compress data, the MPEG method first divides frames into
three categories:
vi. I-frames, P-frames, and B-frames. Figure1 shows a sample sequence off
names.
According to the MPEG standard the entire movie is considered as a video sequence
which consist of pictures each having three components, one luminance component and
two chrominance components (y, u & v).
The luminance component contains the gray scale picture & the chrominance components
provide the color, hue & saturation.
Each component is a rectangular array of samples & each row of the array is called the
raster line.
The eye is more sensitive to spatial variations of luminance but less sensitive to similar
variations in chrominance. Hence MPEG – 1 standard samples the chrominance
components at half the resolution of luminance components.
The input to MPEG encoder is called the resource data and the output of the MPEG
decoder is called the reconstructed data.
The MPEG decoder has three parts, audio layer, video layer, system layer.
The system layer reads and interprets the various headers in the source data and transmits
this data to either audio or video layer.
The basic building block of an MPEG picture is the macro block as shown:
The macro block consist of 16×16 block of luminance gray scale samples divided into
four 8×8 blocks of chrominance samples.
The MPEG compression of a macro block consists of passing each of the °6 blocks their
DCT quantization and entropy encoding similar to JPEG.
A picture in MPEG is made up of slices where each slice is continuous set of macro
blocks having a similar gray scale component.
The concept of slice is important when a picture contains uniform areas.
The MPEG standard defines a quantization stage having values (1, 31). Quantization for
intra coding is:
11
Where
DCT = Discrete cosine transform of the coefficienting encoded
Q = Quantization coefficient from quantization table
The quantized numbers Q_(DCT )are encoded using non adaptive Haffman method and
the standard defines specific Haffman code tables which are calculated by collecting
statistics.
Mother wavelet ᵱ (ᵆ ) is a frequency domain function and Father wavelet ∅(ᵆ ) is a time
domain function. 11
There are two types of Wavelet Transforms: Continuous and Discrete.
Definitions of each type are given in the above figure. The key difference
between these two types is the Continuous Wavelet Transform (CWT) uses
every possible wavelet over a range of scales and locations i.e. an infinite
number of scales and locations. While the Discrete Wavelet Transform (DWT)
uses a finite set of wavelets i.e. defined at a particular set of scales and
locations.
there are a wide variety of wavelets to choose from to best match that shape. A handful of
options are given in the figure below.
11
From top to bottom, left to right: Daubechies 4, Daubechies 16, Haar, Coiflet
1, Symlet 4, Symlet 8, Biorthogonal 1.3, &Biorthogonal 3.1
Wavelet Transform Wavelets are functions defined over a finite interval. The
basic idea of the wavelet transform is to represent an arbitrary function ƒ(x) as
a linear combination of a set of such wavelets or basis functions. These basis
functions are obtained from a single prototype wavelet called the mother
wavelet by dilations (scaling) and translations (shifts). The purpose of wavelet
transform is to change the data from time-space domain to time-frequency
domain which makes better compression results. The simplest form of
wavelets, the
12
Morelet wavelet
It has Fourier basis with Gaussian function
2
ᵆ
ᵱ (ᵆ ) = ᵅᵆᵅ (ᵅᵱ ᵅ ᵆ )ᵅᵆᵅ ( )
2
Mexicam Wavelet
It is a 2nd order derivative Gaussian
2
ᵆ
ᵱ (ᵆ ) = (1 − ᵆ 2)ᵅᵆᵅ ()
2
Example: Detecting R-peaks in ECG Signal
In this example, I use a type of discrete wavelet transform to help detect R-
peaks from an Electrocardiogram (ECG) which measures heart activity. R-
peaks are typically the highest peak in an ECG signal. They are part of the
QRS-complex which is a characteristic oscillation that corresponds to the
contraction of the ventricles and expansion of the atria. Detecting R-peaks is
helpful in computing heart rate and heart rate variability (HRV).
In the real world, we rarely have ECG signals that look as clean as the above
graphic. As seen in this example, ECG data is typically noisy. For R-peak
detection, simple peak-finding algorithms will fail to generalize when applied
to raw data. The wavelet transform can help convert the signal into a form that
makes it much easier for our peak finder function.
Here I use the maximal overlap discrete wavelet transform (MODWT) to
extract R-peaks from the ECG waveform. The Symletwavelet with 4 vanishing
moments (sym4) at 7 different scales are 1u2s e d .
Block Diagram
12
Image Preprocessing
Before being used for model training and inference, pictures must first undergo image
preprocessing. This includes, but is not limited to, adjustments to the size, orientation,
and color. The purpose of pre-processing is to raise the image's quality so that we can
analyze it more effectively. Preprocessing allows us to eliminate unwanted distortions
and improve specific qualities that are essential for the application we are working on.
Those characteristics could change depending on the application. An image must be
preprocessed in order for software to function correctly and produce the desired results.
1. Orientation:
When a picture is taken, its metadata informs our computers how to show the input image
in relation to how it is stored on disk. Its EXIF orientation is the name given to that
metadata, and incorrect EXIF data handling has long been a source of frustration for
developers everywhere. This also holds true for models: if we've established annotated
bound boxes on how we viewed an image to be orientated but our model is "seeing" the
picture in a different orientation.
2. Resize:
Although altering an image's size may seem simple, there are things to keep in mind. Few
devices capture exactly square images, despite the fact that many model topologies require
square input images. Stretching an image's dimensions to make it square is one option,
as is maintaining the image's aspect ratio while adding additional pixels to fill in the
newly formed "dead space."
3. Random Flips:
Our model must acknowledge that an object need not always be read from left to right or
up to down by randomly reflecting it about its x- or y-axis. In order-dependent
circumstances, such as when deciphering text, flipping may be irrational.
4. Grayscale:
One type of image transformation that can be applied to all images (train and test) is a
change in color. Random changes can also be made to images only during training as
augmentations. Every image is often subjected to grayscaling, which alters the color. While
we may believe that "more signal is always better, we may actually observe more timely
model performance when images are rendered in grayscale.
12
5. Different Exposure:
If a model might be expected to operate in a range of lighting conditions, changing
image brightness to be randomly brighter and darker is most appropriate. The
maximum and minimum levels of brightness in the space must be taken into account.
Medical imaging: Image processing can provide sharp, high-quality images for
scientific and medical studies, ultimately assisting doctors in making diagnoses.
Convolutional Neural Networks (CNN) learn to do tasks like object detection, image
segmentation, and classification by taking in an input image and applying filters to it.
The retina is the light sensitive layer at the back of the eye that is visualisable with
specialist equipment when imaging through the pupil. The features of a typical view of
the retina include the optic disc where the blood vessels and nerves enter from the
back of the eye into the retina. The blood vessels emerge from the optic disc and
branch out to cover most of the retina. The macula is the central region of the retina
about which the blood vessels circle and partially penetrate has the optic disc on the left
and the macula towards the centre-right) and is the most important for vision.
12
As the photographer does not have complete control over the patient’s eye which forms
a part of the imaging optical system, retinal images often contain artifacts and/or are
of poorer quality than desirable. Patients often have tears covering the eye and,
particularly the elderly, may have cataract that obscures and blurs the view of the
retina. In addition, patients often do not or cannot hold their eye still during the
imaging process hence retinal images are often unevenly illuminated with parts of
the retinal image brighter or darker than the rest of the image, or, in worst cases,
washed out with a substantial or complete loss of contrast.
Ultrasound of liver
A vascular ultrasound of the liver is performed to help evaluate the liver and its
network of blood vessels (within the liver and entering and exiting the liver). Using
vascular ultrasound can help physicians diagnose and review the outcome of
treatments for various liver-related problems and diseases.A liver ultrasound is a
noninvasive test that produces images of a person’s liver and its blood vessels. It can
help diagnose various liver conditions, such as fatty liver, liver cancer, and
gallstones.A liver ultrasound is a type of transabdominal ultrasound. This means a
technician scans the abdomen using a device that resembles a microphone. The
process uses sound waves to create digital images. Liver ultrasounds are safe and
usually do not take long.
12
Some common types of liver ultrasound scans include:
Contrast imaging: This involves injecting dye into the blood vessels to make it easier to
see the liver and its vessels. It can be especially helpful for diagnosing growths and lesions
on the liver and detecting liver cancer.
Elastography: This is a technique to see how stiff the liver tissue is, which could
signal cirrhosis or another problem. It involves delivering a series of pulses to the liver to
see the liver tissue. A doctor may compare elastography scores over time to detect changes
in liver health.
Combined techniques: A doctor may combine techniques, such as by doing an ultrasound
and an MRI scan.
A liver ultrasound may indicate structural changes consistent with the presence of
certain conditions, including:
gallstones
6. Liver cancer
7. Infections
8. Blockages
Often, doctors will need to use a range of diagnostic tools to definitively identify
the reason for these changes. This process may include a medical history, physical exam,
blood tests, and potentially a biopsy. 12
Ultrasound scans use high frequency sound waves to create a picture of a part of
the body.
The ultrasound scanner has a microphone that gives off sound waves. The sound waves
bounce off the organs inside your body and a microphone picks them up. The microphone
links to a computer that turns the sound waves into a picture.
You might need to stop eating for 6 hours beforehand. Let the scan team know if this will
be a problem for any reason, for example if you are diabetic.
They might ask you to drink plenty before your scan so that you have a comfortably full
bladder.
Kidney Ultrasound
A kidney ultrasound may be performed to assist in placement of needles used to biopsy
(obtain a tissue sample) the kidneys , to drain fluid from a cyst or abscess, or to place a
drainage tube. This procedure may also be used to determine blood flow to the kidneys
through the renal arteries and veins.
12
A kidney ultrasound can show:
A kidney ultrasound may also be used to help detect physical signs of chronic kidney
disease (CKD), which can lead to kidney failure. For example, the kidneys of someone with
CKD maybeTrusted Source smaller, have thinning of certain kidney tissues, or show the
presence of cysts.
guiding your doctor to insert a needle for a tissue biopsy of your kidney
helping you doctor to locate a kidney abscess or cyst
helping your doctor place a drainage tube into your kidney
allowing your doctor to check on a transplanted kidney
12
Mammogram
Many studies have found that 3D mammography appears to lower the chance of
being called back for follow-up testing after screening. It also appears to find more breast
cancers, and several studies have shown it can be helpful in women with dense breasts. A
large study is now in progress to better compare outcomes between 3D mammograms and
standard (2D) mammograms.
Mammograms expose the breasts to small amounts of radiation. But the benefits of
mammography outweigh any possible harm from the radiation exposure. Modern
machines use low radiation doses to get breast x-rays that are high in image quality. On
average the total dose for a typical mammogram with 2 views of each breast is about 0.4
millisieverts, or mSv. (AmSv is a measure of radiation dose.) The radiation dose from 3D
mammograms can range from slightly lower to slightly higher than that from standard 2D
mammograms.
12
Breast tomosynthesis may also result in:
13
Blood vessels circulate blood throughout your body. They help deliver oxygen to
vital organs and tissues, and also remove waste products. Blood vessels include veins,
arteries and capillaries.
Segmentation of ROI
Semantic segmentation is an approach detecting, for every pixel, the belonging class. For
example, in a figure with many people, all the pixels belonging to persons will have the same
class id and the pixels in the background will be classified as background.
Instance segmentation is an approach that identifies, for every pixel, the specific
belonging instance of the object. It detects each distinct object of interest in the image. For
example, when each person in a figure is segmented as an individual object.
1. Arteries which
carry the blood away from the heart; the arterioles;
the capillaries, where the exchange of water and chemicals between the blood and
the tissues occurs.
2. Venules and the veins, which carry blood from the capillaries back
towards the heart.
3. Tunica intima: The inner layer surrounds the blood as it flows through your
body. It regulates blood pressure, prevents blood clots and keeps toxins out of your
blood. It keeps your blood flowing smoothly.
13
4. Media: The middle layer contains elastic fibers that keep your blood flowing in
one direction. The media also helps vessels expand and contract.
5. Adventitia: The outer layer contains nerves and tiny vessels. It delivers
oxygen and nutrients from your blood to your cells and helps remove waste. It also
gives blood vessels their structure and support.
1. Locating all the blood vessels with an appropriate method and visualizing
them. This can, for instance, be done by ray casting the original vessel data or by creating
a mesh around the edges of the vessels. The problem with this approach is that the
number of vessels is huge. Their structure varies a lot, going from the neck area up to the
brain. The placement and structure of the vessels also differs a lot from person to person,
which can make the results more uncertain.
13
1. Matched Filtering Method
Matched Filtering for Blood Vessel Segmentation One of the earliest and reasonably
effective proposals for the segmentation of blood vessels in retinal images is the use of
oriented matched-filters for the detection of long linear structures. Blood vessels
often have a Gaussian like cross-section that is fairly consistent along the length of
vessel segments. Provided the vessels are not too tortuous then they can be approximated
as elongated cylinders of Gaussian cross-section between the vessel branch points. Thus,
the two-dimensional model consisting of an elongated cylinder of Gaussian cross-section
should correlate well with a vessel segment provided they have both the same
orientation. The model is moved to each possible position in the image and the
correlation of the local patch of image to the model is calculated to form a correlation
image. Peaks in the correlation image occur at the locations of the blood vessels.
13
One of the most intuitive ways to find all vessels would be to locate one or
several places in the image where most of the blood must pass, and search outwards from
there. If the scan includes some of the chest area, the aorta is a good point to start.
Otherwise, one point in each common carotid vessel should be enough. If region growing
could be executed here, the problem would be more or less solved. However, as previously
discussed this is not entirely possible. Due to the intensity overlap and close proximity
of bone, the region growing will leak out and include bone parts such as vertebrae
and parts of the scull. In some few cases, where the vessels are better separated from
bone structure, it might be possible to minimize leakage to a point where it is no
longer critical. But by doing so, it is likely to have included too little of the vessels
themselves. The region growing method works very well though, if the only goal is to
strip the skull bone. It is possible to place a seed point at arbitrary locations inside the
brain, and then region grow the brain with an upper threshold set just so that none of the
skull bone is included. If there are any vessels in the head that were excluded as well, they
can be selected by some closing operations.
In order to actually find the vessels, it is essential to know the diameter of the
vessels. Generally, the blood vessels in the neck are much thicker than those in the head.
Even if the two are split up, it is still impossible to set an exact diameter that fits every
vessel. 13
2. Lesion Based Segmentation
The process of delineating the boundary of a lesion from an image or image series either
by use of interactive computer tools (manual) or by automated image segmentation
algorithms.
Retinal Lesions
Skin Lesions
Skin lesion segmentation, which is one of the medical image segmentation areas, is
important for the detection of melanoma. Melanoma, the most life-threatening skin
13
cancer, can suddenly occur on normal skin without warning and can develop on a
preexisting lesion. Therefore, lesions must be carefully monitored.
It contains images of the classes melanoma (MEL), melanocytic nevus (NV), basal
cell carcinoma (BCC), actinic keratosis (AK), benign keratosis (BKL),
dermatofibroma (DF), vascular lesion (VASC) and squamous cell carcinoma
(SCC).
1. Primary and Secondary lesions. Primary skin lesions are abnormal skin conditions that
may be present at birth or acquired later.
Secondary skin lesions are a result of irritated or manipulated primary skin lesions.
Primary lesions may be present at birth or acquired later in a person’s life. The most
common primary skin lesions include
Birthmarks: These are the most common primary skin lesions. They include moles,
port-wine stains, nevi, etc.
Blisters: Blisters are skin lesions that are less than half a centimeter in diameter and
filled with clear fluid. Small blisters are called vesicles and larger ones are called the
bullae. Blisters may be caused by burns (including sunburns), viral infections (herpes
zoster), friction due to shoes or clothes, insect bites, drug reactions, etc.
Macules: Macules are flat skin lesions. They are small (less than one centimeter in
diameter) and may be brownish or reddish. Freckles and flat moles are examples of
macules. A macular rash is commonly seen in measles.
Nodules: Nodules are soft or firm, raised skin lesions that are less than two centimeters
in diameter. The nodules are seen in certain diseases such as neurofibromatosis and
leprosy.
Papule: Papules are raised lesions and usually develop with other papules. A patch of
papules or nodules is called a plaque. Plaques are commonly seen in psoriasis. Papules
may be seen in viral infections, such as measles, or may occur due to mosquito bites.
Pustule: Pustules are pus-filled lesions. Boils and abscesses are examples of pustules.
Wheals: Wheals are swollen, raised bumps or plaques that appear suddenly on the skin.
They are mostly caused by an allergic reaction. For example, hives (also called
urticaria), insect bites, etc.
13
Secondary skin lesions, which get inflamed and irritated, develop after primary skin
lesions or due to an injury. The most common secondary skin lesions include
Crust: A crust or a scab is a type of skin lesion that forms over a scratched, injured or
irritated primary skin lesion. It is formed from the dried secretions over the skin.
Ulcer: Ulcers are a break in the continuity of the skin or mucosa. Skin ulcers are caused
by an infection or trauma. Poor blood circulation, diabetes, smoking and/or bedridden
status increase the risk of ulcers.
Scales: Scales are patches of skin cells that build up and flake off the skin. Patches are
often seen in psoriasis and cause bleeding when they are removed.
Scar: Injuries, such as scratches, cuts and scrapes, can leave scars. Some scars may be
thick and raised. These may cause itching or oozing and appear reddish or brownish.
These are called keloids.
Skin atrophy: Skin atrophy occurs when areas of the skin become thin and wrinkled.
This could occur due to the frequent use of steroid creams, radiation therapy or poor
blood circulation.
2. Morphological Operations
Morphology means the study of the shape and structure of living things from a biological
perspective. Morphology is a discipline of biology related to the study of the shape and
structure of the organism and its unique structural characteristics.
Cellular Morphology
Tissue Morphology
Organ Morphology
The Whole Organism
3 types of lesions
A flat mark on your skin of a different color than your skin tone (macule or patch).
Image segmentation involves the process of partitioning image data into multiple
sets of pixels/voxels. In other words, every pixel/voxel is assigned a label/value, where
those with the same label/value belong to the same segment. There are a vast number o f
methods for doing image segmentation
13
Thresholding, clustering, region growing and edge detection, just to mention a few
– and they can be applied to varying problems: Some of them can be used for doing blood
vessel segmentation. The reason for doing segmentation is in most cases to get a different
view on the image, mostly creating a more comprehensible representation of the original,
making it easier to analyze.
1. Thresholding
The key of this method is to select the threshold value (or values when multiple-
levels are selected). Several popular methods are used in industry including the maximum
entropy method, balanced histogram thresholding.
2. Clustering methods
3. Histogram-based methods
4. Edge detection
5. Region-growing methods
Using a partial differential equation (PDE)-based method and solving the PDE
equation by a numerical scheme, one can segment the image.[40] Curve propagation is a
popular technique in this category, with numerous applications to object extraction,
object tracking, stereo reconstruction, etc.
Graph partitioning methods are an effective tools for image segmentation since
they model the impact of pixel neighborhoods on a given cluster of pixels or pixel,
under the assumption of homogeneity in images.
8. Watershed transformation
3. ROI of Tumours
Tumours are groups of abnormal cells that form lumps or growths. They can start
in any one of the trillions of cells in our bodies. Tumours grow and behave differently,
depending on whether they are cancerous (malignant), non-cancerous (benign) or
precancerous.
A tumor is a solid mass of tissue that forms when abnormal cells group together.
Tumors can affect bones, skin, tissue, organs and glands. Many tumors are not cancer
(they’re benign). But they still may need treatment. Cancerous, or malignant, tumors can
be life-threatening and require cancer treatment.
A cyst is a small sac that may contain fluid, air or solid material. The majority of
cysts are not cancerous.
13
Cancerous: Malignant or cancerous tumors can spread into nearby tissue, glands
and other parts of the body.
Noncancerous: Benign tumors are not cancerous and are rarely life-threatening.
Uterine fibroids.
Cervical dysplasia.
14
Colon polyps.
Lung nodules are small masses of tissue in the lung that appear as round, white
spots on a chest X-ray or computed tomography (CT) scan. Because they rarely have
symptoms, they are usually found incidentally in 1 of every 500 chest X-rays taken for
other, unrelated ailments, like a respiratory illness.
Lung nodules are small clumps of cells in the lungs. They're very common. Most
lung nodules are scar tissue from past lung infections. Lung nodules usually don't cause
symptoms. They're often found by accident on a chest X-ray or CT scan done for some
other reason.
Pulmonary nodules, or lung nodules, are common, and are usually benign or
non-cancerous. Here’s what you need to know about these spots.
Most nodules are smaller than 10 mm, or the size of a cherry. Larger lung
nodules, or nodules located near an airway, may have symptoms such as a chronic
cough, blood-tinged mucus and saliva, shortness of breath, fever or wheezing.
“In our part of the world, very small (less than 6 mm) nodules are commonly
identified incidentally on chest CT scans for reasons like chest pain or shortness of
breath, or to evaluate for pulmonary embolism,” “The significant majority are benign,
although in certain instances they may require follow-up to prove that.”
The most common causes of lung nodules are tissue that has become inflamed
from infection or benign lung tumors. Causes of lung nodules can include:
Imaging, like an X-ray or a CT scan, can determine the size, shape and location of
your lung nodules. This can help your physician determine the cause and, as a result, the
treatment needed.
Though most lung nodules are not cancerous, it’s important to detect them early.
Northwestern Medicine offers a low-dose CT lung cancer screening program specifically
for individuals at high risk of lung cancer. To determine your eligibility for the program,
your physician will discuss your history, including your smoking history and age.
Feature Extraction
3. Kernel PCA
The feature Extraction technique gives us new features which are a linear
combination of the existing features. The new set of features will have different values as
compared to the original feature values. The main aim is that fewer features will be
required to capture the same information.
14
maximum variation (spread) in data. PCA is more useful when dealing with 3 or
higher-dimensional data.
We can infer from the above figure that from the first 6 Principal Components we
are able to capture 80% of the data. This shows us the Power of PCA that with only using 6
features we able to capture most of the data.
1. Mean
2. Variance
3. SD
4. Entropy
5. Skew
6. Kurtosis
We need to note that all the PC’s will be perpendicular to each other. The main
intention behind this is that no information present in PC1 will be present in PC2 when
they are perpendicular to each other.
Though PCA is a very useful technique to extract only the important features but
should be avoided for supervised algorithms as it completely hampers the data. If we still
wish to go for Feature Extraction Technique then we should go for LDA instead.
14
The main difference between LDA and PCA is:
LDA works in a similar manner as PCA but the only difference is that LDA requires class
label information, unlike PCA.
Then I have used a linear model like Logistic Regression to fit the data. Then plotted the
Decision Boundary for better class separability understanding
14
Feature extraction with Morphological features
Feature Selection
14
The features were selected using LNKnet package in order to identify the
ones that yield maximum discrimination capability thus achieving the
optimal diagnostic performance. Each parameter set was normalized to
have zero mean and unit variance before training. Forward search
strategy was applied to find the optimal feature subset, which was
obtained when the trained classifier produced the least error rate.
Well, if we compare the neural network to our brain, a node is a replica of a neuron that
receives a set of input signals—external stimuli.
14
The role of the Activation Function is to derive output from a set of input values fed to a node (or a layer).
K-nearest neighbors (KNN) is a type of supervised learning algorithm used for both regression and
classification. KNN tries to predict the correct class for the test data by calculating the distance between
the test data and all the training points. Then select the K number of points which is closet to the test
data.
Suppose there are two categories, i.e., Category A and Category B, and we have a new data point x1, so
this data point will lie in which of these categories. To solve this type of problem, we need a K-NN
algorithm. With the help of K-NN, we can easily identify the category or class of a particular dataset.
Consider the below diagram:
14
The K-NN working can be explained on the basis of the below algorithm:
Step-3: Take the K nearest neighbors as per the calculated Euclidean distance.
Step-4: Among these k neighbors, count the number of the data points in each category.
Step-5: Assign the new data points to that category for which the number of the neighbor is maximum.
Firstly, we will choose the number of neighbors, so we will choose the k=5.
Next, we will calculate the Euclidean distance between the data points. The Euclidean distance is
the distance between two points, which we have already studied in geometry. It can be calculated
as:
14
By calculating the Euclidean distance we got the nearest neighbors, as three nearest neighbors in
category A and two nearest neighbors in category B.
An ROC curve (receiver operating characteristic curve) is a graph showing the performance of a
classification model at all classification thresholds. This curve plots two parameters: True Positive Rate.
False Positive Rate.
14
AUC - Area Under the Curve
AUC stands for "Area under the ROC Curve." That is, AUC measures the
entire two-dimensional area underneath the entire ROC curve (think integral
calculus) from (0,0) to (1,1).
AUC provides an aggregate measure of performance across all possible classification thresholds. One
way of interpreting AUC is as the probability that the model ranks a random positive example more
highly than a random negative example. For example, given the following examples, which are
arranged from left to right in ascending order of logistic regression predictions:
15
Shape and Texture
BOUNDARY DESCRIPTORS
Simple Descriptors
Length of a Contour
By counting the number of pixels along the contour.For a chain coded curve with unit
spacing in both directions, the number of vertical and horizontal components plus 21/2
times the number of omponents give the exact length of curve.
● Boundary Diameter
It is defines as
Where D is the distance measure which can be either Euclidean distance or D4 distance.
The value of the diameter and the orientation of the major axis of the boundary are two
useful Descriptors.
● Curvature
15
Curvature can be determined by using the difference between the slopes of adjacent
boundary segments at the point of intersection of the segments.
Shape Numbers
Shape number is the smallest magnitude of the first difference of a chain code
representation.
The order of a shape number is defined as the number of digits in its representation.
Shape order is even for a closed boundary.
Chain codes are used to represent the binary by a connected sequence of straight –line
segments. This represented is based on 4-connectivity and 8-connectivity of the
segments.The chain code works best with binary images and is a concise way of
representing a shape contour. The chain code direction convention is given below:
15
15
REGIONAL DESCRIPTORS
Compactness = (perimeter)2/area
Topological Descriptors
● Rubber-sheet Distortions
Topology is the study of properties of a figure that are unaffected by any deformation, as
long as there is no tearing or joining of the figure.
● Euler Number
E=C−H
A connected component of a set is a subset of maximal size such that any two of its
points can be joined by a connected curve lying entirely within the subset.
15
Texture
In the image processing, the texture can be defined as a function of spatial variation of
the brightness intensity of the pixels. Texture is the main term used to define objects or
concepts of a given image.
– Tone is based on pixel intensity properties in the texel, whilst structure represents the
spatial relationship between texels. If texels are small and tonal differences between
texels are large a fine texture results. – If texels are large and consist of several pixels, a
coarse texture results.
15
There are two primary issues in texture analysis:
1. Texture classification
2. Texture segmentation
Texture classification is concerned with identifying a given textured region from a given
set of texture classes.
Statistical methods are particularly usefulwhen the texture primitives are small,resulting
in microtextures.
• When the size of the texture primitive is large,first determine the shape and properties
ofthe basic primitive and the rules which governthe placement of these primitives,
formingmacrotextures.
One of the simplest of the texture operators is the range or difference between maximum
and minimum intensity values in a neighborhood.
– The range operator converts the original image to one in which brightness represents
texture.
The statistical measures described so far are easy to calculate, but do not provide any
information about the repeating nature of texture.
contains information about the positions of pixels having similar gray level values.
The statistical measures described so far are easy to calculate, but do not provide any
information about the repeating nature of texture.
contains information about the positions of pixels having similar gray level values.
A co-occurrence matrix is a two-dimensional array, P, in which both the rows and the
wherenij is the number of occurrences of the pixel values (i,j) lying at distance d in the
image.
15
– The co-occurrence matrix Pd has dimension n×n, where n is the number of gray levels
in the image.
The elements of Pd[i,j] can be normalized by dividing each entry by the total number of
pixel pairs.
Maximum Probability
This is simply the largest entry in the matrix, and corresponds to the strongest response.
Moments
Contrast
Homogeneity
15
Entropy
Correlation
Performance Measure
Digital image processing is the use of computer algorithm to enhance the properties of digital images.
Digital image processing techniques include preprocessing (filtering), segmentation and classification
technique. The effectiveness of these techniques can be estimated using performance metrics.
Performance metrics are used to determine the effectiveness of image processing technique in
achieving expected results. They are the quantities that are used to compare the performances of
different systems. In image processing, there are pre-processing performance metrics, segmentation
performance metrics and classification performance metrics depending on the stage the metrics are
applied.
The Peak Signal-to-Noise Ratio (PSNR) of an image is the ratio of the maximum power
of the signal to the maximum power of the noise distorting the image [39]. The PSNR is
measured in decibel.
The mean square error (MSE) is the average of the squared intensity differences between
the filtered image pixels and reference (noiseless) image pixels.The metrics assumes that
the reduction in perceptual quality of an image is directly related to the visibility of the
error signal .
15
PSNR Gain
The PSNR gain of a new filter is the value in which the PSNR of the new filter is more
than the PSNR of an existing filter.
When the value of gain is positive, it means that the new filter is better that the existing
filter. However, if the gain is negative, the existing filter is better. The gain in
performance is measured in decibel.
True Acceptance Rate (TAR) is defined as the percentage of times a system correctly
verifies a true claim of identity [42]. A filter whose output has the highest value of TAR
when classified has the best performance and higher the value the better the technique.
FAR is defined as the percentage of times a system incorrectly verifies a true claim of
identity. A filter whose output has the lowest value of FAR when classified has the best
performance and higher the value the better the technique.
Pixel Error Rate (PERR) is defined as the percentage of a pixel error in the filtered image
with respect to the total number of pixels in the noiseless image. The pixel error is the
difference in the number of black pixels in the noiseless image and filtered image after
both are converted to binary images. It can also be defined as the total number of pixels
in the out image that have the wrong colour. Pixel error is the difference between the
number of black pixels in a noiseless image and the number of black pixels in a filtered
image. The parameter M and N are the row size and column size of the image
respectively. A classification technique with the lowest value of PERR has the best
performance and lower the value the better the technique.
Recognition Accuracy
Recognition accuracy (RA) is the accuracy with which all the features in an image are
recognized. A filter whose output has the highest value of RA when classified has the
best performance and higher the value the better the technique.
16
Confusion Matrix
False Positive: (Type 1 Error) Interpretation: You predicted positive and it’s false.
False Negative: (Type 2 Error) Interpretation: You predicted negative and it’s false.
16
16
References
2. Anil Jain K. “Fundamentals of Digital Image Processing”, PHI Learning Pvt. Ltd.,
2011.
Question Bank
S.No PART-A
5. What is Redundancy?
16
6. Validate the types of data redundancy.
PART-B
5. Apply and analyze Shannon’s second theorem for noisy coding theorem.
16